diff --git a/docs/fabric-standardization.md b/docs/fabric-standardization.md index 16fd2b2..50ef92c 100644 --- a/docs/fabric-standardization.md +++ b/docs/fabric-standardization.md @@ -1,46 +1,135 @@ # Fabric Standardization — Small EVPN-VXLAN Data Centers > **Status**: Draft — Phase 0 (#31) -> **Scope**: POC for small data centers (2 spines, 3-8 leafs) +> **Scope**: POC for small data centers (2-4 spines, up to 24 leaf pairs) > **Parent**: Epic #30 --- -## 1. Topology constraints +## 1. Device model -### 1.1 Spine layer -- Always **2 spines** per fabric (redundancy, no single point of failure) +### 1.1 Reference platform — Arista 7050SX3-48YC12 + +All fabric devices (spines and leafs) use the same hardware model: + +| Attribute | Value | +|-----------|-------| +| Model | 7050SX3-48YC12 | +| 25G SFP28 ports | 48 (Ethernet1–48) | +| 100G QSFP100 ports | 12 (Ethernet49–60) | +| Total ports | 60 | + +Port banks are physically separated, enabling clean role assignment with no overlap between host-facing and fabric traffic. + +### 1.2 Port role assignment — Production + +**Spine (production):** + +| Port bank | Role | Details | +|-----------|------|---------| +| Ethernet1–48 (25G) | Leaf downlinks | 1 port per leaf, routed P2P /31, MTU 9214 | +| Ethernet49–60 (100G) | Reserved | Future: inter-spine links, DCI, monitoring | + +**Leaf (production):** + +| Port bank | Role | Details | +|-----------|------|---------| +| Ethernet1–48 (25G) | Host-facing | MLAG port-channels (trunk, LACP active) | +| Ethernet49 (100G) | MLAG peer-link | Port-Channel999 (trunk, trunk-group `mlag-peer`) | +| Ethernet50–{49+N_spines} (100G) | Spine uplinks | 1 per spine, routed P2P /31, MTU 9214 | +| Remaining 100G | Reserved | Future use | + +### 1.3 Port role assignment — Containerlab (clab) + +Containerlab uses the `ceos` image with a **fixed 12-port model** (Ethernet1–12, uniform speed). The production layout is compressed into 12 ports using a deterministic formula based on spine count: + +| Parameter | Formula | +|-----------|---------| +| Spine uplinks | Last N ports: **Ethernet{13−N_spines}** through **Ethernet12** | +| MLAG peer-link | Port just before uplinks: **Ethernet{12−N_spines}** | +| Host-facing | All remaining: **Ethernet1** through **Ethernet{11−N_spines}** | +| Host port count | **12 − N_spines − 1** | + +**Concrete layouts per spine count:** + +| Spines | Host ports | MLAG port | Spine uplinks | Host count | +|--------|-----------|-----------|---------------|------------| +| 2 | Eth1–9 | Eth10 | Eth11 (S1), Eth12 (S2) | 9 | +| 3 | Eth1–8 | Eth9 | Eth10 (S1), Eth11 (S2), Eth12 (S3) | 8 | +| 4 | Eth1–7 | Eth8 | Eth9 (S1), Eth10 (S2), Eth11 (S3), Eth12 (S4) | 7 | + +**Spine in clab:** all 12 ports are leaf downlinks (Ethernet1 → Leaf-01, Ethernet2 → Leaf-02, ...). + +--- + +## 2. Topology constraints + +### 2.1 Derived limits + +All limits derive from the device model and the MLAG pairing rule (leafs always in pairs): + +| Constraint | Production (7050SX3-48YC12) | Clab (12-port) | Source | +|------------|----------------------------|-----------------|--------| +| Min spines | 2 | 2 | Redundancy requirement | +| Max spines | 4 | 4 | Small DC scope (hardware allows more) | +| Max leafs per fabric | 48 | 12 | Spine downlink port count | +| Max leaf pairs per fabric | 24 | 6 | Max leafs ÷ 2 | +| Min leaf pairs per fabric | 1 | 1 | Minimum viable fabric | +| Max host ports per leaf | 48 | 12 − N_spines − 1 | Port bank size minus fabric ports | +| Spine uplinks per leaf | N_spines | N_spines | 1 uplink per spine | +| MLAG peer-link ports | 1 (100G) | 1 | Fixed: Po999 | + +### 2.2 Validation rules + +The fabric generator must enforce: + +1. **N_spines ∈ {2, 3, 4}** — minimum 2 for redundancy, max 4 for small DC scope +2. **N_leafs is even** — leafs always come in MLAG pairs +3. **N_leafs ≥ 2** — at least 1 MLAG pair +4. **N_leafs ≤ spine downlink port count** — cannot exceed physical ports +5. **Every leaf connects to every spine** — full-mesh underlay +6. **Every spine connects to every leaf** — symmetric fabric + +### 2.3 Spine layer - Spines are **pure L3 routers** — no VTEPs, no VLANs, no MLAG -- Each spine connects to **every leaf** via a dedicated P2P link +- Each spine connects to every leaf via a dedicated P2P link +- All spines share the same ASN within a fabric -### 1.2 Leaf layer +### 2.4 Leaf layer - Leafs always come in **MLAG pairs** (2 leafs = 1 VTEP) -- Minimum **3 pairs** (6 leafs), maximum **4 pairs** (8 leafs) per fabric -- Each leaf connects to **both spines** via dedicated uplinks +- Each leaf connects to **all spines** via dedicated uplinks - Each leaf pair shares a **VTEP loopback IP** (Loopback1) -### 1.3 Host connectivity +### 2.5 Host connectivity - Hosts connect **directly to leaf pairs** — no access switches - Every host is **dual-homed** via MLAG (LACP active) - Each host-facing port-channel gets a unique MLAG ID -### 1.4 Containerlab model -- All devices use the `ceos` image (Arista cEOS) -- The lab simulates a fixed 12-port model per device +### 2.6 POC defaults + +For the initial POC, the following defaults apply: + +| Parameter | Default | Notes | +|-----------|---------|-------| +| Spines | 2 | Minimum for redundancy | +| Leaf pairs | 3 (6 leafs) | Enough to validate multi-VTEP behavior | +| Platform | clab (ceos, 12 ports) | Production model as reference only | --- -## 2. Port assignment — Spine +## 3. Port assignment — Spine -Spines use a simple sequential mapping: one Ethernet port per leaf. +Spines use sequential port mapping: one port per leaf, starting from Ethernet1. | Port | Role | Connected to | |------|------|-------------| -| Ethernet1 | Underlay downlink | Leaf 01 | -| Ethernet2 | Underlay downlink | Leaf 02 | -| Ethernet3 | Underlay downlink | Leaf 03 | +| Ethernet1 | Underlay downlink | Leaf-01 | +| Ethernet2 | Underlay downlink | Leaf-02 | | ... | ... | ... | -| Ethernet{N} | Underlay downlink | Leaf {N} | +| Ethernet{N_leafs} | Underlay downlink | Leaf-{N_leafs} | + +**Production:** Ethernet1–48 (25G), remaining ports reserved. +**Clab:** Ethernet1–12, all available for downlinks. All spine downlinks are: - **Routed** (`no switchport`) @@ -49,39 +138,36 @@ All spine downlinks are: --- -## 3. Port assignment — Leaf +## 4. Port assignment — Leaf -Leaf port allocation follows a fixed layout. Ports are divided into 3 zones: - -| Port range | Role | Details | -|------------|------|---------| -| **Ethernet1 — Ethernet9** | Host-facing | MLAG port-channels (trunk, LACP active) | -| **Ethernet10** | MLAG peer-link | Port-Channel999 (trunk, trunk-group `mlag-peer`) | -| **Ethernet11** | Spine1 uplink | Routed P2P /31, MTU 9214 | -| **Ethernet12** | Spine2 uplink | Routed P2P /31, MTU 9214 | - -### 3.1 Host-facing ports (Ethernet1-9) -- Each physical port is a member of a Port-Channel +### 4.1 Host-facing ports +- Each physical port maps to a Port-Channel - Port-Channel number = MLAG ID = host index (e.g., host 01 → Po1, MLAG 1) - Mode: `switchport mode trunk` - VLANs: only the VLANs needed by the host - LACP fallback enabled (timeout 5, individual) -### 3.2 MLAG peer-link (Ethernet10) +### 4.2 MLAG peer-link - Always **Port-Channel999** - Trunk mode with trunk-group `mlag-peer` - Spanning-tree link-type point-to-point - Carries MLAG control traffic + VLANs 4090, 4091 -### 3.3 Spine uplinks (Ethernet11-12) -- Ethernet11 → spine 01, Ethernet12 → spine 02 -- Fixed mapping, never changes regardless of fabric size +**Production:** Ethernet49 (100G) +**Clab:** Ethernet{12−N_spines} (derived from formula in §1.3) + +### 4.3 Spine uplinks +- One uplink per spine, routed P2P /31, MTU 9214 +- Fixed mapping: uplink port index matches spine index + +**Production:** Ethernet50 → Spine-01, Ethernet51 → Spine-02, ... Ethernet{49+N_spines} → Spine-{N_spines} +**Clab:** Ethernet{13−N_spines} → Spine-01, ..., Ethernet12 → Spine-{N_spines} --- -## 4. Naming conventions +## 5. Naming conventions -### 4.1 Device hostname format +### 5.1 Device hostname format All device hostnames follow the pattern: @@ -115,7 +201,7 @@ All device hostnames follow the pattern: | Spine 1 | `LY-DR-SPINE-01` | | Leaf 1 | `LY-DR-LEAF-01` | -### 4.2 Leaf pairing rule +### 5.2 Leaf pairing rule Leafs are numbered sequentially. **Odd = primary, even = secondary** within a pair: @@ -126,7 +212,7 @@ Leafs are numbered sequentially. **Odd = primary, even = secondary** within a pa | 3 | `{SITE}-{ZONE}-LEAF-05` | `{SITE}-{ZONE}-LEAF-06` | VTEP 3 | | 4 | `{SITE}-{ZONE}-LEAF-07` | `{SITE}-{ZONE}-LEAF-08` | VTEP 4 | -### 4.3 Fabric name +### 5.3 Fabric name The fabric is identified by `{SITE}-{ZONE}` (lowercase in Infrahub objects): @@ -135,16 +221,15 @@ The fabric is identified by `{SITE}-{ZONE}` (lowercase in Infrahub objects): | `InfraFabric` | `{site}-{zone}` | `pa-dc` | | `LocationSite` | `{site}-{zone}` | `pa-dc` | -### 4.4 Interface descriptions +### 5.4 Interface descriptions Interface descriptions reference the **full hostname** of the remote device: | Interface | Description format | Example (on PA-DC-LEAF-01) | |-----------|-------------------|----------------------------| -| Spine uplink Eth11 | `to {REMOTE_HOSTNAME}` | `to PA-DC-SPINE-01` | -| Spine uplink Eth12 | `to {REMOTE_HOSTNAME}` | `to PA-DC-SPINE-02` | -| MLAG peer-link Eth10 | `mlag peer link` | `mlag peer link` | -| Host-facing Eth1 | `to {HOST_HOSTNAME}` | `to PA-DC-HOST-01` | +| Spine uplink | `to {REMOTE_HOSTNAME}` | `to PA-DC-SPINE-01` | +| MLAG peer-link | `mlag peer link` | `mlag peer link` | +| Host-facing | `to {HOST_HOSTNAME}` | `to PA-DC-HOST-01` | | Loopback0 | `Router-ID` | `Router-ID` | | Loopback1 | `VTEP` | `VTEP` | @@ -152,10 +237,10 @@ On spines: | Interface | Description format | Example (on PA-DC-SPINE-01) | |-----------|-------------------|----------------------------| -| Downlink Eth1 | `to {REMOTE_HOSTNAME}` | `to PA-DC-LEAF-01` | +| Downlink | `to {REMOTE_HOSTNAME}` | `to PA-DC-LEAF-01` | | Loopback0 | `Router-ID` | `Router-ID` | -### 4.5 MLAG domain +### 5.5 MLAG domain | Parameter | Value | Notes | |-----------|-------|-------| @@ -164,7 +249,7 @@ On spines: | Peer-link VLAN | 4090 | Fixed | | iBGP peering VLAN | 4091 | Fixed | -### 4.6 BGP descriptions +### 5.6 BGP descriptions | Session type | Description format | Example | |-------------|-------------------|---------| @@ -172,7 +257,7 @@ On spines: | iBGP MLAG peer | `ibgp to {REMOTE_HOSTNAME}` | `ibgp to PA-DC-LEAF-02` | | EVPN overlay | `evpn to {REMOTE_HOSTNAME}` | `evpn to PA-DC-SPINE-01` | -### 4.7 IPAM identifiers (for resource pool idempotence) +### 5.7 IPAM identifiers (for resource pool idempotence) All identifiers use **lowercase**, with the fabric name `{site}-{zone}`: @@ -186,8 +271,10 @@ All identifiers use **lowercase**, with the fabric name `{site}-{zone}`: | MLAG peer /31 | `mlag-peer-{site}-{zone}-pair{NN}` | `mlag-peer-pa-dc-pair01` | | MLAG iBGP /31 | `mlag-ibgp-{site}-{zone}-pair{NN}` | `mlag-ibgp-pa-dc-pair01` | | Leaf ASN | `asn-{site}-{zone}-pair{NN}` | `asn-pa-dc-pair01` | +| L2 VNI | `l2vni-{site}-{zone}-vlan{NNNN}` | `l2vni-pa-dc-vlan0040` | +| L3 VNI | `l3vni-{site}-{zone}-{vrf_name}` | `l3vni-pa-dc-gold` | -### 4.8 Site prefix registry +### 5.8 Site prefix registry To avoid conflicts, site prefixes must be registered: @@ -204,39 +291,49 @@ To avoid conflicts, site prefixes must be registered: --- -## 5. IPAM — IP addressing plan +## 6. IPAM — IP addressing plan -### 5.1 Supernets (global) +### 6.1 Design principle + +Only the **two supernets** are fixed. All intermediate allocations (site prefixes, fabric pools, individual subnets) are delegated to **Infrahub's resource manager**, which picks the smallest available prefix that satisfies the request. Prefix sizes mentioned in this document are illustrative defaults — the generator requests a number of allocations from a pool and Infrahub handles sizing and placement. + +### 6.2 Supernets (global, fixed) | Role | Supernet | Description | |------|----------|-------------| | Infrastructure | `10.0.0.0/8` | Loopbacks, underlay, MLAG | | Services | `172.16.0.0/12` | L2/L3 VXLAN user subnets | -### 5.2 Site allocation (from supernets) +These are the only hardcoded prefixes. Everything below is allocated dynamically. -Each site receives: -- **1x /16** from `10.0.0.0/8` for infrastructure -- **1x /16** from `172.16.0.0/12` for services +### 6.3 Site allocation (from supernets) -### 5.3 Fabric pools (from site infra /16) +Each site receives one prefix from each supernet, allocated by Infrahub: +- **1 prefix** from `10.0.0.0/8` for infrastructure (e.g., /16) +- **1 prefix** from `172.16.0.0/12` for services (e.g., /16) -| Pool | Prefix size | Allocation unit | Pool type | -|------|-------------|-----------------|-----------| -| Loopback0 (router-id) | /24 | /32 per device | `CoreIPAddressPool` | -| Loopback1 (VTEP) | /24 | /32 per MLAG pair | `CoreIPAddressPool` | -| Underlay P2P | /24 | /31 per spine-leaf link | `CoreIPPrefixPool` | -| MLAG peer-link SVI | /24 | /31 per MLAG pair | `CoreIPPrefixPool` | -| MLAG iBGP peering | /24 | /31 per MLAG pair | `CoreIPPrefixPool` | +### 6.4 Fabric pools (from site infra prefix) -### 5.4 Service pools (from site services /16) +The fabric generator creates pools within the site's infrastructure prefix. Each pool serves a specific role and allocates individual subnets on demand: + +| Pool | Allocation unit | Pool type | Example size | +|------|-----------------|-----------|-------------| +| Loopback0 (router-id) | /32 per device | `CoreIPAddressPool` | /24 | +| Loopback1 (VTEP) | /32 per MLAG pair | `CoreIPAddressPool` | /24 | +| Underlay P2P | /31 per spine-leaf link | `CoreIPPrefixPool` | /24 or /23 | +| MLAG peer-link SVI | /31 per MLAG pair | `CoreIPPrefixPool` | /24 | +| MLAG iBGP peering | /31 per MLAG pair | `CoreIPPrefixPool` | /24 | + +> **Example sizes are not prescriptive.** Infrahub allocates the parent prefix for each pool based on the number of resources requested. A 2-spine / 6-leaf fabric needs far fewer /31s than a 4-spine / 48-leaf fabric — the resource manager adapts accordingly. + +### 6.5 Service pools (from site services prefix) | Pool | Allocation unit | Pool type | |------|-----------------|-----------| -| L2 VXLAN subnets | /24 per VLAN (customizable) | `CoreIPPrefixPool` | -| L3 VXLAN subnets (VRF SVIs) | /24 per VRF SVI (customizable) | `CoreIPPrefixPool` | +| L2 VXLAN subnets | Per VLAN (e.g., /24) | `CoreIPPrefixPool` | +| L3 VXLAN subnets (VRF SVIs) | Per VRF SVI (e.g., /24) | `CoreIPPrefixPool` | -### 5.5 Special VLANs (reserved, not from pools) +### 6.6 Special VLANs (reserved, not from pools) | VLAN | Name | Purpose | Trunk group | |------|------|---------|-------------| @@ -245,32 +342,33 @@ Each site receives: --- -## 6. BGP — Autonomous System assignment +## 7. BGP — Autonomous System assignment -### 6.1 Spine ASN +### 7.1 Spine ASN - **Single ASN** shared by all spines in a fabric - Defined as an attribute on `InfraFabric` - Default for POC: **65000** -### 6.2 Leaf ASN +### 7.2 Leaf ASN - **One ASN per MLAG pair** (iBGP within pair, eBGP to spines) - Allocated from a `CoreNumberPool` (range: 65001–65099) - Deterministic via identifier: `asn-{site}-{zone}-pair{NN}` -### 6.3 BGP configuration standards +### 7.3 BGP configuration standards | Parameter | Value | Notes | |-----------|-------|-------| | `no bgp default ipv4-unicast` | Always | Explicit activation per AFI | +| `bgp log-neighbor-changes` | Always | Operational visibility for BGP state transitions | | `distance bgp` | `20 200 200` | eBGP preferred over iBGP | -| `maximum-paths` | `4 ecmp 64` | Multi-path for spine redundancy | +| `maximum-paths` | `{N_spines × 2} ecmp 64` | Multi-path scaled to spine count (e.g., 2 spines → `4`, 4 spines → `8`) | | `maximum-routes` | `12000 warning-only` | Per neighbor | | `ebgp-multihop` | `3` | EVPN overlay (loopback peering) | | `send-community extended` | Always | Required for EVPN route-targets | | `next-hop-unchanged` | Spine EVPN peer-group | Preserve leaf next-hop in overlay | | `next-hop-self` | Leaf iBGP peer-group | Required for iBGP convergence | -### 6.4 Peer groups (per device) +### 7.4 Peer groups (per device) **Leaf peer groups:** @@ -288,7 +386,7 @@ Each site receives: Spine underlay neighbors are configured individually (no peer-group) since each leaf has a different ASN. -### 6.5 Address families +### 7.5 Address families | AFI | Activated on | Networks advertised | |-----|-------------|---------------------| @@ -297,7 +395,7 @@ Spine underlay neighbors are configured individually (no peer-group) since each --- -## 7. MLAG standards +## 8. MLAG standards | Parameter | Value | |-----------|-------| @@ -310,39 +408,39 @@ Spine underlay neighbors are configured individually (no peer-group) since each | Heartbeat | Via Management0 (VRF mgmt) | | Virtual MAC | `c001.cafe.babe` (fabric-wide anycast gateway) | -### 7.1 Primary/secondary assignment +### 8.1 Primary/secondary assignment - **Odd-numbered leaf** (LEAF-01, LEAF-03, LEAF-05, LEAF-07): lower IP on MLAG VLANs (e.g., x.x.x.0/31) - **Even-numbered leaf** (LEAF-02, LEAF-04, LEAF-06, LEAF-08): higher IP (e.g., x.x.x.1/31) --- -## 8. VXLAN standards +## 9. VXLAN standards -### 8.1 VTEP interface +### 9.1 VTEP interface - Interface: `Vxlan1` on every leaf - Source interface: `Loopback1` (shared IP within MLAG pair) - UDP port: `4789` - Learning: `vxlan learn-restrict any` (EVPN-controlled) -### 8.2 VNI allocation +### 9.2 VNI allocation -| Type | NumberPool range | Usage | -|------|-----------------|-------| -| L2 VNI | 100001–199999 | One VNI per extended VLAN (EVPN Type-2) | -| L3 VNI | 200001–299999 | One VNI per VRF (EVPN Type-5) | +| Type | NumberPool name | Range | Usage | Identifier pattern | +|------|----------------|-------|-------|--------------------| +| L2 VNI | `l2-vni-pool` | 100001–199999 | One VNI per extended VLAN (EVPN Type-2) | `l2vni-{site}-{zone}-vlan{NNNN}` | +| L3 VNI | `l3-vni-pool` | 200001–299999 | One VNI per VRF (EVPN Type-5) | `l3vni-{site}-{zone}-{vrf_name}` | -VNIs are allocated from `CoreNumberPool` with deterministic identifiers. +VNIs are allocated from `CoreNumberPool` with deterministic identifiers for idempotent sync. -### 8.3 Route distinguisher and route target +### 9.3 Route distinguisher and route target | Service type | RD format | RT format | |-------------|-----------|-----------| | L2 VXLAN (per VLAN) | `{ASN}:{VNI}` | `{VLAN_ID}:{VNI}` (import/export) | -| L3 VXLAN (per VRF) | `{Loopback0_IP}:{VRF_index}` | `{VRF_index}:{VNI}` (import/export evpn) | +| L3 VXLAN (per VRF) | `{Loopback0_IP}:{L3_VNI}` | `{L3_VNI}:{L3_VNI}` (import/export evpn) | --- -## 9. Global parameters +## 10. Global parameters | Parameter | Value | Notes | |-----------|-------|-------| @@ -355,12 +453,12 @@ VNIs are allocated from `CoreNumberPool` with deterministic identifiers. --- -## 10. Out of scope (for now) +## 11. Out of scope (for now) - **Access switches** — hosts connect directly to leafs - **Multi-fabric / DCI** — single fabric per site - **IPv6 underlay** — IPv4 only - **BFD** — not configured in initial POC - **Route-maps / prefix-lists** — no filtering in the underlay -- **More than 2 spines** — fixed at 2 for the POC +- **More than 4 spines** — capped for small DC scope - **Non-Arista platforms** — EOS only