docs: update fabric standardization — device model, flexible spines, dynamic IPAM

- Add §1 Device model (7050SX3-48YC12, prod vs clab port layouts)
- Rework §2 topology: 2-4 spines, limits derived from hardware
- §6 IPAM: only supernets fixed, all else delegated to Infrahub resource manager
- §7 BGP: maximum-paths = N_spines×2, add bgp log-neighbor-changes
- §9 VXLAN: RD/RT use L3_VNI, add VNI pool names and identifiers

Refs: #31
This commit is contained in:
2026-03-15 19:55:55 +00:00
parent 96244df528
commit a5f0652ea5

View File

@@ -1,46 +1,135 @@
# Fabric Standardization — Small EVPN-VXLAN Data Centers
> **Status**: Draft — Phase 0 (#31)
> **Scope**: POC for small data centers (2 spines, 3-8 leafs)
> **Scope**: POC for small data centers (2-4 spines, up to 24 leaf pairs)
> **Parent**: Epic #30
---
## 1. Topology constraints
## 1. Device model
### 1.1 Spine layer
- Always **2 spines** per fabric (redundancy, no single point of failure)
### 1.1 Reference platform — Arista 7050SX3-48YC12
All fabric devices (spines and leafs) use the same hardware model:
| Attribute | Value |
|-----------|-------|
| Model | 7050SX3-48YC12 |
| 25G SFP28 ports | 48 (Ethernet148) |
| 100G QSFP100 ports | 12 (Ethernet4960) |
| Total ports | 60 |
Port banks are physically separated, enabling clean role assignment with no overlap between host-facing and fabric traffic.
### 1.2 Port role assignment — Production
**Spine (production):**
| Port bank | Role | Details |
|-----------|------|---------|
| Ethernet148 (25G) | Leaf downlinks | 1 port per leaf, routed P2P /31, MTU 9214 |
| Ethernet4960 (100G) | Reserved | Future: inter-spine links, DCI, monitoring |
**Leaf (production):**
| Port bank | Role | Details |
|-----------|------|---------|
| Ethernet148 (25G) | Host-facing | MLAG port-channels (trunk, LACP active) |
| Ethernet49 (100G) | MLAG peer-link | Port-Channel999 (trunk, trunk-group `mlag-peer`) |
| Ethernet50{49+N_spines} (100G) | Spine uplinks | 1 per spine, routed P2P /31, MTU 9214 |
| Remaining 100G | Reserved | Future use |
### 1.3 Port role assignment — Containerlab (clab)
Containerlab uses the `ceos` image with a **fixed 12-port model** (Ethernet112, uniform speed). The production layout is compressed into 12 ports using a deterministic formula based on spine count:
| Parameter | Formula |
|-----------|---------|
| Spine uplinks | Last N ports: **Ethernet{13N_spines}** through **Ethernet12** |
| MLAG peer-link | Port just before uplinks: **Ethernet{12N_spines}** |
| Host-facing | All remaining: **Ethernet1** through **Ethernet{11N_spines}** |
| Host port count | **12 N_spines 1** |
**Concrete layouts per spine count:**
| Spines | Host ports | MLAG port | Spine uplinks | Host count |
|--------|-----------|-----------|---------------|------------|
| 2 | Eth19 | Eth10 | Eth11 (S1), Eth12 (S2) | 9 |
| 3 | Eth18 | Eth9 | Eth10 (S1), Eth11 (S2), Eth12 (S3) | 8 |
| 4 | Eth17 | Eth8 | Eth9 (S1), Eth10 (S2), Eth11 (S3), Eth12 (S4) | 7 |
**Spine in clab:** all 12 ports are leaf downlinks (Ethernet1 → Leaf-01, Ethernet2 → Leaf-02, ...).
---
## 2. Topology constraints
### 2.1 Derived limits
All limits derive from the device model and the MLAG pairing rule (leafs always in pairs):
| Constraint | Production (7050SX3-48YC12) | Clab (12-port) | Source |
|------------|----------------------------|-----------------|--------|
| Min spines | 2 | 2 | Redundancy requirement |
| Max spines | 4 | 4 | Small DC scope (hardware allows more) |
| Max leafs per fabric | 48 | 12 | Spine downlink port count |
| Max leaf pairs per fabric | 24 | 6 | Max leafs ÷ 2 |
| Min leaf pairs per fabric | 1 | 1 | Minimum viable fabric |
| Max host ports per leaf | 48 | 12 N_spines 1 | Port bank size minus fabric ports |
| Spine uplinks per leaf | N_spines | N_spines | 1 uplink per spine |
| MLAG peer-link ports | 1 (100G) | 1 | Fixed: Po999 |
### 2.2 Validation rules
The fabric generator must enforce:
1. **N_spines ∈ {2, 3, 4}** — minimum 2 for redundancy, max 4 for small DC scope
2. **N_leafs is even** — leafs always come in MLAG pairs
3. **N_leafs ≥ 2** — at least 1 MLAG pair
4. **N_leafs ≤ spine downlink port count** — cannot exceed physical ports
5. **Every leaf connects to every spine** — full-mesh underlay
6. **Every spine connects to every leaf** — symmetric fabric
### 2.3 Spine layer
- Spines are **pure L3 routers** — no VTEPs, no VLANs, no MLAG
- Each spine connects to **every leaf** via a dedicated P2P link
- Each spine connects to every leaf via a dedicated P2P link
- All spines share the same ASN within a fabric
### 1.2 Leaf layer
### 2.4 Leaf layer
- Leafs always come in **MLAG pairs** (2 leafs = 1 VTEP)
- Minimum **3 pairs** (6 leafs), maximum **4 pairs** (8 leafs) per fabric
- Each leaf connects to **both spines** via dedicated uplinks
- Each leaf connects to **all spines** via dedicated uplinks
- Each leaf pair shares a **VTEP loopback IP** (Loopback1)
### 1.3 Host connectivity
### 2.5 Host connectivity
- Hosts connect **directly to leaf pairs** — no access switches
- Every host is **dual-homed** via MLAG (LACP active)
- Each host-facing port-channel gets a unique MLAG ID
### 1.4 Containerlab model
- All devices use the `ceos` image (Arista cEOS)
- The lab simulates a fixed 12-port model per device
### 2.6 POC defaults
For the initial POC, the following defaults apply:
| Parameter | Default | Notes |
|-----------|---------|-------|
| Spines | 2 | Minimum for redundancy |
| Leaf pairs | 3 (6 leafs) | Enough to validate multi-VTEP behavior |
| Platform | clab (ceos, 12 ports) | Production model as reference only |
---
## 2. Port assignment — Spine
## 3. Port assignment — Spine
Spines use a simple sequential mapping: one Ethernet port per leaf.
Spines use sequential port mapping: one port per leaf, starting from Ethernet1.
| Port | Role | Connected to |
|------|------|-------------|
| Ethernet1 | Underlay downlink | Leaf 01 |
| Ethernet2 | Underlay downlink | Leaf 02 |
| Ethernet3 | Underlay downlink | Leaf 03 |
| Ethernet1 | Underlay downlink | Leaf-01 |
| Ethernet2 | Underlay downlink | Leaf-02 |
| ... | ... | ... |
| Ethernet{N} | Underlay downlink | Leaf {N} |
| Ethernet{N_leafs} | Underlay downlink | Leaf-{N_leafs} |
**Production:** Ethernet148 (25G), remaining ports reserved.
**Clab:** Ethernet112, all available for downlinks.
All spine downlinks are:
- **Routed** (`no switchport`)
@@ -49,39 +138,36 @@ All spine downlinks are:
---
## 3. Port assignment — Leaf
## 4. Port assignment — Leaf
Leaf port allocation follows a fixed layout. Ports are divided into 3 zones:
| Port range | Role | Details |
|------------|------|---------|
| **Ethernet1 — Ethernet9** | Host-facing | MLAG port-channels (trunk, LACP active) |
| **Ethernet10** | MLAG peer-link | Port-Channel999 (trunk, trunk-group `mlag-peer`) |
| **Ethernet11** | Spine1 uplink | Routed P2P /31, MTU 9214 |
| **Ethernet12** | Spine2 uplink | Routed P2P /31, MTU 9214 |
### 3.1 Host-facing ports (Ethernet1-9)
- Each physical port is a member of a Port-Channel
### 4.1 Host-facing ports
- Each physical port maps to a Port-Channel
- Port-Channel number = MLAG ID = host index (e.g., host 01 → Po1, MLAG 1)
- Mode: `switchport mode trunk`
- VLANs: only the VLANs needed by the host
- LACP fallback enabled (timeout 5, individual)
### 3.2 MLAG peer-link (Ethernet10)
### 4.2 MLAG peer-link
- Always **Port-Channel999**
- Trunk mode with trunk-group `mlag-peer`
- Spanning-tree link-type point-to-point
- Carries MLAG control traffic + VLANs 4090, 4091
### 3.3 Spine uplinks (Ethernet11-12)
- Ethernet11 → spine 01, Ethernet12 → spine 02
- Fixed mapping, never changes regardless of fabric size
**Production:** Ethernet49 (100G)
**Clab:** Ethernet{12N_spines} (derived from formula in §1.3)
### 4.3 Spine uplinks
- One uplink per spine, routed P2P /31, MTU 9214
- Fixed mapping: uplink port index matches spine index
**Production:** Ethernet50 → Spine-01, Ethernet51 → Spine-02, ... Ethernet{49+N_spines} → Spine-{N_spines}
**Clab:** Ethernet{13N_spines} → Spine-01, ..., Ethernet12 → Spine-{N_spines}
---
## 4. Naming conventions
## 5. Naming conventions
### 4.1 Device hostname format
### 5.1 Device hostname format
All device hostnames follow the pattern:
@@ -115,7 +201,7 @@ All device hostnames follow the pattern:
| Spine 1 | `LY-DR-SPINE-01` |
| Leaf 1 | `LY-DR-LEAF-01` |
### 4.2 Leaf pairing rule
### 5.2 Leaf pairing rule
Leafs are numbered sequentially. **Odd = primary, even = secondary** within a pair:
@@ -126,7 +212,7 @@ Leafs are numbered sequentially. **Odd = primary, even = secondary** within a pa
| 3 | `{SITE}-{ZONE}-LEAF-05` | `{SITE}-{ZONE}-LEAF-06` | VTEP 3 |
| 4 | `{SITE}-{ZONE}-LEAF-07` | `{SITE}-{ZONE}-LEAF-08` | VTEP 4 |
### 4.3 Fabric name
### 5.3 Fabric name
The fabric is identified by `{SITE}-{ZONE}` (lowercase in Infrahub objects):
@@ -135,16 +221,15 @@ The fabric is identified by `{SITE}-{ZONE}` (lowercase in Infrahub objects):
| `InfraFabric` | `{site}-{zone}` | `pa-dc` |
| `LocationSite` | `{site}-{zone}` | `pa-dc` |
### 4.4 Interface descriptions
### 5.4 Interface descriptions
Interface descriptions reference the **full hostname** of the remote device:
| Interface | Description format | Example (on PA-DC-LEAF-01) |
|-----------|-------------------|----------------------------|
| Spine uplink Eth11 | `to {REMOTE_HOSTNAME}` | `to PA-DC-SPINE-01` |
| Spine uplink Eth12 | `to {REMOTE_HOSTNAME}` | `to PA-DC-SPINE-02` |
| MLAG peer-link Eth10 | `mlag peer link` | `mlag peer link` |
| Host-facing Eth1 | `to {HOST_HOSTNAME}` | `to PA-DC-HOST-01` |
| Spine uplink | `to {REMOTE_HOSTNAME}` | `to PA-DC-SPINE-01` |
| MLAG peer-link | `mlag peer link` | `mlag peer link` |
| Host-facing | `to {HOST_HOSTNAME}` | `to PA-DC-HOST-01` |
| Loopback0 | `Router-ID` | `Router-ID` |
| Loopback1 | `VTEP` | `VTEP` |
@@ -152,10 +237,10 @@ On spines:
| Interface | Description format | Example (on PA-DC-SPINE-01) |
|-----------|-------------------|----------------------------|
| Downlink Eth1 | `to {REMOTE_HOSTNAME}` | `to PA-DC-LEAF-01` |
| Downlink | `to {REMOTE_HOSTNAME}` | `to PA-DC-LEAF-01` |
| Loopback0 | `Router-ID` | `Router-ID` |
### 4.5 MLAG domain
### 5.5 MLAG domain
| Parameter | Value | Notes |
|-----------|-------|-------|
@@ -164,7 +249,7 @@ On spines:
| Peer-link VLAN | 4090 | Fixed |
| iBGP peering VLAN | 4091 | Fixed |
### 4.6 BGP descriptions
### 5.6 BGP descriptions
| Session type | Description format | Example |
|-------------|-------------------|---------|
@@ -172,7 +257,7 @@ On spines:
| iBGP MLAG peer | `ibgp to {REMOTE_HOSTNAME}` | `ibgp to PA-DC-LEAF-02` |
| EVPN overlay | `evpn to {REMOTE_HOSTNAME}` | `evpn to PA-DC-SPINE-01` |
### 4.7 IPAM identifiers (for resource pool idempotence)
### 5.7 IPAM identifiers (for resource pool idempotence)
All identifiers use **lowercase**, with the fabric name `{site}-{zone}`:
@@ -186,8 +271,10 @@ All identifiers use **lowercase**, with the fabric name `{site}-{zone}`:
| MLAG peer /31 | `mlag-peer-{site}-{zone}-pair{NN}` | `mlag-peer-pa-dc-pair01` |
| MLAG iBGP /31 | `mlag-ibgp-{site}-{zone}-pair{NN}` | `mlag-ibgp-pa-dc-pair01` |
| Leaf ASN | `asn-{site}-{zone}-pair{NN}` | `asn-pa-dc-pair01` |
| L2 VNI | `l2vni-{site}-{zone}-vlan{NNNN}` | `l2vni-pa-dc-vlan0040` |
| L3 VNI | `l3vni-{site}-{zone}-{vrf_name}` | `l3vni-pa-dc-gold` |
### 4.8 Site prefix registry
### 5.8 Site prefix registry
To avoid conflicts, site prefixes must be registered:
@@ -204,39 +291,49 @@ To avoid conflicts, site prefixes must be registered:
---
## 5. IPAM — IP addressing plan
## 6. IPAM — IP addressing plan
### 5.1 Supernets (global)
### 6.1 Design principle
Only the **two supernets** are fixed. All intermediate allocations (site prefixes, fabric pools, individual subnets) are delegated to **Infrahub's resource manager**, which picks the smallest available prefix that satisfies the request. Prefix sizes mentioned in this document are illustrative defaults — the generator requests a number of allocations from a pool and Infrahub handles sizing and placement.
### 6.2 Supernets (global, fixed)
| Role | Supernet | Description |
|------|----------|-------------|
| Infrastructure | `10.0.0.0/8` | Loopbacks, underlay, MLAG |
| Services | `172.16.0.0/12` | L2/L3 VXLAN user subnets |
### 5.2 Site allocation (from supernets)
These are the only hardcoded prefixes. Everything below is allocated dynamically.
Each site receives:
- **1x /16** from `10.0.0.0/8` for infrastructure
- **1x /16** from `172.16.0.0/12` for services
### 6.3 Site allocation (from supernets)
### 5.3 Fabric pools (from site infra /16)
Each site receives one prefix from each supernet, allocated by Infrahub:
- **1 prefix** from `10.0.0.0/8` for infrastructure (e.g., /16)
- **1 prefix** from `172.16.0.0/12` for services (e.g., /16)
| Pool | Prefix size | Allocation unit | Pool type |
|------|-------------|-----------------|-----------|
| Loopback0 (router-id) | /24 | /32 per device | `CoreIPAddressPool` |
| Loopback1 (VTEP) | /24 | /32 per MLAG pair | `CoreIPAddressPool` |
| Underlay P2P | /24 | /31 per spine-leaf link | `CoreIPPrefixPool` |
| MLAG peer-link SVI | /24 | /31 per MLAG pair | `CoreIPPrefixPool` |
| MLAG iBGP peering | /24 | /31 per MLAG pair | `CoreIPPrefixPool` |
### 6.4 Fabric pools (from site infra prefix)
### 5.4 Service pools (from site services /16)
The fabric generator creates pools within the site's infrastructure prefix. Each pool serves a specific role and allocates individual subnets on demand:
| Pool | Allocation unit | Pool type | Example size |
|------|-----------------|-----------|-------------|
| Loopback0 (router-id) | /32 per device | `CoreIPAddressPool` | /24 |
| Loopback1 (VTEP) | /32 per MLAG pair | `CoreIPAddressPool` | /24 |
| Underlay P2P | /31 per spine-leaf link | `CoreIPPrefixPool` | /24 or /23 |
| MLAG peer-link SVI | /31 per MLAG pair | `CoreIPPrefixPool` | /24 |
| MLAG iBGP peering | /31 per MLAG pair | `CoreIPPrefixPool` | /24 |
> **Example sizes are not prescriptive.** Infrahub allocates the parent prefix for each pool based on the number of resources requested. A 2-spine / 6-leaf fabric needs far fewer /31s than a 4-spine / 48-leaf fabric — the resource manager adapts accordingly.
### 6.5 Service pools (from site services prefix)
| Pool | Allocation unit | Pool type |
|------|-----------------|-----------|
| L2 VXLAN subnets | /24 per VLAN (customizable) | `CoreIPPrefixPool` |
| L3 VXLAN subnets (VRF SVIs) | /24 per VRF SVI (customizable) | `CoreIPPrefixPool` |
| L2 VXLAN subnets | Per VLAN (e.g., /24) | `CoreIPPrefixPool` |
| L3 VXLAN subnets (VRF SVIs) | Per VRF SVI (e.g., /24) | `CoreIPPrefixPool` |
### 5.5 Special VLANs (reserved, not from pools)
### 6.6 Special VLANs (reserved, not from pools)
| VLAN | Name | Purpose | Trunk group |
|------|------|---------|-------------|
@@ -245,32 +342,33 @@ Each site receives:
---
## 6. BGP — Autonomous System assignment
## 7. BGP — Autonomous System assignment
### 6.1 Spine ASN
### 7.1 Spine ASN
- **Single ASN** shared by all spines in a fabric
- Defined as an attribute on `InfraFabric`
- Default for POC: **65000**
### 6.2 Leaf ASN
### 7.2 Leaf ASN
- **One ASN per MLAG pair** (iBGP within pair, eBGP to spines)
- Allocated from a `CoreNumberPool` (range: 6500165099)
- Deterministic via identifier: `asn-{site}-{zone}-pair{NN}`
### 6.3 BGP configuration standards
### 7.3 BGP configuration standards
| Parameter | Value | Notes |
|-----------|-------|-------|
| `no bgp default ipv4-unicast` | Always | Explicit activation per AFI |
| `bgp log-neighbor-changes` | Always | Operational visibility for BGP state transitions |
| `distance bgp` | `20 200 200` | eBGP preferred over iBGP |
| `maximum-paths` | `4 ecmp 64` | Multi-path for spine redundancy |
| `maximum-paths` | `{N_spines × 2} ecmp 64` | Multi-path scaled to spine count (e.g., 2 spines → `4`, 4 spines → `8`) |
| `maximum-routes` | `12000 warning-only` | Per neighbor |
| `ebgp-multihop` | `3` | EVPN overlay (loopback peering) |
| `send-community extended` | Always | Required for EVPN route-targets |
| `next-hop-unchanged` | Spine EVPN peer-group | Preserve leaf next-hop in overlay |
| `next-hop-self` | Leaf iBGP peer-group | Required for iBGP convergence |
### 6.4 Peer groups (per device)
### 7.4 Peer groups (per device)
**Leaf peer groups:**
@@ -288,7 +386,7 @@ Each site receives:
Spine underlay neighbors are configured individually (no peer-group) since each leaf has a different ASN.
### 6.5 Address families
### 7.5 Address families
| AFI | Activated on | Networks advertised |
|-----|-------------|---------------------|
@@ -297,7 +395,7 @@ Spine underlay neighbors are configured individually (no peer-group) since each
---
## 7. MLAG standards
## 8. MLAG standards
| Parameter | Value |
|-----------|-------|
@@ -310,39 +408,39 @@ Spine underlay neighbors are configured individually (no peer-group) since each
| Heartbeat | Via Management0 (VRF mgmt) |
| Virtual MAC | `c001.cafe.babe` (fabric-wide anycast gateway) |
### 7.1 Primary/secondary assignment
### 8.1 Primary/secondary assignment
- **Odd-numbered leaf** (LEAF-01, LEAF-03, LEAF-05, LEAF-07): lower IP on MLAG VLANs (e.g., x.x.x.0/31)
- **Even-numbered leaf** (LEAF-02, LEAF-04, LEAF-06, LEAF-08): higher IP (e.g., x.x.x.1/31)
---
## 8. VXLAN standards
## 9. VXLAN standards
### 8.1 VTEP interface
### 9.1 VTEP interface
- Interface: `Vxlan1` on every leaf
- Source interface: `Loopback1` (shared IP within MLAG pair)
- UDP port: `4789`
- Learning: `vxlan learn-restrict any` (EVPN-controlled)
### 8.2 VNI allocation
### 9.2 VNI allocation
| Type | NumberPool range | Usage |
|------|-----------------|-------|
| L2 VNI | 100001199999 | One VNI per extended VLAN (EVPN Type-2) |
| L3 VNI | 200001299999 | One VNI per VRF (EVPN Type-5) |
| Type | NumberPool name | Range | Usage | Identifier pattern |
|------|----------------|-------|-------|--------------------|
| L2 VNI | `l2-vni-pool` | 100001199999 | One VNI per extended VLAN (EVPN Type-2) | `l2vni-{site}-{zone}-vlan{NNNN}` |
| L3 VNI | `l3-vni-pool` | 200001299999 | One VNI per VRF (EVPN Type-5) | `l3vni-{site}-{zone}-{vrf_name}` |
VNIs are allocated from `CoreNumberPool` with deterministic identifiers.
VNIs are allocated from `CoreNumberPool` with deterministic identifiers for idempotent sync.
### 8.3 Route distinguisher and route target
### 9.3 Route distinguisher and route target
| Service type | RD format | RT format |
|-------------|-----------|-----------|
| L2 VXLAN (per VLAN) | `{ASN}:{VNI}` | `{VLAN_ID}:{VNI}` (import/export) |
| L3 VXLAN (per VRF) | `{Loopback0_IP}:{VRF_index}` | `{VRF_index}:{VNI}` (import/export evpn) |
| L3 VXLAN (per VRF) | `{Loopback0_IP}:{L3_VNI}` | `{L3_VNI}:{L3_VNI}` (import/export evpn) |
---
## 9. Global parameters
## 10. Global parameters
| Parameter | Value | Notes |
|-----------|-------|-------|
@@ -355,12 +453,12 @@ VNIs are allocated from `CoreNumberPool` with deterministic identifiers.
---
## 10. Out of scope (for now)
## 11. Out of scope (for now)
- **Access switches** — hosts connect directly to leafs
- **Multi-fabric / DCI** — single fabric per site
- **IPv6 underlay** — IPv4 only
- **BFD** — not configured in initial POC
- **Route-maps / prefix-lists** — no filtering in the underlay
- **More than 2 spines** — fixed at 2 for the POC
- **More than 4 spines** — capped for small DC scope
- **Non-Arista platforms** — EOS only