From 05f9623bf53830420866aaf69d28aaa8ea770abb Mon Sep 17 00:00:00 2001 From: Damien Arnodo Date: Sun, 15 Mar 2026 15:31:09 +0000 Subject: [PATCH] docs: add fabric standardization draft (Phase 0 - #31) --- docs/fabric-standardization.md | 280 +++++++++++++++++++++++++++++++++ 1 file changed, 280 insertions(+) create mode 100644 docs/fabric-standardization.md diff --git a/docs/fabric-standardization.md b/docs/fabric-standardization.md new file mode 100644 index 0000000..932506c --- /dev/null +++ b/docs/fabric-standardization.md @@ -0,0 +1,280 @@ +# Fabric Standardization — Small EVPN-VXLAN Data Centers + +> **Status**: Draft — Phase 0 (#31) +> **Scope**: POC for small data centers (2 spines, 3-8 leafs) +> **Parent**: Epic #30 + +--- + +## 1. Topology constraints + +### 1.1 Spine layer +- Always **2 spines** per fabric (redundancy, no single point of failure) +- Spines are **pure L3 routers** — no VTEPs, no VLANs, no MLAG +- Each spine connects to **every leaf** via a dedicated P2P link + +### 1.2 Leaf layer +- Leafs always come in **MLAG pairs** (2 leafs = 1 VTEP) +- Minimum **3 pairs** (6 leafs), maximum **4 pairs** (8 leafs) per fabric +- Each leaf connects to **both spines** via dedicated uplinks +- Each leaf pair shares a **VTEP loopback IP** (Loopback1) + +### 1.3 Host connectivity +- Hosts connect **directly to leaf pairs** — no access switches +- Every host is **dual-homed** via MLAG (LACP active) +- Each host-facing port-channel gets a unique MLAG ID + +### 1.4 Containerlab model +- All devices use the `ceos` image (Arista cEOS) +- The lab simulates a fixed 12-port model per device + +--- + +## 2. Port assignment — Spine + +Spines use a simple sequential mapping: one Ethernet port per leaf. + +| Port | Role | Connected to | +|------|------|-------------| +| Ethernet1 | Underlay downlink | leaf1 | +| Ethernet2 | Underlay downlink | leaf2 | +| Ethernet3 | Underlay downlink | leaf3 | +| ... | ... | ... | +| Ethernet{N} | Underlay downlink | leaf{N} | + +All spine downlinks are: +- **Routed** (`no switchport`) +- **/31 P2P addressing** +- **MTU 9214** (jumbo frames for VXLAN encapsulation) + +--- + +## 3. Port assignment — Leaf + +Leaf port allocation follows a fixed layout. Ports are divided into 3 zones: + +| Port range | Role | Details | +|------------|------|---------| +| **Ethernet1 — Ethernet9** | Host-facing | MLAG port-channels (trunk, LACP active) | +| **Ethernet10** | MLAG peer-link | Port-Channel999 (trunk, trunk-group `mlag-peer`) | +| **Ethernet11** | Spine1 uplink | Routed P2P /31, MTU 9214 | +| **Ethernet12** | Spine2 uplink | Routed P2P /31, MTU 9214 | + +### 3.1 Host-facing ports (Ethernet1-9) +- Each physical port is a member of a Port-Channel +- Port-Channel number = MLAG ID = host index (e.g., host1 → Po1, MLAG 1) +- Mode: `switchport mode trunk` +- VLANs: only the VLANs needed by the host +- LACP fallback enabled (timeout 5, individual) + +### 3.2 MLAG peer-link (Ethernet10) +- Always **Port-Channel999** +- Trunk mode with trunk-group `mlag-peer` +- Spanning-tree link-type point-to-point +- Carries MLAG control traffic + VLANs 4090, 4091 + +### 3.3 Spine uplinks (Ethernet11-12) +- Ethernet11 → spine1, Ethernet12 → spine2 +- Fixed mapping, never changes regardless of fabric size + +--- + +## 4. Naming conventions + +### 4.1 Devices + +| Role | Pattern | Examples | +|------|---------|---------| +| Spine | `spine{N}` | spine1, spine2 | +| Leaf | `leaf{N}` | leaf1, leaf2, ..., leaf8 | +| Host | `host{N}` | host1, host2, host3, host4 | + +- Leafs are numbered sequentially: odd = primary, even = secondary within a pair + - Pair 1: leaf1 (primary), leaf2 (secondary) + - Pair 2: leaf3 (primary), leaf4 (secondary) + - Pair N: leaf{2N-1} (primary), leaf{2N} (secondary) + +### 4.2 MLAG domain +- Domain ID: `leafs` (same for all pairs — Arista uses this as an identifier within the MLAG) +- Peer-link: Port-Channel999 +- Peer-link VLAN: 4090 +- iBGP peering VLAN: 4091 + +### 4.3 Fabric +- Pattern: `{site}-{function}` (e.g., `dc1-evpn`, `paris-evpn`) + +### 4.4 IPAM identifiers (for resource pool idempotence) + +| Object | Identifier pattern | Example | +|--------|-------------------|---------| +| Site infra prefix | `site-{site}-infra` | `site-dc1-infra` | +| Site services prefix | `site-{site}-services` | `site-dc1-services` | +| Device loopback0 IP | `lo0-{fabric}-{device}` | `lo0-evpnlab-leaf1` | +| Device loopback1 IP | `lo1-{fabric}-vtep{pair}` | `lo1-evpnlab-vtep1` | +| Underlay P2P /31 | `p2p-{fabric}-{spine}-{leaf}` | `p2p-evpnlab-spine1-leaf1` | +| MLAG peer /31 | `mlag-peer-{fabric}-pair{N}` | `mlag-peer-evpnlab-pair1` | +| MLAG iBGP /31 | `mlag-ibgp-{fabric}-pair{N}` | `mlag-ibgp-evpnlab-pair1` | +| Leaf ASN | `asn-{fabric}-pair{N}` | `asn-evpnlab-pair1` | + +--- + +## 5. IPAM — IP addressing plan + +### 5.1 Supernets (global) + +| Role | Supernet | Description | +|------|----------|-------------| +| Infrastructure | `10.0.0.0/8` | Loopbacks, underlay, MLAG | +| Services | `172.16.0.0/12` | L2/L3 VXLAN user subnets | + +### 5.2 Site allocation (from supernets) + +Each site receives: +- **1x /16** from `10.0.0.0/8` for infrastructure +- **1x /16** from `172.16.0.0/12` for services + +### 5.3 Fabric pools (from site infra /16) + +| Pool | Prefix size | Allocation unit | Pool type | +|------|-------------|-----------------|-----------| +| Loopback0 (router-id) | /24 | /32 per device | `CoreIPAddressPool` | +| Loopback1 (VTEP) | /24 | /32 per MLAG pair | `CoreIPAddressPool` | +| Underlay P2P | /24 | /31 per spine-leaf link | `CoreIPPrefixPool` | +| MLAG peer-link SVI | /24 | /31 per MLAG pair | `CoreIPPrefixPool` | +| MLAG iBGP peering | /24 | /31 per MLAG pair | `CoreIPPrefixPool` | + +### 5.4 Service pools (from site services /16) + +| Pool | Allocation unit | Pool type | +|------|-----------------|-----------| +| L2 VXLAN subnets | /24 per VLAN (customizable) | `CoreIPPrefixPool` | +| L3 VXLAN subnets (VRF SVIs) | /24 per VRF SVI (customizable) | `CoreIPPrefixPool` | + +### 5.5 Special VLANs (reserved, not from pools) + +| VLAN | Name | Purpose | Trunk group | +|------|------|---------|-------------| +| 4090 | mlag-peer | MLAG peer-link SVI | mlag-peer | +| 4091 | mlag-ibgp | MLAG iBGP peering | mlag-peer | + +--- + +## 6. BGP — Autonomous System assignment + +### 6.1 Spine ASN +- **Single ASN** shared by all spines in a fabric +- Defined as an attribute on `InfraFabric` +- Default for POC: **65000** + +### 6.2 Leaf ASN +- **One ASN per MLAG pair** (iBGP within pair, eBGP to spines) +- Allocated from a `CoreNumberPool` (range: 65001–65099) +- Deterministic via identifier: `asn-{fabric}-pair{N}` + +### 6.3 BGP configuration standards + +| Parameter | Value | Notes | +|-----------|-------|-------| +| `no bgp default ipv4-unicast` | Always | Explicit activation per AFI | +| `distance bgp` | `20 200 200` | eBGP preferred over iBGP | +| `maximum-paths` | `4 ecmp 64` | Multi-path for spine redundancy | +| `maximum-routes` | `12000 warning-only` | Per neighbor | +| `ebgp-multihop` | `3` | EVPN overlay (loopback peering) | +| `send-community extended` | Always | Required for EVPN route-targets | +| `next-hop-unchanged` | Spine EVPN peer-group | Preserve leaf next-hop in overlay | +| `next-hop-self` | Leaf iBGP peer-group | Required for iBGP convergence | + +### 6.4 Peer groups (per device) + +**Leaf peer groups:** + +| Peer group | Type | Remote AS | Neighbors | +|------------|------|-----------|-----------| +| `underlay` | eBGP | spine ASN | Spine P2P IPs | +| `underlay_ibgp` | iBGP | own ASN | MLAG peer via VLAN 4091 | +| `evpn` | eBGP | spine ASN | Spine loopback0 IPs | + +**Spine peer groups:** + +| Peer group | Type | Neighbors | +|------------|------|-----------| +| `evpn` | eBGP | All leaf loopback0 IPs (each with its own remote-as) | + +Spine underlay neighbors are configured individually (no peer-group) since each leaf has a different ASN. + +### 6.5 Address families + +| AFI | Activated on | Networks advertised | +|-----|-------------|---------------------| +| IPv4 unicast | underlay, underlay_ibgp | Loopback0/32, Loopback1/32 | +| EVPN | evpn | (routes from VLAN/VRF config) | + +--- + +## 7. MLAG standards + +| Parameter | Value | +|-----------|-------| +| Domain ID | `leafs` | +| Peer-link interface | Port-Channel999 | +| Peer-link VLAN | 4090 (IP: /31 from MLAG peer pool) | +| iBGP VLAN | 4091 (IP: /31 from MLAG iBGP pool, MTU 9214) | +| Peer VLAN autostate | Disabled (`no autostate`) | +| Dual-primary detection | Enabled (delay 10, errdisable all-interfaces) | +| Heartbeat | Via Management0 (VRF mgmt) | +| Virtual MAC | `c001.cafe.babe` (fabric-wide anycast gateway) | + +### 7.1 Primary/secondary assignment +- **Odd-numbered leaf** (leaf1, leaf3, leaf5, leaf7): lower IP on MLAG VLANs (e.g., x.x.x.254/31) +- **Even-numbered leaf** (leaf2, leaf4, leaf6, leaf8): higher IP (e.g., x.x.x.255/31) + +--- + +## 8. VXLAN standards + +### 8.1 VTEP interface +- Interface: `Vxlan1` on every leaf +- Source interface: `Loopback1` (shared IP within MLAG pair) +- UDP port: `4789` +- Learning: `vxlan learn-restrict any` (EVPN-controlled) + +### 8.2 VNI allocation + +| Type | NumberPool range | Usage | +|------|-----------------|-------| +| L2 VNI | 100001–199999 | One VNI per extended VLAN (EVPN Type-2) | +| L3 VNI | 200001–299999 | One VNI per VRF (EVPN Type-5) | + +VNIs are allocated from `CoreNumberPool` with deterministic identifiers. + +### 8.3 Route distinguisher and route target + +| Service type | RD format | RT format | +|-------------|-----------|-----------| +| L2 VXLAN (per VLAN) | `{ASN}:{VNI}` | `{VLAN_ID}:{VNI}` (import/export) | +| L3 VXLAN (per VRF) | `{Loopback0_IP}:{VRF_index}` | `{VRF_index}:{VNI}` (import/export evpn) | + +--- + +## 9. Global parameters + +| Parameter | Value | Notes | +|-----------|-------|-------| +| Underlay MTU | 9214 | All P2P and iBGP links | +| Anycast gateway MAC | `c001.cafe.babe` | `ip virtual-router mac-address` | +| Routing model | `multi-agent` | `service routing protocols model multi-agent` | +| Spanning-tree | Disabled on VLAN 4090, 4091 | MLAG VLANs only | +| LLDP | Management0 | `lldp management-address Management0` | +| gNMI | Enabled | `management api gnmi` with `provider eos-native` | + +--- + +## 10. Out of scope (for now) + +- **Access switches** — hosts connect directly to leafs +- **Multi-fabric / DCI** — single fabric per site +- **IPv6 underlay** — IPv4 only +- **BFD** — not configured in initial POC +- **Route-maps / prefix-lists** — no filtering in the underlay +- **More than 2 spines** — fixed at 2 for the POC +- **Non-Arista platforms** — EOS only