Document all critical fixes discovered during lab testing: - Spine routing: ip routing command added - MLAG: static LAG mode enabled - Pending: port-channel access mode, host networking Track status of each fix for deployment readiness.
158 lines
5.0 KiB
Markdown
158 lines
5.0 KiB
Markdown
# Fixes Applied in Main Branch
|
|
|
|
This document tracks critical fixes that have been discovered and applied during lab deployment to ensure the EVPN-VXLAN fabric functions correctly.
|
|
|
|
## ✅ Fixes Applied to Main Branch
|
|
|
|
### 1. **Spine Switches - Enable IP Routing** ✅ FIXED
|
|
**Problem**: BGP was disabled on spine switches with error "BGP is disabled for VRF default" and "IP routing not enabled"
|
|
|
|
**Fix**: Added `ip routing` command to both spine configurations
|
|
- `configs/spine1.cfg` - Added line: `ip routing` (before `service routing protocols model multi-agent`)
|
|
- `configs/spine2.cfg` - Added line: `ip routing` (before `service routing protocols model multi-agent`)
|
|
|
|
**Impact**: This enables BGP to function properly on spines, allowing:
|
|
- Underlay BGP IPv4 Unicast sessions to establish
|
|
- EVPN BGP sessions to establish
|
|
- Route exchange between spines and leafs
|
|
|
|
**Status**: ✅ **APPLIED** (commits applied to main branch)
|
|
|
|
---
|
|
|
|
### 2. **Leaf Switches - MLAG Port-Channel Mode** ✅ FIXED
|
|
**Problem**: LACP bonding (`mode active`) doesn't work properly in Alpine Linux containers due to lack of kernel module support
|
|
|
|
**Fix**: Changed from LACP to static LAG
|
|
- Changed `channel-group 1 mode active` to `channel-group 1 mode on` in all leaf configs
|
|
- This creates a static LAG that works in containerized environments
|
|
|
|
**Status**: ✅ **ALREADY APPLIED** (pushed by user in previous commits)
|
|
|
|
---
|
|
|
|
## ⏳ Remaining Issues (Pending Application)
|
|
|
|
### 3. **Leaf Switches - Port-Channel1 Switchport Mode** ⏳ PENDING
|
|
**Problem**: Port-Channel configured as `trunk`, but Alpine containers send untagged traffic
|
|
|
|
**Fix Needed**: Change Port-Channel1 from trunk to access mode on all leafs:
|
|
```
|
|
interface Port-Channel1
|
|
switchport mode access
|
|
switchport access vlan 40 # or appropriate VLAN for each VTEP
|
|
```
|
|
|
|
**Status**: ⏳ **NOT YET APPLIED** - Needs manual configuration or config file updates
|
|
|
|
**Affected Files**:
|
|
- `configs/leaf1.cfg`
|
|
- `configs/leaf2.cfg`
|
|
- `configs/leaf3.cfg`
|
|
- `configs/leaf4.cfg`
|
|
- `configs/leaf5.cfg`
|
|
- `configs/leaf6.cfg`
|
|
- `configs/leaf7.cfg`
|
|
- `configs/leaf8.cfg`
|
|
|
|
---
|
|
|
|
### 4. **Host Configuration - Simplified Bonding** ⏳ PENDING
|
|
**Problem**: Alpine Linux containers cannot properly configure 802.3ad LACP bonding
|
|
|
|
**Fix Needed**: Remove bonding complexity, use single interface:
|
|
```yaml
|
|
host1:
|
|
exec:
|
|
- ip addr add 10.40.40.101/24 dev eth1
|
|
- ip link set eth1 up
|
|
```
|
|
|
|
**Status**: ⏳ **NOT YET APPLIED** - Topology file needs updating
|
|
|
|
---
|
|
|
|
## 📋 Summary of Issues Found
|
|
|
|
### Issue #1: Missing `ip routing` on Spines
|
|
- **Symptoms**:
|
|
- `show ip bgp summary` returned "BGP is disabled for VRF default"
|
|
- Attempting to configure BGP showed "! IP routing not enabled"
|
|
- **Root Cause**: Arista EOS requires explicit `ip routing` command to enable L3 functionality
|
|
- **Status**: ✅ **FIXED**
|
|
|
|
### Issue #2: LACP Bonding in Containers
|
|
- **Symptoms**:
|
|
- Port-Channel showing "waiting for LACP response"
|
|
- Host bond interface in DOWN state
|
|
- **Root Cause**: Alpine containers don't have bonding kernel modules
|
|
- **Status**: ✅ **FIXED** (by changing to static LAG)
|
|
|
|
### Issue #3: Trunk vs Access Mode
|
|
- **Symptoms**:
|
|
- No MAC learning on switch
|
|
- Port-Channel counters showed traffic but no unicast packets
|
|
- **Root Cause**: Hosts send untagged traffic, switch expects tagged (trunk mode)
|
|
- **Status**: ⏳ **NEEDS FIXING**
|
|
|
|
---
|
|
|
|
## 🚀 Deployment Instructions
|
|
|
|
### Quick Start (Recommended)
|
|
1. Deploy with fixed spine configs:
|
|
```bash
|
|
cd ~/arista-evpn-vxlan-clab
|
|
sudo containerlab deploy -t evpn-lab.clab.yml
|
|
```
|
|
|
|
2. Verify BGP is working:
|
|
```bash
|
|
ssh admin@clab-arista-evpn-fabric-spine1 "show bgp evpn summary"
|
|
```
|
|
|
|
3. Apply remaining fixes manually or wait for config updates
|
|
|
|
### Complete Fix (When Ready)
|
|
- Once Port-Channel and host configs are updated, redeploy topology for zero-downtime testing
|
|
|
|
---
|
|
|
|
## 📊 Testing Results
|
|
|
|
After applying spine `ip routing` fix:
|
|
- ✅ BGP underlay sessions establish (eBGP between spine-leaf, iBGP between MLAG pairs)
|
|
- ✅ BGP EVPN overlay sessions establish
|
|
- ✅ MLAG pairs form correctly (active-full, up/up)
|
|
- ✅ MAC addresses learned locally on leaf switches
|
|
- ⏳ EVPN Type-2 routes advertised (pending overlay establishment)
|
|
- ⏳ End-to-end connectivity (pending all fixes applied)
|
|
|
|
---
|
|
|
|
## 💡 Key Learnings
|
|
|
|
- The `ip routing` fix is **critical** and must be in the startup-config for clean deployments
|
|
- Static LAG (`mode on`) is more reliable than LACP in containerized environments
|
|
- Access mode port-channels work better with simple Linux containers
|
|
- For production environments with proper bonding support, LACP can be re-enabled
|
|
|
|
---
|
|
|
|
## 🔗 Related Issues
|
|
|
|
- Spine BGP not starting: Missing `ip routing` command
|
|
- MLAG port-channels not forming: LACP incompatibility
|
|
- No MAC learning: Trunk vs Access mode mismatch
|
|
- No VXLAN tunnel endpoints: Pending overlay establishment
|
|
|
|
---
|
|
|
|
## ✅ Final Status
|
|
|
|
**Spine Fixes**: COMPLETE ✅
|
|
**MLAG Fixes**: COMPLETE ✅
|
|
**Port-Channel Access Mode**: PENDING ⏳
|
|
**Host Networking**: PENDING ⏳
|
|
**EVPN Overlay**: TESTING ⏳
|