## Summary This PR merges all fixes and improvements from the troubleshooting journey to make the Arista EVPN-VXLAN lab fully operational with both L2 and L3 VXLAN connectivity. ## What's Changed ### 🎯 Major Achievements - ✅ **L2 VXLAN fully operational** - host1 ↔ host3 connectivity verified - ✅ **L3 VXLAN fully operational** - host2 ↔ host4 connectivity verified (VRF gold) - ✅ **LACP bonding working** - dual-homed hosts with proper Port-Channel negotiation - ✅ **All BGP/EVPN sessions established** - complete underlay and overlay working ### 🔧 Infrastructure Fixes #### BGP & Routing - Added `ip routing` command to all spine and leaf switches - Fixed duplicate BGP network statements on leaf3, leaf4, leaf7, leaf8 - Activated EVPN neighbors on spine switches - Added loopback network advertisements to BGP #### MLAG Configuration - Configured MLAG peer-link in trunk mode (not access) for VLAN 4090/4091 - Added dual-active detection via management interface - Configured virtual router MAC for MLAG pairs #### Switch Port Configuration - Port-Channel1 configured in **trunk mode** on all leaf switches - Added `switchport trunk allowed vlan` for host VLANs (34, 40, 78) - Removed `no shutdown` from Port-Channel interfaces ### 🖥️ Host Networking - Complete Redesign #### Image Change - **Old:** `alpine:latest` (had bonding syntax issues) - **New:** `ghcr.io/hellt/network-multitool` (networking tools pre-installed) #### LACP Bonding Configuration Proper LACP setup following network-multitool best practices: ```yaml - ip link add bond0 type bond mode 802.3ad - ip link set dev bond0 type bond xmit_hash_policy layer3+4 - ip link set dev eth1 down - ip link set dev eth2 down - ip link set eth1 master bond0 - ip link set eth2 master bond0 - ip link set dev eth1 up - ip link set dev eth2 up - ip link set dev bond0 type bond lacp_rate fast - ip link set dev bond0 up ``` #### VLAN Configuration - **L2 VXLAN hosts (host1, host3):** VLAN 40 tagged on bond0 - **L3 VXLAN hosts (host2, host4):** VLANs 34 and 78 tagged on bond0 #### Routing Strategy - Kept management default route (172.16.0.254 via eth0) - Added **specific routes** for L3 VXLAN networks instead of default routes: - host2: `ip route add 10.78.78.0/24 via 10.34.34.1` - host4: `ip route add 10.34.34.0/24 via 10.78.78.1` ### 📁 Files Changed #### Switch Configurations (Updated) - `configs/spine1.cfg` - Added ip routing, EVPN activation - `configs/spine2.cfg` - Added ip routing, EVPN activation - `configs/leaf1.cfg` - Port-Channel trunk mode, VLAN config - `configs/leaf2.cfg` - Port-Channel trunk mode, VLAN config - `configs/leaf3.cfg` - Added ip routing, loopback ads, Port-Channel config - `configs/leaf4.cfg` - Added ip routing, loopback ads, Port-Channel config - `configs/leaf5.cfg` - Port-Channel trunk mode, VLAN config - `configs/leaf6.cfg` - Port-Channel trunk mode, VLAN config - `configs/leaf7.cfg` - Added ip routing, loopback ads, Port-Channel config - `configs/leaf8.cfg` - Added ip routing, loopback ads, Port-Channel config #### Topology (Updated) - `evpn-lab.clab.yml` - Updated all host configurations with network-multitool image and proper LACP/VLAN setup #### Documentation (New) - `hosts/README.md` - Host interface configuration guide - `hosts/host1_interfaces` - Interface file for host1 (not currently used, kept for reference) - `hosts/host2_interfaces` - Interface file for host2 (not currently used, kept for reference) - `hosts/host3_interfaces` - Interface file for host3 (not currently used, kept for reference) - `hosts/host4_interfaces` - Interface file for host4 (not currently used, kept for reference) ## Testing & Verification ### ✅ L2 VXLAN (VLAN 40) ``` host1 (10.40.40.101) → host3 (10.40.40.103) - Connectivity: VERIFIED ✓ - VXLAN tunnel: VTEP1 ↔ VTEP3 - MAC learning: Working via EVPN Type-2 ``` ### ✅ L3 VXLAN (VRF gold) ``` host2 (10.34.34.102) → host4 (10.78.78.104) - Connectivity: VERIFIED ✓ - Ping results: 0% packet loss, TTL=62 - Routing: Via EVPN Type-5 through fabric ``` ### ✅ Infrastructure Status - BGP Underlay: All sessions ESTAB - EVPN Overlay: All neighbors ESTAB - MLAG: All 4 pairs operational - Port-Channels: LACP negotiated on all hosts ## Related Issues Fixes #1 - Lab deployment and configuration fixes Fixes #2 - BGP EVPN neighbors stuck in Connect state Fixes #3 - Ready for deployment with EVPN activation Fixes #4 - Lab convergence in progress Fixes #5 - BGP EVPN neighbors stuck in Active state Fixes #11 - Host LACP bonding configuration Fixes #13 - L3 VXLAN default route issue ## Key Technical Learnings 1. **Arista EOS requires explicit `ip routing`** before BGP can function 2. **MLAG peer-link must be trunk mode** to allow VLAN 4090/4091 traversal 3. **VLAN tagging location matters** - hosts tag, switches use trunk mode 4. **network-multitool image** superior to Alpine for LACP bonding 5. **Specific routes better than default routes** when management network present 6. **LACP rate fast** ensures quick negotiation with Arista switches ## Deployment After merging, deploy with: ```bash cd ~/arista-evpn-vxlan-clab sudo containerlab destroy -t evpn-lab.clab.yml --cleanup sudo containerlab deploy -t evpn-lab.clab.yml ``` No manual post-deployment configuration needed - everything works from initial deployment! ## Breaking Changes ⚠️ **Host image changed** from `alpine:latest` to `ghcr.io/hellt/network-multitool` ⚠️ **Host configuration completely redesigned** - old exec commands replaced ## Reviewers @Damien - Please review and merge when ready --- **This PR represents the complete troubleshooting journey and brings the lab to production-ready status with full L2 and L3 VXLAN functionality.** 🚀 Reviewed-on: #14 Co-authored-by: Damien <damien@arnodo.fr> Co-committed-by: Damien <damien@arnodo.fr>
7.6 KiB
Deployment & Testing Checklist
✅ What Was Fixed
- Host VLAN tagging configuration in topology file
- All 4 hosts now create VLAN subinterfaces (bond0.XX)
- Leaf port-channels properly configured for access mode
- BGP configuration in leafs includes
ip routingcommand - MLAG configurations validated on all 4 leaf pairs
- VXLAN VTEP configuration in place
- EVPN overlay configuration complete
🚀 Deployment Steps
1. Check Current Branch
cd ~/arista-evpn-vxlan-clab
git branch
git status
Should show: fix-bgp-and-mlag branch
2. Destroy Current Lab (if running)
sudo containerlab destroy -t evpn-lab.clab.yml --cleanup
3. Deploy Fixed Lab
sudo containerlab deploy -t evpn-lab.clab.yml
# Wait 60-90 seconds for all containers to start
4. Verify Lab is Running
sudo containerlab inspect -t evpn-lab.clab.yml
Should show all 10 nodes (2 spines + 8 leaves + 4 hosts) as RUNNING
📋 Pre-Testing Checks (Run in Order)
Check 1: Spine BGP Underlay
ssh admin@clab-arista-evpn-fabric-spine1 "show bgp ipv4 unicast summary"
Expected: All 8 leaf neighbors in ESTABLISHED state
10.0.1.1 4 65001 22 18 Estab 3
10.0.1.3 4 65001 20 17 Estab 3
10.0.1.5 4 65002 19 18 Estab 0 ← Check this, should be 0 or more
...
Status: ☐ Pass / ☐ Fail
Check 2: Leaf MLAG Status
ssh admin@clab-arista-evpn-fabric-leaf1 "show mlag detail"
ssh admin@clab-arista-evpn-fabric-leaf3 "show mlag detail"
Expected: All pairs show MLAG is active
MLAG is active
Active per VLAN: yes
Status: ☐ Pass / ☐ Fail
Check 3: Leaf BGP EVPN
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn summary"
Expected: Both spine neighbors in ESTABLISHED
10.0.250.1 4 65000 8 9 Estab 0
10.0.250.2 4 65000 8 8 Estab 0
Status: ☐ Pass / ☐ Fail
Check 4: Host VLAN Interfaces
docker exec clab-arista-evpn-fabric-host1 ip -d link show bond0.40
docker exec clab-arista-evpn-fabric-host2 ip -d link show bond0.34
docker exec clab-arista-evpn-fabric-host3 ip -d link show bond0.40
docker exec clab-arista-evpn-fabric-host4 ip -d link show bond0.78
Expected: All show VLAN tagging
vlan protocol 802.1Q id 40 <BROADCAST,MULTICAST,UP,LOWER_UP>
Status: ☐ Pass / ☐ Fail
🧪 Connectivity Tests
Test 1: Host to Gateway (VLAN40)
docker exec clab-arista-evpn-fabric-host1 ping -c 2 10.40.40.1
docker exec clab-arista-evpn-fabric-host3 ping -c 2 10.40.40.1
Expected: 2/2 packets successful Status: ☐ Pass / ☐ Fail Time: ~5 seconds
Test 2: L2 VXLAN Connectivity (Host1 → Host3)
docker exec clab-arista-evpn-fabric-host1 ping -c 4 10.40.40.103
Expected: 4/4 packets successful
PING 10.40.40.103 (10.40.40.103): 56 data bytes
64 bytes from 10.40.40.103: seq=0 ttl=64 time=X.XXms
Status: ☐ Pass / ☐ Fail Time: ~10 seconds
Test 3: MAC Learning on Leaf1
ssh admin@clab-arista-evpn-fabric-leaf1 "show mac address-table vlan 40"
Expected: At least 1 MAC learned
Vlan Mac Address Type Ports
40 XXXX.XXXX.XXXX DYNAMIC Po1
Status: ☐ Pass / ☐ Fail
Test 4: Remote MAC Learning via VXLAN
ssh admin@clab-arista-evpn-fabric-leaf1 "show vxlan address-table vlan 40"
Expected: MAC from host3 learned via Vxlan1
VLAN Mac Address Type Prt VTEP
40 XXXX.XXXX.XXXX EVPN Vx1 10.0.255.13
Status: ☐ Pass / ☐ Fail
Test 5: EVPN Type-2 Routes
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn route-type mac-ip | head -20"
Expected: Both local and remote MACs advertised
RD: 65001:110040 mac-ip XXXX.XXXX.XXXX
- -
RD: 65003:110040 mac-ip XXXX.XXXX.XXXX
10.0.255.13
Status: ☐ Pass / ☐ Fail
Test 6: Host to Gateway (VLAN34)
docker exec clab-arista-evpn-fabric-host2 ping -c 2 10.34.34.1
Expected: 2/2 packets successful Status: ☐ Pass / ☐ Fail Time: ~5 seconds
Test 7: L3 VXLAN Connectivity (Host2 → Host4)
docker exec clab-arista-evpn-fabric-host2 ping -c 4 10.78.78.104
Expected: 4/4 packets successful Status: ☐ Pass / ☐ Fail Time: ~10 seconds
Test 8: VRF Routing on Leaf3
ssh admin@clab-arista-evpn-fabric-leaf3 "show ip route vrf gold"
Expected: Routes to both 10.34.34.0/24 and 10.78.78.0/24
C 10.34.34.0/24 is directly connected, Vlan34
B E 10.78.78.0/24 [200/0] via VTEP 10.0.255.14
Status: ☐ Pass / ☐ Fail
Test 9: EVPN Type-5 Routes
ssh admin@clab-arista-evpn-fabric-leaf3 "show bgp evpn route-type ip-prefix ipv4"
Expected: IP prefixes for both VTEPs
RD: 10.0.250.13:1 ip-prefix 10.34.34.0/24
RD: 10.0.250.17:1 ip-prefix 10.78.78.0/24
Status: ☐ Pass / ☐ Fail
📊 Summary Table
| Component | Check | Expected | Actual | Status |
|---|---|---|---|---|
| Spine BGP | All leaves established | 8/8 ESTAB | ? | ☐ |
| Leaf MLAG | Pair status | active/active | ? | ☐ |
| EVPN | Spine peers | 2/2 ESTAB | ? | ☐ |
| Host Interfaces | VLAN tags | 4 VLAN ifaces | ? | ☐ |
| L2 Gateway | Ping host→gw | 2/2 success | ? | ☐ |
| L2 VXLAN | Host1→Host3 | 4/4 success | ? | ☐ |
| MAC Learning | Leaf1 VLAN40 | ≥1 MAC | ? | ☐ |
| Remote MACs | VXLAN table | MACs from Vx1 | ? | ☐ |
| Type-2 Routes | EVPN MACs | Local + Remote | ? | ☐ |
| L3 Gateway | Ping host→gw | 2/2 success | ? | ☐ |
| L3 VXLAN | Host2→Host4 | 4/4 success | ? | ☐ |
| VRF Routes | Leaf3 VRF gold | 2+ routes | ? | ☐ |
| Type-5 Routes | EVPN prefixes | Local + Remote | ? | ☐ |
🔧 If Tests Fail
L2 ping fails
# 1. Check host VLAN interface
docker exec clab-arista-evpn-fabric-host1 ip addr show bond0.40
# Should show: inet 10.40.40.101/24 dev bond0.40
# 2. Check port-channel status
ssh admin@clab-arista-evpn-fabric-leaf1 "show interface Port-Channel1"
# Should show: up, up
# 3. Check VLAN 40 exists on leaf
ssh admin@clab-arista-evpn-fabric-leaf1 "show vlan 40"
# Should show: VLAN 40 exists
# 4. Check MAC learning (generate traffic)
docker exec clab-arista-evpn-fabric-host1 arping -c 3 10.40.40.1
ssh admin@clab-arista-evpn-fabric-leaf1 "show mac address-table vlan 40"
# Should show host1 MAC
L3 ping fails
# 1. Check VRF VLAN interface
ssh admin@clab-arista-evpn-fabric-leaf3 "show interface Vlan34"
# Should show: up, up
# 2. Check VRF routing enabled
ssh admin@clab-arista-evpn-fabric-leaf3 "show ip route vrf gold"
# Should show routes
# 3. Check VXLAN VRF mapping
ssh admin@clab-arista-evpn-fabric-leaf3 "show interface Vxlan1"
# Should show: vxlan vrf gold vni 100001
📝 Notes for Next Steps
-
If all tests pass ✅
- Create pull request to merge
fix-bgp-and-mlagintomain - Document the changes in FIXES_APPLIED.md
- Update main branch documentation
- Create pull request to merge
-
If specific tests fail ⚠️
- Review the troubleshooting section above
- Check device logs:
show log - Review configuration with
show running-config
-
Keep for reference
- END_TO_END_TESTING.md - Comprehensive testing guide
- VLAN_TAGGING_FIX_EXPLANATION.md - Explains the root cause and fix
🎯 Success Criteria
Lab is ready for production use when:
- ✓ All pre-testing checks pass
- ✓ All 9 connectivity tests pass
- ✓ No errors in device logs
- ✓ MLAG is active/active on all pairs
- ✓ BGP neighbors all established
- ✓ EVPN routes being advertised