Complete Lab Fixes - L2 and L3 VXLAN Fully Operational (#14)

## Summary

This PR merges all fixes and improvements from the troubleshooting journey to make the Arista EVPN-VXLAN lab fully operational with both L2 and L3 VXLAN connectivity.

## What's Changed

### 🎯 Major Achievements
-  **L2 VXLAN fully operational** - host1 ↔ host3 connectivity verified
-  **L3 VXLAN fully operational** - host2 ↔ host4 connectivity verified (VRF gold)
-  **LACP bonding working** - dual-homed hosts with proper Port-Channel negotiation
-  **All BGP/EVPN sessions established** - complete underlay and overlay working

### 🔧 Infrastructure Fixes

#### BGP & Routing
- Added `ip routing` command to all spine and leaf switches
- Fixed duplicate BGP network statements on leaf3, leaf4, leaf7, leaf8
- Activated EVPN neighbors on spine switches
- Added loopback network advertisements to BGP

#### MLAG Configuration
- Configured MLAG peer-link in trunk mode (not access) for VLAN 4090/4091
- Added dual-active detection via management interface
- Configured virtual router MAC for MLAG pairs

#### Switch Port Configuration
- Port-Channel1 configured in **trunk mode** on all leaf switches
- Added `switchport trunk allowed vlan` for host VLANs (34, 40, 78)
- Removed `no shutdown` from Port-Channel interfaces

### 🖥️ Host Networking - Complete Redesign

#### Image Change
- **Old:** `alpine:latest` (had bonding syntax issues)
- **New:** `ghcr.io/hellt/network-multitool` (networking tools pre-installed)

#### LACP Bonding Configuration
Proper LACP setup following network-multitool best practices:
```yaml
- ip link add bond0 type bond mode 802.3ad
- ip link set dev bond0 type bond xmit_hash_policy layer3+4
- ip link set dev eth1 down
- ip link set dev eth2 down
- ip link set eth1 master bond0
- ip link set eth2 master bond0
- ip link set dev eth1 up
- ip link set dev eth2 up
- ip link set dev bond0 type bond lacp_rate fast
- ip link set dev bond0 up
```

#### VLAN Configuration
- **L2 VXLAN hosts (host1, host3):** VLAN 40 tagged on bond0
- **L3 VXLAN hosts (host2, host4):** VLANs 34 and 78 tagged on bond0

#### Routing Strategy
- Kept management default route (172.16.0.254 via eth0)
- Added **specific routes** for L3 VXLAN networks instead of default routes:
  - host2: `ip route add 10.78.78.0/24 via 10.34.34.1`
  - host4: `ip route add 10.34.34.0/24 via 10.78.78.1`

### 📁 Files Changed

#### Switch Configurations (Updated)
- `configs/spine1.cfg` - Added ip routing, EVPN activation
- `configs/spine2.cfg` - Added ip routing, EVPN activation
- `configs/leaf1.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf2.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf3.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf4.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf5.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf6.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf7.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf8.cfg` - Added ip routing, loopback ads, Port-Channel config

#### Topology (Updated)
- `evpn-lab.clab.yml` - Updated all host configurations with network-multitool image and proper LACP/VLAN setup

#### Documentation (New)
- `hosts/README.md` - Host interface configuration guide
- `hosts/host1_interfaces` - Interface file for host1 (not currently used, kept for reference)
- `hosts/host2_interfaces` - Interface file for host2 (not currently used, kept for reference)
- `hosts/host3_interfaces` - Interface file for host3 (not currently used, kept for reference)
- `hosts/host4_interfaces` - Interface file for host4 (not currently used, kept for reference)

## Testing & Verification

###  L2 VXLAN (VLAN 40)
```
host1 (10.40.40.101) → host3 (10.40.40.103)
- Connectivity: VERIFIED ✓
- VXLAN tunnel: VTEP1 ↔ VTEP3
- MAC learning: Working via EVPN Type-2
```

###  L3 VXLAN (VRF gold)
```
host2 (10.34.34.102) → host4 (10.78.78.104)
- Connectivity: VERIFIED ✓
- Ping results: 0% packet loss, TTL=62
- Routing: Via EVPN Type-5 through fabric
```

###  Infrastructure Status
- BGP Underlay: All sessions ESTAB
- EVPN Overlay: All neighbors ESTAB
- MLAG: All 4 pairs operational
- Port-Channels: LACP negotiated on all hosts

## Related Issues

Fixes #1 - Lab deployment and configuration fixes
Fixes #2 - BGP EVPN neighbors stuck in Connect state
Fixes #3 - Ready for deployment with EVPN activation
Fixes #4 - Lab convergence in progress
Fixes #5 - BGP EVPN neighbors stuck in Active state
Fixes #11 - Host LACP bonding configuration
Fixes #13 - L3 VXLAN default route issue

## Key Technical Learnings

1. **Arista EOS requires explicit `ip routing`** before BGP can function
2. **MLAG peer-link must be trunk mode** to allow VLAN 4090/4091 traversal
3. **VLAN tagging location matters** - hosts tag, switches use trunk mode
4. **network-multitool image** superior to Alpine for LACP bonding
5. **Specific routes better than default routes** when management network present
6. **LACP rate fast** ensures quick negotiation with Arista switches

## Deployment

After merging, deploy with:
```bash
cd ~/arista-evpn-vxlan-clab
sudo containerlab destroy -t evpn-lab.clab.yml --cleanup
sudo containerlab deploy -t evpn-lab.clab.yml
```

No manual post-deployment configuration needed - everything works from initial deployment!

## Breaking Changes

⚠️ **Host image changed** from `alpine:latest` to `ghcr.io/hellt/network-multitool`
⚠️ **Host configuration completely redesigned** - old exec commands replaced

## Reviewers

@Damien - Please review and merge when ready

---

**This PR represents the complete troubleshooting journey and brings the lab to production-ready status with full L2 and L3 VXLAN functionality.** 🚀

Reviewed-on: #14
Co-authored-by: Damien <damien@arnodo.fr>
Co-committed-by: Damien <damien@arnodo.fr>
This commit was merged in pull request #14.
This commit is contained in:
2025-11-30 10:24:29 +00:00
committed by Damien Arnodo
parent 9502302b76
commit 1080bf07bb
23 changed files with 2632 additions and 74 deletions

337
END_TO_END_TESTING.md Normal file
View File

@@ -0,0 +1,337 @@
# End-to-End Connectivity Testing Guide
## Overview
This document provides a step-by-step guide to test the EVPN VXLAN fabric after deploying the updated topology with proper VLAN tagging on hosts.
## Recent Changes
### Fixed Issues
1. **Host VLAN Tagging**
- Hosts now create VLAN subinterfaces on top of bonded interfaces
- Host1 & Host3: VLAN 40 tagged (L2 VXLAN test)
- Host2: VLAN 34 tagged (L3 VXLAN test)
- Host4: VLAN 78 tagged (L3 VXLAN test)
2. **Leaf Port-Channel Configuration**
- All leaf Port-Channel1 interfaces are in **access mode**
- Properly mapped to their respective VLANs
- MLAG enabled for dual-active forwarding
## Pre-Test Verification
### 1. Check MLAG Status on All Leaf Pairs
```bash
# Leaf Pair 1 (leaf1 & leaf2)
ssh admin@clab-arista-evpn-fabric-leaf1 "show mlag detail"
ssh admin@clab-arista-evpn-fabric-leaf2 "show mlag detail"
# Leaf Pair 2 (leaf3 & leaf4)
ssh admin@clab-arista-evpn-fabric-leaf3 "show mlag detail"
ssh admin@clab-arista-evpn-fabric-leaf4 "show mlag detail"
# Leaf Pair 3 (leaf5 & leaf6)
ssh admin@clab-arista-evpn-fabric-leaf5 "show mlag detail"
ssh admin@clab-arista-evpn-fabric-leaf6 "show mlag detail"
# Leaf Pair 4 (leaf7 & leaf8)
ssh admin@clab-arista-evpn-fabric-leaf7 "show mlag detail"
ssh admin@clab-arista-evpn-fabric-leaf8 "show mlag detail"
```
### 2. Check BGP Underlay Status
```bash
# On Spines
ssh admin@clab-arista-evpn-fabric-spine1 "show bgp ipv4 unicast summary"
ssh admin@clab-arista-evpn-fabric-spine2 "show bgp ipv4 unicast summary"
# Expected: All leaf neighbors should be in ESTABLISHED state
```
### 3. Check BGP EVPN Status
```bash
# On any leaf
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn summary"
# Expected: Both spine neighbors should be ESTABLISHED
```
## L2 VXLAN Testing (VLAN 40)
### Hosts Involved
- **Host1** (10.40.40.101) - Connected to Leaf1/Leaf2 (VTEP1)
- **Host3** (10.40.40.103) - Connected to Leaf5/Leaf6 (VTEP3)
### Test Sequence
#### Step 1: Verify Host Network Interfaces
```bash
# Check host1 VLAN interface
docker exec clab-arista-evpn-fabric-host1 ip -d link show bond0.40
docker exec clab-arista-evpn-fabric-host1 ip addr show bond0.40
# Check host3 VLAN interface
docker exec clab-arista-evpn-fabric-host3 ip -d link show bond0.40
docker exec clab-arista-evpn-fabric-host3 ip addr show bond0.40
```
#### Step 2: Verify Leaf Port-Channel Configuration
```bash
# Leaf1 Port-Channel1
ssh admin@clab-arista-evpn-fabric-leaf1 "show interface Port-Channel1 switchport"
# Expected output:
# Switchport Mode: access
# Access Mode VLAN: 40
# Spanning Tree Portfast: enabled
```
#### Step 3: Test L2 Connectivity (Ping Test)
```bash
echo "=== L2 VXLAN Ping Test (Host1 → Host3) ==="
timeout 10 docker exec clab-arista-evpn-fabric-host1 ping -c 4 10.40.40.103
```
#### Step 4: Verify MAC Learning
```bash
# On Leaf1 - check local MAC learning
ssh admin@clab-arista-evpn-fabric-leaf1 "show mac address-table vlan 40"
# Expected: MAC from host1 should appear on Port-Channel1
# On Leaf5 - check MAC learning
ssh admin@clab-arista-evpn-fabric-leaf5 "show mac address-table vlan 40"
# Expected: MAC from host3 should appear on Port-Channel1
```
#### Step 5: Verify VXLAN Learning
```bash
# Check remote VXLAN endpoints
ssh admin@clab-arista-evpn-fabric-leaf1 "show vxlan vtep"
# Expected: Should show VTEP3 (10.0.255.13)
# Check VXLAN address table
ssh admin@clab-arista-evpn-fabric-leaf1 "show vxlan address-table"
# Expected: Should show MACs learned via Vxlan1 interface
```
#### Step 6: Verify EVPN Type-2 Routes
```bash
# Check BGP EVPN routes on Leaf1
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn route-type mac-ip"
# Expected:
# - Local MAC (host1) with RD 65001:110040
# - Remote MAC (host3) with RD 65003:110040 pointing to VTEP 10.0.255.13
```
## L3 VXLAN Testing (VRF gold)
### Hosts Involved
- **Host2** (10.34.34.102) - Connected to Leaf3/Leaf4 (VTEP2) in VRF gold VLAN 34
- **Host4** (10.78.78.104) - Connected to Leaf7/Leaf8 (VTEP4) in VRF gold VLAN 78
### Test Sequence
#### Step 1: Verify Host Network Interfaces
```bash
# Check host2 VLAN interface
docker exec clab-arista-evpn-fabric-host2 ip -d link show bond0.34
docker exec clab-arista-evpn-fabric-host2 ip addr show bond0.34
# Check host4 VLAN interface
docker exec clab-arista-evpn-fabric-host4 ip -d link show bond0.78
docker exec clab-arista-evpn-fabric-host4 ip addr show bond0.78
```
#### Step 2: Verify Leaf VRF VLAN Configuration
```bash
# On Leaf3
ssh admin@clab-arista-evpn-fabric-leaf3 "show vlan 34"
ssh admin@clab-arista-evpn-fabric-leaf3 "show interface Vlan34"
# Expected:
# - VLAN 34 exists
# - Vlan34 interface is in VRF gold with IP 10.34.34.2/24
# - Virtual router address 10.34.34.1 is configured
```
#### Step 3: Test L3 Connectivity (Ping Test)
```bash
echo "=== L3 VXLAN Ping Test (Host2 → Host4) ==="
timeout 10 docker exec clab-arista-evpn-fabric-host2 ping -c 4 10.78.78.104
```
#### Step 4: Verify VRF Routing Tables
```bash
# On Leaf3 - check routes in VRF gold
ssh admin@clab-arista-evpn-fabric-leaf3 "show ip route vrf gold"
# Expected: Should include routes to 10.34.34.0/24 and 10.78.78.0/24
# On Leaf4
ssh admin@clab-arista-evpn-fabric-leaf4 "show ip route vrf gold"
```
#### Step 5: Verify EVPN Type-5 Routes
```bash
# Check BGP EVPN routes on Leaf3
ssh admin@clab-arista-evpn-fabric-leaf3 "show bgp evpn route-type ip-prefix ipv4"
# Expected:
# - Local subnets (10.34.34.0/24 from Leaf3/Leaf4)
# - Remote subnets (10.78.78.0/24 from Leaf7/Leaf8)
```
## Complete End-to-End Test Script
```bash
#!/bin/bash
echo "======================================"
echo "EVPN VXLAN Fabric Testing"
echo "======================================"
# 1. Underlay connectivity
echo ""
echo "=== Testing Underlay BGP ==="
ssh admin@clab-arista-evpn-fabric-spine1 "show bgp ipv4 unicast summary" | tail -20
# 2. EVPN overlay connectivity
echo ""
echo "=== Testing EVPN Overlay ==="
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn summary" | tail -5
# 3. L2 VXLAN connectivity
echo ""
echo "=== Testing L2 VXLAN (Host1 → Host3) ==="
timeout 10 docker exec clab-arista-evpn-fabric-host1 ping -c 4 10.40.40.103
echo "Status: $?"
# 4. L3 VXLAN connectivity
echo ""
echo "=== Testing L3 VXLAN (Host2 → Host4) ==="
timeout 10 docker exec clab-arista-evpn-fabric-host2 ping -c 4 10.78.78.104
echo "Status: $?"
# 5. MAC learning verification
echo ""
echo "=== Verifying MAC Learning ==="
echo "Leaf1 VLAN 40:"
ssh admin@clab-arista-evpn-fabric-leaf1 "show mac address-table vlan 40"
echo ""
echo "Leaf5 VLAN 40:"
ssh admin@clab-arista-evpn-fabric-leaf5 "show mac address-table vlan 40"
# 6. VRF routing verification
echo ""
echo "=== Verifying VRF Routing ==="
echo "Leaf3 VRF gold routes:"
ssh admin@clab-arista-evpn-fabric-leaf3 "show ip route vrf gold"
```
## Troubleshooting
### Ping fails - Hosts can't reach each other
1. **Check host connectivity to leaf:**
```bash
docker exec clab-arista-evpn-fabric-host1 ip route
# Should show default route via VLAN gateway
docker exec clab-arista-evpn-fabric-host1 ping -c 2 10.40.40.1
# Should reach the virtual router gateway
```
2. **Check leaf port-channel status:**
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show interface Port-Channel1"
# Should show "up, up"
```
3. **Check VXLAN interface status:**
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show interface Vxlan1"
# Should show "up, up"
```
4. **Check MLAG status:**
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show mlag detail"
# Should show "mlag is active"
```
### Empty MAC table on leafs
1. **Verify host is sending traffic:**
```bash
docker exec clab-arista-evpn-fabric-host1 ping -c 4 10.40.40.1
# Generate some ARP/ICMP traffic
```
2. **Check for spanning-tree blocking:**
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show spanning-tree detail vlan 40"
```
### No EVPN routes exchanged
1. **Check BGP EVPN session state:**
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn summary"
# Must show ESTABLISHED, not Connect or Active
```
2. **Check EVPN configuration:**
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn"
# Look for rd and route-target configuration
```
## Expected Results
| Test | Expected Outcome | Status |
|------|------------------|--------|
| Spine BGP | All leaves established | ✓ Expected |
| Leaf BGP | All spines established | ✓ Expected |
| EVPN neighbors | Established with spines | ✓ Expected |
| L2 ping (Host1→Host3) | 4/4 packets successful | ✓ Expected |
| L3 ping (Host2→Host4) | 4/4 packets successful | ✓ Expected |
| MAC learning | MACs learned on Vxlan1 | ✓ Expected |
| EVPN Type-2 | Routes learned for MACs | ✓ Expected |
| EVPN Type-5 | Routes learned for subnets | ✓ Expected |
---
## Lab Deployment Steps
To deploy the lab with the fixes:
```bash
cd ~/arista-evpn-vxlan-clab
git checkout fix-bgp-and-mlag
sudo containerlab destroy -t evpn-lab.clab.yml
sudo containerlab deploy -t evpn-lab.clab.yml
```
The lab should now have:
- Proper VLAN tagging on all hosts
- Correct VXLAN VTEP configuration
- Working BGP EVPN overlay
- End-to-end connectivity between remote VTEPs