Files
arista-evpn-vxlan-clab/TESTING_CHECKLIST.md
Damien 1080bf07bb Complete Lab Fixes - L2 and L3 VXLAN Fully Operational (#14)
## Summary

This PR merges all fixes and improvements from the troubleshooting journey to make the Arista EVPN-VXLAN lab fully operational with both L2 and L3 VXLAN connectivity.

## What's Changed

### 🎯 Major Achievements
-  **L2 VXLAN fully operational** - host1 ↔ host3 connectivity verified
-  **L3 VXLAN fully operational** - host2 ↔ host4 connectivity verified (VRF gold)
-  **LACP bonding working** - dual-homed hosts with proper Port-Channel negotiation
-  **All BGP/EVPN sessions established** - complete underlay and overlay working

### 🔧 Infrastructure Fixes

#### BGP & Routing
- Added `ip routing` command to all spine and leaf switches
- Fixed duplicate BGP network statements on leaf3, leaf4, leaf7, leaf8
- Activated EVPN neighbors on spine switches
- Added loopback network advertisements to BGP

#### MLAG Configuration
- Configured MLAG peer-link in trunk mode (not access) for VLAN 4090/4091
- Added dual-active detection via management interface
- Configured virtual router MAC for MLAG pairs

#### Switch Port Configuration
- Port-Channel1 configured in **trunk mode** on all leaf switches
- Added `switchport trunk allowed vlan` for host VLANs (34, 40, 78)
- Removed `no shutdown` from Port-Channel interfaces

### 🖥️ Host Networking - Complete Redesign

#### Image Change
- **Old:** `alpine:latest` (had bonding syntax issues)
- **New:** `ghcr.io/hellt/network-multitool` (networking tools pre-installed)

#### LACP Bonding Configuration
Proper LACP setup following network-multitool best practices:
```yaml
- ip link add bond0 type bond mode 802.3ad
- ip link set dev bond0 type bond xmit_hash_policy layer3+4
- ip link set dev eth1 down
- ip link set dev eth2 down
- ip link set eth1 master bond0
- ip link set eth2 master bond0
- ip link set dev eth1 up
- ip link set dev eth2 up
- ip link set dev bond0 type bond lacp_rate fast
- ip link set dev bond0 up
```

#### VLAN Configuration
- **L2 VXLAN hosts (host1, host3):** VLAN 40 tagged on bond0
- **L3 VXLAN hosts (host2, host4):** VLANs 34 and 78 tagged on bond0

#### Routing Strategy
- Kept management default route (172.16.0.254 via eth0)
- Added **specific routes** for L3 VXLAN networks instead of default routes:
  - host2: `ip route add 10.78.78.0/24 via 10.34.34.1`
  - host4: `ip route add 10.34.34.0/24 via 10.78.78.1`

### 📁 Files Changed

#### Switch Configurations (Updated)
- `configs/spine1.cfg` - Added ip routing, EVPN activation
- `configs/spine2.cfg` - Added ip routing, EVPN activation
- `configs/leaf1.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf2.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf3.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf4.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf5.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf6.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf7.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf8.cfg` - Added ip routing, loopback ads, Port-Channel config

#### Topology (Updated)
- `evpn-lab.clab.yml` - Updated all host configurations with network-multitool image and proper LACP/VLAN setup

#### Documentation (New)
- `hosts/README.md` - Host interface configuration guide
- `hosts/host1_interfaces` - Interface file for host1 (not currently used, kept for reference)
- `hosts/host2_interfaces` - Interface file for host2 (not currently used, kept for reference)
- `hosts/host3_interfaces` - Interface file for host3 (not currently used, kept for reference)
- `hosts/host4_interfaces` - Interface file for host4 (not currently used, kept for reference)

## Testing & Verification

###  L2 VXLAN (VLAN 40)
```
host1 (10.40.40.101) → host3 (10.40.40.103)
- Connectivity: VERIFIED ✓
- VXLAN tunnel: VTEP1 ↔ VTEP3
- MAC learning: Working via EVPN Type-2
```

###  L3 VXLAN (VRF gold)
```
host2 (10.34.34.102) → host4 (10.78.78.104)
- Connectivity: VERIFIED ✓
- Ping results: 0% packet loss, TTL=62
- Routing: Via EVPN Type-5 through fabric
```

###  Infrastructure Status
- BGP Underlay: All sessions ESTAB
- EVPN Overlay: All neighbors ESTAB
- MLAG: All 4 pairs operational
- Port-Channels: LACP negotiated on all hosts

## Related Issues

Fixes #1 - Lab deployment and configuration fixes
Fixes #2 - BGP EVPN neighbors stuck in Connect state
Fixes #3 - Ready for deployment with EVPN activation
Fixes #4 - Lab convergence in progress
Fixes #5 - BGP EVPN neighbors stuck in Active state
Fixes #11 - Host LACP bonding configuration
Fixes #13 - L3 VXLAN default route issue

## Key Technical Learnings

1. **Arista EOS requires explicit `ip routing`** before BGP can function
2. **MLAG peer-link must be trunk mode** to allow VLAN 4090/4091 traversal
3. **VLAN tagging location matters** - hosts tag, switches use trunk mode
4. **network-multitool image** superior to Alpine for LACP bonding
5. **Specific routes better than default routes** when management network present
6. **LACP rate fast** ensures quick negotiation with Arista switches

## Deployment

After merging, deploy with:
```bash
cd ~/arista-evpn-vxlan-clab
sudo containerlab destroy -t evpn-lab.clab.yml --cleanup
sudo containerlab deploy -t evpn-lab.clab.yml
```

No manual post-deployment configuration needed - everything works from initial deployment!

## Breaking Changes

⚠️ **Host image changed** from `alpine:latest` to `ghcr.io/hellt/network-multitool`
⚠️ **Host configuration completely redesigned** - old exec commands replaced

## Reviewers

@Damien - Please review and merge when ready

---

**This PR represents the complete troubleshooting journey and brings the lab to production-ready status with full L2 and L3 VXLAN functionality.** 🚀

Reviewed-on: #14
Co-authored-by: Damien <damien@arnodo.fr>
Co-committed-by: Damien <damien@arnodo.fr>
2025-11-30 10:24:29 +00:00

7.6 KiB

Deployment & Testing Checklist

What Was Fixed

  • Host VLAN tagging configuration in topology file
  • All 4 hosts now create VLAN subinterfaces (bond0.XX)
  • Leaf port-channels properly configured for access mode
  • BGP configuration in leafs includes ip routing command
  • MLAG configurations validated on all 4 leaf pairs
  • VXLAN VTEP configuration in place
  • EVPN overlay configuration complete

🚀 Deployment Steps

1. Check Current Branch

cd ~/arista-evpn-vxlan-clab
git branch
git status

Should show: fix-bgp-and-mlag branch

2. Destroy Current Lab (if running)

sudo containerlab destroy -t evpn-lab.clab.yml --cleanup

3. Deploy Fixed Lab

sudo containerlab deploy -t evpn-lab.clab.yml
# Wait 60-90 seconds for all containers to start

4. Verify Lab is Running

sudo containerlab inspect -t evpn-lab.clab.yml

Should show all 10 nodes (2 spines + 8 leaves + 4 hosts) as RUNNING


📋 Pre-Testing Checks (Run in Order)

Check 1: Spine BGP Underlay

ssh admin@clab-arista-evpn-fabric-spine1 "show bgp ipv4 unicast summary"

Expected: All 8 leaf neighbors in ESTABLISHED state

10.0.1.1  4 65001  22  18  Estab  3
10.0.1.3  4 65001  20  17  Estab  3
10.0.1.5  4 65002  19  18  Estab  0    ← Check this, should be 0 or more
...

Status: ☐ Pass / ☐ Fail


Check 2: Leaf MLAG Status

ssh admin@clab-arista-evpn-fabric-leaf1 "show mlag detail"
ssh admin@clab-arista-evpn-fabric-leaf3 "show mlag detail"

Expected: All pairs show MLAG is active

MLAG is active
Active per VLAN: yes

Status: ☐ Pass / ☐ Fail


Check 3: Leaf BGP EVPN

ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn summary"

Expected: Both spine neighbors in ESTABLISHED

10.0.250.1  4 65000  8  9  Estab  0
10.0.250.2  4 65000  8  8  Estab  0

Status: ☐ Pass / ☐ Fail


Check 4: Host VLAN Interfaces

docker exec clab-arista-evpn-fabric-host1 ip -d link show bond0.40
docker exec clab-arista-evpn-fabric-host2 ip -d link show bond0.34
docker exec clab-arista-evpn-fabric-host3 ip -d link show bond0.40
docker exec clab-arista-evpn-fabric-host4 ip -d link show bond0.78

Expected: All show VLAN tagging

vlan protocol 802.1Q id 40 <BROADCAST,MULTICAST,UP,LOWER_UP>

Status: ☐ Pass / ☐ Fail


🧪 Connectivity Tests

Test 1: Host to Gateway (VLAN40)

docker exec clab-arista-evpn-fabric-host1 ping -c 2 10.40.40.1
docker exec clab-arista-evpn-fabric-host3 ping -c 2 10.40.40.1

Expected: 2/2 packets successful Status: ☐ Pass / ☐ Fail Time: ~5 seconds


Test 2: L2 VXLAN Connectivity (Host1 → Host3)

docker exec clab-arista-evpn-fabric-host1 ping -c 4 10.40.40.103

Expected: 4/4 packets successful

PING 10.40.40.103 (10.40.40.103): 56 data bytes
64 bytes from 10.40.40.103: seq=0 ttl=64 time=X.XXms

Status: ☐ Pass / ☐ Fail Time: ~10 seconds


Test 3: MAC Learning on Leaf1

ssh admin@clab-arista-evpn-fabric-leaf1 "show mac address-table vlan 40"

Expected: At least 1 MAC learned

Vlan    Mac Address       Type        Ports
40      XXXX.XXXX.XXXX   DYNAMIC     Po1

Status: ☐ Pass / ☐ Fail


Test 4: Remote MAC Learning via VXLAN

ssh admin@clab-arista-evpn-fabric-leaf1 "show vxlan address-table vlan 40"

Expected: MAC from host3 learned via Vxlan1

VLAN  Mac Address     Type     Prt  VTEP
40    XXXX.XXXX.XXXX  EVPN     Vx1  10.0.255.13

Status: ☐ Pass / ☐ Fail


Test 5: EVPN Type-2 Routes

ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn route-type mac-ip | head -20"

Expected: Both local and remote MACs advertised

RD: 65001:110040 mac-ip XXXX.XXXX.XXXX
                  -                -  
RD: 65003:110040 mac-ip XXXX.XXXX.XXXX
                  10.0.255.13      

Status: ☐ Pass / ☐ Fail


Test 6: Host to Gateway (VLAN34)

docker exec clab-arista-evpn-fabric-host2 ping -c 2 10.34.34.1

Expected: 2/2 packets successful Status: ☐ Pass / ☐ Fail Time: ~5 seconds


Test 7: L3 VXLAN Connectivity (Host2 → Host4)

docker exec clab-arista-evpn-fabric-host2 ping -c 4 10.78.78.104

Expected: 4/4 packets successful Status: ☐ Pass / ☐ Fail Time: ~10 seconds


Test 8: VRF Routing on Leaf3

ssh admin@clab-arista-evpn-fabric-leaf3 "show ip route vrf gold"

Expected: Routes to both 10.34.34.0/24 and 10.78.78.0/24

C    10.34.34.0/24 is directly connected, Vlan34
B E  10.78.78.0/24 [200/0] via VTEP 10.0.255.14

Status: ☐ Pass / ☐ Fail


Test 9: EVPN Type-5 Routes

ssh admin@clab-arista-evpn-fabric-leaf3 "show bgp evpn route-type ip-prefix ipv4"

Expected: IP prefixes for both VTEPs

RD: 10.0.250.13:1 ip-prefix 10.34.34.0/24
RD: 10.0.250.17:1 ip-prefix 10.78.78.0/24

Status: ☐ Pass / ☐ Fail


📊 Summary Table

Component Check Expected Actual Status
Spine BGP All leaves established 8/8 ESTAB ?
Leaf MLAG Pair status active/active ?
EVPN Spine peers 2/2 ESTAB ?
Host Interfaces VLAN tags 4 VLAN ifaces ?
L2 Gateway Ping host→gw 2/2 success ?
L2 VXLAN Host1→Host3 4/4 success ?
MAC Learning Leaf1 VLAN40 ≥1 MAC ?
Remote MACs VXLAN table MACs from Vx1 ?
Type-2 Routes EVPN MACs Local + Remote ?
L3 Gateway Ping host→gw 2/2 success ?
L3 VXLAN Host2→Host4 4/4 success ?
VRF Routes Leaf3 VRF gold 2+ routes ?
Type-5 Routes EVPN prefixes Local + Remote ?

🔧 If Tests Fail

L2 ping fails

# 1. Check host VLAN interface
docker exec clab-arista-evpn-fabric-host1 ip addr show bond0.40
# Should show: inet 10.40.40.101/24 dev bond0.40

# 2. Check port-channel status
ssh admin@clab-arista-evpn-fabric-leaf1 "show interface Port-Channel1"
# Should show: up, up

# 3. Check VLAN 40 exists on leaf
ssh admin@clab-arista-evpn-fabric-leaf1 "show vlan 40"
# Should show: VLAN 40 exists

# 4. Check MAC learning (generate traffic)
docker exec clab-arista-evpn-fabric-host1 arping -c 3 10.40.40.1
ssh admin@clab-arista-evpn-fabric-leaf1 "show mac address-table vlan 40"
# Should show host1 MAC

L3 ping fails

# 1. Check VRF VLAN interface
ssh admin@clab-arista-evpn-fabric-leaf3 "show interface Vlan34"
# Should show: up, up

# 2. Check VRF routing enabled
ssh admin@clab-arista-evpn-fabric-leaf3 "show ip route vrf gold"
# Should show routes

# 3. Check VXLAN VRF mapping
ssh admin@clab-arista-evpn-fabric-leaf3 "show interface Vxlan1"
# Should show: vxlan vrf gold vni 100001

📝 Notes for Next Steps

  1. If all tests pass

    • Create pull request to merge fix-bgp-and-mlag into main
    • Document the changes in FIXES_APPLIED.md
    • Update main branch documentation
  2. If specific tests fail ⚠️

    • Review the troubleshooting section above
    • Check device logs: show log
    • Review configuration with show running-config
  3. Keep for reference

    • END_TO_END_TESTING.md - Comprehensive testing guide
    • VLAN_TAGGING_FIX_EXPLANATION.md - Explains the root cause and fix

🎯 Success Criteria

Lab is ready for production use when:

  • ✓ All pre-testing checks pass
  • ✓ All 9 connectivity tests pass
  • ✓ No errors in device logs
  • ✓ MLAG is active/active on all pairs
  • ✓ BGP neighbors all established
  • ✓ EVPN routes being advertised