Files
arista-evpn-vxlan-clab/TESTING_CHECKLIST.md
Damien 1080bf07bb Complete Lab Fixes - L2 and L3 VXLAN Fully Operational (#14)
## Summary

This PR merges all fixes and improvements from the troubleshooting journey to make the Arista EVPN-VXLAN lab fully operational with both L2 and L3 VXLAN connectivity.

## What's Changed

### 🎯 Major Achievements
-  **L2 VXLAN fully operational** - host1 ↔ host3 connectivity verified
-  **L3 VXLAN fully operational** - host2 ↔ host4 connectivity verified (VRF gold)
-  **LACP bonding working** - dual-homed hosts with proper Port-Channel negotiation
-  **All BGP/EVPN sessions established** - complete underlay and overlay working

### 🔧 Infrastructure Fixes

#### BGP & Routing
- Added `ip routing` command to all spine and leaf switches
- Fixed duplicate BGP network statements on leaf3, leaf4, leaf7, leaf8
- Activated EVPN neighbors on spine switches
- Added loopback network advertisements to BGP

#### MLAG Configuration
- Configured MLAG peer-link in trunk mode (not access) for VLAN 4090/4091
- Added dual-active detection via management interface
- Configured virtual router MAC for MLAG pairs

#### Switch Port Configuration
- Port-Channel1 configured in **trunk mode** on all leaf switches
- Added `switchport trunk allowed vlan` for host VLANs (34, 40, 78)
- Removed `no shutdown` from Port-Channel interfaces

### 🖥️ Host Networking - Complete Redesign

#### Image Change
- **Old:** `alpine:latest` (had bonding syntax issues)
- **New:** `ghcr.io/hellt/network-multitool` (networking tools pre-installed)

#### LACP Bonding Configuration
Proper LACP setup following network-multitool best practices:
```yaml
- ip link add bond0 type bond mode 802.3ad
- ip link set dev bond0 type bond xmit_hash_policy layer3+4
- ip link set dev eth1 down
- ip link set dev eth2 down
- ip link set eth1 master bond0
- ip link set eth2 master bond0
- ip link set dev eth1 up
- ip link set dev eth2 up
- ip link set dev bond0 type bond lacp_rate fast
- ip link set dev bond0 up
```

#### VLAN Configuration
- **L2 VXLAN hosts (host1, host3):** VLAN 40 tagged on bond0
- **L3 VXLAN hosts (host2, host4):** VLANs 34 and 78 tagged on bond0

#### Routing Strategy
- Kept management default route (172.16.0.254 via eth0)
- Added **specific routes** for L3 VXLAN networks instead of default routes:
  - host2: `ip route add 10.78.78.0/24 via 10.34.34.1`
  - host4: `ip route add 10.34.34.0/24 via 10.78.78.1`

### 📁 Files Changed

#### Switch Configurations (Updated)
- `configs/spine1.cfg` - Added ip routing, EVPN activation
- `configs/spine2.cfg` - Added ip routing, EVPN activation
- `configs/leaf1.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf2.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf3.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf4.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf5.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf6.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf7.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf8.cfg` - Added ip routing, loopback ads, Port-Channel config

#### Topology (Updated)
- `evpn-lab.clab.yml` - Updated all host configurations with network-multitool image and proper LACP/VLAN setup

#### Documentation (New)
- `hosts/README.md` - Host interface configuration guide
- `hosts/host1_interfaces` - Interface file for host1 (not currently used, kept for reference)
- `hosts/host2_interfaces` - Interface file for host2 (not currently used, kept for reference)
- `hosts/host3_interfaces` - Interface file for host3 (not currently used, kept for reference)
- `hosts/host4_interfaces` - Interface file for host4 (not currently used, kept for reference)

## Testing & Verification

###  L2 VXLAN (VLAN 40)
```
host1 (10.40.40.101) → host3 (10.40.40.103)
- Connectivity: VERIFIED ✓
- VXLAN tunnel: VTEP1 ↔ VTEP3
- MAC learning: Working via EVPN Type-2
```

###  L3 VXLAN (VRF gold)
```
host2 (10.34.34.102) → host4 (10.78.78.104)
- Connectivity: VERIFIED ✓
- Ping results: 0% packet loss, TTL=62
- Routing: Via EVPN Type-5 through fabric
```

###  Infrastructure Status
- BGP Underlay: All sessions ESTAB
- EVPN Overlay: All neighbors ESTAB
- MLAG: All 4 pairs operational
- Port-Channels: LACP negotiated on all hosts

## Related Issues

Fixes #1 - Lab deployment and configuration fixes
Fixes #2 - BGP EVPN neighbors stuck in Connect state
Fixes #3 - Ready for deployment with EVPN activation
Fixes #4 - Lab convergence in progress
Fixes #5 - BGP EVPN neighbors stuck in Active state
Fixes #11 - Host LACP bonding configuration
Fixes #13 - L3 VXLAN default route issue

## Key Technical Learnings

1. **Arista EOS requires explicit `ip routing`** before BGP can function
2. **MLAG peer-link must be trunk mode** to allow VLAN 4090/4091 traversal
3. **VLAN tagging location matters** - hosts tag, switches use trunk mode
4. **network-multitool image** superior to Alpine for LACP bonding
5. **Specific routes better than default routes** when management network present
6. **LACP rate fast** ensures quick negotiation with Arista switches

## Deployment

After merging, deploy with:
```bash
cd ~/arista-evpn-vxlan-clab
sudo containerlab destroy -t evpn-lab.clab.yml --cleanup
sudo containerlab deploy -t evpn-lab.clab.yml
```

No manual post-deployment configuration needed - everything works from initial deployment!

## Breaking Changes

⚠️ **Host image changed** from `alpine:latest` to `ghcr.io/hellt/network-multitool`
⚠️ **Host configuration completely redesigned** - old exec commands replaced

## Reviewers

@Damien - Please review and merge when ready

---

**This PR represents the complete troubleshooting journey and brings the lab to production-ready status with full L2 and L3 VXLAN functionality.** 🚀

Reviewed-on: #14
Co-authored-by: Damien <damien@arnodo.fr>
Co-committed-by: Damien <damien@arnodo.fr>
2025-11-30 10:24:29 +00:00

305 lines
7.6 KiB
Markdown

# Deployment & Testing Checklist
## ✅ What Was Fixed
- [x] Host VLAN tagging configuration in topology file
- [x] All 4 hosts now create VLAN subinterfaces (bond0.XX)
- [x] Leaf port-channels properly configured for access mode
- [x] BGP configuration in leafs includes `ip routing` command
- [x] MLAG configurations validated on all 4 leaf pairs
- [x] VXLAN VTEP configuration in place
- [x] EVPN overlay configuration complete
## 🚀 Deployment Steps
### 1. Check Current Branch
```bash
cd ~/arista-evpn-vxlan-clab
git branch
git status
```
Should show: `fix-bgp-and-mlag` branch
### 2. Destroy Current Lab (if running)
```bash
sudo containerlab destroy -t evpn-lab.clab.yml --cleanup
```
### 3. Deploy Fixed Lab
```bash
sudo containerlab deploy -t evpn-lab.clab.yml
# Wait 60-90 seconds for all containers to start
```
### 4. Verify Lab is Running
```bash
sudo containerlab inspect -t evpn-lab.clab.yml
```
Should show all 10 nodes (2 spines + 8 leaves + 4 hosts) as RUNNING
---
## 📋 Pre-Testing Checks (Run in Order)
### Check 1: Spine BGP Underlay
```bash
ssh admin@clab-arista-evpn-fabric-spine1 "show bgp ipv4 unicast summary"
```
**Expected:** All 8 leaf neighbors in ESTABLISHED state
```
10.0.1.1 4 65001 22 18 Estab 3
10.0.1.3 4 65001 20 17 Estab 3
10.0.1.5 4 65002 19 18 Estab 0 ← Check this, should be 0 or more
...
```
**Status:** ☐ Pass / ☐ Fail
---
### Check 2: Leaf MLAG Status
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show mlag detail"
ssh admin@clab-arista-evpn-fabric-leaf3 "show mlag detail"
```
**Expected:** All pairs show `MLAG is active`
```
MLAG is active
Active per VLAN: yes
```
**Status:** ☐ Pass / ☐ Fail
---
### Check 3: Leaf BGP EVPN
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn summary"
```
**Expected:** Both spine neighbors in ESTABLISHED
```
10.0.250.1 4 65000 8 9 Estab 0
10.0.250.2 4 65000 8 8 Estab 0
```
**Status:** ☐ Pass / ☐ Fail
---
### Check 4: Host VLAN Interfaces
```bash
docker exec clab-arista-evpn-fabric-host1 ip -d link show bond0.40
docker exec clab-arista-evpn-fabric-host2 ip -d link show bond0.34
docker exec clab-arista-evpn-fabric-host3 ip -d link show bond0.40
docker exec clab-arista-evpn-fabric-host4 ip -d link show bond0.78
```
**Expected:** All show VLAN tagging
```
vlan protocol 802.1Q id 40 <BROADCAST,MULTICAST,UP,LOWER_UP>
```
**Status:** ☐ Pass / ☐ Fail
---
## 🧪 Connectivity Tests
### Test 1: Host to Gateway (VLAN40)
```bash
docker exec clab-arista-evpn-fabric-host1 ping -c 2 10.40.40.1
docker exec clab-arista-evpn-fabric-host3 ping -c 2 10.40.40.1
```
**Expected:** 2/2 packets successful
**Status:** ☐ Pass / ☐ Fail
**Time:** ~5 seconds
---
### Test 2: L2 VXLAN Connectivity (Host1 → Host3)
```bash
docker exec clab-arista-evpn-fabric-host1 ping -c 4 10.40.40.103
```
**Expected:** 4/4 packets successful
```
PING 10.40.40.103 (10.40.40.103): 56 data bytes
64 bytes from 10.40.40.103: seq=0 ttl=64 time=X.XXms
```
**Status:** ☐ Pass / ☐ Fail
**Time:** ~10 seconds
---
### Test 3: MAC Learning on Leaf1
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show mac address-table vlan 40"
```
**Expected:** At least 1 MAC learned
```
Vlan Mac Address Type Ports
40 XXXX.XXXX.XXXX DYNAMIC Po1
```
**Status:** ☐ Pass / ☐ Fail
---
### Test 4: Remote MAC Learning via VXLAN
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show vxlan address-table vlan 40"
```
**Expected:** MAC from host3 learned via Vxlan1
```
VLAN Mac Address Type Prt VTEP
40 XXXX.XXXX.XXXX EVPN Vx1 10.0.255.13
```
**Status:** ☐ Pass / ☐ Fail
---
### Test 5: EVPN Type-2 Routes
```bash
ssh admin@clab-arista-evpn-fabric-leaf1 "show bgp evpn route-type mac-ip | head -20"
```
**Expected:** Both local and remote MACs advertised
```
RD: 65001:110040 mac-ip XXXX.XXXX.XXXX
- -
RD: 65003:110040 mac-ip XXXX.XXXX.XXXX
10.0.255.13
```
**Status:** ☐ Pass / ☐ Fail
---
### Test 6: Host to Gateway (VLAN34)
```bash
docker exec clab-arista-evpn-fabric-host2 ping -c 2 10.34.34.1
```
**Expected:** 2/2 packets successful
**Status:** ☐ Pass / ☐ Fail
**Time:** ~5 seconds
---
### Test 7: L3 VXLAN Connectivity (Host2 → Host4)
```bash
docker exec clab-arista-evpn-fabric-host2 ping -c 4 10.78.78.104
```
**Expected:** 4/4 packets successful
**Status:** ☐ Pass / ☐ Fail
**Time:** ~10 seconds
---
### Test 8: VRF Routing on Leaf3
```bash
ssh admin@clab-arista-evpn-fabric-leaf3 "show ip route vrf gold"
```
**Expected:** Routes to both 10.34.34.0/24 and 10.78.78.0/24
```
C 10.34.34.0/24 is directly connected, Vlan34
B E 10.78.78.0/24 [200/0] via VTEP 10.0.255.14
```
**Status:** ☐ Pass / ☐ Fail
---
### Test 9: EVPN Type-5 Routes
```bash
ssh admin@clab-arista-evpn-fabric-leaf3 "show bgp evpn route-type ip-prefix ipv4"
```
**Expected:** IP prefixes for both VTEPs
```
RD: 10.0.250.13:1 ip-prefix 10.34.34.0/24
RD: 10.0.250.17:1 ip-prefix 10.78.78.0/24
```
**Status:** ☐ Pass / ☐ Fail
---
## 📊 Summary Table
| Component | Check | Expected | Actual | Status |
|-----------|-------|----------|--------|--------|
| Spine BGP | All leaves established | 8/8 ESTAB | ? | ☐ |
| Leaf MLAG | Pair status | active/active | ? | ☐ |
| EVPN | Spine peers | 2/2 ESTAB | ? | ☐ |
| Host Interfaces | VLAN tags | 4 VLAN ifaces | ? | ☐ |
| L2 Gateway | Ping host→gw | 2/2 success | ? | ☐ |
| L2 VXLAN | Host1→Host3 | 4/4 success | ? | ☐ |
| MAC Learning | Leaf1 VLAN40 | ≥1 MAC | ? | ☐ |
| Remote MACs | VXLAN table | MACs from Vx1 | ? | ☐ |
| Type-2 Routes | EVPN MACs | Local + Remote | ? | ☐ |
| L3 Gateway | Ping host→gw | 2/2 success | ? | ☐ |
| L3 VXLAN | Host2→Host4 | 4/4 success | ? | ☐ |
| VRF Routes | Leaf3 VRF gold | 2+ routes | ? | ☐ |
| Type-5 Routes | EVPN prefixes | Local + Remote | ? | ☐ |
---
## 🔧 If Tests Fail
### L2 ping fails
```bash
# 1. Check host VLAN interface
docker exec clab-arista-evpn-fabric-host1 ip addr show bond0.40
# Should show: inet 10.40.40.101/24 dev bond0.40
# 2. Check port-channel status
ssh admin@clab-arista-evpn-fabric-leaf1 "show interface Port-Channel1"
# Should show: up, up
# 3. Check VLAN 40 exists on leaf
ssh admin@clab-arista-evpn-fabric-leaf1 "show vlan 40"
# Should show: VLAN 40 exists
# 4. Check MAC learning (generate traffic)
docker exec clab-arista-evpn-fabric-host1 arping -c 3 10.40.40.1
ssh admin@clab-arista-evpn-fabric-leaf1 "show mac address-table vlan 40"
# Should show host1 MAC
```
### L3 ping fails
```bash
# 1. Check VRF VLAN interface
ssh admin@clab-arista-evpn-fabric-leaf3 "show interface Vlan34"
# Should show: up, up
# 2. Check VRF routing enabled
ssh admin@clab-arista-evpn-fabric-leaf3 "show ip route vrf gold"
# Should show routes
# 3. Check VXLAN VRF mapping
ssh admin@clab-arista-evpn-fabric-leaf3 "show interface Vxlan1"
# Should show: vxlan vrf gold vni 100001
```
---
## 📝 Notes for Next Steps
1. **If all tests pass**
- Create pull request to merge `fix-bgp-and-mlag` into `main`
- Document the changes in FIXES_APPLIED.md
- Update main branch documentation
2. **If specific tests fail** ⚠️
- Review the troubleshooting section above
- Check device logs: `show log`
- Review configuration with `show running-config`
3. **Keep for reference**
- END_TO_END_TESTING.md - Comprehensive testing guide
- VLAN_TAGGING_FIX_EXPLANATION.md - Explains the root cause and fix
---
## 🎯 Success Criteria
**Lab is ready for production use when:**
- ✓ All pre-testing checks pass
- ✓ All 9 connectivity tests pass
- ✓ No errors in device logs
- ✓ MLAG is active/active on all pairs
- ✓ BGP neighbors all established
- ✓ EVPN routes being advertised