Complete Lab Fixes - L2 and L3 VXLAN Fully Operational #14

Merged
Damien merged 87 commits from fix-bgp-and-mlag into main 2025-11-30 10:24:29 +00:00
Showing only changes of commit 1f6bd4f978 - Show all commits

114
BUGFIX_EVPN_ACTIVATION.md Normal file
View File

@@ -0,0 +1,114 @@
# BGP EVPN Activation Bug - Critical Fix
## Issue Description
All BGP EVPN neighbors on the leaves were stuck in **Active** state instead of **Established** state, with **0 messages sent/received**.
```
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
10.0.250.1 4 65000 0 0 0 0 00:02:05 Active
10.0.250.2 4 65000 0 0 0 0 00:02:05 Active
```
Active state with 0 messages means the TCP handshake was **never completed**.
## Root Cause
The **spine BGP configurations were missing the EVPN address family activation**.
In both `configs/spine1.cfg` and `configs/spine2.cfg`:
```
address-family evpn
neighbor evpn activate ← This line was MISSING!
```
Without activating the EVPN address family on the spines, they:
1. Accept the EVPN neighbor definitions
2. But don't actively listen for or respond to EVPN connections
3. Leaves try to establish sessions but spines don't respond
4. Connection attempt times out → Active state
This is **different from the IPv4 underlay** which was working because the IPv4 address family **was activated** on the spines.
## Solution Applied
### Before (Broken)
```
router bgp 65000
...
address-family evpn
! Missing activation line!
```
### After (Fixed)
```
router bgp 65000
...
address-family evpn
neighbor evpn activate
```
## Files Modified
- `configs/spine1.cfg` - Added `neighbor evpn activate` in EVPN address family
- `configs/spine2.cfg` - Added `neighbor evpn activate` in EVPN address family
## Technical Explanation
In Arista EOS BGP, neighbors defined in the global BGP context don't actively participate in any address family **until explicitly activated in that address family block**.
### Address Family Activation Rules
```
router bgp 65000
neighbor 10.0.250.1 peer group evpn
neighbor 10.0.250.1 remote-as 65000
address-family evpn
neighbor evpn activate ← REQUIRED for EVPN sessions to work
address-family ipv4
neighbor 10.0.250.1 activate ← Separate activation for IPv4
```
Without activating in the EVPN address family:
- The spines define the neighbor parameters ✓
- The spines enter BGP configuration ✓
- The spines do NOT listen on TCP 179 for EVPN sessions ✗
- Leaf attempts to TCP connect to spine loopback on port 179 for EVPN ✗
- Timeout occurs → Active state ✗
## Testing the Fix
After deploying with the fix, the EVPN neighbors should immediately transition to **Established**:
```bash
# Before fix
10.0.250.1 4 65000 0 0 0 0 00:02:05 Active
# After fix
10.0.250.1 4 65000 8 8 0 0 00:00:15 Estab
```
## Impact
This was a **critical bug** that:
- Prevented any EVPN overlay from functioning
- Made L2 VXLAN testing impossible
- Made L3 VXLAN testing impossible
- Prevented MAC learning via VXLAN
- Prevented EVPN route distribution
Once fixed, the entire EVPN overlay becomes operational immediately.
## Lesson Learned
In BGP multi-address-family configurations, **every address family must be explicitly activated**. This includes:
- IPv4 unicast
- IPv6 unicast
- EVPN
- Route target filtering
- Any other address families being used
A common mistake is to define a neighbor globally but forget to activate it in all address families where it should be used.