3.2 KiB
BGP EVPN Activation Bug - Critical Fix
Issue Description
All BGP EVPN neighbors on the leaves were stuck in Active state instead of Established state, with 0 messages sent/received.
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
10.0.250.1 4 65000 0 0 0 0 00:02:05 Active
10.0.250.2 4 65000 0 0 0 0 00:02:05 Active
Active state with 0 messages means the TCP handshake was never completed.
Root Cause
The spine BGP configurations were missing the EVPN address family activation.
In both configs/spine1.cfg and configs/spine2.cfg:
address-family evpn
neighbor evpn activate ← This line was MISSING!
Without activating the EVPN address family on the spines, they:
- Accept the EVPN neighbor definitions
- But don't actively listen for or respond to EVPN connections
- Leaves try to establish sessions but spines don't respond
- Connection attempt times out → Active state
This is different from the IPv4 underlay which was working because the IPv4 address family was activated on the spines.
Solution Applied
Before (Broken)
router bgp 65000
...
address-family evpn
! Missing activation line!
After (Fixed)
router bgp 65000
...
address-family evpn
neighbor evpn activate
Files Modified
configs/spine1.cfg- Addedneighbor evpn activatein EVPN address familyconfigs/spine2.cfg- Addedneighbor evpn activatein EVPN address family
Technical Explanation
In Arista EOS BGP, neighbors defined in the global BGP context don't actively participate in any address family until explicitly activated in that address family block.
Address Family Activation Rules
router bgp 65000
neighbor 10.0.250.1 peer group evpn
neighbor 10.0.250.1 remote-as 65000
address-family evpn
neighbor evpn activate ← REQUIRED for EVPN sessions to work
address-family ipv4
neighbor 10.0.250.1 activate ← Separate activation for IPv4
Without activating in the EVPN address family:
- The spines define the neighbor parameters ✓
- The spines enter BGP configuration ✓
- The spines do NOT listen on TCP 179 for EVPN sessions ✗
- Leaf attempts to TCP connect to spine loopback on port 179 for EVPN ✗
- Timeout occurs → Active state ✗
Testing the Fix
After deploying with the fix, the EVPN neighbors should immediately transition to Established:
# Before fix
10.0.250.1 4 65000 0 0 0 0 00:02:05 Active
# After fix
10.0.250.1 4 65000 8 8 0 0 00:00:15 Estab
Impact
This was a critical bug that:
- Prevented any EVPN overlay from functioning
- Made L2 VXLAN testing impossible
- Made L3 VXLAN testing impossible
- Prevented MAC learning via VXLAN
- Prevented EVPN route distribution
Once fixed, the entire EVPN overlay becomes operational immediately.
Lesson Learned
In BGP multi-address-family configurations, every address family must be explicitly activated. This includes:
- IPv4 unicast
- IPv6 unicast
- EVPN
- Route target filtering
- Any other address families being used
A common mistake is to define a neighbor globally but forget to activate it in all address families where it should be used.