Add guide explaining VXLAN monitoring through existing metrics

This commit is contained in:
2025-12-16 20:06:58 +00:00
parent 33407445fb
commit 2ac535cdae

View File

@@ -0,0 +1,212 @@
# VXLAN Monitoring Without Native Paths
## The Problem
Arista's VXLAN-specific telemetry paths (`arista-exp-eos-vxlan`) don't have well-documented OpenConfig equivalents, and the native paths are not standardized.
## The Solution
**You already have VXLAN visibility** through existing subscriptions! Here's how:
### 1. VXLAN Interface Metrics (Already Collected!)
The `Vxlan1` interface IS your VXLAN endpoint. Our existing `interfaces` subscription captures:
```prometheus
# VXLAN tunnel traffic
gnmic_interfaces_interface_state_counters_in_octets{interface_name="Vxlan1"}
gnmic_interfaces_interface_state_counters_out_octets{interface_name="Vxlan1"}
# VXLAN tunnel errors
gnmic_interfaces_interface_state_counters_in_errors{interface_name="Vxlan1"}
gnmic_interfaces_interface_state_counters_out_errors{interface_name="Vxlan1"}
# VXLAN interface status
gnmic_interfaces_interface_state_oper_status{interface_name="Vxlan1"}
```
### 2. VTEP Reachability (via BGP EVPN!)
BGP EVPN neighbors = VTEP reachability:
```prometheus
# EVPN neighbor state (1 = Established, VTEP is up)
gnmic_bgp_neighbors_neighbor_state_session_state{neighbor_address="10.0.250.13"}
# EVPN routes received = VNI propagation working
gnmic_bgp_neighbors_neighbor_afi_safis_state_prefixes_received{
neighbor_address="10.0.250.1",
afi_safi_name="L2VPN_EVPN"
}
```
### 3. Underlay Health = VXLAN Health
If underlay (spine-leaf) interfaces are up and BGP is established, VXLAN tunnels will form automatically:
```prometheus
# Underlay interfaces to spines
gnmic_interfaces_interface_state_oper_status{
interface_name=~"Ethernet1[12]",
role="leaf"
}
```
## Grafana Queries for VXLAN Monitoring
### VXLAN Tunnel Bandwidth
```promql
# VXLAN tunnel TX rate (bits/sec)
rate(gnmic_interfaces_interface_state_counters_out_octets{interface_name="Vxlan1"}[1m]) * 8
# VXLAN tunnel RX rate (bits/sec)
rate(gnmic_interfaces_interface_state_counters_in_octets{interface_name="Vxlan1"}[1m]) * 8
```
### VTEP Reachability Matrix
```promql
# Show which VTEPs can reach each other (via EVPN)
gnmic_bgp_neighbors_neighbor_state_session_state{
afi_safi_name="L2VPN_EVPN"
} == 6 # 6 = Established in OpenConfig BGP
```
### VNI Count per VTEP
```promql
# Count of EVPN routes = approximation of active VNIs
gnmic_bgp_neighbors_neighbor_afi_safis_state_prefixes_received{
afi_safi_name="L2VPN_EVPN"
}
```
### VXLAN Errors
```promql
# VXLAN tunnel errors
rate(gnmic_interfaces_interface_state_counters_in_errors{interface_name="Vxlan1"}[5m])
```
## What You're Missing (and Why It's OK)
### ❌ Not Directly Available:
- Per-VNI packet/byte counters
- Individual VTEP discovery lists
- Flood list details
- VNI-to-VLAN mappings
### ✅ Why It's OK:
1. **Total VXLAN traffic** (Vxlan1 interface) is usually more useful than per-VNI
2. **VTEP reachability** is inferred from BGP EVPN neighbor states
3. **VNI health** is inferred from EVPN route counts
4. **Configuration info** (VNI-to-VLAN) doesn't change often, can be in docs
## If You Really Need Native VXLAN Paths
### Discovery Method:
```bash
# SSH to a leaf
ssh admin@172.16.0.25
# Enter bash
bash
# Try to get native VXLAN paths
gnmi -get /Sysdb/bridging/vxlan/status
gnmi -get /Smash/bridging/status/vxlanStatus
# Or use EOS native provider in gnmi config
```
### Add to gnmic.yaml (if discovery works):
```yaml
subscriptions:
arista_vxlan:
paths:
- /Sysdb/bridging/vxlan/status # If this works
mode: stream
stream-mode: sample
sample-interval: 30s
encoding: json # Note: probably needs 'json' not 'json_ietf'
```
### Add to switch config:
```
management api gnmi
transport grpc default
provider eos-native
```
This enables Arista native YANG paths alongside OpenConfig.
## Recommended Dashboard Panels
### 1. VXLAN Tunnel Bandwidth (per VTEP)
Shows total VXLAN encapsulated traffic per leaf pair:
```promql
sum by (source, vtep) (
rate(gnmic_interfaces_interface_state_counters_out_octets{
interface_name="Vxlan1",
role="leaf"
}[1m]) * 8
)
```
### 2. VTEP Connectivity Heat Map
Matrix showing which VTEPs can reach each other:
```promql
gnmic_bgp_neighbors_neighbor_state_session_state{
afi_safi_name="L2VPN_EVPN"
}
```
### 3. EVPN Route Count (Proxy for VNI Health)
```promql
gnmic_bgp_neighbors_neighbor_afi_safis_state_prefixes_received{
afi_safi_name="L2VPN_EVPN"
}
```
### 4. VXLAN vs Underlay Traffic Comparison
Compare VXLAN encapsulated vs total underlay:
```promql
# VXLAN traffic (overlay)
sum(rate(gnmic_interfaces_interface_state_counters_out_octets{interface_name="Vxlan1"}[1m])) * 8
# vs
# Total underlay traffic
sum(rate(gnmic_interfaces_interface_state_counters_out_octets{interface_name=~"Ethernet.*"}[1m])) * 8
```
## Summary
**You already have comprehensive VXLAN monitoring** through:
- ✅ Vxlan1 interface metrics (tunnel traffic)
- ✅ BGP EVPN neighbors (VTEP reachability)
- ✅ EVPN route counts (VNI propagation)
- ✅ Underlay interface health (tunnel foundation)
This is **sufficient for production monitoring** and will power your Flow Plugin visualization perfectly.
If you discover the native Arista VXLAN paths, we can add them as an enhancement, but they're not required for a functional monitoring stack.
## Next Steps
1. **Use current config** - It's production-ready
2. **Create VXLAN dashboard** - Use the queries above
3. **Optional: Discover native paths** - If you need per-VNI details later
The beauty of this approach: **It works right now** and gives you 90% of what you need for VXLAN monitoring!