From 2ac535cdae8e5a8aaf890b08914cc604bda81ac1 Mon Sep 17 00:00:00 2001 From: Damien Arnodo Date: Tue, 16 Dec 2025 20:06:58 +0000 Subject: [PATCH] Add guide explaining VXLAN monitoring through existing metrics --- monitoring/VXLAN_MONITORING_GUIDE.md | 212 +++++++++++++++++++++++++++ 1 file changed, 212 insertions(+) create mode 100644 monitoring/VXLAN_MONITORING_GUIDE.md diff --git a/monitoring/VXLAN_MONITORING_GUIDE.md b/monitoring/VXLAN_MONITORING_GUIDE.md new file mode 100644 index 0000000..fdb0b24 --- /dev/null +++ b/monitoring/VXLAN_MONITORING_GUIDE.md @@ -0,0 +1,212 @@ +# VXLAN Monitoring Without Native Paths + +## The Problem + +Arista's VXLAN-specific telemetry paths (`arista-exp-eos-vxlan`) don't have well-documented OpenConfig equivalents, and the native paths are not standardized. + +## The Solution + +**You already have VXLAN visibility** through existing subscriptions! Here's how: + +### 1. VXLAN Interface Metrics (Already Collected!) + +The `Vxlan1` interface IS your VXLAN endpoint. Our existing `interfaces` subscription captures: + +```prometheus +# VXLAN tunnel traffic +gnmic_interfaces_interface_state_counters_in_octets{interface_name="Vxlan1"} +gnmic_interfaces_interface_state_counters_out_octets{interface_name="Vxlan1"} + +# VXLAN tunnel errors +gnmic_interfaces_interface_state_counters_in_errors{interface_name="Vxlan1"} +gnmic_interfaces_interface_state_counters_out_errors{interface_name="Vxlan1"} + +# VXLAN interface status +gnmic_interfaces_interface_state_oper_status{interface_name="Vxlan1"} +``` + +### 2. VTEP Reachability (via BGP EVPN!) + +BGP EVPN neighbors = VTEP reachability: + +```prometheus +# EVPN neighbor state (1 = Established, VTEP is up) +gnmic_bgp_neighbors_neighbor_state_session_state{neighbor_address="10.0.250.13"} + +# EVPN routes received = VNI propagation working +gnmic_bgp_neighbors_neighbor_afi_safis_state_prefixes_received{ + neighbor_address="10.0.250.1", + afi_safi_name="L2VPN_EVPN" +} +``` + +### 3. Underlay Health = VXLAN Health + +If underlay (spine-leaf) interfaces are up and BGP is established, VXLAN tunnels will form automatically: + +```prometheus +# Underlay interfaces to spines +gnmic_interfaces_interface_state_oper_status{ + interface_name=~"Ethernet1[12]", + role="leaf" +} +``` + +## Grafana Queries for VXLAN Monitoring + +### VXLAN Tunnel Bandwidth + +```promql +# VXLAN tunnel TX rate (bits/sec) +rate(gnmic_interfaces_interface_state_counters_out_octets{interface_name="Vxlan1"}[1m]) * 8 + +# VXLAN tunnel RX rate (bits/sec) +rate(gnmic_interfaces_interface_state_counters_in_octets{interface_name="Vxlan1"}[1m]) * 8 +``` + +### VTEP Reachability Matrix + +```promql +# Show which VTEPs can reach each other (via EVPN) +gnmic_bgp_neighbors_neighbor_state_session_state{ + afi_safi_name="L2VPN_EVPN" +} == 6 # 6 = Established in OpenConfig BGP +``` + +### VNI Count per VTEP + +```promql +# Count of EVPN routes = approximation of active VNIs +gnmic_bgp_neighbors_neighbor_afi_safis_state_prefixes_received{ + afi_safi_name="L2VPN_EVPN" +} +``` + +### VXLAN Errors + +```promql +# VXLAN tunnel errors +rate(gnmic_interfaces_interface_state_counters_in_errors{interface_name="Vxlan1"}[5m]) +``` + +## What You're Missing (and Why It's OK) + +### ❌ Not Directly Available: +- Per-VNI packet/byte counters +- Individual VTEP discovery lists +- Flood list details +- VNI-to-VLAN mappings + +### ✅ Why It's OK: +1. **Total VXLAN traffic** (Vxlan1 interface) is usually more useful than per-VNI +2. **VTEP reachability** is inferred from BGP EVPN neighbor states +3. **VNI health** is inferred from EVPN route counts +4. **Configuration info** (VNI-to-VLAN) doesn't change often, can be in docs + +## If You Really Need Native VXLAN Paths + +### Discovery Method: + +```bash +# SSH to a leaf +ssh admin@172.16.0.25 + +# Enter bash +bash + +# Try to get native VXLAN paths +gnmi -get /Sysdb/bridging/vxlan/status +gnmi -get /Smash/bridging/status/vxlanStatus + +# Or use EOS native provider in gnmi config +``` + +### Add to gnmic.yaml (if discovery works): + +```yaml +subscriptions: + arista_vxlan: + paths: + - /Sysdb/bridging/vxlan/status # If this works + mode: stream + stream-mode: sample + sample-interval: 30s + encoding: json # Note: probably needs 'json' not 'json_ietf' +``` + +### Add to switch config: + +``` +management api gnmi + transport grpc default + provider eos-native +``` + +This enables Arista native YANG paths alongside OpenConfig. + +## Recommended Dashboard Panels + +### 1. VXLAN Tunnel Bandwidth (per VTEP) + +Shows total VXLAN encapsulated traffic per leaf pair: + +```promql +sum by (source, vtep) ( + rate(gnmic_interfaces_interface_state_counters_out_octets{ + interface_name="Vxlan1", + role="leaf" + }[1m]) * 8 +) +``` + +### 2. VTEP Connectivity Heat Map + +Matrix showing which VTEPs can reach each other: + +```promql +gnmic_bgp_neighbors_neighbor_state_session_state{ + afi_safi_name="L2VPN_EVPN" +} +``` + +### 3. EVPN Route Count (Proxy for VNI Health) + +```promql +gnmic_bgp_neighbors_neighbor_afi_safis_state_prefixes_received{ + afi_safi_name="L2VPN_EVPN" +} +``` + +### 4. VXLAN vs Underlay Traffic Comparison + +Compare VXLAN encapsulated vs total underlay: + +```promql +# VXLAN traffic (overlay) +sum(rate(gnmic_interfaces_interface_state_counters_out_octets{interface_name="Vxlan1"}[1m])) * 8 + +# vs + +# Total underlay traffic +sum(rate(gnmic_interfaces_interface_state_counters_out_octets{interface_name=~"Ethernet.*"}[1m])) * 8 +``` + +## Summary + +**You already have comprehensive VXLAN monitoring** through: +- ✅ Vxlan1 interface metrics (tunnel traffic) +- ✅ BGP EVPN neighbors (VTEP reachability) +- ✅ EVPN route counts (VNI propagation) +- ✅ Underlay interface health (tunnel foundation) + +This is **sufficient for production monitoring** and will power your Flow Plugin visualization perfectly. + +If you discover the native Arista VXLAN paths, we can add them as an enhancement, but they're not required for a functional monitoring stack. + +## Next Steps + +1. **Use current config** - It's production-ready +2. **Create VXLAN dashboard** - Use the queries above +3. **Optional: Discover native paths** - If you need per-VNI details later + +The beauty of this approach: **It works right now** and gives you 90% of what you need for VXLAN monitoring!