Add documentation on Arista cEOS gNMI path compatibility and troubleshooting

This commit is contained in:
2025-12-16 19:49:24 +00:00
parent f79a3bdd38
commit 53b585e6b3

View File

@@ -0,0 +1,199 @@
# Arista cEOS gNMI Path Troubleshooting
## Issue Identified
The VXLAN subscription was causing errors because the OpenConfig paths I initially provided don't match Arista's implementation:
```
Error: cannot specify list items of a leaf-list or an unkeyed list: "member"
Path: /network-instances/network-instance/vlans/vlan/members/member/state
```
## Root Cause
Arista cEOS implements a **subset** of OpenConfig models, and some paths are either:
1. Not implemented at all
2. Implemented differently than standard OpenConfig
3. Available only through Arista-native YANG models
The problematic paths were:
- `/network-instances/network-instance/vlans/vlan/members/member/state`
- `/network-instances/network-instance/connection-points/connection-point/endpoints`
- `/network-instances/network-instance/protocols/protocol/static-routes` ❌ (may not be available)
- `/network-instances/network-instance/afts/ipv4-unicast/ipv4-entry` ❌ (may not be available)
## Fixed Configuration
The updated gnmic.yaml now includes only **verified working paths** for Arista cEOS:
### ✅ Working Subscriptions
1. **interfaces** - Interface stats and status
```yaml
- /interfaces/interface/state/counters
- /interfaces/interface/state/oper-status
- /interfaces/interface/state/admin-status
- /interfaces/interface/config
- /interfaces/interface/ethernet/state
```
2. **system** - System information
```yaml
- /system/state
- /system/memory/state
- /system/cpus/cpu/state
```
3. **bgp** - BGP/EVPN overlay
```yaml
- /network-instances/network-instance/protocols/protocol/bgp/global/state
- /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/state
- /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/afi-safis/afi-safi/state
```
4. **lacp** - LACP/MLAG
```yaml
- /lacp/interfaces/interface/state
- /lacp/interfaces/interface/members/member/state
```
### ❌ Removed Subscriptions
- **vxlan** - Paths not compatible with Arista's OpenConfig implementation
- **routing** - Static routes/AFT paths may not be fully implemented
## How to Verify Paths on Arista cEOS
### Method 1: Use gnmic capabilities
```bash
# Check what paths are supported
gnmic -a 172.16.0.1:6030 -u admin -p admin --insecure capabilities
# Look for supported models in output
```
### Method 2: Test subscriptions directly
```bash
# Test a specific path
gnmic -a 172.16.0.1:6030 -u admin -p admin --insecure \
subscribe \
--path /interfaces/interface/state/counters \
--stream-mode sample \
--sample-interval 10s
# If it works, you'll see JSON data streaming
# If it fails, you'll see an error like:
# "rpc error: code = InvalidArgument desc = failed to subscribe..."
```
### Method 3: Check Arista documentation
Arista's gNMI implementation is documented here:
- [Arista OpenConfig Support](https://aristanetworks.github.io/openmgmt/)
- Check EOS release notes for supported OpenConfig models
### Method 4: Use gNMI path browser (if available)
Some tools like gNMIc Explorer or vendor-specific tools can browse available paths interactively.
## Alternative: Arista Native YANG Models
For VXLAN-specific telemetry not available via OpenConfig, you may need to use Arista's native YANG models:
```yaml
# Example using Arista native paths (not standard OpenConfig)
subscriptions:
arista_vxlan:
paths:
- /Smash/arp/status
- /Smash/bridging/status/vlanStatus
- /Smash/bridging/status/fdb
mode: stream
stream-mode: sample
sample-interval: 30s
encoding: json
```
**Note:** Native paths:
- Use different encoding (often `json` not `json_ietf`)
- Are Arista-specific (not portable to other vendors)
- May have different schema structure
## Current Monitoring Capabilities
With the fixed configuration, you now have:
### ✅ Full Coverage
- **Underlay**: Interface bandwidth, status, errors
- **Overlay**: BGP neighbor states, EVPN route counts
- **Redundancy**: LACP/MLAG status
- **System**: CPU, memory, uptime
### ⚠️ Limited Coverage
- **VXLAN**: No direct OpenConfig paths for VNI status, VTEP discovery
- **Workaround**: BGP EVPN metrics show overlay health indirectly
- **Alternative**: Use Arista CLI scraping or native YANG if needed
- **Routing**: No AFT (Abstract Forwarding Table) data
- **Workaround**: BGP metrics provide route count information
- **Alternative**: Underlay is healthy if interfaces are up and BGP converged
## Testing the Fixed Configuration
```bash
# 1. Restart gnmic with fixed config
cd monitoring
docker-compose restart gnmic
# 2. Check logs for errors
docker logs gnmic | grep -E "(error|ERROR)" | tail -20
# You should see NO more "InvalidArgument" errors for VXLAN subscription
# 3. Verify metrics are being collected
curl http://localhost:9804/metrics | grep -E "(interfaces|bgp|lacp|system)" | head -20
# Should show metrics like:
# gnmic_interfaces_interface_state_counters_in_octets{...}
# gnmic_bgp_neighbors_neighbor_state_session_state{...}
# gnmic_lacp_interfaces_interface_state_...
```
## Future Enhancements
If you need VXLAN-specific telemetry:
1. **Option 1**: Use Arista native YANG models
- Requires research into Arista's native paths
- Add as separate subscription with `encoding: json`
2. **Option 2**: Use EOS eAPI alongside gNMI
- Run periodic CLI commands via eAPI
- Parse `show vxlan vtep`, `show vxlan vni`, etc.
- Export to Prometheus via custom exporter
3. **Option 3**: Infer VXLAN health from BGP EVPN
- BGP EVPN neighbor state indicates VTEP reachability
- EVPN route counts indicate VNI propagation
- Indirect but effective for most monitoring needs
## Summary
**What was fixed:**
- Removed invalid VXLAN paths causing subscription errors
- Removed routing paths that may not be implemented
- Kept only verified working OpenConfig paths
- Changed debug from `true` to `false` for cleaner logs
**What you have now:**
- Clean gnmic operation with no subscription errors
- Full interface, BGP, LACP, and system telemetry
- Enough data for comprehensive fabric monitoring and Flow Plugin visualization
**What you're missing:**
- Direct VXLAN VNI/VTEP metrics (can be added via native YANG if needed)
- Routing table entries (can infer health from BGP convergence)
For most fabric monitoring purposes, especially for the Flow Plugin visualization, the current telemetry is **sufficient and production-ready**.