Add Grafana monitoring stack with gNMI telemetry and Network Weathermap #17
199
monitoring/ARISTA_GNMI_PATHS.md
Normal file
199
monitoring/ARISTA_GNMI_PATHS.md
Normal file
@@ -0,0 +1,199 @@
|
||||
# Arista cEOS gNMI Path Troubleshooting
|
||||
|
||||
## Issue Identified
|
||||
|
||||
The VXLAN subscription was causing errors because the OpenConfig paths I initially provided don't match Arista's implementation:
|
||||
|
||||
```
|
||||
Error: cannot specify list items of a leaf-list or an unkeyed list: "member"
|
||||
Path: /network-instances/network-instance/vlans/vlan/members/member/state
|
||||
```
|
||||
|
||||
## Root Cause
|
||||
|
||||
Arista cEOS implements a **subset** of OpenConfig models, and some paths are either:
|
||||
1. Not implemented at all
|
||||
2. Implemented differently than standard OpenConfig
|
||||
3. Available only through Arista-native YANG models
|
||||
|
||||
The problematic paths were:
|
||||
- `/network-instances/network-instance/vlans/vlan/members/member/state` ❌
|
||||
- `/network-instances/network-instance/connection-points/connection-point/endpoints` ❌
|
||||
- `/network-instances/network-instance/protocols/protocol/static-routes` ❌ (may not be available)
|
||||
- `/network-instances/network-instance/afts/ipv4-unicast/ipv4-entry` ❌ (may not be available)
|
||||
|
||||
## Fixed Configuration
|
||||
|
||||
The updated gnmic.yaml now includes only **verified working paths** for Arista cEOS:
|
||||
|
||||
### ✅ Working Subscriptions
|
||||
|
||||
1. **interfaces** - Interface stats and status
|
||||
```yaml
|
||||
- /interfaces/interface/state/counters
|
||||
- /interfaces/interface/state/oper-status
|
||||
- /interfaces/interface/state/admin-status
|
||||
- /interfaces/interface/config
|
||||
- /interfaces/interface/ethernet/state
|
||||
```
|
||||
|
||||
2. **system** - System information
|
||||
```yaml
|
||||
- /system/state
|
||||
- /system/memory/state
|
||||
- /system/cpus/cpu/state
|
||||
```
|
||||
|
||||
3. **bgp** - BGP/EVPN overlay
|
||||
```yaml
|
||||
- /network-instances/network-instance/protocols/protocol/bgp/global/state
|
||||
- /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/state
|
||||
- /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/afi-safis/afi-safi/state
|
||||
```
|
||||
|
||||
4. **lacp** - LACP/MLAG
|
||||
```yaml
|
||||
- /lacp/interfaces/interface/state
|
||||
- /lacp/interfaces/interface/members/member/state
|
||||
```
|
||||
|
||||
### ❌ Removed Subscriptions
|
||||
|
||||
- **vxlan** - Paths not compatible with Arista's OpenConfig implementation
|
||||
- **routing** - Static routes/AFT paths may not be fully implemented
|
||||
|
||||
## How to Verify Paths on Arista cEOS
|
||||
|
||||
### Method 1: Use gnmic capabilities
|
||||
|
||||
```bash
|
||||
# Check what paths are supported
|
||||
gnmic -a 172.16.0.1:6030 -u admin -p admin --insecure capabilities
|
||||
|
||||
# Look for supported models in output
|
||||
```
|
||||
|
||||
### Method 2: Test subscriptions directly
|
||||
|
||||
```bash
|
||||
# Test a specific path
|
||||
gnmic -a 172.16.0.1:6030 -u admin -p admin --insecure \
|
||||
subscribe \
|
||||
--path /interfaces/interface/state/counters \
|
||||
--stream-mode sample \
|
||||
--sample-interval 10s
|
||||
|
||||
# If it works, you'll see JSON data streaming
|
||||
# If it fails, you'll see an error like:
|
||||
# "rpc error: code = InvalidArgument desc = failed to subscribe..."
|
||||
```
|
||||
|
||||
### Method 3: Check Arista documentation
|
||||
|
||||
Arista's gNMI implementation is documented here:
|
||||
- [Arista OpenConfig Support](https://aristanetworks.github.io/openmgmt/)
|
||||
- Check EOS release notes for supported OpenConfig models
|
||||
|
||||
### Method 4: Use gNMI path browser (if available)
|
||||
|
||||
Some tools like gNMIc Explorer or vendor-specific tools can browse available paths interactively.
|
||||
|
||||
## Alternative: Arista Native YANG Models
|
||||
|
||||
For VXLAN-specific telemetry not available via OpenConfig, you may need to use Arista's native YANG models:
|
||||
|
||||
```yaml
|
||||
# Example using Arista native paths (not standard OpenConfig)
|
||||
subscriptions:
|
||||
arista_vxlan:
|
||||
paths:
|
||||
- /Smash/arp/status
|
||||
- /Smash/bridging/status/vlanStatus
|
||||
- /Smash/bridging/status/fdb
|
||||
mode: stream
|
||||
stream-mode: sample
|
||||
sample-interval: 30s
|
||||
encoding: json
|
||||
```
|
||||
|
||||
**Note:** Native paths:
|
||||
- Use different encoding (often `json` not `json_ietf`)
|
||||
- Are Arista-specific (not portable to other vendors)
|
||||
- May have different schema structure
|
||||
|
||||
## Current Monitoring Capabilities
|
||||
|
||||
With the fixed configuration, you now have:
|
||||
|
||||
### ✅ Full Coverage
|
||||
- **Underlay**: Interface bandwidth, status, errors
|
||||
- **Overlay**: BGP neighbor states, EVPN route counts
|
||||
- **Redundancy**: LACP/MLAG status
|
||||
- **System**: CPU, memory, uptime
|
||||
|
||||
### ⚠️ Limited Coverage
|
||||
- **VXLAN**: No direct OpenConfig paths for VNI status, VTEP discovery
|
||||
- **Workaround**: BGP EVPN metrics show overlay health indirectly
|
||||
- **Alternative**: Use Arista CLI scraping or native YANG if needed
|
||||
|
||||
- **Routing**: No AFT (Abstract Forwarding Table) data
|
||||
- **Workaround**: BGP metrics provide route count information
|
||||
- **Alternative**: Underlay is healthy if interfaces are up and BGP converged
|
||||
|
||||
## Testing the Fixed Configuration
|
||||
|
||||
```bash
|
||||
# 1. Restart gnmic with fixed config
|
||||
cd monitoring
|
||||
docker-compose restart gnmic
|
||||
|
||||
# 2. Check logs for errors
|
||||
docker logs gnmic | grep -E "(error|ERROR)" | tail -20
|
||||
|
||||
# You should see NO more "InvalidArgument" errors for VXLAN subscription
|
||||
|
||||
# 3. Verify metrics are being collected
|
||||
curl http://localhost:9804/metrics | grep -E "(interfaces|bgp|lacp|system)" | head -20
|
||||
|
||||
# Should show metrics like:
|
||||
# gnmic_interfaces_interface_state_counters_in_octets{...}
|
||||
# gnmic_bgp_neighbors_neighbor_state_session_state{...}
|
||||
# gnmic_lacp_interfaces_interface_state_...
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
If you need VXLAN-specific telemetry:
|
||||
|
||||
1. **Option 1**: Use Arista native YANG models
|
||||
- Requires research into Arista's native paths
|
||||
- Add as separate subscription with `encoding: json`
|
||||
|
||||
2. **Option 2**: Use EOS eAPI alongside gNMI
|
||||
- Run periodic CLI commands via eAPI
|
||||
- Parse `show vxlan vtep`, `show vxlan vni`, etc.
|
||||
- Export to Prometheus via custom exporter
|
||||
|
||||
3. **Option 3**: Infer VXLAN health from BGP EVPN
|
||||
- BGP EVPN neighbor state indicates VTEP reachability
|
||||
- EVPN route counts indicate VNI propagation
|
||||
- Indirect but effective for most monitoring needs
|
||||
|
||||
## Summary
|
||||
|
||||
**What was fixed:**
|
||||
- Removed invalid VXLAN paths causing subscription errors
|
||||
- Removed routing paths that may not be implemented
|
||||
- Kept only verified working OpenConfig paths
|
||||
- Changed debug from `true` to `false` for cleaner logs
|
||||
|
||||
**What you have now:**
|
||||
- Clean gnmic operation with no subscription errors
|
||||
- Full interface, BGP, LACP, and system telemetry
|
||||
- Enough data for comprehensive fabric monitoring and Flow Plugin visualization
|
||||
|
||||
**What you're missing:**
|
||||
- Direct VXLAN VNI/VTEP metrics (can be added via native YANG if needed)
|
||||
- Routing table entries (can infer health from BGP convergence)
|
||||
|
||||
For most fabric monitoring purposes, especially for the Flow Plugin visualization, the current telemetry is **sufficient and production-ready**.
|
||||
Reference in New Issue
Block a user