Add Grafana monitoring stack with gNMI telemetry and Network Weathermap #17

Closed
Damien wants to merge 28 commits from feature/grafana-monitoring into main
Owner

Summary

Adds a complete monitoring stack for the EVPN-VXLAN fabric using gNMI streaming telemetry with gnmic, Prometheus, and Grafana Network Weathermap.

Components

  • gnmic: Collects gNMI telemetry from all 10 switches (port 6030)
  • Prometheus: Stores metrics with 15-day retention
  • Grafana: Visualizes with Weathermap plugin pre-installed

Files Added

  • monitoring/docker-compose.yml - Stack orchestration
  • monitoring/gnmic/gnmic.yaml - gNMI collector config for all devices
  • monitoring/prometheus/prometheus.yml - Scrape configuration
  • monitoring/grafana/dashboards/ - Pre-built dashboards including weathermap
  • monitoring/deploy.sh - Quick deployment script

Usage

cd monitoring
docker-compose up -d
# Grafana: http://localhost:3000 (admin/admin)

Collected Metrics

  • Interface counters (in/out octets, packets, errors)
  • BGP neighbor state
  • MLAG status
  • System metrics (CPU, memory)
## Summary Adds a complete monitoring stack for the EVPN-VXLAN fabric using gNMI streaming telemetry with gnmic, Prometheus, and Grafana Network Weathermap. ## Components - **gnmic**: Collects gNMI telemetry from all 10 switches (port 6030) - **Prometheus**: Stores metrics with 15-day retention - **Grafana**: Visualizes with Weathermap plugin pre-installed ## Files Added - `monitoring/docker-compose.yml` - Stack orchestration - `monitoring/gnmic/gnmic.yaml` - gNMI collector config for all devices - `monitoring/prometheus/prometheus.yml` - Scrape configuration - `monitoring/grafana/dashboards/` - Pre-built dashboards including weathermap - `monitoring/deploy.sh` - Quick deployment script ## Usage ```bash cd monitoring docker-compose up -d # Grafana: http://localhost:3000 (admin/admin) ``` ## Collected Metrics - Interface counters (in/out octets, packets, errors) - BGP neighbor state - MLAG status - System metrics (CPU, memory)
Damien added 9 commits 2025-12-16 13:27:19 +00:00
Damien added 1 commit 2025-12-16 13:44:43 +00:00
Damien added 1 commit 2025-12-16 14:19:41 +00:00
- Remove invalid 'add-target: target' (must be overwrite|if-not-present|empty)
- Enable debug mode for troubleshooting
- Simplify interface paths to /interfaces/interface/state (Arista compatible)
- Simplify system paths to /system/state
- Remove complex BGP path that may not work on cEOS
- Add retry and timeout parameters for reliability
- Add expiration to prevent stale metrics
- Add skip-verify for insecure connections
- Increase sample intervals for stability
Damien added 1 commit 2025-12-16 14:21:46 +00:00
The flags --insecure and --skip-verify are mutually exclusive in gNMIc.
Since we're using insecure connections (no TLS), skip-verify is not needed.
Damien added 1 commit 2025-12-16 14:26:03 +00:00
Changed from:
- gnmic_interfaces_interface_state_counters_out_octets
- gnmic_interfaces_interface_state_counters_in_octets
- target label

To:
- gnmic_interfaces_out_octets
- gnmic_interfaces_in_octets  
- source label

These match the actual metrics generated by gNMIc with the simplified
/interfaces/interface/state path and trim-prefixes processor.
Damien added 1 commit 2025-12-16 14:26:36 +00:00
Changed from:
- gnmic_interfaces_interface_state_counters_* with target label

To:
- gnmic_interfaces_* with source label

Also added:
- Interfaces Monitored stat panel
- MLAG Peer-Link Traffic panel

These match the actual metrics generated by gNMIc.
Damien added 1 commit 2025-12-16 18:51:31 +00:00
Damien added 1 commit 2025-12-16 18:51:48 +00:00
Damien added 1 commit 2025-12-16 18:52:08 +00:00
Damien added 1 commit 2025-12-16 18:52:43 +00:00
Damien added 1 commit 2025-12-16 18:53:42 +00:00
Damien added 1 commit 2025-12-16 18:54:17 +00:00
Damien added 1 commit 2025-12-16 19:48:44 +00:00
Damien added 1 commit 2025-12-16 19:49:25 +00:00
Damien added 1 commit 2025-12-16 19:50:04 +00:00
Damien added 1 commit 2025-12-16 20:06:59 +00:00
Damien added 1 commit 2025-12-16 20:07:45 +00:00
Damien added 1 commit 2025-12-16 20:14:54 +00:00
Damien added 1 commit 2025-12-16 20:15:39 +00:00
Damien added 1 commit 2025-12-16 21:16:17 +00:00
Changes:
- Remove path-base transform that was stripping metric names to just leaf elements
- Change VXLAN subscription from on_change to sample mode (30s interval)
  to ensure consistent metric collection
- Remove unused event-processors from Prometheus output
- Clean up processor configuration

This fixes metric naming to match Grafana dashboard expectations:
- Before: gnmic_interfaces_out_octets
- After: gnmic_interfaces_interface_state_counters_out_octets

The full path names provide better clarity and match standard OpenConfig
metric naming conventions used in dashboards.
Damien closed this pull request 2025-12-26 14:49:51 +00:00
Damien deleted branch feature/grafana-monitoring 2025-12-26 14:49:51 +00:00

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Damien/arista-evpn-vxlan-clab#17