From bcb3160c9be42e50126fac7bffc12b9487d547c4 Mon Sep 17 00:00:00 2001 From: Damien Arnodo Date: Tue, 16 Dec 2025 18:54:15 +0000 Subject: [PATCH] Add quick start deployment guide for monitoring stack --- monitoring/QUICKSTART.md | 246 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 246 insertions(+) create mode 100644 monitoring/QUICKSTART.md diff --git a/monitoring/QUICKSTART.md b/monitoring/QUICKSTART.md new file mode 100644 index 0000000..7bbc186 --- /dev/null +++ b/monitoring/QUICKSTART.md @@ -0,0 +1,246 @@ +# Quick Start Guide - EVPN-VXLAN Monitoring Stack + +## Prerequisites + +1. **ContainerLab topology deployed** with management network named `evpn-mgmt` +2. **Docker and Docker Compose** installed +3. **gNMI enabled on all switches** (should already be configured) + +## Deployment Steps + +### 1. Deploy the Monitoring Stack + +```bash +# Navigate to monitoring directory +cd monitoring + +# Start all services +docker-compose up -d + +# Verify all services are running +docker-compose ps + +# Expected output: +# NAME STATUS PORTS +# gnmic Up (healthy) 0.0.0.0:9804->9804/tcp +# prometheus Up (healthy) 0.0.0.0:9090->9090/tcp +# grafana Up (healthy) 0.0.0.0:3000->3000/tcp +``` + +### 2. Verify gnmic is Collecting Metrics + +```bash +# Check gnmic logs +docker logs gnmic + +# Should see successful subscription messages like: +# "starting connection to target 'spine1'" +# "target 'spine1' gNMI connection established" + +# Check metrics endpoint +curl http://localhost:9804/metrics | grep gnmic_interfaces | head -5 + +# Should see interface metrics: +# gnmic_interfaces_interface_state_counters_in_octets{...} 12345 +# gnmic_interfaces_interface_state_counters_out_octets{...} 67890 +``` + +### 3. Verify Prometheus is Scraping + +```bash +# Check Prometheus targets +curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job, health}' + +# Should show gnmic target as "up": +# { +# "job": "gnmic", +# "health": "up" +# } + +# Query a specific metric +curl -G http://localhost:9090/api/v1/query \ + --data-urlencode 'query=gnmic_interfaces_interface_state_counters_out_octets{source="spine1"}' \ + | jq '.data.result[0]' +``` + +### 4. Access Grafana + +1. **Open browser**: http://localhost:3000 +2. **Login** (optional): admin/admin + - Or use anonymous access (Viewer role) +3. **Navigate to dashboards**: + - Dashboards → Browse + - Select "EVPN-VXLAN Fabric Flow Topology" + +### 5. Generate Traffic (Optional) + +To see bandwidth visualization in action: + +```bash +# From your lab directory (not monitoring/) +cd .. + +# Generate traffic between clients +# (Assumes you have traffic generation scripts) +bash scripts/generate-traffic.sh +``` + +## Accessing the Stack + +### Service URLs + +| Service | URL | Credentials | +|---------|-----|-------------| +| Grafana | http://localhost:3000 | admin/admin or anonymous | +| Prometheus | http://localhost:9090 | None | +| gnmic metrics | http://localhost:9804/metrics | None | + +### Available Dashboards + +1. **EVPN-VXLAN Fabric Flow Topology** (`fabric-flow-topology.json`) + - Interactive flowchart of fabric topology + - Real-time bandwidth overlays on links + - Spine and leaf interface graphs + +2. **Fabric Overview** (`fabric-overview.json`) + - General fabric statistics + - Device health overview + +## Troubleshooting + +### Problem: gnmic not collecting data + +**Check switch gNMI configuration:** +```bash +# SSH to any switch +ssh admin@172.16.0.1 + +# Verify gNMI is enabled +show management api gnmi + +# Should show: +# Enabled: yes +# Transport: GRPC +``` + +**If not enabled, add to switch configs:** +``` +management api gnmi + transport grpc default +``` + +### Problem: Prometheus shows no data + +**Check:** +```bash +# 1. Verify gnmic is exposing metrics +curl http://localhost:9804/metrics | grep gnmic + +# 2. Check Prometheus logs +docker logs prometheus | tail -20 + +# 3. Check Prometheus config is valid +docker exec prometheus promtool check config /etc/prometheus/prometheus.yml +``` + +### Problem: Grafana dashboard shows "No Data" + +**Check:** +1. **Prometheus datasource**: Configuration → Data Sources → Prometheus + - URL should be: http://prometheus:9090 + - Click "Save & Test" - should show green "Data source is working" + +2. **Query in Explore**: + - Menu → Explore + - Select "Prometheus" datasource + - Run query: `gnmic_interfaces_interface_state_counters_out_octets` + - Should return results + +3. **Time range**: Ensure dashboard time range shows recent data (last 1h) + +### Problem: Flow diagram not rendering + +**Check:** +1. **Plugin installed**: + ```bash + docker exec grafana grafana-cli plugins ls | grep agenty + ``` + Should show: agenty-flowcharting-panel + +2. **If missing, reinstall**: + ```bash + docker-compose down + docker-compose up -d + ``` + +## Stopping the Stack + +```bash +# Stop all services +docker-compose down + +# Stop and remove volumes (fresh start) +docker-compose down -v +``` + +## Updating Configuration + +### Update gnmic subscriptions + +1. Edit `gnmic/gnmic.yaml` +2. Restart gnmic: + ```bash + docker-compose restart gnmic + ``` + +### Update Prometheus scrape config + +1. Edit `prometheus/prometheus.yml` +2. Reload Prometheus (no restart needed): + ```bash + curl -X POST http://localhost:9090/-/reload + ``` + +### Update Grafana dashboards + +1. Edit JSON files in `grafana/dashboards/` +2. Restart Grafana: + ```bash + docker-compose restart grafana + ``` + OR update via UI and export + +## Next Steps + +1. **Explore metrics**: Use Prometheus Explore to see all available metrics +2. **Create custom dashboards**: Build specific views for your use cases +3. **Add alerting**: Configure Prometheus alerting rules +4. **Add more visualizations**: Enhanced BGP, VXLAN, and MLAG dashboards + +## Useful Commands + +```bash +# View logs for all services +docker-compose logs -f + +# View logs for specific service +docker-compose logs -f gnmic + +# Restart specific service +docker-compose restart prometheus + +# Check resource usage +docker stats gnmic prometheus grafana + +# Execute command in container +docker exec -it gnmic sh +``` + +## Support + +- **gnmic**: https://gnmic.openconfig.net +- **Prometheus**: https://prometheus.io/docs +- **Grafana**: https://grafana.com/docs +- **Flow Plugin**: https://grafana.com/grafana/plugins/agenty-flowcharting-panel/ + +For issues specific to this lab, check the main repository documentation.