Files
arista-evpn-vxlan-clab/monitoring/QUICKSTART.md

5.7 KiB

Quick Start Guide - EVPN-VXLAN Monitoring Stack

Prerequisites

  1. ContainerLab topology deployed with management network named evpn-mgmt
  2. Docker and Docker Compose installed
  3. gNMI enabled on all switches (should already be configured)

Deployment Steps

1. Deploy the Monitoring Stack

# Navigate to monitoring directory
cd monitoring

# Start all services
docker-compose up -d

# Verify all services are running
docker-compose ps

# Expected output:
# NAME         STATUS          PORTS
# gnmic       Up (healthy)    0.0.0.0:9804->9804/tcp
# prometheus  Up (healthy)    0.0.0.0:9090->9090/tcp
# grafana     Up (healthy)    0.0.0.0:3000->3000/tcp

2. Verify gnmic is Collecting Metrics

# Check gnmic logs
docker logs gnmic

# Should see successful subscription messages like:
# "starting connection to target 'spine1'"
# "target 'spine1' gNMI connection established"

# Check metrics endpoint
curl http://localhost:9804/metrics | grep gnmic_interfaces | head -5

# Should see interface metrics:
# gnmic_interfaces_interface_state_counters_in_octets{...} 12345
# gnmic_interfaces_interface_state_counters_out_octets{...} 67890

3. Verify Prometheus is Scraping

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job, health}'

# Should show gnmic target as "up":
# {
#   "job": "gnmic",
#   "health": "up"
# }

# Query a specific metric
curl -G http://localhost:9090/api/v1/query \
  --data-urlencode 'query=gnmic_interfaces_interface_state_counters_out_octets{source="spine1"}' \
  | jq '.data.result[0]'

4. Access Grafana

  1. Open browser: http://localhost:3000
  2. Login (optional): admin/admin
    • Or use anonymous access (Viewer role)
  3. Navigate to dashboards:
    • Dashboards → Browse
    • Select "EVPN-VXLAN Fabric Flow Topology"

5. Generate Traffic (Optional)

To see bandwidth visualization in action:

# From your lab directory (not monitoring/)
cd ..

# Generate traffic between clients
# (Assumes you have traffic generation scripts)
bash scripts/generate-traffic.sh

Accessing the Stack

Service URLs

Service URL Credentials
Grafana http://localhost:3000 admin/admin or anonymous
Prometheus http://localhost:9090 None
gnmic metrics http://localhost:9804/metrics None

Available Dashboards

  1. EVPN-VXLAN Fabric Flow Topology (fabric-flow-topology.json)

    • Interactive flowchart of fabric topology
    • Real-time bandwidth overlays on links
    • Spine and leaf interface graphs
  2. Fabric Overview (fabric-overview.json)

    • General fabric statistics
    • Device health overview

Troubleshooting

Problem: gnmic not collecting data

Check switch gNMI configuration:

# SSH to any switch
ssh admin@172.16.0.1

# Verify gNMI is enabled
show management api gnmi

# Should show:
# Enabled: yes
# Transport: GRPC

If not enabled, add to switch configs:

management api gnmi
   transport grpc default

Problem: Prometheus shows no data

Check:

# 1. Verify gnmic is exposing metrics
curl http://localhost:9804/metrics | grep gnmic

# 2. Check Prometheus logs
docker logs prometheus | tail -20

# 3. Check Prometheus config is valid
docker exec prometheus promtool check config /etc/prometheus/prometheus.yml

Problem: Grafana dashboard shows "No Data"

Check:

  1. Prometheus datasource: Configuration → Data Sources → Prometheus

  2. Query in Explore:

    • Menu → Explore
    • Select "Prometheus" datasource
    • Run query: gnmic_interfaces_interface_state_counters_out_octets
    • Should return results
  3. Time range: Ensure dashboard time range shows recent data (last 1h)

Problem: Flow diagram not rendering

Check:

  1. Plugin installed:

    docker exec grafana grafana-cli plugins ls | grep agenty
    

    Should show: agenty-flowcharting-panel

  2. If missing, reinstall:

    docker-compose down
    docker-compose up -d
    

Stopping the Stack

# Stop all services
docker-compose down

# Stop and remove volumes (fresh start)
docker-compose down -v

Updating Configuration

Update gnmic subscriptions

  1. Edit gnmic/gnmic.yaml
  2. Restart gnmic:
    docker-compose restart gnmic
    

Update Prometheus scrape config

  1. Edit prometheus/prometheus.yml
  2. Reload Prometheus (no restart needed):
    curl -X POST http://localhost:9090/-/reload
    

Update Grafana dashboards

  1. Edit JSON files in grafana/dashboards/
  2. Restart Grafana:
    docker-compose restart grafana
    
    OR update via UI and export

Next Steps

  1. Explore metrics: Use Prometheus Explore to see all available metrics
  2. Create custom dashboards: Build specific views for your use cases
  3. Add alerting: Configure Prometheus alerting rules
  4. Add more visualizations: Enhanced BGP, VXLAN, and MLAG dashboards

Useful Commands

# View logs for all services
docker-compose logs -f

# View logs for specific service
docker-compose logs -f gnmic

# Restart specific service
docker-compose restart prometheus

# Check resource usage
docker stats gnmic prometheus grafana

# Execute command in container
docker exec -it gnmic sh

Support

For issues specific to this lab, check the main repository documentation.