247 lines
5.7 KiB
Markdown
247 lines
5.7 KiB
Markdown
# Quick Start Guide - EVPN-VXLAN Monitoring Stack
|
|
|
|
## Prerequisites
|
|
|
|
1. **ContainerLab topology deployed** with management network named `evpn-mgmt`
|
|
2. **Docker and Docker Compose** installed
|
|
3. **gNMI enabled on all switches** (should already be configured)
|
|
|
|
## Deployment Steps
|
|
|
|
### 1. Deploy the Monitoring Stack
|
|
|
|
```bash
|
|
# Navigate to monitoring directory
|
|
cd monitoring
|
|
|
|
# Start all services
|
|
docker-compose up -d
|
|
|
|
# Verify all services are running
|
|
docker-compose ps
|
|
|
|
# Expected output:
|
|
# NAME STATUS PORTS
|
|
# gnmic Up (healthy) 0.0.0.0:9804->9804/tcp
|
|
# prometheus Up (healthy) 0.0.0.0:9090->9090/tcp
|
|
# grafana Up (healthy) 0.0.0.0:3000->3000/tcp
|
|
```
|
|
|
|
### 2. Verify gnmic is Collecting Metrics
|
|
|
|
```bash
|
|
# Check gnmic logs
|
|
docker logs gnmic
|
|
|
|
# Should see successful subscription messages like:
|
|
# "starting connection to target 'spine1'"
|
|
# "target 'spine1' gNMI connection established"
|
|
|
|
# Check metrics endpoint
|
|
curl http://localhost:9804/metrics | grep gnmic_interfaces | head -5
|
|
|
|
# Should see interface metrics:
|
|
# gnmic_interfaces_interface_state_counters_in_octets{...} 12345
|
|
# gnmic_interfaces_interface_state_counters_out_octets{...} 67890
|
|
```
|
|
|
|
### 3. Verify Prometheus is Scraping
|
|
|
|
```bash
|
|
# Check Prometheus targets
|
|
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job, health}'
|
|
|
|
# Should show gnmic target as "up":
|
|
# {
|
|
# "job": "gnmic",
|
|
# "health": "up"
|
|
# }
|
|
|
|
# Query a specific metric
|
|
curl -G http://localhost:9090/api/v1/query \
|
|
--data-urlencode 'query=gnmic_interfaces_interface_state_counters_out_octets{source="spine1"}' \
|
|
| jq '.data.result[0]'
|
|
```
|
|
|
|
### 4. Access Grafana
|
|
|
|
1. **Open browser**: http://localhost:3000
|
|
2. **Login** (optional): admin/admin
|
|
- Or use anonymous access (Viewer role)
|
|
3. **Navigate to dashboards**:
|
|
- Dashboards → Browse
|
|
- Select "EVPN-VXLAN Fabric Flow Topology"
|
|
|
|
### 5. Generate Traffic (Optional)
|
|
|
|
To see bandwidth visualization in action:
|
|
|
|
```bash
|
|
# From your lab directory (not monitoring/)
|
|
cd ..
|
|
|
|
# Generate traffic between clients
|
|
# (Assumes you have traffic generation scripts)
|
|
bash scripts/generate-traffic.sh
|
|
```
|
|
|
|
## Accessing the Stack
|
|
|
|
### Service URLs
|
|
|
|
| Service | URL | Credentials |
|
|
|---------|-----|-------------|
|
|
| Grafana | http://localhost:3000 | admin/admin or anonymous |
|
|
| Prometheus | http://localhost:9090 | None |
|
|
| gnmic metrics | http://localhost:9804/metrics | None |
|
|
|
|
### Available Dashboards
|
|
|
|
1. **EVPN-VXLAN Fabric Flow Topology** (`fabric-flow-topology.json`)
|
|
- Interactive flowchart of fabric topology
|
|
- Real-time bandwidth overlays on links
|
|
- Spine and leaf interface graphs
|
|
|
|
2. **Fabric Overview** (`fabric-overview.json`)
|
|
- General fabric statistics
|
|
- Device health overview
|
|
|
|
## Troubleshooting
|
|
|
|
### Problem: gnmic not collecting data
|
|
|
|
**Check switch gNMI configuration:**
|
|
```bash
|
|
# SSH to any switch
|
|
ssh admin@172.16.0.1
|
|
|
|
# Verify gNMI is enabled
|
|
show management api gnmi
|
|
|
|
# Should show:
|
|
# Enabled: yes
|
|
# Transport: GRPC
|
|
```
|
|
|
|
**If not enabled, add to switch configs:**
|
|
```
|
|
management api gnmi
|
|
transport grpc default
|
|
```
|
|
|
|
### Problem: Prometheus shows no data
|
|
|
|
**Check:**
|
|
```bash
|
|
# 1. Verify gnmic is exposing metrics
|
|
curl http://localhost:9804/metrics | grep gnmic
|
|
|
|
# 2. Check Prometheus logs
|
|
docker logs prometheus | tail -20
|
|
|
|
# 3. Check Prometheus config is valid
|
|
docker exec prometheus promtool check config /etc/prometheus/prometheus.yml
|
|
```
|
|
|
|
### Problem: Grafana dashboard shows "No Data"
|
|
|
|
**Check:**
|
|
1. **Prometheus datasource**: Configuration → Data Sources → Prometheus
|
|
- URL should be: http://prometheus:9090
|
|
- Click "Save & Test" - should show green "Data source is working"
|
|
|
|
2. **Query in Explore**:
|
|
- Menu → Explore
|
|
- Select "Prometheus" datasource
|
|
- Run query: `gnmic_interfaces_interface_state_counters_out_octets`
|
|
- Should return results
|
|
|
|
3. **Time range**: Ensure dashboard time range shows recent data (last 1h)
|
|
|
|
### Problem: Flow diagram not rendering
|
|
|
|
**Check:**
|
|
1. **Plugin installed**:
|
|
```bash
|
|
docker exec grafana grafana-cli plugins ls | grep agenty
|
|
```
|
|
Should show: agenty-flowcharting-panel
|
|
|
|
2. **If missing, reinstall**:
|
|
```bash
|
|
docker-compose down
|
|
docker-compose up -d
|
|
```
|
|
|
|
## Stopping the Stack
|
|
|
|
```bash
|
|
# Stop all services
|
|
docker-compose down
|
|
|
|
# Stop and remove volumes (fresh start)
|
|
docker-compose down -v
|
|
```
|
|
|
|
## Updating Configuration
|
|
|
|
### Update gnmic subscriptions
|
|
|
|
1. Edit `gnmic/gnmic.yaml`
|
|
2. Restart gnmic:
|
|
```bash
|
|
docker-compose restart gnmic
|
|
```
|
|
|
|
### Update Prometheus scrape config
|
|
|
|
1. Edit `prometheus/prometheus.yml`
|
|
2. Reload Prometheus (no restart needed):
|
|
```bash
|
|
curl -X POST http://localhost:9090/-/reload
|
|
```
|
|
|
|
### Update Grafana dashboards
|
|
|
|
1. Edit JSON files in `grafana/dashboards/`
|
|
2. Restart Grafana:
|
|
```bash
|
|
docker-compose restart grafana
|
|
```
|
|
OR update via UI and export
|
|
|
|
## Next Steps
|
|
|
|
1. **Explore metrics**: Use Prometheus Explore to see all available metrics
|
|
2. **Create custom dashboards**: Build specific views for your use cases
|
|
3. **Add alerting**: Configure Prometheus alerting rules
|
|
4. **Add more visualizations**: Enhanced BGP, VXLAN, and MLAG dashboards
|
|
|
|
## Useful Commands
|
|
|
|
```bash
|
|
# View logs for all services
|
|
docker-compose logs -f
|
|
|
|
# View logs for specific service
|
|
docker-compose logs -f gnmic
|
|
|
|
# Restart specific service
|
|
docker-compose restart prometheus
|
|
|
|
# Check resource usage
|
|
docker stats gnmic prometheus grafana
|
|
|
|
# Execute command in container
|
|
docker exec -it gnmic sh
|
|
```
|
|
|
|
## Support
|
|
|
|
- **gnmic**: https://gnmic.openconfig.net
|
|
- **Prometheus**: https://prometheus.io/docs
|
|
- **Grafana**: https://grafana.com/docs
|
|
- **Flow Plugin**: https://grafana.com/grafana/plugins/agenty-flowcharting-panel/
|
|
|
|
For issues specific to this lab, check the main repository documentation.
|