Commit Graph

69 Commits

Author SHA1 Message Date
bcb3160c9b Add quick start deployment guide for monitoring stack 2025-12-16 18:54:15 +00:00
4b657a4e1e Add comprehensive configuration review documentation 2025-12-16 18:53:40 +00:00
903522dd82 Create Flow Plugin-based topology dashboard to replace weathermap 2025-12-16 18:52:42 +00:00
011541b7f2 Update docker-compose to use Flow Plugin instead of archived weathermap 2025-12-16 18:52:07 +00:00
b77f461967 Enhance Prometheus config with better metric filtering for Flow Plugin 2025-12-16 18:51:46 +00:00
b34b0eed7d Enhance gnmic config for Flow Plugin support with BGP/EVPN telemetry 2025-12-16 18:51:30 +00:00
b23353bf15 fix(grafana): correct metric names in fabric-overview dashboard
Changed from:
- gnmic_interfaces_interface_state_counters_* with target label

To:
- gnmic_interfaces_* with source label

Also added:
- Interfaces Monitored stat panel
- MLAG Peer-Link Traffic panel

These match the actual metrics generated by gNMIc.
2025-12-16 14:26:35 +00:00
ca55e2ff59 fix(grafana): correct metric names in weathermap queries
Changed from:
- gnmic_interfaces_interface_state_counters_out_octets
- gnmic_interfaces_interface_state_counters_in_octets
- target label

To:
- gnmic_interfaces_out_octets
- gnmic_interfaces_in_octets  
- source label

These match the actual metrics generated by gNMIc with the simplified
/interfaces/interface/state path and trim-prefixes processor.
2025-12-16 14:26:01 +00:00
6c08b9ecf7 fix(gnmic): remove skip-verify (mutually exclusive with insecure)
The flags --insecure and --skip-verify are mutually exclusive in gNMIc.
Since we're using insecure connections (no TLS), skip-verify is not needed.
2025-12-16 14:21:43 +00:00
5fdf374fa4 fix(gnmic): rewrite config with correct parameters and simplified paths
- Remove invalid 'add-target: target' (must be overwrite|if-not-present|empty)
- Enable debug mode for troubleshooting
- Simplify interface paths to /interfaces/interface/state (Arista compatible)
- Simplify system paths to /system/state
- Remove complex BGP path that may not work on cEOS
- Add retry and timeout parameters for reliability
- Add expiration to prevent stale metrics
- Add skip-verify for insecure connections
- Increase sample intervals for stability
2025-12-16 14:19:39 +00:00
d01598f9ce Fix gnmic config: remove mlag and vxlan subscriptions (not available via OpenConfig on cEOS) 2025-12-16 13:44:41 +00:00
92e8556e1f Add Network Weathermap dashboard template 2025-12-16 13:05:55 +00:00
1c08b156d6 Add monitoring stack deployment script 2025-12-16 13:05:14 +00:00
c12bd2a701 Add Docker Compose for monitoring stack 2025-12-16 13:04:51 +00:00
c975945d27 Add Grafana fabric overview dashboard 2025-12-16 13:04:33 +00:00
6f873c8584 Add Grafana dashboard provisioning configuration 2025-12-16 12:14:41 +00:00
35123308c2 Add Grafana datasource provisioning for Prometheus 2025-12-16 12:14:35 +00:00
da5a8997d3 Add Prometheus configuration 2025-12-16 12:14:25 +00:00
442211ed5b Add gnmic configuration for gNMI telemetry collection 2025-12-16 12:14:16 +00:00
2762a5040b Add monitoring stack README 2025-12-16 12:13:54 +00:00
d9327ed95f feat(configs): enable gNMI API on all network devices
Enables the gNMI (gRPC Network Management Interface) API across all leaf
and spine switches to allow for telemetry streaming and programmatic
device management.

Configuration details:
- Transport: grpc default
- Provider: eos-native
2025-12-16 12:00:20 +00:00
481f3d996d Chore(annotations): Update AS information 2025-12-16 10:35:34 +00:00
bbcf2c9cb9 chore(configs): update admin password for spine switches 2025-12-14 19:26:39 +00:00
5c55732941 clab annotation 2025-12-14 19:04:37 +00:00
8ea0fdcf64 adapt line width 2025-12-14 19:02:33 +00:00
308781e6eb Update readme 2025-12-04 10:12:06 +00:00
51f21f16c9 Add topology visualization and annotations
Add SVG topology diagram and containerlab annotations file with AS number labels and node positioning for the EVPN VXLAN lab environment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 10:11:03 +00:00
1a084d0c3f Add detailed information to README
Expanded the README to provide more details about the configuration,
including:

- AS Numbers
- IP Addressing (Management, Loopback, Underlay P2P, Host)
- VXLAN Network Identifiers (VNI)
- Features Implemented
- Test Connectivity
2025-11-30 19:16:56 +00:00
db54e56b41 chore: Repository cleanup - Remove unnecessary files (#16)
## Summary

Repository cleanup to remove unnecessary files and streamline documentation after the successful EVPN-VXLAN lab implementation.

Closes #15

---

## Changes

### Files Removed (13 files total)

**Scripts folder:**
- `scripts/deploy.sh`
- `scripts/test-connectivity.sh`
- `scripts/cleanup.sh`

**Root-level markdown files:**
- `BRANCH_SUMMARY.md`
- `BUGFIX_EVPN_ACTIVATION.md`
- `DEPLOYMENT_GUIDE.md`
- `FIXES_APPLIED.md`
- `TESTING_CHECKLIST.md`
- `VLAN_TAGGING_FIX_EXPLANATION.md`

**docs/ folder (entire folder removed):**
- `docs/HOST_INTERFACE_CONFIGURATION.md`
- `docs/configuration-guide.md`
- `docs/quick-reference.md`
- `docs/validation-commands.md`

### Files Updated
- `hosts/README.md` - Fixed broken links
- `README.md` - Updated repository structure section

---

## Final Repository Structure

```
├── .gitignore
├── README.md                    # Main documentation
├── TROUBLESHOOTING.md           # Troubleshooting guide
├── END_TO_END_TESTING.md        # Testing procedures
├── evpn-lab.clab.yml            # ContainerLab topology
├── configs/                     # Switch configurations (10 files)
└── hosts/                       # Host interface configs (5 files)
```

---

## Testing

- [x] Lab redeployed successfully with `containerlab deploy -t evpn-lab.clab.yml`
- [x] L2 VXLAN connectivity verified (host1 ↔ host3)
- [x] L3 VXLAN connectivity verified (host2 ↔ host4)
- [x] All BGP EVPN sessions established
- [x] MLAG pairs operational

Reviewed-on: #16
2025-11-30 19:07:22 +00:00
1080bf07bb Complete Lab Fixes - L2 and L3 VXLAN Fully Operational (#14)
## Summary

This PR merges all fixes and improvements from the troubleshooting journey to make the Arista EVPN-VXLAN lab fully operational with both L2 and L3 VXLAN connectivity.

## What's Changed

### 🎯 Major Achievements
-  **L2 VXLAN fully operational** - host1 ↔ host3 connectivity verified
-  **L3 VXLAN fully operational** - host2 ↔ host4 connectivity verified (VRF gold)
-  **LACP bonding working** - dual-homed hosts with proper Port-Channel negotiation
-  **All BGP/EVPN sessions established** - complete underlay and overlay working

### 🔧 Infrastructure Fixes

#### BGP & Routing
- Added `ip routing` command to all spine and leaf switches
- Fixed duplicate BGP network statements on leaf3, leaf4, leaf7, leaf8
- Activated EVPN neighbors on spine switches
- Added loopback network advertisements to BGP

#### MLAG Configuration
- Configured MLAG peer-link in trunk mode (not access) for VLAN 4090/4091
- Added dual-active detection via management interface
- Configured virtual router MAC for MLAG pairs

#### Switch Port Configuration
- Port-Channel1 configured in **trunk mode** on all leaf switches
- Added `switchport trunk allowed vlan` for host VLANs (34, 40, 78)
- Removed `no shutdown` from Port-Channel interfaces

### 🖥️ Host Networking - Complete Redesign

#### Image Change
- **Old:** `alpine:latest` (had bonding syntax issues)
- **New:** `ghcr.io/hellt/network-multitool` (networking tools pre-installed)

#### LACP Bonding Configuration
Proper LACP setup following network-multitool best practices:
```yaml
- ip link add bond0 type bond mode 802.3ad
- ip link set dev bond0 type bond xmit_hash_policy layer3+4
- ip link set dev eth1 down
- ip link set dev eth2 down
- ip link set eth1 master bond0
- ip link set eth2 master bond0
- ip link set dev eth1 up
- ip link set dev eth2 up
- ip link set dev bond0 type bond lacp_rate fast
- ip link set dev bond0 up
```

#### VLAN Configuration
- **L2 VXLAN hosts (host1, host3):** VLAN 40 tagged on bond0
- **L3 VXLAN hosts (host2, host4):** VLANs 34 and 78 tagged on bond0

#### Routing Strategy
- Kept management default route (172.16.0.254 via eth0)
- Added **specific routes** for L3 VXLAN networks instead of default routes:
  - host2: `ip route add 10.78.78.0/24 via 10.34.34.1`
  - host4: `ip route add 10.34.34.0/24 via 10.78.78.1`

### 📁 Files Changed

#### Switch Configurations (Updated)
- `configs/spine1.cfg` - Added ip routing, EVPN activation
- `configs/spine2.cfg` - Added ip routing, EVPN activation
- `configs/leaf1.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf2.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf3.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf4.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf5.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf6.cfg` - Port-Channel trunk mode, VLAN config
- `configs/leaf7.cfg` - Added ip routing, loopback ads, Port-Channel config
- `configs/leaf8.cfg` - Added ip routing, loopback ads, Port-Channel config

#### Topology (Updated)
- `evpn-lab.clab.yml` - Updated all host configurations with network-multitool image and proper LACP/VLAN setup

#### Documentation (New)
- `hosts/README.md` - Host interface configuration guide
- `hosts/host1_interfaces` - Interface file for host1 (not currently used, kept for reference)
- `hosts/host2_interfaces` - Interface file for host2 (not currently used, kept for reference)
- `hosts/host3_interfaces` - Interface file for host3 (not currently used, kept for reference)
- `hosts/host4_interfaces` - Interface file for host4 (not currently used, kept for reference)

## Testing & Verification

###  L2 VXLAN (VLAN 40)
```
host1 (10.40.40.101) → host3 (10.40.40.103)
- Connectivity: VERIFIED ✓
- VXLAN tunnel: VTEP1 ↔ VTEP3
- MAC learning: Working via EVPN Type-2
```

###  L3 VXLAN (VRF gold)
```
host2 (10.34.34.102) → host4 (10.78.78.104)
- Connectivity: VERIFIED ✓
- Ping results: 0% packet loss, TTL=62
- Routing: Via EVPN Type-5 through fabric
```

###  Infrastructure Status
- BGP Underlay: All sessions ESTAB
- EVPN Overlay: All neighbors ESTAB
- MLAG: All 4 pairs operational
- Port-Channels: LACP negotiated on all hosts

## Related Issues

Fixes #1 - Lab deployment and configuration fixes
Fixes #2 - BGP EVPN neighbors stuck in Connect state
Fixes #3 - Ready for deployment with EVPN activation
Fixes #4 - Lab convergence in progress
Fixes #5 - BGP EVPN neighbors stuck in Active state
Fixes #11 - Host LACP bonding configuration
Fixes #13 - L3 VXLAN default route issue

## Key Technical Learnings

1. **Arista EOS requires explicit `ip routing`** before BGP can function
2. **MLAG peer-link must be trunk mode** to allow VLAN 4090/4091 traversal
3. **VLAN tagging location matters** - hosts tag, switches use trunk mode
4. **network-multitool image** superior to Alpine for LACP bonding
5. **Specific routes better than default routes** when management network present
6. **LACP rate fast** ensures quick negotiation with Arista switches

## Deployment

After merging, deploy with:
```bash
cd ~/arista-evpn-vxlan-clab
sudo containerlab destroy -t evpn-lab.clab.yml --cleanup
sudo containerlab deploy -t evpn-lab.clab.yml
```

No manual post-deployment configuration needed - everything works from initial deployment!

## Breaking Changes

⚠️ **Host image changed** from `alpine:latest` to `ghcr.io/hellt/network-multitool`
⚠️ **Host configuration completely redesigned** - old exec commands replaced

## Reviewers

@Damien - Please review and merge when ready

---

**This PR represents the complete troubleshooting journey and brings the lab to production-ready status with full L2 and L3 VXLAN functionality.** 🚀

Reviewed-on: #14
Co-authored-by: Damien <damien@arnodo.fr>
Co-committed-by: Damien <damien@arnodo.fr>
2025-11-30 10:24:29 +00:00
9502302b76 Revert topology to original version before accidental commits 2025-11-30 08:55:08 +00:00
d6acdfbe75 Revert accidental commit to main - remove HOST_CONFIGURATION.md 2025-11-30 08:53:58 +00:00
e6210267a6 Revert accidental commit to main - remove host4-interfaces 2025-11-30 08:53:44 +00:00
67816b84a1 Revert accidental commit to main - remove host3-interfaces 2025-11-30 08:53:32 +00:00
aef0890f34 Revert accidental commit to main - remove host2-interfaces 2025-11-30 08:53:21 +00:00
0baec91037 Revert accidental commit to main - remove host interface files 2025-11-30 08:53:11 +00:00
7c08125826 Add comprehensive host interface configuration documentation 2025-11-30 08:21:24 +00:00
e04a531a86 Update topology to use persistent interface files via binds instead of exec commands 2025-11-30 08:20:58 +00:00
926ab47337 Add host4 interface configuration for LACP bonding with VLAN 78 2025-11-30 08:20:31 +00:00
ce76b0c342 Add host3 interface configuration for LACP bonding with VLAN 40 2025-11-30 08:20:26 +00:00
31e1e345cd Add host2 interface configuration for LACP bonding with VLAN 34 2025-11-30 08:20:21 +00:00
050a529c68 Add host1 interface configuration for LACP bonding with VLAN 40 2025-11-30 08:20:16 +00:00
4fc902ee13 Add DEPLOYMENT_GUIDE.md with step-by-step instructions
Provide clear deployment instructions, verification checklist, and
troubleshooting guidance for the EVPN-VXLAN lab with applied fixes.
2025-11-28 09:28:13 +00:00
783e12cea6 Add FIXES_APPLIED.md to main branch
Document all critical fixes discovered during lab testing:
- Spine routing: ip routing command added
- MLAG: static LAG mode enabled
- Pending: port-channel access mode, host networking

Track status of each fix for deployment readiness.
2025-11-28 09:27:56 +00:00
62efe9fc93 Apply critical fix: Add ip routing command to spine2
This enables BGP and IP forwarding on spine switches. Without this command,
BGP sessions cannot be established and routing is disabled. This is essential
for the underlay fabric to function properly.
2025-11-28 09:27:19 +00:00
41a7e5e9e3 Apply critical fix: Add ip routing command to spine1
This enables BGP and IP forwarding on spine switches. Without this command,
BGP sessions cannot be established and routing is disabled. This is essential
for the underlay fabric to function properly.
2025-11-28 09:27:03 +00:00
8c3eb7f2d2 Fix MLAG and host interface configuration
Changed the channel-group mode to "on" on the host-facing interfaces.

Changed the bonding mode to balance-rr in the clab file.
2025-11-28 09:15:36 +00:00
02d41fde2e Add host network addressing and test connectivity steps
Add host network addressing information. Add L2 and L3 VXLAN testing
steps. Add commands to verify EVPN routes on the switches.
2025-11-24 20:10:05 +00:00
8a291426f9 Add admin user configuration to all devices 2025-11-21 12:09:23 +00:00
3946d6b2e5 Add EOS image to .gitignore and update topology
Update topology to use arista_ceos kind and new image.
2025-11-17 19:27:57 +00:00