Add models for MLAG Domain, Peer Config, and MLAG Interfaces. Supports dual-primary detection and heartbeat configuration. Ref: #41
Fabric Orchestrator
Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics
A workflow-based orchestration system that uses InfraHub as Source of Truth, Prefect for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
🎯 Project Vision
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
- Intent is defined in InfraHub (custom schema, Git-versioned)
- Orchestration is handled by Prefect (Python-native
@flowand@taskdecorators) - State is continuously monitored via gNMI Subscribe
- Changes are computed as diffs and applied atomically via gNMI Set
- Drift is detected and optionally auto-remediated
Think terraform plan and terraform apply, but for your network fabric — powered by Prefect flows.
🏗️ Architecture
🎯 Why InfraHub?
We chose InfraHub over NetBox as Source of Truth for several reasons:
| Feature | NetBox | InfraHub |
|---|---|---|
| Schema | Fixed DCIM/IPAM model | Fully customizable YAML schema |
| Git Integration | External sync needed | Native - branches = data branches |
| Versioning | Changelog only | True Git-like versioning with merges |
| Test/Redeploy | Dump/restore | git clone = complete environment |
| Transforms | Limited | Built-in Jinja2 + Python transforms |
| GraphQL | Yes | Yes (auto-generated from schema) |
Key benefits for this project:
- Custom Schema - Model exactly what we need (VTEPs, MLAG pairs, fabric topology)
- Git-native - Schema + data versioned together, easy test environment setup
- Transforms - Generate device configs directly from InfraHub
- Branches - Test fabric changes in isolated branches before merge
📦 Repository as InfraHub Backend
This repository serves as the single source of truth for both code and infrastructure data:
fabric-orchestrator/
├── .infrahub.yml # InfraHub repository config
│
├── schemas/ # InfraHub schema definitions
│ └── fabric.yml # Custom EVPN-VXLAN fabric schema
│
├── data/ # Infrastructure objects (YAML)
│ ├── topology/
│ │ ├── sites.yml
│ │ └── devices.yml # Spines, Leafs, VTEP pairs
│ ├── network/
│ │ ├── vlans.yml # VLANs + L2VNI mappings
│ │ ├── vrfs.yml # VRFs + L3VNI mappings
│ │ └── interfaces.yml # Interface configs
│ └── routing/
│ ├── bgp_sessions.yml # Underlay + EVPN overlay
│ └── evpn.yml # Route targets, RDs
│
├── transforms/ # Jinja2 templates for config generation
│ └── arista/
│ ├── base.j2
│ ├── interfaces.j2
│ ├── bgp.j2
│ └── evpn.j2
│
└── src/ # Python orchestration code
Workflow
# 1. Edit data files (e.g., add a VLAN)
vim data/network/vlans.yml
# 2. Commit & push
git commit -am "Add VLAN 100 for production"
git push
# 3. InfraHub syncs automatically from Git
# → Data available via GraphQL
# 4. Prefect flow detects change → reconciles fabric
Benefits
- Reproductibility:
git clone→docker compose up→ complete environment - Code Review: Infrastructure changes go through PR review
- History: Full audit trail via Git
- Testing: Create a branch, test changes, merge when validated
🎛 Why Prefect?
| Feature | Benefit |
|---|---|
| Python-native workflows | Use @flow and @task decorators — no YAML, just Python |
| Free secrets management | Native Secret blocks for credentials (free in OSS) |
| Built-in UI | Dashboard, logs, metrics, execution history via prefect server start |
| No containerization required | Run flows directly with .serve() — no Docker needed |
| Event-driven triggers | Schedule, webhooks (via FastAPI), flow triggers out of the box |
| Task dependencies | Automatic dependency ordering via task result passing or wait_for |
| Retry & error handling | Built-in retry policies with @task(retries=3) |
| Human-in-the-loop | Native pause_flow_run() for approval workflows |
🎯 Target Fabric
This project is designed for the Arista EVPN-VXLAN ContainerLab topology:
- 2 Spines (BGP Route Reflectors, AS 65000)
- 8 Leafs (4 MLAG VTEP pairs, AS 65001-65004)
- cEOS 4.35.0F with gNMI enabled
- EVPN Type-2 (L2 VXLAN) and Type-5 (L3 VXLAN) support
Reference: arista-evpn-vxlan-clab
📋 Project Phases
Progress is tracked via issues. See all issues or filter by phase:
| Phase | Description | Status |
|---|---|---|
| Phase 1 | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | ✅ Complete |
| Phase 2 | InfraHub Setup & Core Reconciler - Schema, diff engine, YANG mappers | 🔄 In Progress |
| Phase 3 | Full Fabric Coverage - BGP, MLAG, VRFs mappers | 📋 Planned |
| Phase 4 | Prefect Integration - Flows, webhooks, drift detection | 📋 Planned |
📁 Project Structure
fabric-orchestrator/
├── README.md
├── pyproject.toml
├── .infrahub.yml # InfraHub config (points to schemas/)
│
├── schemas/ # InfraHub schema definitions
│ └── fabric.yml # Custom EVPN-VXLAN fabric schema
│
├── data/ # Infrastructure data (YAML)
│ ├── topology/
│ │ ├── sites.yml
│ │ └── devices.yml
│ ├── network/
│ │ ├── vlans.yml
│ │ ├── vrfs.yml
│ │ └── interfaces.yml
│ └── routing/
│ ├── bgp_sessions.yml
│ └── evpn.yml
│
├── transforms/ # Jinja2 config templates
│ └── arista/
│ └── *.j2
│
├── src/ # Python package
│ ├── __init__.py
│ ├── cli.py # CLI for YANG discovery
│ │
│ ├── flows/ # Prefect flows
│ │ ├── __init__.py
│ │ ├── reconcile.py # @flow fabric_reconcile
│ │ ├── drift.py # @flow handle_drift
│ │ └── remediation.py # @flow drift_remediation
│ │
│ ├── gnmi/
│ │ ├── __init__.py
│ │ ├── client.py # gNMI client wrapper (pygnmi)
│ │ └── README.md
│ │
│ ├── infrahub/ # InfraHub integration
│ │ ├── __init__.py
│ │ ├── client.py # InfraHub SDK wrapper
│ │ └── queries.py # GraphQL queries
│ │
│ └── yang/
│ ├── __init__.py
│ ├── mapper.py # InfraHub intent → YANG paths
│ ├── paths.py # YANG path definitions
│ └── mappers/ # Resource-specific mappers
│ ├── vlan.py
│ ├── interface.py
│ ├── bgp.py
│ └── vxlan.py
│
├── tests/
│
└── docs/
├── cli-user-guide.md
└── yang-paths.md
🛠️ Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Source of Truth | InfraHub | Intent definition via custom schema |
| Data Storage | This Git repo | Schema + data versioned together |
| Orchestrator | Prefect | Python-native workflow orchestration |
| Transport | gNMI | Configuration and telemetry |
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
| Python Library | pygnmi + infrahub-sdk | gNMI/InfraHub interactions |
| CLI | Click + Rich | YANG discovery tools |
| Validation | Pydantic v2 | Intent data validation |
| Lab | ContainerLab + cEOS | Development environment |
🔗 Related Projects
- arista-evpn-vxlan-clab - Target fabric topology
- InfraHub - Source of Truth platform
- InfraHub Schema Library - Reference schemas
- Arista YANG Models - EOS 4.35.0F YANG definitions
📚 References
InfraHub
Prefect
YANG / gNMI
EVPN-VXLAN
🚀 Getting Started
Prerequisites
- Python 3.12+
uvpackage manager- Docker (for InfraHub)
- Access to ContainerLab with cEOS images
Quick Start
# Clone the repository (includes schema + data)
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
cd fabric-orchestrator
# Install Python dependencies
uv sync
# Start InfraHub (loads schema & data from this repo)
docker compose up -d
# Configure Prefect secrets
python -c "
from prefect.blocks.system import Secret
from prefect.variables import Variable
Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True)
Variable.set('infrahub_url', 'http://localhost:8000')
Variable.set('gnmi_username', 'admin')
"
# Verify gNMI connectivity
uv run fabric-orch discover capabilities --target leaf1:6030
# Run reconciliation
uv run fabric-orch plan
uv run fabric-orch apply
Prefect Flow Example
from prefect import flow, task
from prefect.variables import Variable
@task(retries=2, retry_delay_seconds=10)
def get_fabric_intent(device: str | None = None) -> dict:
"""Retrieve fabric intent from InfraHub."""
from infrahub_sdk import InfrahubClient
client = InfrahubClient(address=Variable.get("infrahub_url"))
# Query fabric intent via GraphQL
return client.query(...)
@task
def compute_diff(intent: dict, current: dict) -> list[dict]:
"""Compute diff between desired and current state."""
from src.reconciler.diff import compute_diff as diff_engine
return diff_engine(want=intent, have=current)
@flow(log_prints=True, name="fabric-reconcile")
def fabric_reconcile(device: str | None = None, dry_run: bool = True) -> dict:
"""Reconcile fabric state with InfraHub intent."""
intent = get_fabric_intent(device)
current = get_current_state(device)
changes = compute_diff(intent, current)
if not changes:
print("✅ Fabric is in sync")
return {"in_sync": True}
if not dry_run:
apply_changes(changes)
return {"changes": changes, "applied": not dry_run}
Status: 🚧 Active Development - Phase 2 (InfraHub Setup & Core Reconciler)