Update Markdown tables including InfraHub comparison, Prefect benefits, and project progress phases to use consistent column padding. This improves the visual alignment and readability when viewing the raw source file.
13 KiB
Fabric Orchestrator
Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics
A workflow-based orchestration system that uses InfraHub as Source of Truth, Prefect for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
🎯 Project Vision
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
- Intent is defined in InfraHub (custom schema, Git-versioned)
- Orchestration is handled by Prefect (Python-native
@flowand@taskdecorators) - State is continuously monitored via gNMI Subscribe
- Changes are computed as diffs and applied atomically via gNMI Set
- Drift is detected and optionally auto-remediated
Think terraform plan and terraform apply, but for your network fabric — powered by Prefect flows.
🏗️ Architecture
🎯 Why InfraHub?
We chose InfraHub over NetBox as Source of Truth for several reasons:
| Feature | NetBox | InfraHub |
|---|---|---|
| Schema | Fixed DCIM/IPAM model | Fully customizable YAML schema |
| Git Integration | External sync needed | Native - branches = data branches |
| Versioning | Changelog only | True Git-like versioning with merges |
| Test/Redeploy | Dump/restore | git clone = complete environment |
| Transforms | Limited | Built-in Jinja2 + Python transforms |
| GraphQL | Yes | Yes (auto-generated from schema) |
Key benefits for this project:
- Custom Schema - Model exactly what we need (VTEPs, MLAG pairs, fabric topology)
- Git-native - Schema + data versioned together, easy test environment setup
- Transforms - Generate device configs directly from InfraHub
- Branches - Test fabric changes in isolated branches before merge
📦 Repository as InfraHub Backend
This repository serves as the single source of truth for both code and infrastructure data:
fabric-orchestrator/
├── .infrahub.yml # InfraHub repository config
│
├── schemas/ # InfraHub schema definitions
│ └── fabric.yml # Custom EVPN-VXLAN fabric schema
│
├── data/ # Infrastructure objects (YAML)
│ ├── topology/
│ │ ├── sites.yml
│ │ └── devices.yml # Spines, Leafs, VTEP pairs
│ ├── network/
│ │ ├── vlans.yml # VLANs + L2VNI mappings
│ │ ├── vrfs.yml # VRFs + L3VNI mappings
│ │ └── interfaces.yml # Interface configs
│ └── routing/
│ ├── bgp_sessions.yml # Underlay + EVPN overlay
│ └── evpn.yml # Route targets, RDs
│
├── transforms/ # Jinja2 templates for config generation
│ └── arista/
│ ├── base.j2
│ ├── interfaces.j2
│ ├── bgp.j2
│ └── evpn.j2
│
└── src/ # Python orchestration code
Workflow
# 1. Edit data files (e.g., add a VLAN)
vim data/network/vlans.yml
# 2. Commit & push
git commit -am "Add VLAN 100 for production"
git push
# 3. InfraHub syncs automatically from Git
# → Data available via GraphQL
# 4. Prefect flow detects change → reconciles fabric
Benefits
- Reproductibility:
git clone→docker compose up→ complete environment - Code Review: Infrastructure changes go through PR review
- History: Full audit trail via Git
- Testing: Create a branch, test changes, merge when validated
🎛 Why Prefect?
| Feature | Benefit |
|---|---|
| Python-native workflows | Use @flow and @task decorators — no YAML, just Python |
| Free secrets management | Native Secret blocks for credentials (free in OSS) |
| Built-in UI | Dashboard, logs, metrics, execution history via prefect server start |
| No containerization required | Run flows directly with .serve() — no Docker needed |
| Event-driven triggers | Schedule, webhooks (via FastAPI), flow triggers out of the box |
| Task dependencies | Automatic dependency ordering via task result passing or wait_for |
| Retry & error handling | Built-in retry policies with @task(retries=3) |
| Human-in-the-loop | Native pause_flow_run() for approval workflows |
🎯 Target Fabric
This project is designed for the Arista EVPN-VXLAN ContainerLab topology:
- 2 Spines (BGP Route Reflectors, AS 65000)
- 8 Leafs (4 MLAG VTEP pairs, AS 65001-65004)
- cEOS 4.35.0F with gNMI enabled
- EVPN Type-2 (L2 VXLAN) and Type-5 (L3 VXLAN) support
Reference: arista-evpn-vxlan-clab
📋 Project Phases
Progress is tracked via issues. See all issues or filter by phase:
| Phase | Description | Status |
|---|---|---|
| Phase 1 | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | ✅ Complete |
| Phase 2 | InfraHub Setup & Core Reconciler - Schema, diff engine, YANG mappers | 🔄 In Progress |
| Phase 3 | Full Fabric Coverage - BGP, MLAG, VRFs mappers | 📋 Planned |
| Phase 4 | Prefect Integration - Flows, webhooks, drift detection | 📋 Planned |
📁 Project Structure
fabric-orchestrator/
├── README.md
├── pyproject.toml
├── .infrahub.yml # InfraHub config (points to schemas/)
│
├── schemas/ # InfraHub schema definitions
│ └── fabric.yml # Custom EVPN-VXLAN fabric schema
│
├── data/ # Infrastructure data (YAML)
│ ├── topology/
│ │ ├── sites.yml
│ │ └── devices.yml
│ ├── network/
│ │ ├── vlans.yml
│ │ ├── vrfs.yml
│ │ └── interfaces.yml
│ └── routing/
│ ├── bgp_sessions.yml
│ └── evpn.yml
│
├── transforms/ # Jinja2 config templates
│ └── arista/
│ └── *.j2
│
├── src/ # Python package
│ ├── __init__.py
│ ├── cli.py # CLI for YANG discovery
│ │
│ ├── flows/ # Prefect flows
│ │ ├── __init__.py
│ │ ├── reconcile.py # @flow fabric_reconcile
│ │ ├── drift.py # @flow handle_drift
│ │ └── remediation.py # @flow drift_remediation
│ │
│ ├── gnmi/
│ │ ├── __init__.py
│ │ ├── client.py # gNMI client wrapper (pygnmi)
│ │ └── README.md
│ │
│ ├── infrahub/ # InfraHub integration
│ │ ├── __init__.py
│ │ ├── client.py # InfraHub SDK wrapper
│ │ └── queries.py # GraphQL queries
│ │
│ └── yang/
│ ├── __init__.py
│ ├── mapper.py # InfraHub intent → YANG paths
│ ├── paths.py # YANG path definitions
│ └── mappers/ # Resource-specific mappers
│ ├── vlan.py
│ ├── interface.py
│ ├── bgp.py
│ └── vxlan.py
│
├── tests/
│
└── docs/
├── cli-user-guide.md
└── yang-paths.md
🛠️ Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Source of Truth | InfraHub | Intent definition via custom schema |
| Data Storage | This Git repo | Schema + data versioned together |
| Orchestrator | Prefect | Python-native workflow orchestration |
| Transport | gNMI | Configuration and telemetry |
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
| Python Library | pygnmi + infrahub-sdk | gNMI/InfraHub interactions |
| CLI | Click + Rich | YANG discovery tools |
| Validation | Pydantic v2 | Intent data validation |
| Lab | ContainerLab + cEOS | Development environment |
🔗 Related Projects
- arista-evpn-vxlan-clab - Target fabric topology
- InfraHub - Source of Truth platform
- InfraHub Schema Library - Reference schemas
- Arista YANG Models - EOS 4.35.0F YANG definitions
📚 References
InfraHub
Prefect
YANG / gNMI
EVPN-VXLAN
🚀 Getting Started
Prerequisites
- Python 3.12+
uvpackage manager- Docker (for InfraHub)
- Access to ContainerLab with cEOS images
Quick Start
# Clone the repository (includes schema + data)
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
cd fabric-orchestrator
# Install Python dependencies
uv sync
# Start InfraHub (loads schema & data from this repo)
docker compose up -d
# Configure Prefect secrets
python -c "
from prefect.blocks.system import Secret
from prefect.variables import Variable
Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True)
Variable.set('infrahub_url', 'http://localhost:8000')
Variable.set('gnmi_username', 'admin')
"
# Verify gNMI connectivity
uv run fabric-orch discover capabilities --target leaf1:6030
# Run reconciliation
uv run fabric-orch plan
uv run fabric-orch apply
Prefect Flow Example
from prefect import flow, task
from prefect.variables import Variable
@task(retries=2, retry_delay_seconds=10)
def get_fabric_intent(device: str | None = None) -> dict:
"""Retrieve fabric intent from InfraHub."""
from infrahub_sdk import InfrahubClient
client = InfrahubClient(address=Variable.get("infrahub_url"))
# Query fabric intent via GraphQL
return client.query(...)
@task
def compute_diff(intent: dict, current: dict) -> list[dict]:
"""Compute diff between desired and current state."""
from src.reconciler.diff import compute_diff as diff_engine
return diff_engine(want=intent, have=current)
@flow(log_prints=True, name="fabric-reconcile")
def fabric_reconcile(device: str | None = None, dry_run: bool = True) -> dict:
"""Reconcile fabric state with InfraHub intent."""
intent = get_fabric_intent(device)
current = get_current_state(device)
changes = compute_diff(intent, current)
if not changes:
print("✅ Fabric is in sync")
return {"in_sync": True}
if not dry_run:
apply_changes(changes)
return {"changes": changes, "applied": not dry_run}
Status: 🚧 Active Development - Phase 2 (InfraHub Setup & Core Reconciler)