17 KiB
Fabric Orchestrator
Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics
A workflow-based orchestration system that uses NetBox as Source of Truth, Prefect for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
🎯 Project Vision
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
- Intent is defined in NetBox (Custom Fields, Native Models, BGP Plugin)
- Orchestration is handled by Prefect (Python-native
@flowand@taskdecorators) - State is continuously monitored via gNMI Subscribe
- Changes are computed as diffs and applied atomically via gNMI Set
- Drift is detected and optionally auto-remediated
Think terraform plan and terraform apply, but for your network fabric — powered by Prefect flows.
🏗️ Architecture
┌───────────────────────────────────────────────────────────────────────────────────────┐
│ INTENT LAYER │
│ ┌─────────────────────┐ ┌──────────────────────────┐ ┌──────────────────────┐ │
│ │ NetBox │ │ Custom Fields / │ │ netbox-bgp │ │
│ │ (SoT) │◄───│ Native Models │◄───│ Plugin │ │
│ └──────────┬──────────┘ └──────────────────────────┘ └──────────────────────┘ │
└─────────────┼─────────────────────────────────────────────────────────────────────────┘
│ Webhook / Polling
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ ORCHESTRATION LAYER (PREFECT) │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Prefect Flows (Python) │ │
│ │ ┌───────────────────┐ ┌───────────────────┐ ┌─────────────────────┐ │ │
│ │ │ fabric_reconcile │ │ handle_drift │ │ drift_remediation │ │ │
│ │ │ (plan/apply) │ │ (subscribe) │ │ (auto-fix) │ │ │
│ │ └───────────────────┘ └───────────────────┘ └─────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Prefect Tasks (Python) │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────────────┐ │ │
│ │ │ Intent Parser │ │ Diff Engine │ │ gNMI Client │ │ │
│ │ │ (NetBox→YANG) │ │ (Want vs Have) │ │ (pygnmi wrapper) │ │ │
│ │ └─────────────────┘ └─────────────────┘ └───────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ FastAPI Webhook Receiver │ Prefect .serve() │ Prefect Server (UI) │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
│ gNMI Get/Set/Subscribe
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ DEVICE LAYER │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ spine1 │ │ spine2 │ │ leaf1 │ │ leaf2 │ ... │
│ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
🎛 Why Prefect?
We chose Prefect as the orchestration engine for several reasons:
| Feature | Benefit |
|---|---|
| Python-native workflows | Use @flow and @task decorators — no YAML, just Python |
| Free secrets management | Native Secret blocks for credentials (free in OSS) |
| Built-in UI | Dashboard, logs, metrics, execution history via prefect server start |
| No containerization required | Run flows directly with .serve() — no Docker needed |
| Event-driven triggers | Schedule, webhooks (via FastAPI), flow triggers out of the box |
| Task dependencies | Automatic dependency ordering via task result passing or wait_for |
| Retry & error handling | Built-in retry policies with @task(retries=3) |
| Human-in-the-loop | Native pause_flow_run() for approval workflows |
🎯 Target Fabric
This project is designed for the Arista EVPN-VXLAN ContainerLab topology:
- 2 Spines (BGP Route Reflectors, AS 65000)
- 8 Leafs (4 MLAG VTEP pairs, AS 65001-65004)
- cEOS 4.35.0F with gNMI enabled
- EVPN Type-2 (L2 VXLAN) and Type-5 (L3 VXLAN) support
Reference: arista-evpn-vxlan-clab
📋 Project Phases
Progress is tracked via issues. See all issues or filter by phase:
| Phase | Description | Issues |
|---|---|---|
| Phase 1 | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | phase-1-yang-discovery |
| Phase 2 | Core Components - NetBox client, diff engine, gNMI operations | phase-2-minimal-reconciler |
| Phase 3 | Full Fabric - BGP, MLAG, VRFs, YANG mappers | phase-3-full-fabric |
| Phase 4 | Prefect Integration - Flows, webhooks, drift detection | phase-4-event-driven |
📌 Project Board: View Kanban
📁 Project Structure
fabric-orchestrator/
├── README.md
├── pyproject.toml
│
├── src/ # Python package
│ ├── __init__.py
│ ├── cli.py # CLI for YANG discovery (discover commands)
│ │
│ ├── flows/ # Prefect flows
│ │ ├── __init__.py
│ │ ├── reconcile.py # @flow fabric_reconcile (plan/apply)
│ │ ├── drift.py # @flow handle_drift
│ │ └── remediation.py # @flow drift_remediation
│ │
│ ├── api/ # FastAPI webhook receiver
│ │ ├── __init__.py
│ │ └── webhooks.py # NetBox webhook endpoint
│ │
│ ├── services/ # Long-running services
│ │ ├── __init__.py
│ │ └── drift_monitor.py # gNMI Subscribe drift detection
│ │
│ ├── gnmi/
│ │ ├── __init__.py
│ │ ├── client.py # gNMI client wrapper (pygnmi)
│ │ └── README.md
│ │
│ ├── netbox/
│ │ ├── __init__.py
│ │ ├── client.py # NetBox API client (pynetbox)
│ │ └── models.py # Pydantic models for intent validation
│ │
│ └── yang/
│ ├── __init__.py
│ ├── mapper.py # NetBox intent → YANG paths
│ ├── paths.py # YANG path definitions
│ └── dependencies.py # Dependency ordering graph
│
├── tests/
│
└── docs/
├── cli-user-guide.md # CLI documentation
├── yang-paths.md # Documented YANG paths
└── netbox-data-model.md # NetBox schema documentation
🛠️ Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Source of Truth | NetBox + BGP Plugin | Intent definition via native models |
| Orchestrator | Prefect | Python-native workflow orchestration |
| Webhooks | FastAPI | Receive NetBox webhooks |
| Transport | gNMI | Configuration and telemetry |
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
| Python Library | pygnmi + pynetbox | gNMI/NetBox interactions |
| CLI | Click + Rich | YANG discovery tools |
| Validation | Pydantic v2 | Intent data validation |
| Lab | ContainerLab + cEOS | Development environment |
🔗 Related Projects
- arista-evpn-vxlan-clab - Target fabric topology
- projet-vxlan-automation - Previous NetBox RenderConfig work
- Arista YANG Models - EOS 4.35.0F YANG definitions
- Prefect Documentation - Orchestration platform docs
📚 References
Prefect
YANG / gNMI
EVPN-VXLAN
🚀 Getting Started
Prerequisites
- Python 3.12+
uvpackage manager- Access to ContainerLab with cEOS images
- NetBox instance with BGP plugin
Quick Start
# Clone the repository
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
cd fabric-orchestrator
# Install Python dependencies
uv sync
# Configure Prefect secrets
python -c "
from prefect.blocks.system import Secret
from prefect.variables import Variable
Secret(value='your-netbox-token').save('netbox-token', overwrite=True)
Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True)
Variable.set('netbox_url', 'https://netbox.example.com')
Variable.set('gnmi_username', 'admin')
"
# Start Prefect server (optional, for UI)
prefect server start
# Verify gNMI connectivity to your fabric
uv run fabric-orch discover capabilities --target leaf1:6030
# Explore YANG paths
uv run fabric-orch discover get --target leaf1:6030 \
--path "/interfaces/interface[name=Ethernet1]/state"
Running Flows
from src.flows.reconcile import fabric_reconcile
# Plan only (dry-run)
result = fabric_reconcile(dry_run=True)
# Plan for a specific device
result = fabric_reconcile(device="leaf1", dry_run=True)
# Apply changes automatically
result = fabric_reconcile(auto_apply=True, dry_run=False)
Deploying with Scheduling
# Start the flow with scheduling (runs every 6 hours)
python -m src.flows.reconcile
# Or deploy via Prefect CLI
prefect deployment run fabric-reconcile/fabric-reconcile-scheduled
Starting the Webhook Receiver
# Start FastAPI webhook server
uvicorn src.api.webhooks:app --host 0.0.0.0 --port 8000
Prefect Flow Example
from prefect import flow, task
from prefect.blocks.system import Secret
from prefect.variables import Variable
@task(retries=2, retry_delay_seconds=10)
def get_fabric_intent(device: str | None = None) -> dict:
"""Retrieve fabric intent from NetBox."""
from src.netbox import FabricNetBoxClient
netbox_url = Variable.get("netbox_url")
netbox_token = Secret.load("netbox-token").get()
client = FabricNetBoxClient(url=netbox_url, token=netbox_token)
return client.get_fabric_intent() if not device else client.get_device_intent(device)
@task
def compute_diff(intent: dict, current: dict) -> list[dict]:
"""Compute diff between desired and current state."""
from src.reconciler.diff import compute_diff as diff_engine
return diff_engine(want=intent, have=current)
@task(retries=1)
def apply_changes(changes: list[dict], dry_run: bool = True) -> dict:
"""Apply changes via gNMI Set."""
if dry_run:
return {"applied": False, "changes": changes}
# Apply via gNMI...
return {"applied": True, "changes": changes}
@flow(log_prints=True, name="fabric-reconcile")
def fabric_reconcile(
device: str | None = None,
auto_apply: bool = False,
dry_run: bool = True
) -> dict:
"""Reconcile fabric state with NetBox intent."""
print(f"🔄 Starting fabric reconciliation")
intent = get_fabric_intent(device)
current = get_current_state(devices)
changes = compute_diff(intent, current)
if not changes:
print("✅ No changes detected - fabric is in sync")
return {"changes": [], "in_sync": True}
should_apply = auto_apply and not dry_run
result = apply_changes(changes, dry_run=not should_apply)
return {"changes": changes, "applied": should_apply}
if __name__ == "__main__":
fabric_reconcile.serve(
name="fabric-reconcile-scheduled",
cron="0 */6 * * *",
tags=["network", "fabric"]
)
Status: 🚧 Active Development - Phase 2 (Core Components) & Phase 4 (Prefect Integration)