Fabric Orchestrator

Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics

A workflow-based orchestration system that uses NetBox as Source of Truth, Prefect for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.

🎯 Project Vision

Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:

  • Intent is defined in NetBox (Custom Fields, Native Models, BGP Plugin)
  • Orchestration is handled by Prefect (Python-native @flow and @task decorators)
  • State is continuously monitored via gNMI Subscribe
  • Changes are computed as diffs and applied atomically via gNMI Set
  • Drift is detected and optionally auto-remediated

Think terraform plan and terraform apply, but for your network fabric — powered by Prefect flows.

🏗️ Architecture

┌───────────────────────────────────────────────────────────────────────────────────────┐
│                              INTENT LAYER                                             │
│  ┌─────────────────────┐    ┌──────────────────────────┐    ┌──────────────────────┐  │
│  │       NetBox        │    │   Custom Fields /        │    │   netbox-bgp         │  │
│  │       (SoT)         │◄───│   Native Models          │◄───│   Plugin             │  │
│  └──────────┬──────────┘    └──────────────────────────┘    └──────────────────────┘  │
└─────────────┼─────────────────────────────────────────────────────────────────────────┘
              │ Webhook / Polling
              ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                       ORCHESTRATION LAYER (PREFECT)                          │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │                    Prefect Flows (Python)                              │  │
│  │  ┌───────────────────┐  ┌───────────────────┐  ┌─────────────────────┐ │  │
│  │  │ fabric_reconcile  │  │ handle_drift      │  │ drift_remediation   │ │  │
│  │  │   (plan/apply)    │  │  (subscribe)      │  │   (auto-fix)        │ │  │
│  │  └───────────────────┘  └───────────────────┘  └─────────────────────┘ │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │                    Prefect Tasks (Python)                              │  │
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌───────────────────────┐   │  │
│  │  │ Intent Parser   │  │  Diff Engine    │  │   gNMI Client         │   │  │
│  │  │ (NetBox→YANG)   │  │ (Want vs Have)  │  │   (pygnmi wrapper)    │   │  │
│  │  └─────────────────┘  └─────────────────┘  └───────────────────────┘   │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  FastAPI Webhook Receiver │ Prefect .serve() │ Prefect Server (UI)     │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘
              │ gNMI Get/Set/Subscribe
              ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                              DEVICE LAYER                                    │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐         │
│  │   spine1     │ │   spine2     │ │   leaf1      │ │   leaf2      │  ...    │
│  │  gNMI:6030   │ │  gNMI:6030   │ │  gNMI:6030   │ │  gNMI:6030   │         │
│  └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘         │
└──────────────────────────────────────────────────────────────────────────────┘

🎛 Why Prefect?

We chose Prefect as the orchestration engine for several reasons:

Feature Benefit
Python-native workflows Use @flow and @task decorators — no YAML, just Python
Free secrets management Native Secret blocks for credentials (free in OSS)
Built-in UI Dashboard, logs, metrics, execution history via prefect server start
No containerization required Run flows directly with .serve() — no Docker needed
Event-driven triggers Schedule, webhooks (via FastAPI), flow triggers out of the box
Task dependencies Automatic dependency ordering via task result passing or wait_for
Retry & error handling Built-in retry policies with @task(retries=3)
Human-in-the-loop Native pause_flow_run() for approval workflows

🎯 Target Fabric

This project is designed for the Arista EVPN-VXLAN ContainerLab topology:

  • 2 Spines (BGP Route Reflectors, AS 65000)
  • 8 Leafs (4 MLAG VTEP pairs, AS 65001-65004)
  • cEOS 4.35.0F with gNMI enabled
  • EVPN Type-2 (L2 VXLAN) and Type-5 (L3 VXLAN) support

Reference: arista-evpn-vxlan-clab

📋 Project Phases

Progress is tracked via issues. See all issues or filter by phase:

Phase Description Issues
Phase 1 YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI phase-1-yang-discovery
Phase 2 Core Components - NetBox client, diff engine, gNMI operations phase-2-minimal-reconciler
Phase 3 Full Fabric - BGP, MLAG, VRFs, YANG mappers phase-3-full-fabric
Phase 4 Prefect Integration - Flows, webhooks, drift detection phase-4-event-driven

📌 Project Board: View Kanban

📁 Project Structure

fabric-orchestrator/
├── README.md
├── pyproject.toml
│
├── src/                            # Python package
│   ├── __init__.py
│   ├── cli.py                      # CLI for YANG discovery (discover commands)
│   │
│   ├── flows/                      # Prefect flows
│   │   ├── __init__.py
│   │   ├── reconcile.py            # @flow fabric_reconcile (plan/apply)
│   │   ├── drift.py                # @flow handle_drift
│   │   └── remediation.py          # @flow drift_remediation
│   │
│   ├── api/                        # FastAPI webhook receiver
│   │   ├── __init__.py
│   │   └── webhooks.py             # NetBox webhook endpoint
│   │
│   ├── services/                   # Long-running services
│   │   ├── __init__.py
│   │   └── drift_monitor.py        # gNMI Subscribe drift detection
│   │
│   ├── gnmi/
│   │   ├── __init__.py
│   │   ├── client.py               # gNMI client wrapper (pygnmi)
│   │   └── README.md
│   │
│   ├── netbox/
│   │   ├── __init__.py
│   │   ├── client.py               # NetBox API client (pynetbox)
│   │   └── models.py               # Pydantic models for intent validation
│   │
│   └── yang/
│       ├── __init__.py
│       ├── mapper.py               # NetBox intent → YANG paths
│       ├── paths.py                # YANG path definitions
│       └── dependencies.py         # Dependency ordering graph
│
├── tests/
│
└── docs/
    ├── cli-user-guide.md           # CLI documentation
    ├── yang-paths.md               # Documented YANG paths
    └── netbox-data-model.md        # NetBox schema documentation

🛠️ Technology Stack

Component Technology Purpose
Source of Truth NetBox + BGP Plugin Intent definition via native models
Orchestrator Prefect Python-native workflow orchestration
Webhooks FastAPI Receive NetBox webhooks
Transport gNMI Configuration and telemetry
Data Models YANG (OpenConfig + Arista) Structured configuration
Python Library pygnmi + pynetbox gNMI/NetBox interactions
CLI Click + Rich YANG discovery tools
Validation Pydantic v2 Intent data validation
Lab ContainerLab + cEOS Development environment

📚 References

Prefect

YANG / gNMI

EVPN-VXLAN

🚀 Getting Started

Prerequisites

  • Python 3.12+
  • uv package manager
  • Access to ContainerLab with cEOS images
  • NetBox instance with BGP plugin

Quick Start

# Clone the repository
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
cd fabric-orchestrator

# Install Python dependencies
uv sync

# Configure Prefect secrets
python -c "
from prefect.blocks.system import Secret
from prefect.variables import Variable

Secret(value='your-netbox-token').save('netbox-token', overwrite=True)
Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True)

Variable.set('netbox_url', 'https://netbox.example.com')
Variable.set('gnmi_username', 'admin')
"

# Start Prefect server (optional, for UI)
prefect server start

# Verify gNMI connectivity to your fabric
uv run fabric-orch discover capabilities --target leaf1:6030

# Explore YANG paths
uv run fabric-orch discover get --target leaf1:6030 \
  --path "/interfaces/interface[name=Ethernet1]/state"

Running Flows

from src.flows.reconcile import fabric_reconcile

# Plan only (dry-run)
result = fabric_reconcile(dry_run=True)

# Plan for a specific device
result = fabric_reconcile(device="leaf1", dry_run=True)

# Apply changes automatically
result = fabric_reconcile(auto_apply=True, dry_run=False)

Deploying with Scheduling

# Start the flow with scheduling (runs every 6 hours)
python -m src.flows.reconcile

# Or deploy via Prefect CLI
prefect deployment run fabric-reconcile/fabric-reconcile-scheduled

Starting the Webhook Receiver

# Start FastAPI webhook server
uvicorn src.api.webhooks:app --host 0.0.0.0 --port 8000

Prefect Flow Example

from prefect import flow, task
from prefect.blocks.system import Secret
from prefect.variables import Variable


@task(retries=2, retry_delay_seconds=10)
def get_fabric_intent(device: str | None = None) -> dict:
    """Retrieve fabric intent from NetBox."""
    from src.netbox import FabricNetBoxClient
    
    netbox_url = Variable.get("netbox_url")
    netbox_token = Secret.load("netbox-token").get()
    
    client = FabricNetBoxClient(url=netbox_url, token=netbox_token)
    return client.get_fabric_intent() if not device else client.get_device_intent(device)


@task
def compute_diff(intent: dict, current: dict) -> list[dict]:
    """Compute diff between desired and current state."""
    from src.reconciler.diff import compute_diff as diff_engine
    return diff_engine(want=intent, have=current)


@task(retries=1)
def apply_changes(changes: list[dict], dry_run: bool = True) -> dict:
    """Apply changes via gNMI Set."""
    if dry_run:
        return {"applied": False, "changes": changes}
    # Apply via gNMI...
    return {"applied": True, "changes": changes}


@flow(log_prints=True, name="fabric-reconcile")
def fabric_reconcile(
    device: str | None = None,
    auto_apply: bool = False,
    dry_run: bool = True
) -> dict:
    """Reconcile fabric state with NetBox intent."""
    print(f"🔄 Starting fabric reconciliation")
    
    intent = get_fabric_intent(device)
    current = get_current_state(devices)
    changes = compute_diff(intent, current)
    
    if not changes:
        print("✅ No changes detected - fabric is in sync")
        return {"changes": [], "in_sync": True}
    
    should_apply = auto_apply and not dry_run
    result = apply_changes(changes, dry_run=not should_apply)
    
    return {"changes": changes, "applied": should_apply}


if __name__ == "__main__":
    fabric_reconcile.serve(
        name="fabric-reconcile-scheduled",
        cron="0 */6 * * *",
        tags=["network", "fabric"]
    )

Status: 🚧 Active Development - Phase 2 (Core Components) & Phase 4 (Prefect Integration)

Description
Declarative Network Fabric Orchestrator - Terraform-like infrastructure management for Arista EVPN-VXLAN using gNMI, YANG, and Infrahub as Source of Truth
Readme 710 KiB
Languages
Python 100%