Files
fabric-orchestrator/README.md
Damien 4bef9303ba Update README.md tables to be more compact
Add .ruff_cache to .gitignore
2026-02-26 15:15:53 +01:00

9.7 KiB

Fabric Orchestrator

Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics

A workflow-based orchestration system that uses InfraHub as Source of Truth, Prefect for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.

🎯 Project Vision

Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:

  • Intent is defined in InfraHub (custom schema, Git-versioned)
  • Orchestration is handled by Prefect (Python-native @flow and @task decorators)
  • State is continuously monitored via gNMI Subscribe
  • Changes are computed as diffs and applied atomically via gNMI Set
  • Drift is detected and optionally auto-remediated

Think terraform plan and terraform apply, but for your network fabric — powered by Prefect flows.

🏗️ Architecture

architecture

🎯 Why InfraHub?

We chose InfraHub over NetBox as Source of Truth for several reasons:

Feature NetBox InfraHub
Schema Fixed DCIM/IPAM model Fully customizable YAML schema
Git Integration External sync needed Native - branches = data branches
Versioning Changelog only True Git-like versioning with merges
Transforms Limited Built-in Jinja2 + Python transforms
GraphQL Yes Yes (auto-generated from schema)

Key benefits for this project:

  1. Custom Schema - Model exactly what we need (VTEPs, MLAG pairs, fabric topology)
  2. Git-native - Schema + data versioned together, easy test environment setup
  3. Transforms - Generate device configs directly from InfraHub
  4. Branches - Test fabric changes in isolated branches before merge

🎛 Why Prefect?

Feature Benefit
Python-native workflows Use @flow and @task decorators — no YAML, just Python
Free secrets management Native Secret blocks for credentials (free in OSS)
Built-in UI Dashboard, logs, metrics, execution history via prefect server start
No containerization required Run flows directly with .serve() — no Docker needed
Event-driven triggers Schedule, webhooks (via FastAPI), flow triggers out of the box
Task dependencies Automatic dependency ordering via task result passing or wait_for
Retry & error handling Built-in retry policies with @task(retries=3)
Human-in-the-loop Native pause_flow_run() for approval workflows

🎯 Target Fabric

This project is designed for Arista EVPN-VXLAN fabrics:

  • 2 Spines (BGP Route Reflectors)
  • 8 Leafs (4 MLAG VTEP pairs)
  • cEOS 4.35.0F with gNMI enabled
  • EVPN Type-2 (L2 VXLAN) and Type-5 (L3 VXLAN) support

Reference lab topology: arista-evpn-vxlan-clab

📋 Project Phases

Progress is tracked via issues. See all issues or filter by phase:

Phase Description Status
Phase 1 YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI Complete
Phase 2 InfraHub Client & Core Reconciler - SDK client, diff engine, YANG mappers 🔄 In Progress
Phase 3 Full Fabric Coverage - BGP, MLAG, VRFs mappers 📋 Planned
Phase 4 Prefect Integration - Flows, webhooks, drift detection 📋 Planned

📁 Project Structure

fabric-orchestrator/
├── README.md
├── pyproject.toml
│
├── src/                                  # Python package
│   ├── __init__.py
│   ├── cli.py                            # CLI for YANG discovery
│   │
│   ├── gnmi/
│   │   ├── __init__.py
│   │   ├── client.py                     # gNMI client wrapper (pygnmi)
│   │   └── README.md
│   │
│   ├── infrahub/                         # InfraHub integration
│   │   ├── __init__.py
│   │   ├── client.py                     # InfraHub SDK wrapper
│   │   ├── models.py                     # Pydantic intent models
│   │   └── exceptions.py                 # Client exceptions
│   │
│   └── yang/
│       ├── __init__.py
│       ├── mapper.py                     # InfraHub intent → YANG paths
│       ├── paths.py                      # YANG path definitions
│       └── mappers/                      # Resource-specific mappers
│           ├── vlan.py
│           ├── interface.py
│           ├── bgp.py
│           └── vxlan.py
│
├── tests/
│
└── docs/
    ├── cli-user-guide.md
    └── yang-paths.md

🛠️ Technology Stack

Component Technology Purpose
Source of Truth InfraHub Intent definition via custom schema
Orchestrator Prefect Python-native workflow orchestration
Transport gNMI Configuration and telemetry
Data Models YANG (OpenConfig + Arista) Structured configuration
Python Library pygnmi + infrahub-sdk gNMI/InfraHub interactions
CLI Click + Rich YANG discovery tools
Validation Pydantic v2 Intent data validation
Lab ContainerLab + cEOS Development environment

📚 References

InfraHub

Prefect

YANG / gNMI

EVPN-VXLAN

🚀 Getting Started

Prerequisites

  • Python 3.12+
  • uv package manager
  • Access to an InfraHub instance with the EVPN-VXLAN fabric schema loaded
  • Access to ContainerLab with cEOS images (for lab testing)

Quick Start

# Clone the repository
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
cd fabric-orchestrator

# Install Python dependencies
uv sync

# Set InfraHub connection (point to your InfraHub instance)
export INFRAHUB_ADDRESS="http://localhost:8000"
export INFRAHUB_API_TOKEN="your-token"

# Verify gNMI connectivity
uv run fabric-orch discover capabilities --target leaf1:6030

# Run reconciliation
uv run fabric-orch plan
uv run fabric-orch apply

Prefect Flow Example

from prefect import flow, task
from prefect.variables import Variable


@task(retries=2, retry_delay_seconds=10)
async def get_fabric_intent(device: str | None = None) -> dict:
    """Retrieve fabric intent from InfraHub."""
    from src.infrahub.client import FabricInfrahubClient

    async with FabricInfrahubClient(
        url=Variable.get("infrahub_url"),
        api_token=Variable.get("infrahub_token"),
    ) as client:
        return await client.get_device(device)


@task
def compute_diff(intent: dict, current: dict) -> list[dict]:
    """Compute diff between desired and current state."""
    from src.reconciler.diff import compute_diff as diff_engine
    return diff_engine(want=intent, have=current)


@flow(log_prints=True, name="fabric-reconcile")
def fabric_reconcile(device: str | None = None, dry_run: bool = True) -> dict:
    """Reconcile fabric state with InfraHub intent."""
    intent = get_fabric_intent(device)
    current = get_current_state(device)
    changes = compute_diff(intent, current)

    if not changes:
        print("✅ Fabric is in sync")
        return {"in_sync": True}

    if not dry_run:
        apply_changes(changes)

    return {"changes": changes, "applied": not dry_run}

Status: 🚧 Active Development - Phase 2 (InfraHub Client & Core Reconciler)