# Fabric Orchestrator **Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics** A workflow-based orchestration system that uses [InfraHub](https://github.com/opsmill/infrahub) as Source of Truth, [Prefect](https://prefect.io) for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics. ## 🎯 Project Vision Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where: - **Intent** is defined in InfraHub (custom schema, Git-versioned) - **Orchestration** is handled by Prefect (Python-native `@flow` and `@task` decorators) - **State** is continuously monitored via gNMI Subscribe - **Changes** are computed as diffs and applied atomically via gNMI Set - **Drift** is detected and optionally auto-remediated Think `terraform plan` and `terraform apply`, but for your network fabric — powered by Prefect flows. ## 🏗️ Architecture ![architecture](docs/assets/architecture/fabric-orchestration-archi.excalidraw.svg) ## 🎯 Why InfraHub? We chose [InfraHub](https://github.com/opsmill/infrahub) over NetBox as Source of Truth for several reasons: | Feature | NetBox | InfraHub | | ------------------- | --------------------- | ------------------------------------ | | **Schema** | Fixed DCIM/IPAM model | Fully customizable YAML schema | | **Git Integration** | External sync needed | Native - branches = data branches | | **Versioning** | Changelog only | True Git-like versioning with merges | | **Test/Redeploy** | Dump/restore | `git clone` = complete environment | | **Transforms** | Limited | Built-in Jinja2 + Python transforms | | **GraphQL** | Yes | Yes (auto-generated from schema) | **Key benefits for this project:** 1. **Custom Schema** - Model exactly what we need (VTEPs, MLAG pairs, fabric topology) 2. **Git-native** - Schema + data versioned together, easy test environment setup 3. **Transforms** - Generate device configs directly from InfraHub 4. **Branches** - Test fabric changes in isolated branches before merge ## 📦 Repository as InfraHub Backend This repository serves as the **single source of truth** for both code and infrastructure data: ``` fabric-orchestrator/ ├── .infrahub.yml # InfraHub repository config │ ├── schemas/ # InfraHub schema definitions │ └── fabric.yml # Custom EVPN-VXLAN fabric schema │ ├── data/ # Infrastructure objects (YAML) │ ├── topology/ │ │ ├── sites.yml │ │ └── devices.yml # Spines, Leafs, VTEP pairs │ ├── network/ │ │ ├── vlans.yml # VLANs + L2VNI mappings │ │ ├── vrfs.yml # VRFs + L3VNI mappings │ │ └── interfaces.yml # Interface configs │ └── routing/ │ ├── bgp_sessions.yml # Underlay + EVPN overlay │ └── evpn.yml # Route targets, RDs │ ├── transforms/ # Jinja2 templates for config generation │ └── arista/ │ ├── base.j2 │ ├── interfaces.j2 │ ├── bgp.j2 │ └── evpn.j2 │ └── src/ # Python orchestration code ``` ### Workflow ```bash # 1. Edit data files (e.g., add a VLAN) vim data/network/vlans.yml # 2. Commit & push git commit -am "Add VLAN 100 for production" git push # 3. InfraHub syncs automatically from Git # → Data available via GraphQL # 4. Prefect flow detects change → reconciles fabric ``` ### Benefits - **Reproductibility**: `git clone` → `docker compose up` → complete environment - **Code Review**: Infrastructure changes go through PR review - **History**: Full audit trail via Git - **Testing**: Create a branch, test changes, merge when validated ## 🎛 Why Prefect? | Feature | Benefit | | -------------------------------- | ---------------------------------------------------------------------- | | **Python-native workflows** | Use `@flow` and `@task` decorators — no YAML, just Python | | **Free secrets management** | Native `Secret` blocks for credentials (free in OSS) | | **Built-in UI** | Dashboard, logs, metrics, execution history via `prefect server start` | | **No containerization required** | Run flows directly with `.serve()` — no Docker needed | | **Event-driven triggers** | Schedule, webhooks (via FastAPI), flow triggers out of the box | | **Task dependencies** | Automatic dependency ordering via task result passing or `wait_for` | | **Retry & error handling** | Built-in retry policies with `@task(retries=3)` | | **Human-in-the-loop** | Native `pause_flow_run()` for approval workflows | ## 🎯 Target Fabric This project is designed for the Arista EVPN-VXLAN ContainerLab topology: - **2 Spines** (BGP Route Reflectors, AS 65000) - **8 Leafs** (4 MLAG VTEP pairs, AS 65001-65004) - **cEOS 4.35.0F** with gNMI enabled - **EVPN Type-2** (L2 VXLAN) and **Type-5** (L3 VXLAN) support Reference: [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab) ## 📋 Project Phases Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues) or filter by phase: | Phase | Description | Status | | ----------- | -------------------------------------------------------------------- | ------------- | | **Phase 1** | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | ✅ Complete | | **Phase 2** | InfraHub Setup & Core Reconciler - Schema, diff engine, YANG mappers | 🔄 In Progress | | **Phase 3** | Full Fabric Coverage - BGP, MLAG, VRFs mappers | 📋 Planned | | **Phase 4** | Prefect Integration - Flows, webhooks, drift detection | 📋 Planned | ## 📁 Project Structure ``` fabric-orchestrator/ ├── README.md ├── pyproject.toml ├── .infrahub.yml # InfraHub config (points to schemas/) │ ├── schemas/ # InfraHub schema definitions │ └── fabric.yml # Custom EVPN-VXLAN fabric schema │ ├── data/ # Infrastructure data (YAML) │ ├── topology/ │ │ ├── sites.yml │ │ └── devices.yml │ ├── network/ │ │ ├── vlans.yml │ │ ├── vrfs.yml │ │ └── interfaces.yml │ └── routing/ │ ├── bgp_sessions.yml │ └── evpn.yml │ ├── transforms/ # Jinja2 config templates │ └── arista/ │ └── *.j2 │ ├── src/ # Python package │ ├── __init__.py │ ├── cli.py # CLI for YANG discovery │ │ │ ├── flows/ # Prefect flows │ │ ├── __init__.py │ │ ├── reconcile.py # @flow fabric_reconcile │ │ ├── drift.py # @flow handle_drift │ │ └── remediation.py # @flow drift_remediation │ │ │ ├── gnmi/ │ │ ├── __init__.py │ │ ├── client.py # gNMI client wrapper (pygnmi) │ │ └── README.md │ │ │ ├── infrahub/ # InfraHub integration │ │ ├── __init__.py │ │ ├── client.py # InfraHub SDK wrapper │ │ └── queries.py # GraphQL queries │ │ │ └── yang/ │ ├── __init__.py │ ├── mapper.py # InfraHub intent → YANG paths │ ├── paths.py # YANG path definitions │ └── mappers/ # Resource-specific mappers │ ├── vlan.py │ ├── interface.py │ ├── bgp.py │ └── vxlan.py │ ├── tests/ │ └── docs/ ├── cli-user-guide.md └── yang-paths.md ``` ## 🛠️ Technology Stack | Component | Technology | Purpose | | --------------- | -------------------------- | ------------------------------------ | | Source of Truth | **InfraHub** | Intent definition via custom schema | | Data Storage | **This Git repo** | Schema + data versioned together | | Orchestrator | **Prefect** | Python-native workflow orchestration | | Transport | gNMI | Configuration and telemetry | | Data Models | YANG (OpenConfig + Arista) | Structured configuration | | Python Library | pygnmi + infrahub-sdk | gNMI/InfraHub interactions | | CLI | Click + Rich | YANG discovery tools | | Validation | Pydantic v2 | Intent data validation | | Lab | ContainerLab + cEOS | Development environment | ## 🔗 Related Projects - [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab) - Target fabric topology - [InfraHub](https://github.com/opsmill/infrahub) - Source of Truth platform - [InfraHub Schema Library](https://github.com/opsmill/schema-library) - Reference schemas - [Arista YANG Models](https://github.com/aristanetworks/yang/tree/master/EOS-4.35.0F) - EOS 4.35.0F YANG definitions ## 📚 References ### InfraHub - [InfraHub Documentation](https://docs.infrahub.app) - [InfraHub Schema Guide](https://docs.infrahub.app/guides/create-schema) - [InfraHub Python SDK](https://github.com/opsmill/infrahub-sdk-python) - [InfraHub .infrahub.yml Reference](https://docs.infrahub.app/reference/dotinfrahub) ### Prefect - [Prefect Documentation](https://docs.prefect.io) - [Prefect Flows](https://docs.prefect.io/latest/develop/write-flows/) - [Prefect Tasks](https://docs.prefect.io/latest/develop/write-tasks/) ### YANG / gNMI - [Arista gNMI Documentation](https://aristanetworks.github.io/openmgmt/configuration/gnmi/) - [OpenConfig Models](https://github.com/openconfig/public) - [pygnmi Library](https://github.com/akarneliuk/pygnmi) ### EVPN-VXLAN - [Arista BGP EVPN Configuration Example](https://overlaid.net/2019/01/27/arista-bgp-evpn-configuration-example/) ## 🚀 Getting Started ### Prerequisites - Python 3.12+ - `uv` package manager - Docker (for InfraHub) - Access to ContainerLab with cEOS images ### Quick Start ```bash # Clone the repository (includes schema + data) git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git cd fabric-orchestrator # Install Python dependencies uv sync # Start InfraHub (loads schema & data from this repo) docker compose up -d # Configure Prefect secrets python -c " from prefect.blocks.system import Secret from prefect.variables import Variable Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True) Variable.set('infrahub_url', 'http://localhost:8000') Variable.set('gnmi_username', 'admin') " # Verify gNMI connectivity uv run fabric-orch discover capabilities --target leaf1:6030 # Run reconciliation uv run fabric-orch plan uv run fabric-orch apply ``` ## Prefect Flow Example ```python from prefect import flow, task from prefect.variables import Variable @task(retries=2, retry_delay_seconds=10) def get_fabric_intent(device: str | None = None) -> dict: """Retrieve fabric intent from InfraHub.""" from infrahub_sdk import InfrahubClient client = InfrahubClient(address=Variable.get("infrahub_url")) # Query fabric intent via GraphQL return client.query(...) @task def compute_diff(intent: dict, current: dict) -> list[dict]: """Compute diff between desired and current state.""" from src.reconciler.diff import compute_diff as diff_engine return diff_engine(want=intent, have=current) @flow(log_prints=True, name="fabric-reconcile") def fabric_reconcile(device: str | None = None, dry_run: bool = True) -> dict: """Reconcile fabric state with InfraHub intent.""" intent = get_fabric_intent(device) current = get_current_state(device) changes = compute_diff(intent, current) if not changes: print("✅ Fabric is in sync") return {"in_sync": True} if not dry_run: apply_changes(changes) return {"changes": changes, "applied": not dry_run} ``` --- **Status**: 🚧 Active Development - Phase 2 (InfraHub Setup & Core Reconciler)