Damien 2a2e0dfe73 fix(objects): split BGP sessions from peer groups for load ordering — refs #52
InfraBGPSession references InfraBGPPeerGroup via peer_group HFID,
so peer groups must be committed before sessions are created.

Split 09-bgp.yml into:
- 09-bgp.yml: InfraBGPRouterConfig + InfraBGPPeerGroup
- 10-bgp-sessions.yml: InfraBGPSession + InfraBGPAddressFamily

Renamed: 10-vrfs→11, 11-mlag→12. Now 12 object files total.
2026-02-15 20:25:36 +01:00
2026-02-05 09:38:05 +01:00
2026-02-04 16:12:10 +01:00
2026-02-04 18:23:04 +01:00

Fabric Orchestrator

Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics

A workflow-based orchestration system that uses InfraHub as Source of Truth, Prefect for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.

🎯 Project Vision

Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:

  • Intent is defined in InfraHub (custom schema, Git-versioned)
  • Orchestration is handled by Prefect (Python-native @flow and @task decorators)
  • State is continuously monitored via gNMI Subscribe
  • Changes are computed as diffs and applied atomically via gNMI Set
  • Drift is detected and optionally auto-remediated

Think terraform plan and terraform apply, but for your network fabric — powered by Prefect flows.

🏗️ Architecture

architecture

🎯 Why InfraHub?

We chose InfraHub over NetBox as Source of Truth for several reasons:

Feature NetBox InfraHub
Schema Fixed DCIM/IPAM model Fully customizable YAML schema
Git Integration External sync needed Native - branches = data branches
Versioning Changelog only True Git-like versioning with merges
Test/Redeploy Dump/restore git clone = complete environment
Transforms Limited Built-in Jinja2 + Python transforms
GraphQL Yes Yes (auto-generated from schema)

Key benefits for this project:

  1. Custom Schema - Model exactly what we need (VTEPs, MLAG pairs, fabric topology)
  2. Git-native - Schema + data versioned together, easy test environment setup
  3. Transforms - Generate device configs directly from InfraHub
  4. Branches - Test fabric changes in isolated branches before merge

📦 Repository as InfraHub Backend

This repository serves as the single source of truth for both code and infrastructure data:

fabric-orchestrator/
├── .infrahub.yml                 # InfraHub repository config
│
├── schemas/                      # InfraHub schema definitions
│   └── fabric.yml                # Custom EVPN-VXLAN fabric schema
│
├── data/                         # Infrastructure objects (YAML)
│   ├── topology/
│   │   ├── sites.yml
│   │   └── devices.yml           # Spines, Leafs, VTEP pairs
│   ├── network/
│   │   ├── vlans.yml             # VLANs + L2VNI mappings
│   │   ├── vrfs.yml              # VRFs + L3VNI mappings
│   │   └── interfaces.yml        # Interface configs
│   └── routing/
│       ├── bgp_sessions.yml      # Underlay + EVPN overlay
│       └── evpn.yml              # Route targets, RDs
│
├── transforms/                   # Jinja2 templates for config generation
│   └── arista/
│       ├── base.j2
│       ├── interfaces.j2
│       ├── bgp.j2
│       └── evpn.j2
│
└── src/                          # Python orchestration code

Workflow

# 1. Edit data files (e.g., add a VLAN)
vim data/network/vlans.yml

# 2. Commit & push
git commit -am "Add VLAN 100 for production"
git push

# 3. InfraHub syncs automatically from Git
#    → Data available via GraphQL

# 4. Prefect flow detects change → reconciles fabric

Benefits

  • Reproductibility: git clonedocker compose up → complete environment
  • Code Review: Infrastructure changes go through PR review
  • History: Full audit trail via Git
  • Testing: Create a branch, test changes, merge when validated

🎛 Why Prefect?

Feature Benefit
Python-native workflows Use @flow and @task decorators — no YAML, just Python
Free secrets management Native Secret blocks for credentials (free in OSS)
Built-in UI Dashboard, logs, metrics, execution history via prefect server start
No containerization required Run flows directly with .serve() — no Docker needed
Event-driven triggers Schedule, webhooks (via FastAPI), flow triggers out of the box
Task dependencies Automatic dependency ordering via task result passing or wait_for
Retry & error handling Built-in retry policies with @task(retries=3)
Human-in-the-loop Native pause_flow_run() for approval workflows

🎯 Target Fabric

This project is designed for the Arista EVPN-VXLAN ContainerLab topology:

  • 2 Spines (BGP Route Reflectors, AS 65000)
  • 8 Leafs (4 MLAG VTEP pairs, AS 65001-65004)
  • cEOS 4.35.0F with gNMI enabled
  • EVPN Type-2 (L2 VXLAN) and Type-5 (L3 VXLAN) support

Reference: arista-evpn-vxlan-clab

📋 Project Phases

Progress is tracked via issues. See all issues or filter by phase:

Phase Description Status
Phase 1 YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI Complete
Phase 2 InfraHub Setup & Core Reconciler - Schema, diff engine, YANG mappers 🔄 In Progress
Phase 3 Full Fabric Coverage - BGP, MLAG, VRFs mappers 📋 Planned
Phase 4 Prefect Integration - Flows, webhooks, drift detection 📋 Planned

📁 Project Structure

fabric-orchestrator/
├── README.md
├── pyproject.toml
├── .infrahub.yml                     # InfraHub config (points to schemas/)
│
├── schemas/                          # InfraHub schema definitions
│   └── fabric.yml                    # Custom EVPN-VXLAN fabric schema
│
├── data/                             # Infrastructure data (YAML)
│   ├── topology/
│   │   ├── sites.yml
│   │   └── devices.yml
│   ├── network/
│   │   ├── vlans.yml
│   │   ├── vrfs.yml
│   │   └── interfaces.yml
│   └── routing/
│       ├── bgp_sessions.yml
│       └── evpn.yml
│
├── transforms/                       # Jinja2 config templates
│   └── arista/
│       └── *.j2
│
├── src/                              # Python package
│   ├── __init__.py
│   ├── cli.py                        # CLI for YANG discovery
│   │
│   ├── flows/                        # Prefect flows
│   │   ├── __init__.py
│   │   ├── reconcile.py              # @flow fabric_reconcile
│   │   ├── drift.py                  # @flow handle_drift
│   │   └── remediation.py            # @flow drift_remediation
│   │
│   ├── gnmi/
│   │   ├── __init__.py
│   │   ├── client.py                 # gNMI client wrapper (pygnmi)
│   │   └── README.md
│   │
│   ├── infrahub/                     # InfraHub integration
│   │   ├── __init__.py
│   │   ├── client.py                 # InfraHub SDK wrapper
│   │   └── queries.py                # GraphQL queries
│   │
│   └── yang/
│       ├── __init__.py
│       ├── mapper.py                 # InfraHub intent → YANG paths
│       ├── paths.py                  # YANG path definitions
│       └── mappers/                  # Resource-specific mappers
│           ├── vlan.py
│           ├── interface.py
│           ├── bgp.py
│           └── vxlan.py
│
├── tests/
│
└── docs/
    ├── cli-user-guide.md
    └── yang-paths.md

🛠️ Technology Stack

Component Technology Purpose
Source of Truth InfraHub Intent definition via custom schema
Data Storage This Git repo Schema + data versioned together
Orchestrator Prefect Python-native workflow orchestration
Transport gNMI Configuration and telemetry
Data Models YANG (OpenConfig + Arista) Structured configuration
Python Library pygnmi + infrahub-sdk gNMI/InfraHub interactions
CLI Click + Rich YANG discovery tools
Validation Pydantic v2 Intent data validation
Lab ContainerLab + cEOS Development environment

📚 References

InfraHub

Prefect

YANG / gNMI

EVPN-VXLAN

🚀 Getting Started

Prerequisites

  • Python 3.12+
  • uv package manager
  • Docker (for InfraHub)
  • Access to ContainerLab with cEOS images

Quick Start

# Clone the repository (includes schema + data)
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
cd fabric-orchestrator

# Install Python dependencies
uv sync

# Start InfraHub (loads schema & data from this repo)
docker compose up -d

# Configure Prefect secrets
python -c "
from prefect.blocks.system import Secret
from prefect.variables import Variable

Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True)
Variable.set('infrahub_url', 'http://localhost:8000')
Variable.set('gnmi_username', 'admin')
"

# Verify gNMI connectivity
uv run fabric-orch discover capabilities --target leaf1:6030

# Run reconciliation
uv run fabric-orch plan
uv run fabric-orch apply

Prefect Flow Example

from prefect import flow, task
from prefect.variables import Variable


@task(retries=2, retry_delay_seconds=10)
def get_fabric_intent(device: str | None = None) -> dict:
    """Retrieve fabric intent from InfraHub."""
    from infrahub_sdk import InfrahubClient
    
    client = InfrahubClient(address=Variable.get("infrahub_url"))
    # Query fabric intent via GraphQL
    return client.query(...)


@task
def compute_diff(intent: dict, current: dict) -> list[dict]:
    """Compute diff between desired and current state."""
    from src.reconciler.diff import compute_diff as diff_engine
    return diff_engine(want=intent, have=current)


@flow(log_prints=True, name="fabric-reconcile")
def fabric_reconcile(device: str | None = None, dry_run: bool = True) -> dict:
    """Reconcile fabric state with InfraHub intent."""
    intent = get_fabric_intent(device)
    current = get_current_state(device)
    changes = compute_diff(intent, current)
    
    if not changes:
        print("✅ Fabric is in sync")
        return {"in_sync": True}
    
    if not dry_run:
        apply_changes(changes)
    
    return {"changes": changes, "applied": not dry_run}

Status: 🚧 Active Development - Phase 2 (InfraHub Setup & Core Reconciler)

Description
Declarative Network Fabric Orchestrator - Terraform-like infrastructure management for Arista EVPN-VXLAN using gNMI, YANG, and Infrahub as Source of Truth
Readme 710 KiB
Languages
Python 100%