Files
fabric-orchestrator/README.md
Damien Arnodo 77ca22bd0a docs: Update README for InfraHub migration
- Replace NetBox with InfraHub as Source of Truth
- Update architecture diagram
- Explain InfraHub benefits (Git-native, custom schema)
- Update project structure (remove netbox references)
- Update technology stack
- Revise project phases for new approach
2026-02-05 08:42:51 +00:00

339 lines
18 KiB
Markdown

# Fabric Orchestrator
**Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics**
A workflow-based orchestration system that uses [InfraHub](https://github.com/opsmill/infrahub) as Source of Truth, [Prefect](https://prefect.io) for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
## 🎯 Project Vision
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
- **Intent** is defined in InfraHub (custom schema, Git-versioned)
- **Orchestration** is handled by Prefect (Python-native `@flow` and `@task` decorators)
- **State** is continuously monitored via gNMI Subscribe
- **Changes** are computed as diffs and applied atomically via gNMI Set
- **Drift** is detected and optionally auto-remediated
Think `terraform plan` and `terraform apply`, but for your network fabric — powered by Prefect flows.
## 🏗️ Architecture
```
┌──────────────────────────────────────────────────────────────────────────────┐
│ INTENT LAYER │
│ ┌─────────────────────────┐ ┌──────────────────────────────────────────┐ │
│ │ InfraHub │ │ Git Repository │ │
│ │ (Source of Truth) │◄──►│ - Schema definitions (YAML) │ │
│ │ │ │ - Transforms (Jinja2/Python) │ │
│ │ • Custom fabric schema │ │ - Version-controlled intent │ │
│ │ • GraphQL API │ └──────────────────────────────────────────┘ │
│ │ • Branch-based changes │ │
│ └────────────┬────────────┘ │
└───────────────┼──────────────────────────────────────────────────────────────┘
│ GraphQL / SDK
┌──────────────────────────────────────────────────────────────────────────────┐
│ ORCHESTRATION LAYER (PREFECT) │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Prefect Flows (Python) │ │
│ │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ │
│ │ │ fabric_reconcile │ │ handle_drift │ │ drift_remediation │ │ │
│ │ │ (plan/apply) │ │ (subscribe) │ │ (auto-fix) │ │ │
│ │ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Prefect Tasks (Python) │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐│ │
│ │ │ Intent Parser │ │ Diff Engine │ │ gNMI Client ││ │
│ │ │ (InfraHub→YANG) │ │ (Want vs Have) │ │ (pygnmi wrapper) ││ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘│ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Prefect Server (UI) │ Prefect .serve() │ Webhook Receiver │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────┬───────────────────────────────────────────────────┘
│ gNMI Get/Set/Subscribe
┌──────────────────────────────────────────────────────────────────────────────┐
│ DEVICE LAYER │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ spine1 │ │ spine2 │ │ leaf1 │ │ leaf2 │ ... │
│ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
```
## 🎯 Why InfraHub?
We chose [InfraHub](https://github.com/opsmill/infrahub) over NetBox as Source of Truth for several reasons:
| Feature | NetBox | InfraHub |
|---------|--------|----------|
| **Schema** | Fixed DCIM/IPAM model | Fully customizable YAML schema |
| **Git Integration** | External sync needed | Native - branches = data branches |
| **Versioning** | Changelog only | True Git-like versioning with merges |
| **Test/Redeploy** | Dump/restore | `git clone` = complete environment |
| **Transforms** | Limited | Built-in Jinja2 + Python transforms |
| **GraphQL** | Yes | Yes (auto-generated from schema) |
**Key benefits for this project:**
1. **Custom Schema** - Model exactly what we need (VTEPs, MLAG pairs, fabric topology)
2. **Git-native** - Schema + data versioned together, easy test environment setup
3. **Transforms** - Generate device configs directly from InfraHub
4. **Branches** - Test fabric changes in isolated branches before merge
## 🎛 Why Prefect?
We chose [Prefect](https://prefect.io) as the orchestration engine for several reasons:
| Feature | Benefit |
|---------|---------|
| **Python-native workflows** | Use `@flow` and `@task` decorators — no YAML, just Python |
| **Free secrets management** | Native `Secret` blocks for credentials (free in OSS) |
| **Built-in UI** | Dashboard, logs, metrics, execution history via `prefect server start` |
| **No containerization required** | Run flows directly with `.serve()` — no Docker needed |
| **Event-driven triggers** | Schedule, webhooks (via FastAPI), flow triggers out of the box |
| **Task dependencies** | Automatic dependency ordering via task result passing or `wait_for` |
| **Retry & error handling** | Built-in retry policies with `@task(retries=3)` |
| **Human-in-the-loop** | Native `pause_flow_run()` for approval workflows |
## 🎯 Target Fabric
This project is designed for the Arista EVPN-VXLAN ContainerLab topology:
- **2 Spines** (BGP Route Reflectors, AS 65000)
- **8 Leafs** (4 MLAG VTEP pairs, AS 65001-65004)
- **cEOS 4.35.0F** with gNMI enabled
- **EVPN Type-2** (L2 VXLAN) and **Type-5** (L3 VXLAN) support
Reference: [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab)
## 📋 Project Phases
Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues) or filter by phase:
| Phase | Description | Status |
|-------|-------------|--------|
| **Phase 1** | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | ✅ Complete |
| **Phase 2** | InfraHub Setup & Core Reconciler - Schema, diff engine, YANG mappers | 🔄 In Progress |
| **Phase 3** | Full Fabric Coverage - BGP, MLAG, VRFs mappers | 📋 Planned |
| **Phase 4** | Prefect Integration - Flows, webhooks, drift detection | 📋 Planned |
## 📁 Project Structure
```
fabric-orchestrator/
├── README.md
├── pyproject.toml
├── src/ # Python package
│ ├── __init__.py
│ ├── cli.py # CLI for YANG discovery (discover commands)
│ │
│ ├── flows/ # Prefect flows
│ │ ├── __init__.py
│ │ ├── reconcile.py # @flow fabric_reconcile (plan/apply)
│ │ ├── drift.py # @flow handle_drift
│ │ └── remediation.py # @flow drift_remediation
│ │
│ ├── api/ # FastAPI webhook receiver
│ │ ├── __init__.py
│ │ └── webhooks.py # InfraHub webhook endpoint
│ │
│ ├── services/ # Long-running services
│ │ ├── __init__.py
│ │ └── drift_monitor.py # gNMI Subscribe drift detection
│ │
│ ├── gnmi/
│ │ ├── __init__.py
│ │ ├── client.py # gNMI client wrapper (pygnmi)
│ │ └── README.md
│ │
│ ├── infrahub/ # InfraHub integration (TODO)
│ │ ├── __init__.py
│ │ ├── client.py # InfraHub SDK client
│ │ └── models.py # Pydantic models for intent validation
│ │
│ └── yang/
│ ├── __init__.py
│ ├── mapper.py # InfraHub intent → YANG paths
│ ├── paths.py # YANG path definitions
│ └── dependencies.py # Dependency ordering graph
├── schemas/ # InfraHub schema definitions (TODO)
│ └── fabric.yml # Custom fabric schema
├── tests/
└── docs/
├── cli-user-guide.md # CLI documentation
└── yang-paths.md # Documented YANG paths
```
## 🛠️ Technology Stack
| Component | Technology | Purpose |
|-----------|------------|---------|
| Source of Truth | **InfraHub** | Intent definition via custom schema |
| Orchestrator | **Prefect** | Python-native workflow orchestration |
| Webhooks | FastAPI | Receive InfraHub webhooks |
| Transport | gNMI | Configuration and telemetry |
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
| Python Library | pygnmi + infrahub-sdk | gNMI/InfraHub interactions |
| CLI | Click + Rich | YANG discovery tools |
| Validation | Pydantic v2 | Intent data validation |
| Lab | ContainerLab + cEOS | Development environment |
## 🔗 Related Projects
- [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab) - Target fabric topology
- [InfraHub](https://github.com/opsmill/infrahub) - Source of Truth platform
- [InfraHub Schema Library](https://github.com/opsmill/schema-library) - Reference schemas
- [Arista YANG Models](https://github.com/aristanetworks/yang/tree/master/EOS-4.35.0F) - EOS 4.35.0F YANG definitions
- [Prefect Documentation](https://docs.prefect.io) - Orchestration platform docs
## 📚 References
### InfraHub
- [InfraHub Documentation](https://docs.infrahub.app)
- [InfraHub Schema Guide](https://docs.infrahub.app/guides/create-schema)
- [InfraHub Python SDK](https://github.com/opsmill/infrahub-sdk-python)
### Prefect
- [Prefect Documentation](https://docs.prefect.io)
- [Prefect Flows](https://docs.prefect.io/latest/develop/write-flows/)
- [Prefect Tasks](https://docs.prefect.io/latest/develop/write-tasks/)
- [Prefect Deployments](https://docs.prefect.io/latest/deploy/run-flows-in-local-processes/)
- [Prefect Secrets](https://docs.prefect.io/latest/develop/blocks/#secret)
### YANG / gNMI
- [Arista gNMI Documentation](https://aristanetworks.github.io/openmgmt/configuration/gnmi/)
- [OpenConfig Models](https://github.com/openconfig/public)
- [pygnmi Library](https://github.com/akarneliuk/pygnmi)
### EVPN-VXLAN
- [Arista BGP EVPN Configuration Example](https://overlaid.net/2019/01/27/arista-bgp-evpn-configuration-example/)
- [Arista EVPN Deployment Guide](https://www.arista.com/en/solutions/evpn-vxlan)
## 🚀 Getting Started
### Prerequisites
- Python 3.12+
- `uv` package manager
- Access to ContainerLab with cEOS images
- Docker (for InfraHub)
### Quick Start
```bash
# Clone the repository
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
cd fabric-orchestrator
# Install Python dependencies
uv sync
# Start InfraHub (see InfraHub docs for full setup)
# docker compose -f infrahub-docker-compose.yml up -d
# Configure Prefect secrets
python -c "
from prefect.blocks.system import Secret
from prefect.variables import Variable
Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True)
Secret(value='your-infrahub-token').save('infrahub-token', overwrite=True)
Variable.set('infrahub_url', 'http://localhost:8000')
Variable.set('gnmi_username', 'admin')
"
# Start Prefect server (optional, for UI)
prefect server start
# Verify gNMI connectivity to your fabric
uv run fabric-orch discover capabilities --target leaf1:6030
# Explore YANG paths
uv run fabric-orch discover get --target leaf1:6030 \
--path "/interfaces/interface[name=Ethernet1]/state"
```
## Prefect Flow Example
```python
from prefect import flow, task
from prefect.blocks.system import Secret
from prefect.variables import Variable
@task(retries=2, retry_delay_seconds=10)
def get_fabric_intent(device: str | None = None) -> dict:
"""Retrieve fabric intent from InfraHub."""
from infrahub_sdk import InfrahubClient
infrahub_url = Variable.get("infrahub_url")
infrahub_token = Secret.load("infrahub-token").get()
client = InfrahubClient(address=infrahub_url, api_token=infrahub_token)
# Query fabric intent via GraphQL
# ...
return intent
@task
def compute_diff(intent: dict, current: dict) -> list[dict]:
"""Compute diff between desired and current state."""
from src.reconciler.diff import compute_diff as diff_engine
return diff_engine(want=intent, have=current)
@task(retries=1)
def apply_changes(changes: list[dict], dry_run: bool = True) -> dict:
"""Apply changes via gNMI Set."""
if dry_run:
return {"applied": False, "changes": changes}
# Apply via gNMI...
return {"applied": True, "changes": changes}
@flow(log_prints=True, name="fabric-reconcile")
def fabric_reconcile(
device: str | None = None,
auto_apply: bool = False,
dry_run: bool = True
) -> dict:
"""Reconcile fabric state with InfraHub intent."""
print(f"🔄 Starting fabric reconciliation")
intent = get_fabric_intent(device)
current = get_current_state(devices)
changes = compute_diff(intent, current)
if not changes:
print("✅ No changes detected - fabric is in sync")
return {"changes": [], "in_sync": True}
should_apply = auto_apply and not dry_run
result = apply_changes(changes, dry_run=not should_apply)
return {"changes": changes, "applied": should_apply}
if __name__ == "__main__":
fabric_reconcile.serve(
name="fabric-reconcile-scheduled",
cron="0 */6 * * *",
tags=["network", "fabric"]
)
```
---
**Status**: 🚧 Active Development - Migrating to InfraHub as Source of Truth