Remove all references to local Infrahub schemas, data files, and .infrahub.yml config. The orchestrator now treats Infrahub as an external Source of Truth queried via infrahub-sdk. Schema and data are managed in the arista-evpn-vxlan-clab repository.
233 lines
10 KiB
Markdown
233 lines
10 KiB
Markdown
# Fabric Orchestrator
|
|
|
|
**Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics**
|
|
|
|
A workflow-based orchestration system that uses [InfraHub](https://github.com/opsmill/infrahub) as Source of Truth, [Prefect](https://prefect.io) for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
|
|
|
|
## 🎯 Project Vision
|
|
|
|
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
|
|
|
|
- **Intent** is defined in InfraHub (custom schema, Git-versioned)
|
|
- **Orchestration** is handled by Prefect (Python-native `@flow` and `@task` decorators)
|
|
- **State** is continuously monitored via gNMI Subscribe
|
|
- **Changes** are computed as diffs and applied atomically via gNMI Set
|
|
- **Drift** is detected and optionally auto-remediated
|
|
|
|
Think `terraform plan` and `terraform apply`, but for your network fabric — powered by Prefect flows.
|
|
|
|
## 🏗️ Architecture
|
|
|
|

|
|
|
|
## 🎯 Why InfraHub?
|
|
|
|
We chose [InfraHub](https://github.com/opsmill/infrahub) over NetBox as Source of Truth for several reasons:
|
|
|
|
| Feature | NetBox | InfraHub |
|
|
| ------------------- | ----------------------- | -------------------------------------------- |
|
|
| **Schema** | Fixed DCIM/IPAM model | Fully customizable YAML schema |
|
|
| **Git Integration** | External sync needed | Native - branches = data branches |
|
|
| **Versioning** | Changelog only | True Git-like versioning with merges |
|
|
| **Transforms** | Limited | Built-in Jinja2 + Python transforms |
|
|
| **GraphQL** | Yes | Yes (auto-generated from schema) |
|
|
|
|
**Key benefits for this project:**
|
|
|
|
1. **Custom Schema** - Model exactly what we need (VTEPs, MLAG pairs, fabric topology)
|
|
2. **Git-native** - Schema + data versioned together, easy test environment setup
|
|
3. **Transforms** - Generate device configs directly from InfraHub
|
|
4. **Branches** - Test fabric changes in isolated branches before merge
|
|
|
|
## 🎛 Why Prefect?
|
|
|
|
| Feature | Benefit |
|
|
| -------------------------------- | ------------------------------------------------------------------------- |
|
|
| **Python-native workflows** | Use `@flow` and `@task` decorators — no YAML, just Python |
|
|
| **Free secrets management** | Native `Secret` blocks for credentials (free in OSS) |
|
|
| **Built-in UI** | Dashboard, logs, metrics, execution history via `prefect server start` |
|
|
| **No containerization required** | Run flows directly with `.serve()` — no Docker needed |
|
|
| **Event-driven triggers** | Schedule, webhooks (via FastAPI), flow triggers out of the box |
|
|
| **Task dependencies** | Automatic dependency ordering via task result passing or `wait_for` |
|
|
| **Retry & error handling** | Built-in retry policies with `@task(retries=3)` |
|
|
| **Human-in-the-loop** | Native `pause_flow_run()` for approval workflows |
|
|
|
|
## 🎯 Target Fabric
|
|
|
|
This project is designed for Arista EVPN-VXLAN fabrics:
|
|
|
|
- **2 Spines** (BGP Route Reflectors)
|
|
- **8 Leafs** (4 MLAG VTEP pairs)
|
|
- **cEOS 4.35.0F** with gNMI enabled
|
|
- **EVPN Type-2** (L2 VXLAN) and **Type-5** (L3 VXLAN) support
|
|
|
|
Reference lab topology: [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab)
|
|
|
|
## 📋 Project Phases
|
|
|
|
Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues) or filter by phase:
|
|
|
|
| Phase | Description | Status |
|
|
| ----------- | ---------------------------------------------------------------------------- | -------------- |
|
|
| **Phase 1** | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | ✅ Complete |
|
|
| **Phase 2** | InfraHub Client & Core Reconciler - SDK client, diff engine, YANG mappers | 🔄 In Progress |
|
|
| **Phase 3** | Full Fabric Coverage - BGP, MLAG, VRFs mappers | 📋 Planned |
|
|
| **Phase 4** | Prefect Integration - Flows, webhooks, drift detection | 📋 Planned |
|
|
|
|
## 📁 Project Structure
|
|
|
|
```
|
|
fabric-orchestrator/
|
|
├── README.md
|
|
├── pyproject.toml
|
|
│
|
|
├── src/ # Python package
|
|
│ ├── __init__.py
|
|
│ ├── cli.py # CLI for YANG discovery
|
|
│ │
|
|
│ ├── gnmi/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── client.py # gNMI client wrapper (pygnmi)
|
|
│ │ └── README.md
|
|
│ │
|
|
│ ├── infrahub/ # InfraHub integration
|
|
│ │ ├── __init__.py
|
|
│ │ ├── client.py # InfraHub SDK wrapper
|
|
│ │ ├── models.py # Pydantic intent models
|
|
│ │ └── exceptions.py # Client exceptions
|
|
│ │
|
|
│ └── yang/
|
|
│ ├── __init__.py
|
|
│ ├── mapper.py # InfraHub intent → YANG paths
|
|
│ ├── paths.py # YANG path definitions
|
|
│ └── mappers/ # Resource-specific mappers
|
|
│ ├── vlan.py
|
|
│ ├── interface.py
|
|
│ ├── bgp.py
|
|
│ └── vxlan.py
|
|
│
|
|
├── tests/
|
|
│
|
|
└── docs/
|
|
├── cli-user-guide.md
|
|
└── yang-paths.md
|
|
```
|
|
|
|
## 🛠️ Technology Stack
|
|
|
|
| Component | Technology | Purpose |
|
|
| --------------- | ------------------------------- | ------------------------------------------ |
|
|
| Source of Truth | **InfraHub** | Intent definition via custom schema |
|
|
| Orchestrator | **Prefect** | Python-native workflow orchestration |
|
|
| Transport | gNMI | Configuration and telemetry |
|
|
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
|
|
| Python Library | pygnmi + infrahub-sdk | gNMI/InfraHub interactions |
|
|
| CLI | Click + Rich | YANG discovery tools |
|
|
| Validation | Pydantic v2 | Intent data validation |
|
|
| Lab | ContainerLab + cEOS | Development environment |
|
|
|
|
## 🔗 Related Projects
|
|
|
|
- [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab) - Lab topology, InfraHub schemas & data
|
|
- [InfraHub](https://github.com/opsmill/infrahub) - Source of Truth platform
|
|
- [InfraHub Schema Library](https://github.com/opsmill/schema-library) - Reference schemas
|
|
- [Arista YANG Models](https://github.com/aristanetworks/yang/tree/master/EOS-4.35.0F) - EOS 4.35.0F YANG definitions
|
|
|
|
## 📚 References
|
|
|
|
### InfraHub
|
|
- [InfraHub Documentation](https://docs.infrahub.app)
|
|
- [InfraHub Schema Guide](https://docs.infrahub.app/guides/create-schema)
|
|
- [InfraHub Python SDK](https://github.com/opsmill/infrahub-sdk-python)
|
|
|
|
### Prefect
|
|
- [Prefect Documentation](https://docs.prefect.io)
|
|
- [Prefect Flows](https://docs.prefect.io/latest/develop/write-flows/)
|
|
- [Prefect Tasks](https://docs.prefect.io/latest/develop/write-tasks/)
|
|
|
|
### YANG / gNMI
|
|
- [Arista gNMI Documentation](https://aristanetworks.github.io/openmgmt/configuration/gnmi/)
|
|
- [OpenConfig Models](https://github.com/openconfig/public)
|
|
- [pygnmi Library](https://github.com/akarneliuk/pygnmi)
|
|
|
|
### EVPN-VXLAN
|
|
- [Arista BGP EVPN Configuration Example](https://overlaid.net/2019/01/27/arista-bgp-evpn-configuration-example/)
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
- Python 3.12+
|
|
- `uv` package manager
|
|
- Access to an InfraHub instance with the EVPN-VXLAN fabric schema loaded
|
|
- Access to ContainerLab with cEOS images (for lab testing)
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
|
|
cd fabric-orchestrator
|
|
|
|
# Install Python dependencies
|
|
uv sync
|
|
|
|
# Set InfraHub connection (point to your InfraHub instance)
|
|
export INFRAHUB_ADDRESS="http://localhost:8000"
|
|
export INFRAHUB_API_TOKEN="your-token"
|
|
|
|
# Verify gNMI connectivity
|
|
uv run fabric-orch discover capabilities --target leaf1:6030
|
|
|
|
# Run reconciliation
|
|
uv run fabric-orch plan
|
|
uv run fabric-orch apply
|
|
```
|
|
|
|
## Prefect Flow Example
|
|
|
|
```python
|
|
from prefect import flow, task
|
|
from prefect.variables import Variable
|
|
|
|
|
|
@task(retries=2, retry_delay_seconds=10)
|
|
async def get_fabric_intent(device: str | None = None) -> dict:
|
|
"""Retrieve fabric intent from InfraHub."""
|
|
from src.infrahub.client import FabricInfrahubClient
|
|
|
|
async with FabricInfrahubClient(
|
|
url=Variable.get("infrahub_url"),
|
|
api_token=Variable.get("infrahub_token"),
|
|
) as client:
|
|
return await client.get_device(device)
|
|
|
|
|
|
@task
|
|
def compute_diff(intent: dict, current: dict) -> list[dict]:
|
|
"""Compute diff between desired and current state."""
|
|
from src.reconciler.diff import compute_diff as diff_engine
|
|
return diff_engine(want=intent, have=current)
|
|
|
|
|
|
@flow(log_prints=True, name="fabric-reconcile")
|
|
def fabric_reconcile(device: str | None = None, dry_run: bool = True) -> dict:
|
|
"""Reconcile fabric state with InfraHub intent."""
|
|
intent = get_fabric_intent(device)
|
|
current = get_current_state(device)
|
|
changes = compute_diff(intent, current)
|
|
|
|
if not changes:
|
|
print("✅ Fabric is in sync")
|
|
return {"in_sync": True}
|
|
|
|
if not dry_run:
|
|
apply_changes(changes)
|
|
|
|
return {"changes": changes, "applied": not dry_run}
|
|
```
|
|
|
|
---
|
|
|
|
**Status**: 🚧 Active Development - Phase 2 (InfraHub Client & Core Reconciler)
|