Update Markdown tables including InfraHub comparison, Prefect benefits, and project progress phases to use consistent column padding. This improves the visual alignment and readability when viewing the raw source file.
324 lines
13 KiB
Markdown
324 lines
13 KiB
Markdown
# Fabric Orchestrator
|
|
|
|
**Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics**
|
|
|
|
A workflow-based orchestration system that uses [InfraHub](https://github.com/opsmill/infrahub) as Source of Truth, [Prefect](https://prefect.io) for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
|
|
|
|
## 🎯 Project Vision
|
|
|
|
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
|
|
|
|
- **Intent** is defined in InfraHub (custom schema, Git-versioned)
|
|
- **Orchestration** is handled by Prefect (Python-native `@flow` and `@task` decorators)
|
|
- **State** is continuously monitored via gNMI Subscribe
|
|
- **Changes** are computed as diffs and applied atomically via gNMI Set
|
|
- **Drift** is detected and optionally auto-remediated
|
|
|
|
Think `terraform plan` and `terraform apply`, but for your network fabric — powered by Prefect flows.
|
|
|
|
## 🏗️ Architecture
|
|
|
|

|
|
|
|
## 🎯 Why InfraHub?
|
|
|
|
We chose [InfraHub](https://github.com/opsmill/infrahub) over NetBox as Source of Truth for several reasons:
|
|
|
|
| Feature | NetBox | InfraHub |
|
|
| ------------------- | --------------------- | ------------------------------------ |
|
|
| **Schema** | Fixed DCIM/IPAM model | Fully customizable YAML schema |
|
|
| **Git Integration** | External sync needed | Native - branches = data branches |
|
|
| **Versioning** | Changelog only | True Git-like versioning with merges |
|
|
| **Test/Redeploy** | Dump/restore | `git clone` = complete environment |
|
|
| **Transforms** | Limited | Built-in Jinja2 + Python transforms |
|
|
| **GraphQL** | Yes | Yes (auto-generated from schema) |
|
|
|
|
**Key benefits for this project:**
|
|
|
|
1. **Custom Schema** - Model exactly what we need (VTEPs, MLAG pairs, fabric topology)
|
|
2. **Git-native** - Schema + data versioned together, easy test environment setup
|
|
3. **Transforms** - Generate device configs directly from InfraHub
|
|
4. **Branches** - Test fabric changes in isolated branches before merge
|
|
|
|
## 📦 Repository as InfraHub Backend
|
|
|
|
This repository serves as the **single source of truth** for both code and infrastructure data:
|
|
|
|
```
|
|
fabric-orchestrator/
|
|
├── .infrahub.yml # InfraHub repository config
|
|
│
|
|
├── schemas/ # InfraHub schema definitions
|
|
│ └── fabric.yml # Custom EVPN-VXLAN fabric schema
|
|
│
|
|
├── data/ # Infrastructure objects (YAML)
|
|
│ ├── topology/
|
|
│ │ ├── sites.yml
|
|
│ │ └── devices.yml # Spines, Leafs, VTEP pairs
|
|
│ ├── network/
|
|
│ │ ├── vlans.yml # VLANs + L2VNI mappings
|
|
│ │ ├── vrfs.yml # VRFs + L3VNI mappings
|
|
│ │ └── interfaces.yml # Interface configs
|
|
│ └── routing/
|
|
│ ├── bgp_sessions.yml # Underlay + EVPN overlay
|
|
│ └── evpn.yml # Route targets, RDs
|
|
│
|
|
├── transforms/ # Jinja2 templates for config generation
|
|
│ └── arista/
|
|
│ ├── base.j2
|
|
│ ├── interfaces.j2
|
|
│ ├── bgp.j2
|
|
│ └── evpn.j2
|
|
│
|
|
└── src/ # Python orchestration code
|
|
```
|
|
|
|
### Workflow
|
|
|
|
```bash
|
|
# 1. Edit data files (e.g., add a VLAN)
|
|
vim data/network/vlans.yml
|
|
|
|
# 2. Commit & push
|
|
git commit -am "Add VLAN 100 for production"
|
|
git push
|
|
|
|
# 3. InfraHub syncs automatically from Git
|
|
# → Data available via GraphQL
|
|
|
|
# 4. Prefect flow detects change → reconciles fabric
|
|
```
|
|
|
|
### Benefits
|
|
|
|
- **Reproductibility**: `git clone` → `docker compose up` → complete environment
|
|
- **Code Review**: Infrastructure changes go through PR review
|
|
- **History**: Full audit trail via Git
|
|
- **Testing**: Create a branch, test changes, merge when validated
|
|
|
|
## 🎛 Why Prefect?
|
|
|
|
| Feature | Benefit |
|
|
| -------------------------------- | ---------------------------------------------------------------------- |
|
|
| **Python-native workflows** | Use `@flow` and `@task` decorators — no YAML, just Python |
|
|
| **Free secrets management** | Native `Secret` blocks for credentials (free in OSS) |
|
|
| **Built-in UI** | Dashboard, logs, metrics, execution history via `prefect server start` |
|
|
| **No containerization required** | Run flows directly with `.serve()` — no Docker needed |
|
|
| **Event-driven triggers** | Schedule, webhooks (via FastAPI), flow triggers out of the box |
|
|
| **Task dependencies** | Automatic dependency ordering via task result passing or `wait_for` |
|
|
| **Retry & error handling** | Built-in retry policies with `@task(retries=3)` |
|
|
| **Human-in-the-loop** | Native `pause_flow_run()` for approval workflows |
|
|
|
|
## 🎯 Target Fabric
|
|
|
|
This project is designed for the Arista EVPN-VXLAN ContainerLab topology:
|
|
|
|
- **2 Spines** (BGP Route Reflectors, AS 65000)
|
|
- **8 Leafs** (4 MLAG VTEP pairs, AS 65001-65004)
|
|
- **cEOS 4.35.0F** with gNMI enabled
|
|
- **EVPN Type-2** (L2 VXLAN) and **Type-5** (L3 VXLAN) support
|
|
|
|
Reference: [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab)
|
|
|
|
## 📋 Project Phases
|
|
|
|
Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues) or filter by phase:
|
|
|
|
| Phase | Description | Status |
|
|
| ----------- | -------------------------------------------------------------------- | ------------- |
|
|
| **Phase 1** | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | ✅ Complete |
|
|
| **Phase 2** | InfraHub Setup & Core Reconciler - Schema, diff engine, YANG mappers | 🔄 In Progress |
|
|
| **Phase 3** | Full Fabric Coverage - BGP, MLAG, VRFs mappers | 📋 Planned |
|
|
| **Phase 4** | Prefect Integration - Flows, webhooks, drift detection | 📋 Planned |
|
|
|
|
## 📁 Project Structure
|
|
|
|
```
|
|
fabric-orchestrator/
|
|
├── README.md
|
|
├── pyproject.toml
|
|
├── .infrahub.yml # InfraHub config (points to schemas/)
|
|
│
|
|
├── schemas/ # InfraHub schema definitions
|
|
│ └── fabric.yml # Custom EVPN-VXLAN fabric schema
|
|
│
|
|
├── data/ # Infrastructure data (YAML)
|
|
│ ├── topology/
|
|
│ │ ├── sites.yml
|
|
│ │ └── devices.yml
|
|
│ ├── network/
|
|
│ │ ├── vlans.yml
|
|
│ │ ├── vrfs.yml
|
|
│ │ └── interfaces.yml
|
|
│ └── routing/
|
|
│ ├── bgp_sessions.yml
|
|
│ └── evpn.yml
|
|
│
|
|
├── transforms/ # Jinja2 config templates
|
|
│ └── arista/
|
|
│ └── *.j2
|
|
│
|
|
├── src/ # Python package
|
|
│ ├── __init__.py
|
|
│ ├── cli.py # CLI for YANG discovery
|
|
│ │
|
|
│ ├── flows/ # Prefect flows
|
|
│ │ ├── __init__.py
|
|
│ │ ├── reconcile.py # @flow fabric_reconcile
|
|
│ │ ├── drift.py # @flow handle_drift
|
|
│ │ └── remediation.py # @flow drift_remediation
|
|
│ │
|
|
│ ├── gnmi/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── client.py # gNMI client wrapper (pygnmi)
|
|
│ │ └── README.md
|
|
│ │
|
|
│ ├── infrahub/ # InfraHub integration
|
|
│ │ ├── __init__.py
|
|
│ │ ├── client.py # InfraHub SDK wrapper
|
|
│ │ └── queries.py # GraphQL queries
|
|
│ │
|
|
│ └── yang/
|
|
│ ├── __init__.py
|
|
│ ├── mapper.py # InfraHub intent → YANG paths
|
|
│ ├── paths.py # YANG path definitions
|
|
│ └── mappers/ # Resource-specific mappers
|
|
│ ├── vlan.py
|
|
│ ├── interface.py
|
|
│ ├── bgp.py
|
|
│ └── vxlan.py
|
|
│
|
|
├── tests/
|
|
│
|
|
└── docs/
|
|
├── cli-user-guide.md
|
|
└── yang-paths.md
|
|
```
|
|
|
|
## 🛠️ Technology Stack
|
|
|
|
| Component | Technology | Purpose |
|
|
| --------------- | -------------------------- | ------------------------------------ |
|
|
| Source of Truth | **InfraHub** | Intent definition via custom schema |
|
|
| Data Storage | **This Git repo** | Schema + data versioned together |
|
|
| Orchestrator | **Prefect** | Python-native workflow orchestration |
|
|
| Transport | gNMI | Configuration and telemetry |
|
|
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
|
|
| Python Library | pygnmi + infrahub-sdk | gNMI/InfraHub interactions |
|
|
| CLI | Click + Rich | YANG discovery tools |
|
|
| Validation | Pydantic v2 | Intent data validation |
|
|
| Lab | ContainerLab + cEOS | Development environment |
|
|
|
|
## 🔗 Related Projects
|
|
|
|
- [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab) - Target fabric topology
|
|
- [InfraHub](https://github.com/opsmill/infrahub) - Source of Truth platform
|
|
- [InfraHub Schema Library](https://github.com/opsmill/schema-library) - Reference schemas
|
|
- [Arista YANG Models](https://github.com/aristanetworks/yang/tree/master/EOS-4.35.0F) - EOS 4.35.0F YANG definitions
|
|
|
|
## 📚 References
|
|
|
|
### InfraHub
|
|
- [InfraHub Documentation](https://docs.infrahub.app)
|
|
- [InfraHub Schema Guide](https://docs.infrahub.app/guides/create-schema)
|
|
- [InfraHub Python SDK](https://github.com/opsmill/infrahub-sdk-python)
|
|
- [InfraHub .infrahub.yml Reference](https://docs.infrahub.app/reference/dotinfrahub)
|
|
|
|
### Prefect
|
|
- [Prefect Documentation](https://docs.prefect.io)
|
|
- [Prefect Flows](https://docs.prefect.io/latest/develop/write-flows/)
|
|
- [Prefect Tasks](https://docs.prefect.io/latest/develop/write-tasks/)
|
|
|
|
### YANG / gNMI
|
|
- [Arista gNMI Documentation](https://aristanetworks.github.io/openmgmt/configuration/gnmi/)
|
|
- [OpenConfig Models](https://github.com/openconfig/public)
|
|
- [pygnmi Library](https://github.com/akarneliuk/pygnmi)
|
|
|
|
### EVPN-VXLAN
|
|
- [Arista BGP EVPN Configuration Example](https://overlaid.net/2019/01/27/arista-bgp-evpn-configuration-example/)
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
- Python 3.12+
|
|
- `uv` package manager
|
|
- Docker (for InfraHub)
|
|
- Access to ContainerLab with cEOS images
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Clone the repository (includes schema + data)
|
|
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
|
|
cd fabric-orchestrator
|
|
|
|
# Install Python dependencies
|
|
uv sync
|
|
|
|
# Start InfraHub (loads schema & data from this repo)
|
|
docker compose up -d
|
|
|
|
# Configure Prefect secrets
|
|
python -c "
|
|
from prefect.blocks.system import Secret
|
|
from prefect.variables import Variable
|
|
|
|
Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True)
|
|
Variable.set('infrahub_url', 'http://localhost:8000')
|
|
Variable.set('gnmi_username', 'admin')
|
|
"
|
|
|
|
# Verify gNMI connectivity
|
|
uv run fabric-orch discover capabilities --target leaf1:6030
|
|
|
|
# Run reconciliation
|
|
uv run fabric-orch plan
|
|
uv run fabric-orch apply
|
|
```
|
|
|
|
## Prefect Flow Example
|
|
|
|
```python
|
|
from prefect import flow, task
|
|
from prefect.variables import Variable
|
|
|
|
|
|
@task(retries=2, retry_delay_seconds=10)
|
|
def get_fabric_intent(device: str | None = None) -> dict:
|
|
"""Retrieve fabric intent from InfraHub."""
|
|
from infrahub_sdk import InfrahubClient
|
|
|
|
client = InfrahubClient(address=Variable.get("infrahub_url"))
|
|
# Query fabric intent via GraphQL
|
|
return client.query(...)
|
|
|
|
|
|
@task
|
|
def compute_diff(intent: dict, current: dict) -> list[dict]:
|
|
"""Compute diff between desired and current state."""
|
|
from src.reconciler.diff import compute_diff as diff_engine
|
|
return diff_engine(want=intent, have=current)
|
|
|
|
|
|
@flow(log_prints=True, name="fabric-reconcile")
|
|
def fabric_reconcile(device: str | None = None, dry_run: bool = True) -> dict:
|
|
"""Reconcile fabric state with InfraHub intent."""
|
|
intent = get_fabric_intent(device)
|
|
current = get_current_state(device)
|
|
changes = compute_diff(intent, current)
|
|
|
|
if not changes:
|
|
print("✅ Fabric is in sync")
|
|
return {"in_sync": True}
|
|
|
|
if not dry_run:
|
|
apply_changes(changes)
|
|
|
|
return {"changes": changes, "applied": not dry_run}
|
|
```
|
|
|
|
---
|
|
|
|
**Status**: 🚧 Active Development - Phase 2 (InfraHub Setup & Core Reconciler)
|