docs: migrate README from Kestra to Prefect orchestration
- Replace Kestra references with Prefect - Update architecture diagram for Python-native flows - Add Kestra vs Prefect comparison table - Update project structure for Prefect flows - Update technology stack table - Add Prefect flow example with @flow/@task decorators - Update Getting Started section for Prefect - Update references and documentation links
This commit is contained in:
290
README.md
290
README.md
@@ -2,80 +2,90 @@
|
||||
|
||||
**Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics**
|
||||
|
||||
A workflow-based orchestration system that uses NetBox as Source of Truth, [Kestra](https://kestra.io) for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
|
||||
A workflow-based orchestration system that uses NetBox as Source of Truth, [Prefect](https://prefect.io) for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
|
||||
|
||||
## 🎯 Project Vision
|
||||
|
||||
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
|
||||
|
||||
- **Intent** is defined in NetBox (Custom Fields, Native Models, BGP Plugin)
|
||||
- **Orchestration** is handled by Kestra (declarative YAML workflows)
|
||||
- **Orchestration** is handled by Prefect (Python-native `@flow` and `@task` decorators)
|
||||
- **State** is continuously monitored via gNMI Subscribe
|
||||
- **Changes** are computed as diffs and applied atomically via gNMI Set
|
||||
- **Drift** is detected and optionally auto-remediated
|
||||
|
||||
Think `terraform plan` and `terraform apply`, but for your network fabric — powered by Kestra workflows.
|
||||
Think `terraform plan` and `terraform apply`, but for your network fabric — powered by Prefect flows.
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
│ INTENT LAYER │
|
||||
│ ┌─────────────────┐ ┌──────────────────────┐ ┌────────────────────┐ │
|
||||
│ ┌─────────────────────┐ ┌──────────────────────────┐ ┌──────────────────────┐ │
|
||||
│ │ NetBox │ │ Custom Fields / │ │ netbox-bgp │ │
|
||||
│ │ (SoT) │◄───│ Native Models │◄───│ Plugin │ │
|
||||
│ └────────┬────────┘ └──────────────────────┘ └────────────────────┘ │
|
||||
└───────────┼─────────────────────────────────────────────────────────────────┘
|
||||
│ └──────────┬──────────┘ └──────────────────────────┘ └──────────────────────┘ │
|
||||
└─────────────┼────────────────────────────────────────────────────────────────────────┘
|
||||
│ Webhook / Polling
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ORCHESTRATION LAYER (KESTRA) │
|
||||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ORCHESTRATION LAYER (PREFECT) │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Kestra Workflows (YAML) │ │
|
||||
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │ │
|
||||
│ │ │ fabric-reconcile│ │ drift-detection │ │ netbox-webhook-handler │ │ │
|
||||
│ │ │ (plan/apply) │ │ (subscribe) │ │ (event trigger) │ │ │
|
||||
│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │ │
|
||||
│ │ Prefect Flows (Python) │ │
|
||||
│ │ ┌───────────────────┐ ┌───────────────────┐ ┌─────────────────────┐ │ │
|
||||
│ │ │ fabric_reconcile │ │ handle_drift │ │ drift_remediation │ │ │
|
||||
│ │ │ (plan/apply) │ │ (subscribe) │ │ (auto-fix) │ │ │
|
||||
│ │ └───────────────────┘ └───────────────────┘ └─────────────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Python Tasks (containerized) │ │
|
||||
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ │ │
|
||||
│ │ │ Intent Parser │ │ Diff Engine │ │ gNMI Client (Get/Set) │ │ │
|
||||
│ │ Prefect Tasks (Python) │ │
|
||||
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────────────┐ │ │
|
||||
│ │ │ Intent Parser │ │ Diff Engine │ │ gNMI Client │ │ │
|
||||
│ │ │ (NetBox→YANG) │ │ (Want vs Have) │ │ (pygnmi wrapper) │ │ │
|
||||
│ │ └───────────────┘ └───────────────┘ └───────────────────────────┘ │ │
|
||||
│ │ └─────────────────┘ └─────────────────┘ └───────────────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
||||
│ │ Triggers: Webhook (NetBox) │ Schedule (cron) │ Flow (event-driven) ││
|
||||
│ └─────────────────────────────────────────────────────────────────────────┘│
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│ ┌────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ FastAPI Webhook Receiver │ Prefect .serve() │ Prefect Server (UI) │ │
|
||||
│ └────────────────────────────────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────────────────────────┘
|
||||
│ gNMI Get/Set/Subscribe
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
│ DEVICE LAYER │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ spine1 │ │ spine2 │ │ leaf1 │ │ leaf2 │ ... │
|
||||
│ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
└──────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 🎛 Why Kestra?
|
||||
## 🎛 Why Prefect?
|
||||
|
||||
We chose [Kestra](https://kestra.io) as the orchestration engine for several reasons:
|
||||
We chose [Prefect](https://prefect.io) as the orchestration engine for several reasons:
|
||||
|
||||
| Feature | Benefit |
|
||||
|---------|---------|
|
||||
| **Declarative YAML workflows** | Infrastructure-as-Code for orchestration logic |
|
||||
| **Built-in UI** | Dashboard, logs, metrics, execution history — no custom development |
|
||||
| **Native webhooks** | Direct NetBox integration without custom FastAPI server |
|
||||
| **Event-driven triggers** | Schedule, webhook, flow triggers out of the box |
|
||||
| **Python task support** | Run containerized Python scripts with dependencies |
|
||||
| **DAG support** | Automatic dependency ordering with `io.kestra.core.tasks.flows.Dag` |
|
||||
| **Retry & error handling** | Built-in retry policies and error notifications |
|
||||
| **Secrets management** | Native secrets storage for credentials |
|
||||
| **Python-native workflows** | Use `@flow` and `@task` decorators — no YAML, just Python |
|
||||
| **Free secrets management** | Native `Secret` blocks for credentials (free in OSS) |
|
||||
| **Built-in UI** | Dashboard, logs, metrics, execution history via `prefect server start` |
|
||||
| **No containerization required** | Run flows directly with `.serve()` — no Docker needed |
|
||||
| **Event-driven triggers** | Schedule, webhooks (via FastAPI), flow triggers out of the box |
|
||||
| **Task dependencies** | Automatic dependency ordering via task result passing or `wait_for` |
|
||||
| **Retry & error handling** | Built-in retry policies with `@task(retries=3)` |
|
||||
| **Human-in-the-loop** | Native `pause_flow_run()` for approval workflows |
|
||||
|
||||
### Kestra vs Prefect Comparison
|
||||
|
||||
| Aspect | Kestra | Prefect |
|
||||
|--------|--------|---------|
|
||||
| **Workflow definition** | YAML external files | Python code (`@flow`, `@task`) |
|
||||
| **Secrets management** | ❌ Paid in OSS | ✅ Free (`Secret` blocks) |
|
||||
| **Code packaging** | Container required | Not required |
|
||||
| **Integration** | Isolated scripts | Native Python integration |
|
||||
| **Webhooks** | Built-in triggers | FastAPI + `run_deployment()` |
|
||||
|
||||
## 🎯 Target Fabric
|
||||
|
||||
@@ -97,7 +107,7 @@ Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/
|
||||
| **Phase 1** | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | [phase-1-yang-discovery](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=1) |
|
||||
| **Phase 2** | Core Components - NetBox client, diff engine, gNMI operations | [phase-2-minimal-reconciler](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=2) |
|
||||
| **Phase 3** | Full Fabric - BGP, MLAG, VRFs, YANG mappers | [phase-3-full-fabric](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=3) |
|
||||
| **Phase 4** | Kestra Integration - Workflows, webhooks, drift detection | [phase-4-kestra](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=4) |
|
||||
| **Phase 4** | Prefect Integration - Flows, webhooks, drift detection | [phase-4-event-driven](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=4) |
|
||||
|
||||
📌 **Project Board**: [View Kanban](https://gitea.arnodo.fr/Damien/fabric-orchestrator/projects)
|
||||
|
||||
@@ -107,35 +117,40 @@ Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/
|
||||
fabric-orchestrator/
|
||||
├── README.md
|
||||
├── pyproject.toml
|
||||
├── docker-compose.yml # Kestra + PostgreSQL
|
||||
│
|
||||
├── kestra/ # Kestra workflows
|
||||
│ └── flows/
|
||||
│ ├── fabric-reconcile.yml # Main plan/apply workflow
|
||||
│ ├── netbox-webhook.yml # NetBox webhook handler
|
||||
│ ├── drift-detection.yml # Drift monitoring workflow
|
||||
│ └── device-config.yml # Per-device configuration
|
||||
│
|
||||
├── src/ # Python package (reusable code)
|
||||
├── src/ # Python package
|
||||
│ ├── __init__.py
|
||||
│ ├── cli.py # CLI for YANG discovery (discover commands)
|
||||
│ │
|
||||
│ ├── flows/ # Prefect flows
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── reconcile.py # @flow fabric_reconcile (plan/apply)
|
||||
│ │ ├── drift.py # @flow handle_drift
|
||||
│ │ └── remediation.py # @flow drift_remediation
|
||||
│ │
|
||||
│ ├── api/ # FastAPI webhook receiver
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── webhooks.py # NetBox webhook endpoint
|
||||
│ │
|
||||
│ ├── services/ # Long-running services
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── drift_monitor.py # gNMI Subscribe drift detection
|
||||
│ │
|
||||
│ ├── gnmi/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── client.py # gNMI client wrapper (pygnmi)
|
||||
│ │ └── README.md
|
||||
│ │
|
||||
│ ├── netbox/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── client.py # NetBox API client (pynetbox)
|
||||
│ │ └── models.py # Pydantic models for intent validation
|
||||
│ │
|
||||
│ └── yang/
|
||||
│ ├── __init__.py
|
||||
│ ├── mapper.py # NetBox intent → YANG paths
|
||||
│ └── paths.py # YANG path definitions
|
||||
│
|
||||
├── scripts/ # Scripts called by Kestra workflows
|
||||
│ ├── get_fabric_intent.py
|
||||
│ ├── diff_engine.py
|
||||
│ └── apply_changes.py
|
||||
│ ├── paths.py # YANG path definitions
|
||||
│ └── dependencies.py # Dependency ordering graph
|
||||
│
|
||||
├── tests/
|
||||
│
|
||||
@@ -150,7 +165,8 @@ fabric-orchestrator/
|
||||
| Component | Technology | Purpose |
|
||||
|-----------|------------|---------|
|
||||
| Source of Truth | NetBox + BGP Plugin | Intent definition via native models |
|
||||
| Orchestrator | **Kestra** | Declarative workflow orchestration |
|
||||
| Orchestrator | **Prefect** | Python-native workflow orchestration |
|
||||
| Webhooks | FastAPI | Receive NetBox webhooks |
|
||||
| Transport | gNMI | Configuration and telemetry |
|
||||
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
|
||||
| Python Library | pygnmi + pynetbox | gNMI/NetBox interactions |
|
||||
@@ -163,14 +179,16 @@ fabric-orchestrator/
|
||||
- [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab) - Target fabric topology
|
||||
- [projet-vxlan-automation](https://gitea.arnodo.fr/Damien/projet-vxlan-automation) - Previous NetBox RenderConfig work
|
||||
- [Arista YANG Models](https://github.com/aristanetworks/yang/tree/master/EOS-4.35.0F) - EOS 4.35.0F YANG definitions
|
||||
- [Kestra Documentation](https://kestra.io/docs) - Orchestration platform docs
|
||||
- [Prefect Documentation](https://docs.prefect.io) - Orchestration platform docs
|
||||
|
||||
## 📚 References
|
||||
|
||||
### Kestra
|
||||
- [Kestra Documentation](https://kestra.io/docs)
|
||||
- [Kestra Python Plugin](https://kestra.io/plugins/plugin-script-python)
|
||||
- [Kestra Webhook Triggers](https://kestra.io/docs/workflow-components/triggers/webhook-trigger)
|
||||
### Prefect
|
||||
- [Prefect Documentation](https://docs.prefect.io)
|
||||
- [Prefect Flows](https://docs.prefect.io/latest/develop/write-flows/)
|
||||
- [Prefect Tasks](https://docs.prefect.io/latest/develop/write-tasks/)
|
||||
- [Prefect Deployments](https://docs.prefect.io/latest/deploy/run-flows-in-local-processes/)
|
||||
- [Prefect Secrets](https://docs.prefect.io/latest/develop/blocks/#secret)
|
||||
|
||||
### YANG / gNMI
|
||||
- [Arista gNMI Documentation](https://aristanetworks.github.io/openmgmt/configuration/gnmi/)
|
||||
@@ -185,10 +203,10 @@ fabric-orchestrator/
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Docker and Docker Compose
|
||||
- Python 3.12+
|
||||
- `uv` package manager
|
||||
- Access to ContainerLab with cEOS images
|
||||
- NetBox instance with BGP plugin
|
||||
|
||||
### Quick Start
|
||||
|
||||
@@ -197,15 +215,24 @@ fabric-orchestrator/
|
||||
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
|
||||
cd fabric-orchestrator
|
||||
|
||||
# Start Kestra
|
||||
docker compose up -d
|
||||
|
||||
# Access Kestra UI
|
||||
open http://localhost:8080
|
||||
|
||||
# Install Python dependencies (for CLI tools)
|
||||
# Install Python dependencies
|
||||
uv sync
|
||||
|
||||
# Configure Prefect secrets
|
||||
python -c "
|
||||
from prefect.blocks.system import Secret
|
||||
from prefect.variables import Variable
|
||||
|
||||
Secret(value='your-netbox-token').save('netbox-token', overwrite=True)
|
||||
Secret(value='your-gnmi-password').save('gnmi-password', overwrite=True)
|
||||
|
||||
Variable.set('netbox_url', 'https://netbox.example.com')
|
||||
Variable.set('gnmi_username', 'admin')
|
||||
"
|
||||
|
||||
# Start Prefect server (optional, for UI)
|
||||
prefect server start
|
||||
|
||||
# Verify gNMI connectivity to your fabric
|
||||
uv run fabric-orch discover capabilities --target leaf1:6030
|
||||
|
||||
@@ -214,64 +241,105 @@ uv run fabric-orch discover get --target leaf1:6030 \
|
||||
--path "/interfaces/interface[name=Ethernet1]/state"
|
||||
```
|
||||
|
||||
### Kestra Workflow Example
|
||||
### Running Flows
|
||||
|
||||
```yaml
|
||||
id: fabric-reconcile
|
||||
namespace: network.fabric
|
||||
description: Reconcile fabric state with NetBox intent
|
||||
```python
|
||||
from src.flows.reconcile import fabric_reconcile
|
||||
|
||||
inputs:
|
||||
- id: device
|
||||
type: STRING
|
||||
required: false
|
||||
- id: auto_apply
|
||||
type: BOOLEAN
|
||||
defaults: false
|
||||
# Plan only (dry-run)
|
||||
result = fabric_reconcile(dry_run=True)
|
||||
|
||||
tasks:
|
||||
- id: get_intent
|
||||
type: io.kestra.plugin.scripts.python.Script
|
||||
containerImage: ghcr.io/damien/fabric-orchestrator:latest
|
||||
script: |
|
||||
from kestra import Kestra
|
||||
# Plan for a specific device
|
||||
result = fabric_reconcile(device="leaf1", dry_run=True)
|
||||
|
||||
# Apply changes automatically
|
||||
result = fabric_reconcile(auto_apply=True, dry_run=False)
|
||||
```
|
||||
|
||||
### Deploying with Scheduling
|
||||
|
||||
```bash
|
||||
# Start the flow with scheduling (runs every 6 hours)
|
||||
python -m src.flows.reconcile
|
||||
|
||||
# Or deploy via Prefect CLI
|
||||
prefect deployment run fabric-reconcile/fabric-reconcile-scheduled
|
||||
```
|
||||
|
||||
### Starting the Webhook Receiver
|
||||
|
||||
```bash
|
||||
# Start FastAPI webhook server
|
||||
uvicorn src.api.webhooks:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
## Prefect Flow Example
|
||||
|
||||
```python
|
||||
from prefect import flow, task
|
||||
from prefect.blocks.system import Secret
|
||||
from prefect.variables import Variable
|
||||
|
||||
|
||||
@task(retries=2, retry_delay_seconds=10)
|
||||
def get_fabric_intent(device: str | None = None) -> dict:
|
||||
"""Retrieve fabric intent from NetBox."""
|
||||
from src.netbox import FabricNetBoxClient
|
||||
|
||||
client = FabricNetBoxClient()
|
||||
intent = client.get_fabric_intent()
|
||||
Kestra.outputs({"intent": intent.model_dump()})
|
||||
netbox_url = Variable.get("netbox_url")
|
||||
netbox_token = Secret.load("netbox-token").get()
|
||||
|
||||
- id: compute_diff
|
||||
type: io.kestra.plugin.scripts.python.Script
|
||||
containerImage: ghcr.io/damien/fabric-orchestrator:latest
|
||||
script: |
|
||||
from kestra import Kestra
|
||||
# Compute diff between intent and current state
|
||||
Kestra.outputs({"changes": changes, "has_changes": len(changes) > 0})
|
||||
client = FabricNetBoxClient(url=netbox_url, token=netbox_token)
|
||||
return client.get_fabric_intent() if not device else client.get_device_intent(device)
|
||||
|
||||
- id: apply_changes
|
||||
type: io.kestra.plugin.scripts.python.Script
|
||||
runIf: "{{ outputs.compute_diff.vars.has_changes and inputs.auto_apply }}"
|
||||
containerImage: ghcr.io/damien/fabric-orchestrator:latest
|
||||
script: |
|
||||
from src.gnmi import GNMIClient
|
||||
# Apply changes via gNMI Set
|
||||
|
||||
triggers:
|
||||
- id: netbox_webhook
|
||||
type: io.kestra.plugin.core.trigger.Webhook
|
||||
key: "{{ secret('NETBOX_WEBHOOK_KEY') }}"
|
||||
@task
|
||||
def compute_diff(intent: dict, current: dict) -> list[dict]:
|
||||
"""Compute diff between desired and current state."""
|
||||
from src.reconciler.diff import compute_diff as diff_engine
|
||||
return diff_engine(want=intent, have=current)
|
||||
|
||||
- id: schedule
|
||||
type: io.kestra.plugin.core.trigger.Schedule
|
||||
cron: "0 */6 * * *"
|
||||
|
||||
errors:
|
||||
- id: notify_failure
|
||||
type: io.kestra.plugin.notifications.slack.SlackExecution
|
||||
url: "{{ secret('SLACK_WEBHOOK') }}"
|
||||
@task(retries=1)
|
||||
def apply_changes(changes: list[dict], dry_run: bool = True) -> dict:
|
||||
"""Apply changes via gNMI Set."""
|
||||
if dry_run:
|
||||
return {"applied": False, "changes": changes}
|
||||
# Apply via gNMI...
|
||||
return {"applied": True, "changes": changes}
|
||||
|
||||
|
||||
@flow(log_prints=True, name="fabric-reconcile")
|
||||
def fabric_reconcile(
|
||||
device: str | None = None,
|
||||
auto_apply: bool = False,
|
||||
dry_run: bool = True
|
||||
) -> dict:
|
||||
"""Reconcile fabric state with NetBox intent."""
|
||||
print(f"🔄 Starting fabric reconciliation")
|
||||
|
||||
intent = get_fabric_intent(device)
|
||||
current = get_current_state(devices)
|
||||
changes = compute_diff(intent, current)
|
||||
|
||||
if not changes:
|
||||
print("✅ No changes detected - fabric is in sync")
|
||||
return {"changes": [], "in_sync": True}
|
||||
|
||||
should_apply = auto_apply and not dry_run
|
||||
result = apply_changes(changes, dry_run=not should_apply)
|
||||
|
||||
return {"changes": changes, "applied": should_apply}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
fabric_reconcile.serve(
|
||||
name="fabric-reconcile-scheduled",
|
||||
cron="0 */6 * * *",
|
||||
tags=["network", "fabric"]
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Status**: 🚧 Active Development - Migrating to Kestra orchestration (Phase 4)
|
||||
**Status**: 🚧 Active Development - Migrating to Prefect orchestration (Phase 4)
|
||||
|
||||
Reference in New Issue
Block a user