docs: update README with Kestra orchestration architecture
- Replace custom Python asyncio/Redis architecture with Kestra - Update architecture diagram to show Kestra workflow layer - Add "Why Kestra?" section explaining the choice - Update project structure for Kestra workflows - Update technology stack table - Add Kestra workflow example - Update references with Kestra documentation
This commit is contained in:
235
README.md
235
README.md
@@ -2,59 +2,82 @@
|
|||||||
|
|
||||||
**Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics**
|
**Declarative Network Infrastructure Management for Arista EVPN-VXLAN Fabrics**
|
||||||
|
|
||||||
A Terraform-like orchestration system that uses NetBox as Source of Truth and gNMI/YANG for atomic configuration management of Arista data center fabrics.
|
A workflow-based orchestration system that uses NetBox as Source of Truth, [Kestra](https://kestra.io) for orchestration, and gNMI/YANG for atomic configuration management of Arista data center fabrics.
|
||||||
|
|
||||||
## 🎯 Project Vision
|
## 🎯 Project Vision
|
||||||
|
|
||||||
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
|
Transform network infrastructure management from imperative scripting to true declarative infrastructure-as-code, where:
|
||||||
|
|
||||||
- **Intent** is defined in NetBox (ConfigContexts, Custom Fields)
|
- **Intent** is defined in NetBox (Custom Fields, Native Models, BGP Plugin)
|
||||||
|
- **Orchestration** is handled by Kestra (declarative YAML workflows)
|
||||||
- **State** is continuously monitored via gNMI Subscribe
|
- **State** is continuously monitored via gNMI Subscribe
|
||||||
- **Changes** are computed as diffs and applied atomically via gNMI Set
|
- **Changes** are computed as diffs and applied atomically via gNMI Set
|
||||||
- **Drift** is detected and optionally auto-remediated
|
- **Drift** is detected and optionally auto-remediated
|
||||||
|
|
||||||
Think `terraform plan` and `terraform apply`, but for your network fabric.
|
Think `terraform plan` and `terraform apply`, but for your network fabric — powered by Kestra workflows.
|
||||||
|
|
||||||
## 🏗️ Architecture
|
## 🏗️ Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
│ INTENT LAYER │
|
│ INTENT LAYER │
|
||||||
│ ┌─────────────┐ ┌──────────────────┐ ┌─────────────────────────────┐ │
|
│ ┌─────────────────┐ ┌──────────────────────┐ ┌────────────────────┐ │
|
||||||
│ │ NetBox │ │ ConfigContexts │ │ Custom Fields / Tags │ │
|
│ │ NetBox │ │ Custom Fields / │ │ netbox-bgp │ │
|
||||||
│ │ (SoT) │◄──►│ (Structured │◄──►│ (VLAN, VNI, VRF, BGP AS) │ │
|
│ │ (SoT) │◄───│ Native Models │◄───│ Plugin │ │
|
||||||
│ │ │ │ Intent Data) │ │ │ │
|
│ └────────┬────────┘ └──────────────────────┘ └────────────────────┘ │
|
||||||
│ └──────┬──────┘ └──────────────────┘ └─────────────────────────────┘ │
|
└───────────┼─────────────────────────────────────────────────────────────────┘
|
||||||
└─────────┼───────────────────────────────────────────────────────────────────┘
|
|
||||||
│ Webhook / Polling
|
│ Webhook / Polling
|
||||||
▼
|
▼
|
||||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
│ ORCHESTRATION LAYER │
|
│ ORCHESTRATION LAYER (KESTRA) │
|
||||||
|
│ │
|
||||||
|
│ ┌────────────────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Kestra Workflows (YAML) │ │
|
||||||
|
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │ │
|
||||||
|
│ │ │ fabric-reconcile│ │ drift-detection │ │ netbox-webhook-handler │ │ │
|
||||||
|
│ │ │ (plan/apply) │ │ (subscribe) │ │ (event trigger) │ │ │
|
||||||
|
│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │ │
|
||||||
|
│ └────────────────────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌────────────────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Python Tasks (containerized) │ │
|
||||||
|
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ │ │
|
||||||
|
│ │ │ Intent Parser │ │ Diff Engine │ │ gNMI Client (Get/Set) │ │ │
|
||||||
|
│ │ │ (NetBox→YANG) │ │ (Want vs Have)│ │ (pygnmi wrapper) │ │ │
|
||||||
|
│ │ └───────────────┘ └───────────────┘ └───────────────────────────┘ │ │
|
||||||
|
│ └────────────────────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
||||||
│ │ State Reconciliation Engine ││
|
│ │ Triggers: Webhook (NetBox) │ Schedule (cron) │ Flow (event-driven) ││
|
||||||
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ ││
|
|
||||||
│ │ │ Intent Parser │ │ Diff Engine │ │ Transaction Planner │ ││
|
|
||||||
│ │ │ (NetBox→YANG) │──►│ (Want vs Have)│──►│ (Ordered gNMI SetReqs) │ ││
|
|
||||||
│ │ └───────────────┘ └───────────────┘ └───────────────────────────┘ ││
|
|
||||||
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
||||||
│ │ │
|
|
||||||
│ ┌─────────────────────────────────┼───────────────────────────────────────┐│
|
|
||||||
│ │ Event Bus (Redis / NATS) ││
|
|
||||||
│ │ • config_drift_detected • intent_changed • apply_complete ││
|
|
||||||
│ └─────────────────────────────────────────────────────────────────────────┘│
|
│ └─────────────────────────────────────────────────────────────────────────┘│
|
||||||
└─────────────────────────────────────────────────────────────────────────────┘
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
│ gNMI Subscribe (Telemetry) │ gNMI Set (Config)
|
│ gNMI Get/Set/Subscribe
|
||||||
▼ ▼
|
▼
|
||||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
│ DEVICE LAYER │
|
│ DEVICE LAYER │
|
||||||
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||||
│ │ spine1 │ │ spine2 │ │ leaf1 │ │ leaf2 │ ... │
|
│ │ spine1 │ │ spine2 │ │ leaf1 │ │ leaf2 │ ... │
|
||||||
│ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │
|
│ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │ gNMI:6030 │ │
|
||||||
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
|
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||||
└─────────────────────────────────────────────────────────────────────────────┘
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🔧 Target Fabric
|
## 🎛 Why Kestra?
|
||||||
|
|
||||||
|
We chose [Kestra](https://kestra.io) as the orchestration engine for several reasons:
|
||||||
|
|
||||||
|
| Feature | Benefit |
|
||||||
|
|---------|---------|
|
||||||
|
| **Declarative YAML workflows** | Infrastructure-as-Code for orchestration logic |
|
||||||
|
| **Built-in UI** | Dashboard, logs, metrics, execution history — no custom development |
|
||||||
|
| **Native webhooks** | Direct NetBox integration without custom FastAPI server |
|
||||||
|
| **Event-driven triggers** | Schedule, webhook, flow triggers out of the box |
|
||||||
|
| **Python task support** | Run containerized Python scripts with dependencies |
|
||||||
|
| **DAG support** | Automatic dependency ordering with `io.kestra.core.tasks.flows.Dag` |
|
||||||
|
| **Retry & error handling** | Built-in retry policies and error notifications |
|
||||||
|
| **Secrets management** | Native secrets storage for credentials |
|
||||||
|
|
||||||
|
## 🎯 Target Fabric
|
||||||
|
|
||||||
This project is designed for the Arista EVPN-VXLAN ContainerLab topology:
|
This project is designed for the Arista EVPN-VXLAN ContainerLab topology:
|
||||||
|
|
||||||
@@ -70,11 +93,11 @@ Reference: [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-v
|
|||||||
Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues) or filter by phase:
|
Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues) or filter by phase:
|
||||||
|
|
||||||
| Phase | Description | Issues |
|
| Phase | Description | Issues |
|
||||||
| ----------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
|
|-------|-------------|--------|
|
||||||
| **Phase 1** | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | [phase-1-yang-discovery](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=1) |
|
| **Phase 1** | YANG Path Discovery - Map EOS 4.35.0F YANG models, validate gNMI | [phase-1-yang-discovery](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=1) |
|
||||||
| **Phase 2** | Minimal Reconciler - VLANs/VNIs, diff engine, CLI plan/apply | [phase-2-minimal-reconciler](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=2) |
|
| **Phase 2** | Core Components - NetBox client, diff engine, gNMI operations | [phase-2-minimal-reconciler](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=2) |
|
||||||
| **Phase 3** | Full Fabric - BGP, MLAG, VRFs, dependency ordering | [phase-3-full-fabric](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=3) |
|
| **Phase 3** | Full Fabric - BGP, MLAG, VRFs, YANG mappers | [phase-3-full-fabric](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=3) |
|
||||||
| **Phase 4** | Event-Driven - gNMI Subscribe, drift detection, webhooks | [phase-4-event-driven](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=4) |
|
| **Phase 4** | Kestra Integration - Workflows, webhooks, drift detection | [phase-4-kestra](https://gitea.arnodo.fr/Damien/fabric-orchestrator/issues?type=all&state=all&labels=4) |
|
||||||
|
|
||||||
📌 **Project Board**: [View Kanban](https://gitea.arnodo.fr/Damien/fabric-orchestrator/projects)
|
📌 **Project Board**: [View Kanban](https://gitea.arnodo.fr/Damien/fabric-orchestrator/projects)
|
||||||
|
|
||||||
@@ -84,46 +107,55 @@ Progress is tracked via issues. See [all issues](https://gitea.arnodo.fr/Damien/
|
|||||||
fabric-orchestrator/
|
fabric-orchestrator/
|
||||||
├── README.md
|
├── README.md
|
||||||
├── pyproject.toml
|
├── pyproject.toml
|
||||||
├── docker-compose.yml # Redis, API server
|
├── docker-compose.yml # Kestra + PostgreSQL
|
||||||
├── src/
|
│
|
||||||
|
├── kestra/ # Kestra workflows
|
||||||
|
│ └── flows/
|
||||||
|
│ ├── fabric-reconcile.yml # Main plan/apply workflow
|
||||||
|
│ ├── netbox-webhook.yml # NetBox webhook handler
|
||||||
|
│ ├── drift-detection.yml # Drift monitoring workflow
|
||||||
|
│ └── device-config.yml # Per-device configuration
|
||||||
|
│
|
||||||
|
├── src/ # Python package (reusable code)
|
||||||
│ ├── __init__.py
|
│ ├── __init__.py
|
||||||
│ ├── cli.py # CLI interface (plan, apply, drift)
|
│ ├── cli.py # CLI for YANG discovery (discover commands)
|
||||||
│ ├── api.py # FastAPI server for webhooks
|
|
||||||
│ ├── reconciler/
|
|
||||||
│ │ ├── engine.py # Core reconciliation logic
|
|
||||||
│ │ ├── diff.py # State comparison
|
|
||||||
│ │ └── planner.py # Change ordering/dependencies
|
|
||||||
│ ├── yang/
|
|
||||||
│ │ ├── mapper.py # NetBox intent → YANG paths
|
|
||||||
│ │ ├── paths.py # YANG path definitions
|
|
||||||
│ │ └── validators.py # Schema validation
|
|
||||||
│ ├── gnmi/
|
│ ├── gnmi/
|
||||||
│ │ ├── client.py # gNMI client wrapper
|
│ │ ├── __init__.py
|
||||||
│ │ └── transactions.py # Atomic operations
|
│ │ ├── client.py # gNMI client wrapper (pygnmi)
|
||||||
|
│ │ └── README.md
|
||||||
│ ├── netbox/
|
│ ├── netbox/
|
||||||
│ │ ├── client.py # NetBox API client
|
│ │ ├── __init__.py
|
||||||
│ │ └── models.py # Intent data models
|
│ │ ├── client.py # NetBox API client (pynetbox)
|
||||||
│ └── events/
|
│ │ └── models.py # Pydantic models for intent validation
|
||||||
│ ├── handlers.py # Event handlers
|
│ └── yang/
|
||||||
│ └── bus.py # Event bus (Redis)
|
│ ├── __init__.py
|
||||||
|
│ ├── mapper.py # NetBox intent → YANG paths
|
||||||
|
│ └── paths.py # YANG path definitions
|
||||||
|
│
|
||||||
|
├── scripts/ # Scripts called by Kestra workflows
|
||||||
|
│ ├── get_fabric_intent.py
|
||||||
|
│ ├── diff_engine.py
|
||||||
|
│ └── apply_changes.py
|
||||||
|
│
|
||||||
├── tests/
|
├── tests/
|
||||||
|
│
|
||||||
└── docs/
|
└── docs/
|
||||||
├── architecture.md
|
├── cli-user-guide.md # CLI documentation
|
||||||
├── yang-paths.md # Documented YANG paths
|
├── yang-paths.md # Documented YANG paths
|
||||||
└── netbox-schema.md # ConfigContext schema
|
└── netbox-data-model.md # NetBox schema documentation
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🛠️ Technology Stack
|
## 🛠️ Technology Stack
|
||||||
|
|
||||||
| Component | Technology | Purpose |
|
| Component | Technology | Purpose |
|
||||||
| --------------- | -------------------------- | ------------------------------------ |
|
|-----------|------------|---------|
|
||||||
| Source of Truth | NetBox | Intent definition via ConfigContexts |
|
| Source of Truth | NetBox + BGP Plugin | Intent definition via native models |
|
||||||
|
| Orchestrator | **Kestra** | Declarative workflow orchestration |
|
||||||
| Transport | gNMI | Configuration and telemetry |
|
| Transport | gNMI | Configuration and telemetry |
|
||||||
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
|
| Data Models | YANG (OpenConfig + Arista) | Structured configuration |
|
||||||
| Orchestrator | Python (asyncio) | Reconciliation engine |
|
| Python Library | pygnmi + pynetbox | gNMI/NetBox interactions |
|
||||||
| CLI | Click + Rich | User interface |
|
| CLI | Click + Rich | YANG discovery tools |
|
||||||
| API | FastAPI | Webhook receiver |
|
| Validation | Pydantic v2 | Intent data validation |
|
||||||
| Event Bus | Redis | Async event handling |
|
|
||||||
| Lab | ContainerLab + cEOS | Development environment |
|
| Lab | ContainerLab + cEOS | Development environment |
|
||||||
|
|
||||||
## 🔗 Related Projects
|
## 🔗 Related Projects
|
||||||
@@ -131,9 +163,15 @@ fabric-orchestrator/
|
|||||||
- [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab) - Target fabric topology
|
- [arista-evpn-vxlan-clab](https://gitea.arnodo.fr/Damien/arista-evpn-vxlan-clab) - Target fabric topology
|
||||||
- [projet-vxlan-automation](https://gitea.arnodo.fr/Damien/projet-vxlan-automation) - Previous NetBox RenderConfig work
|
- [projet-vxlan-automation](https://gitea.arnodo.fr/Damien/projet-vxlan-automation) - Previous NetBox RenderConfig work
|
||||||
- [Arista YANG Models](https://github.com/aristanetworks/yang/tree/master/EOS-4.35.0F) - EOS 4.35.0F YANG definitions
|
- [Arista YANG Models](https://github.com/aristanetworks/yang/tree/master/EOS-4.35.0F) - EOS 4.35.0F YANG definitions
|
||||||
|
- [Kestra Documentation](https://kestra.io/docs) - Orchestration platform docs
|
||||||
|
|
||||||
## 📚 References
|
## 📚 References
|
||||||
|
|
||||||
|
### Kestra
|
||||||
|
- [Kestra Documentation](https://kestra.io/docs)
|
||||||
|
- [Kestra Python Plugin](https://kestra.io/plugins/plugin-script-python)
|
||||||
|
- [Kestra Webhook Triggers](https://kestra.io/docs/workflow-components/triggers/webhook-trigger)
|
||||||
|
|
||||||
### YANG / gNMI
|
### YANG / gNMI
|
||||||
- [Arista gNMI Documentation](https://aristanetworks.github.io/openmgmt/configuration/gnmi/)
|
- [Arista gNMI Documentation](https://aristanetworks.github.io/openmgmt/configuration/gnmi/)
|
||||||
- [OpenConfig Models](https://github.com/openconfig/public)
|
- [OpenConfig Models](https://github.com/openconfig/public)
|
||||||
@@ -145,26 +183,95 @@ fabric-orchestrator/
|
|||||||
|
|
||||||
## 🚀 Getting Started
|
## 🚀 Getting Started
|
||||||
|
|
||||||
*Coming in Phase 1*
|
### Prerequisites
|
||||||
|
|
||||||
|
- Docker and Docker Compose
|
||||||
|
- Python 3.12+
|
||||||
|
- `uv` package manager
|
||||||
|
- Access to ContainerLab with cEOS images
|
||||||
|
|
||||||
|
### Quick Start
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the repository
|
# Clone the repository
|
||||||
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
|
git clone https://gitea.arnodo.fr/Damien/fabric-orchestrator.git
|
||||||
cd fabric-orchestrator
|
cd fabric-orchestrator
|
||||||
|
|
||||||
# Install dependencies
|
# Start Kestra
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# Access Kestra UI
|
||||||
|
open http://localhost:8080
|
||||||
|
|
||||||
|
# Install Python dependencies (for CLI tools)
|
||||||
uv sync
|
uv sync
|
||||||
|
|
||||||
# Verify gNMI connectivity to your fabric
|
# Verify gNMI connectivity to your fabric
|
||||||
fabric-orch discover --target leaf1:6030
|
uv run fabric-orch discover capabilities --target leaf1:6030
|
||||||
|
|
||||||
# Generate execution plan
|
# Explore YANG paths
|
||||||
fabric-orch plan
|
uv run fabric-orch discover get --target leaf1:6030 \
|
||||||
|
--path "/interfaces/interface[name=Ethernet1]/state"
|
||||||
|
```
|
||||||
|
|
||||||
# Apply changes
|
### Kestra Workflow Example
|
||||||
fabric-orch apply
|
|
||||||
|
```yaml
|
||||||
|
id: fabric-reconcile
|
||||||
|
namespace: network.fabric
|
||||||
|
description: Reconcile fabric state with NetBox intent
|
||||||
|
|
||||||
|
inputs:
|
||||||
|
- id: device
|
||||||
|
type: STRING
|
||||||
|
required: false
|
||||||
|
- id: auto_apply
|
||||||
|
type: BOOLEAN
|
||||||
|
defaults: false
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- id: get_intent
|
||||||
|
type: io.kestra.plugin.scripts.python.Script
|
||||||
|
containerImage: ghcr.io/damien/fabric-orchestrator:latest
|
||||||
|
script: |
|
||||||
|
from kestra import Kestra
|
||||||
|
from src.netbox import FabricNetBoxClient
|
||||||
|
|
||||||
|
client = FabricNetBoxClient()
|
||||||
|
intent = client.get_fabric_intent()
|
||||||
|
Kestra.outputs({"intent": intent.model_dump()})
|
||||||
|
|
||||||
|
- id: compute_diff
|
||||||
|
type: io.kestra.plugin.scripts.python.Script
|
||||||
|
containerImage: ghcr.io/damien/fabric-orchestrator:latest
|
||||||
|
script: |
|
||||||
|
from kestra import Kestra
|
||||||
|
# Compute diff between intent and current state
|
||||||
|
Kestra.outputs({"changes": changes, "has_changes": len(changes) > 0})
|
||||||
|
|
||||||
|
- id: apply_changes
|
||||||
|
type: io.kestra.plugin.scripts.python.Script
|
||||||
|
runIf: "{{ outputs.compute_diff.vars.has_changes and inputs.auto_apply }}"
|
||||||
|
containerImage: ghcr.io/damien/fabric-orchestrator:latest
|
||||||
|
script: |
|
||||||
|
from src.gnmi import GNMIClient
|
||||||
|
# Apply changes via gNMI Set
|
||||||
|
|
||||||
|
triggers:
|
||||||
|
- id: netbox_webhook
|
||||||
|
type: io.kestra.plugin.core.trigger.Webhook
|
||||||
|
key: "{{ secret('NETBOX_WEBHOOK_KEY') }}"
|
||||||
|
|
||||||
|
- id: schedule
|
||||||
|
type: io.kestra.plugin.core.trigger.Schedule
|
||||||
|
cron: "0 */6 * * *"
|
||||||
|
|
||||||
|
errors:
|
||||||
|
- id: notify_failure
|
||||||
|
type: io.kestra.plugin.notifications.slack.SlackExecution
|
||||||
|
url: "{{ secret('SLACK_WEBHOOK') }}"
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**Status**: 🚧 Active Development - Phase 1
|
**Status**: 🚧 Active Development - Migrating to Kestra orchestration (Phase 4)
|
||||||
|
|||||||
Reference in New Issue
Block a user