Senior-level enterprise networking reference platform — Containerlab multi-site simulation (datacenter spine-leaf with EVPN-VXLAN, enterprise WAN, hybrid AWS connectivity), Nornir/Ansible automation, Batfish + pyATS CI validation, Prometheus observability.
Reference Architecture Notice: Terraform apply is manual-dispatch only — no live AWS account is required or assumed. The Containerlab simulation is fully runnable locally with
make lab-up. See ADR-006.
A production-grade reference platform simulating a multi-site enterprise network for ~5,000 users:
- Data center fabric: 2-spine / 4-leaf topology with eBGP underlay and EVPN-VXLAN overlay
- Enterprise WAN: HQ + Branch routers running OSPF multi-area
- Hybrid cloud: IPSec/BGP VPN from on-premises cloud-edge to AWS Transit Gateway
- Multi-region AWS: Two VPCs per region, cross-region TGW peering
- Automation: Nornir and Ansible scripts for provisioning, validation, change management
- CI validation: Batfish (offline config analysis) + pyATS (live lab assertions)
- Observability: Prometheus + SNMP Exporter + Grafana dashboards
┌─────────────────────────────────────┐
│ AWS Cloud │
│ TGW us-east ◄──peer──► TGW us-west │
│ │ │ │
│ VPC prod-east VPC prod-west │
└──────┬──────────────────────────────-┘
│ Site-to-Site VPN (IKEv2)
┌──────┴──────────────────────────────-┐
│ On-Premises │
│ [cloud-edge ASN 65300] │
│ │ OSPF │
│ [wan-hq ASN 65200] ── [wan-branch] │
│ │ eBGP │
│ [spine1] ── [spine2] ASN 65100 │
│ (EVPN Route Reflectors) │
│ [leaf1][leaf2][leaf3][leaf4] │
│ [host1][host2][host3][host4] │
└──────────────────────────────────────-┘
For detailed diagrams see diagrams/ (Mermaid source) or rendered PNGs.
| Layer | Technology |
|---|---|
| Routing | FRRouting (FRR) v9.1 |
| Overlay | EVPN-VXLAN (RFC 7432) |
| Underlay | eBGP (RFC 7938) |
| WAN | OSPFv2 multi-area |
| IPSec | IKEv2, AES-256-GCM |
| Cloud IaC | Terraform ≥ 1.6 |
| AWS | Transit Gateway, VPC, Site-to-Site VPN |
| Automation | Nornir 3.x + Ansible 9.x |
| Testing | Batfish, pyATS/Genie |
| Observability | Prometheus, Grafana, SNMP Exporter |
| Lab | Containerlab 0.54+ |
Prerequisites: Docker, Containerlab, Python 3.11+, make
git clone https://github.com/Kanim21/enterprise-network-platform
cd enterprise-network-platform
# Boot the full lab (spines, leaves, WAN, cloud-edge, hosts)
make lab-up
# Validate BGP convergence
make validate
# Run full test suite
make test
# Start observability stack
cd observability && docker compose up -d
# Tear down
make lab-downThis platform models a 5,000-user multi-site enterprise with:
- DC HQ: Primary compute fabric (EVPN-VXLAN for VM and container workloads)
- Branch office: 500 users on a stub WAN segment; default route via HQ
- AWS us-east-1: Primary cloud workloads (prod VPC + transit VPC)
- AWS us-west-2: DR region, reached via TGW peering
- Hybrid connectivity: Encrypted IPSec VPN with BGP route exchange; on-prem workloads reach AWS and vice versa
| Decision | Trade-off |
|---|---|
| eBGP-only fabric (no OSPF/ISIS in DC) | Simpler, loop-free, scales to 100s of leafs; slightly more config per leaf |
| EVPN over OTV | Standards-based, multi-vendor; requires jumbo frames (MTU 9000) |
| FRR over vendor images | Zero licensing cost, fully open-source; syntax differs from Cisco/Juniper |
| Hub-spoke TGW | Fewer connections, centralized routing; ~1ms additional latency vs VPC peering |
| Batfish in CI | Catches config bugs without live lab; covers control-plane only, not dataplane |
| Failure | Impact | Recovery |
|---|---|---|
| Single spine down | 50% bandwidth (ECMP halved); fabric remains reachable | BFD detects in <1s; runbook: bgp-session-flap |
| Single leaf down | Hosts on that leaf unreachable | Restart container; re-provision config |
| WAN-HQ down | Branch + cloud isolated from DC | Failover to secondary WAN path (if deployed) |
| Cloud-edge down | Hybrid VPN down; DC continues | Restart strongSwan; runbook: hybrid-vpn-down |
| AWS AZ failure | NAT GW + EC2 fail over via ASG; TGW is AZ-resilient | AWS-managed HA |
- More leafs: Add leaf config + Nornir inventory entry; spines auto-accept new eBGP peers
- More VLANs:
change_vlan.py --commitor Ansiblevlanrole - More AWS regions: New
terraform/environments/<region>/directory; add TGW peering - More branch sites: Add WAN router to OSPF area 1 (stub); no core changes needed
- High-Level Design
- Low-Level Design — IP tables, VLAN matrix, ASN plan, IPSec params
- Architecture Decision Records
- Runbooks
See CONTRIBUTING.md. Security issues: see SECURITY.md.