Data Center Fabric Design and Migration: Spine-Leaf, VXLAN EVPN, ACI

Primary data center engineering references include RFC 7432 (BGP MPLS-Based Ethernet VPN / EVPN), Cisco Nexus 9000 VXLAN Fabric with BGP EVPN, and Arista EOS Data Center reference architectures.

Multi-CCIE engineers design and migrate data center fabrics across Cisco Nexus 9000, Arista 7000-series, Juniper QFX, and NVIDIA Spectrum — every engagement a fixed-fee SOW, not T&M.

WiFi Hotshots is a vendor-agnostic enterprise network engineering firm serving enterprise customers, data center architects, fabric engineers, and infrastructure buyers across Southern California and the broader US market.

Ekahau ECSE — Certified Survey Engineer on every engagement

Multi-CCIE engineering bench

Fixed-fee SOW — no T&M surprises

25 years of enterprise networking leadership

A WiFi Hotshots data center fabric engagement starts with a spine-leaf reference design, a VXLAN EVPN control-plane specification, and a staged cutover plan — every deliverable anchored to a fixed-fee SOW, not hourly billing. We design and migrate fabrics across Cisco Nexus 9000 with NX-OS 10.4 or ACI 6.0, Arista 7000-series with EOS 4.32 and CloudVision, Juniper QFX with Apstra 5.0 intent-based closed-loop automation, and NVIDIA Spectrum-4 Cumulus Linux 5.10 for AI training clusters. See the enterprise services overview, our network security architecture for microsegmentation integration, or the AI-ready infrastructure page for GPU training fabric designs. Send us the switch inventory and rack elevations to start a scope call.

Spine-Leaf Architecture: Why It Replaced 3-Tier

The collapsed 3-tier design — core, aggregation, access — was built for north-south client-to-server traffic patterns that no longer describe a modern data center. Virtualization, container orchestration, distributed storage, microservices, and AI training shifted traffic east-west: server-to-server within the same data center exceeds 70% of total fabric bytes in most enterprise environments. A 3-tier design forces that east-west traffic up to an aggregation layer and back down, adding latency and creating oversubscription chokepoints at the aggregation-to-core uplinks.

Spine-leaf flattens the topology: every leaf connects to every spine, every server is exactly two hops from every other server, and the fabric scales by adding leaf pairs without re-architecting the core. Oversubscription is designed in at the leaf uplink ratio — typically 1:3 for general enterprise workloads, tightening to 1:1 non-blocking for AI training and high-performance storage fabrics where tail-latency matters more than port cost.

Leaf Platform Selection: 10G, 25G, 100G, and 400G Access

Leaf switch selection is driven by server NIC speed, oversubscription target, and whether the fabric is greenfield or a brownfield replacement. A Cisco Nexus N9K-C93180YC-FX3 provides 48x 10/25G SFP28 server-facing ports and 6x 40/100G QSFP28 uplinks — a 1:3 oversubscription leaf for a typical 25G server population. Arista 7050X4 and 7060X5 lines cover the same band with lower per-port latency in cut-through mode.

For 100G-native server connectivity — GPU servers, distributed storage nodes, NVMe-over-Fabrics targets — Cisco N9K-C9364E-SG2 delivers 64x 100/400/800G ports in a single 2RU chassis, and Arista 7800R3 and Juniper QFX5220 offer equivalent density. Leaf-to-spine uplinks are 400G QSFP-DD on 2024-era greenfield builds, with 800G OSFP-XD showing up in hyperscale and AI-first builds where LPO (Linear Pluggable Optics) power savings become measurable at scale.

Spine Platform Selection: Non-Blocking Backplane Capacity

Spine switches are pure transit — no server-facing ports, no policy enforcement, no local VXLAN termination in most designs. The spine job is to provide equal-cost multipath (ECMP) reachability between every pair of leafs at line rate. A Cisco Nexus 9364E-SG2 spine with 64x 400G ports supports up to 128 leaf switches in a single-pod fabric before a second pod is required. Arista 7800R3 fixed and modular spines scale past that density for hyperscale deployments.

The spine selection decision is almost always about port count at the target uplink speed, not about feature set — the routing and policy live at the leaf VTEP, and the spine is intentionally dumb. For fabrics anticipating a 400G-to-800G leaf-uplink migration in the next 36 months, the spine must be spec’d for the future speed on day one — replacing spines is a fabric-wide event, not a rolling upgrade.

  • Oversubscription target: 1:3 for general enterprise (48x 25G down / 4x 100G up), 1:1 non-blocking for AI/HPC
  • Leaf uplink speed: 100G QSFP28 for 2020-era refreshes, 400G QSFP-DD for 2024-onward greenfield, 800G OSFP-XD for AI-first builds
  • Spine port count determines max leaf count — scope for 36-month growth on day one, not at the 80% mark

VXLAN EVPN Design: Control Plane, Data Plane, and Route Types

VXLAN EVPN is the current industry-standard overlay for spine-leaf data center fabrics. VXLAN (RFC 7348) provides the data-plane encapsulation — a 24-bit VNI (VXLAN Network Identifier) allows 16 million Layer 2 segments compared to the 4,094 usable VLANs in a 12-bit 802.1Q tag. MP-BGP EVPN (RFC 7432 for MPLS EVPN, RFC 8365 for VXLAN EVPN) replaces flood-and-learn with a proper routing control plane, so MAC address learning, ARP/ND suppression, and multi-homing election happen via BGP updates instead of broadcast traffic.

The practical outcome: Layer 2 segments extend across a routed underlay without creating a spanning-tree domain that spans the fabric, and server mobility (MAC/IP moves during VM migration or container re-scheduling) is announced as a BGP update with a sequence number rather than discovered by a MAC table ager timeout.

EVPN Route Types That Matter in Production

Five EVPN route types dominate day-two fabric operation. Type-2 (MAC/IP advertisement) carries individual host MAC addresses with their associated IP and VNI — the dominant route type on any fabric with endpoint mobility. Type-3 (Inclusive Multicast Ethernet Tag) handles BUM (broadcast, unknown unicast, multicast) replication either via Ingress Replication (IR, the default on most enterprise fabrics) or an underlay multicast tree (PIM-SM or BIDIR-PIM, preferred at hyperscale).

Type-5 (IP Prefix) announces routed prefixes between VRFs and between fabrics — essential for DCI, for inter-VRF routing, and for summarization from leafs up to border leafs. Type-4 (Ethernet Segment) elects a designated forwarder for ESI-LAG multi-homed servers. Type-1 (Ethernet Auto-Discovery) supports ESI fast convergence. An engineer reading your fabric’s BGP table and seeing only Type-2 routes is looking at a fabric with no Layer 3 extensibility — every inter-subnet routing decision is being made at a single tenant-edge device, which is usually not the design intent.

Anycast Gateway and Distributed Routing

An anycast gateway places the same default-gateway IP and MAC on every leaf that hosts the subnet. A server’s first-hop routing decision terminates at the local leaf VTEP — traffic leaves the subnet on the leaf it ingressed, not at a centralized gateway. That eliminates the traffic-tromboning problem of centralized Layer 3 gateways, where an east-west flow between two servers in different subnets on the same leaf would have to traverse the fabric to reach the gateway and back.

Cisco, Arista, Juniper, and NVIDIA all implement anycast gateway under VXLAN EVPN — the configuration details differ but the behavior is standardized. Pair that with distributed ARP/ND suppression (the leaf answers ARP locally from BGP EVPN Type-2 data rather than flooding the request across the fabric) and BUM traffic drops to the vanishing point in a correctly designed fabric.

Ingress Replication vs. Multicast Underlay

BUM traffic in a VXLAN fabric needs a replication mechanism. Ingress Replication (IR) is the default on most enterprise fabrics: the ingress leaf VTEP replicates a BUM frame to every other leaf VTEP that has a member of the VNI, sending unicast VXLAN-encapsulated copies across the underlay. IR is operationally simple — no multicast routing configuration on the underlay — but it scales as O(n) per BUM frame where n is the leaf count with members of that VNI.

A multicast underlay (PIM-SM with RPs on the spine, or BIDIR-PIM for simpler state) uses a single multicast tree per VNI and replicates in the fabric, dropping BUM cost to O(1) at the ingress leaf. The decision: stay on IR for fabrics under 64 leafs with low BUM volume, move to multicast for hyperscale or for VDI/PXE boot environments where BUM is a non-trivial fraction of traffic.

Cisco ACI vs. NX-OS EVPN: Choosing the Right Data Center Policy Model

Cisco offers two production spine-leaf options on the same N9K hardware: ACI (Application Centric Infrastructure) with APIC 6.0 as the controller and EPG-contract policy model, or NX-OS 10.4 with VXLAN EVPN and Nexus Dashboard Fabric Controller (NDFC, the DCNM successor) as the fabric management layer. Both build the same spine-leaf underlay and VXLAN EVPN overlay.

The distinction is policy: ACI models applications as EPGs (Endpoint Groups) with contracts defining permitted traffic — a declarative whitelist model enforced by the APIC as the policy authority. NX-OS EVPN leaves policy at the leaf (ACL, VRF, route-map) and manages the fabric with NDFC as a configuration orchestrator rather than a policy authority. The choice rarely comes down to capability — both can build a secure, scalable fabric — and almost always comes down to operating model.

When ACI Is the Right Choice

ACI is the stronger fit when the operations team thinks in applications rather than subnets, when microsegmentation is a day-one design requirement rather than a day-three retrofit, and when the security and network teams agree the APIC will be the policy source of truth.

A multi-tenant enterprise fabric with strict application-to-application isolation — a common requirement in regulated verticals like financial services trading floors or academic medical centers — is often cleaner in ACI because the EPG/contract model makes the policy graph explicit and auditable. ACI also supports multi-site (APIC cluster federation across data centers with EVPN multisite gateways on the border leafs) as a first-class design, and the integration with VMware NSX-T 4.2 distributed firewall for L7 policy extends the EPG model into the hypervisor.

When NX-OS EVPN Is the Right Choice

NX-OS EVPN is the stronger fit when the operations team already thinks in routing, VRFs, and ACLs; when the fabric is one of many Cisco-adjacent systems and fabric-level abstraction adds overhead rather than clarity; and when the team wants vendor-interoperable EVPN (NX-OS EVPN peers cleanly with Arista EOS and Juniper Junos EVPN, while ACI’s northbound is a closed Cisco ecosystem at the APIC boundary).

A simpler enterprise data center with a unified ops team running Ansible against network devices with YANG/NETCONF models will often find NX-OS EVPN with NDFC less opinionated — closer to the way the team already works. The vendor-agnostic equivalent of this stance — Arista EOS with CloudVision, Juniper QFX with Apstra — is a valid design path when the customer has no Cisco-preference mandate.

Vendor-Agnostic Fabric Design: Cisco, Arista, Juniper, NVIDIA

WiFi Hotshots is vendor-agnostic by charter — we design and migrate fabrics on the platform that fits the customer’s existing operations stack, the growth roadmap, and the procurement constraints, not the platform our quota depends on because we have none. All four of the 2025 Gartner Magic Quadrant Leaders for Data Center Switching — Cisco, Arista, Juniper, and Huawei (published March 31, 2025) — are in production in WFHS engagements, plus NVIDIA Spectrum-X / InfiniBand fabrics for AI workloads (NVIDIA is not in the Gartner MQ but is the reference AI-network platform). The design methodology is consistent across vendors: spine-leaf underlay with OSPF or eBGP, VXLAN EVPN overlay, anycast gateway, per-tenant VRFs, microsegmentation at the leaf, and a separate out-of-band management fabric.

Cisco Nexus 9000 with NX-OS or ACI

The Nexus 9000 line covers the full range from 1G/10G edge leaf to 800G spine. The N9K-C93180YC-FX3 is the workhorse 48x 25G leaf. The N9K-C9348D-GX2A delivers 48x 400G for spine and high-density leaf roles. The N9K-C9364E-SG2 is the 64x 800G platform for 2024-onward spine builds and AI aggregation. NX-OS 10.4 is the current mainline release; ACI 6.0 is the current controller-led release. Nexus Dashboard Fabric Controller (NDFC) replaces DCNM as the NX-OS fabric management layer — underlay provisioning, overlay template orchestration, day-two compliance and drift reporting.

Arista EOS with CloudVision

Arista 7280R3 (deep-buffer DCI and border leaf), 7800R3 (modular spine and AI), 7050X4 (fixed spine/leaf), and 7060X5 (high-density 100G/400G leaf) cover the equivalent range to the Nexus 9000 line. EOS 4.32 is the current mainline release. Arista’s value proposition concentrates on operational consistency — the same EOS image, the same CLI, the same streaming telemetry model across the entire platform line — plus CloudVision 2024.2 for fabric orchestration, network change management, and telemetry aggregation. Arista is the default choice in a non-trivial number of hyperscale and financial services fabrics; MLAG is the Arista equivalent of Cisco vPC for dual-homed server connectivity.

Juniper QFX with Apstra

Juniper QFX5120 (25G leaf), QFX5130 (100G leaf), and QFX5220 (400G spine) running Junos form the Juniper spine-leaf building blocks. Apstra 5.0 is Juniper’s intent-based fabric management platform — the key differentiator vs. Cisco NDFC and Arista CloudVision is Apstra’s closed-loop intent verification model, where the fabric state is continuously validated against the intent graph and drift is flagged as a policy violation rather than a quiet configuration diff. Apstra supports multi-vendor fabrics (Juniper, Cisco, Arista, SONiC) as a design choice, not as an afterthought. ESI-LAG is the Juniper EVPN multi-homing mechanism — the standards-track equivalent of Cisco vPC and Arista MLAG.

NVIDIA Spectrum-4 with Cumulus Linux

NVIDIA Spectrum-4 (SN5600 family) with Cumulus Linux 5.10 is the emerging platform of choice for AI training fabrics. The Spectrum-4 ASIC is optimized for RoCEv2 with aggressive ECN/DCQCN feedback loops and lossless Ethernet (PFC 802.1Qbb) tuned for GPU-to-GPU collective operations over InfiniBand-equivalent Ethernet. The SN5600 delivers 64x 800G OSFP ports at 51.2 Tbps aggregate — the current non-blocking spine for 16,000-GPU training clusters. Cumulus Linux runs BGP EVPN natively, integrates with NVIDIA NetQ for AI-fabric-specific telemetry, and is the preferred stack when the customer already operates a Linux-native network tooling pipeline (Ansible, Terraform, Prometheus) rather than vendor-specific orchestration.

Switch inventory, rack elevations, and server NIC counts are all we need to scope a fabric design — most engagements are quoted on a fixed-fee SOW within three business days of a 30-60 minute scoping call.

Oversubscription Decisions: 1:3 vs. 1:1 Non-Blocking

Oversubscription is the ratio of server-facing bandwidth to uplink bandwidth at each leaf, and it is the single largest cost-per-port driver in a spine-leaf build. A 48x 25G leaf with 6x 100G uplinks is 1:2 (48 * 25 = 1200G down, 6 * 100 = 600G up). Change the uplinks to 4x 100G and the ratio becomes 1:3 (1200/400). Convert the uplinks to 4x 400G and the ratio becomes 1:0.75 — over-provisioned on uplink. The right ratio is workload-dependent and application-specific, and it should be computed from measured east-west traffic — not defaulted to a vendor reference architecture.

General Enterprise: 1:3 Is the Working Default

A typical enterprise VM and container fabric — mixed application workloads, VDI, general-purpose storage over iSCSI or NFS — rarely sustains more than 30-35% average utilization on its server-facing ports. 1:3 oversubscription gives enough headroom for 99th-percentile bursts, keeps the spine port count manageable, and saves significant capex on 100G and 400G optics. Design the fabric with port-level telemetry exposed to Prometheus or the vendor equivalent (Arista EOS streaming telemetry, Cisco Model-Driven Telemetry over gRPC) so the ratio can be revisited on real data after 90 days in production, not on a refresh cycle three years later.

AI Training and HPC: 1:1 Non-Blocking Is Non-Negotiable

GPU-to-GPU collective operations (all-reduce, all-gather, broadcast) in distributed training workloads generate sustained, synchronized, east-west bursts that can saturate every uplink on every leaf simultaneously for milliseconds at a time. Oversubscription under those conditions creates tail-latency events that stall the slowest GPU and waste every other GPU in the job for the duration of the stall.

AI training fabrics are designed 1:1 non-blocking — every server-facing 400G port has 400G of dedicated uplink capacity — and the fabric is tuned with RoCEv2, PFC 802.1Qbb, and ECN/DCQCN feedback loops so that congestion signals propagate to the NIC before packets drop. Storage fabrics for NVMe-over-Fabrics follow the same 1:1 non-blocking discipline for the same reason.

Microsegmentation: Fabric-Native, Hypervisor, and Agent-Based

Microsegmentation moves the enforcement boundary from the fabric perimeter (north-south firewall) to inside the fabric (east-west policy). A flat VLAN with a perimeter firewall protects the data center from the outside world and does essentially nothing about an attacker who has already compromised a server inside the fabric — the lateral-movement problem. The 2024-era enterprise answer is layered: fabric-native segmentation at the leaf VTEP for coarse tenant boundaries, hypervisor distributed firewall for VM-to-VM policy, and agent-based segmentation for workload-identity-aware policy that follows the workload across environments.

Fabric-Native: ACI EPGs, EVPN VRF, and Leaf ACL

Cisco ACI implements microsegmentation natively via the EPG-and-contract model: any endpoint tagged into an EPG inherits the permitted-contract list, and traffic not matching a contract is dropped at the ingress leaf. NX-OS EVPN and the equivalent Arista and Juniper fabrics implement microsegmentation via per-VRF route separation plus leaf-applied ACLs (typically 500-2,000 ACEs per leaf before performance impact). Fabric-native segmentation is coarse but cheap — no additional licensing, no agent to deploy, policy enforced at the first switch hop.

Hypervisor: VMware NSX-T 4.2 Distributed Firewall

VMware NSX-T 4.2 implements a distributed firewall in the hypervisor kernel — policy is enforced on the vNIC before the frame reaches the physical wire. That moves the enforcement point from the leaf to the host, which is useful when the workload density per leaf exceeds what an ACL table can cleanly represent, when the policy needs to be tied to a VM tag rather than an IP address, and when the same policy needs to apply to workloads that migrate between hosts via vMotion. NSX-T integrates cleanly with ACI for L7 application policy and with any EVPN-VXLAN fabric for L2/L3 overlay carriage.

Agent-Based: Illumio Core and Akamai Guardicore

Illumio Core and Akamai Guardicore Centra deploy lightweight agents on Linux and Windows servers to collect flow data, build a process-to-process dependency map, and enforce policy at the OS-level firewall (iptables, Windows Filtering Platform). Agent-based segmentation is the right tool when policy must follow the workload across bare metal, VM, and container without dependence on the underlying network or hypervisor. It is also the standard answer for east-west policy in brownfield environments where the fabric is not yet microsegmentation-capable and replacing the fabric to add policy is not justifiable on its own. Pair with WFHS network security architecture to integrate segmentation policy with the broader zero-trust design.

Data Center Interconnect: EVPN Multisite, DCI-EVPN, and MPLS L3VPN

Data Center Interconnect (DCI) extends Layer 2 and Layer 3 reachability between data centers — active/active site pairs, primary/DR site pairs, hybrid cloud on-ramps to AWS Direct Connect or Azure ExpressRoute, and colo-to-enterprise extensions.

The dominant 2024-era design is EVPN multisite: each data center is its own VXLAN EVPN fabric with a pair of border leafs acting as multisite gateways, and the border leafs peer across the DCI link as EVPN peers with a route-target rewrite that controls which tenants extend and which stay local. Cisco Nexus 9000 supports EVPN multisite natively in NX-OS 10.4; Arista supports DCI-EVPN with an equivalent border-leaf model; Juniper supports the same pattern via Apstra-managed border-leaf policies.

When Layer 2 Extension Is the Right Answer

Layer 2 extension between sites is justified for cold-migration of legacy applications that depend on IP-address preservation, for active-active clustering protocols that require L2 adjacency (some database heartbeat protocols, some HA appliance pairs), and for VM mobility where the operational team has not yet invested in overlay mobility tooling like NSX-T T0/T1 gateways or equivalent. Layer 2 extension is operationally expensive — BUM traffic traverses the DCI link, a broadcast storm at one site can propagate to the other, and MAC mobility events churn BGP updates across the DCI peering — so the policy should be: extend only the VNIs that require it, route the rest.

MPLS L3VPN and Legacy OTV

For enterprise-to-colo extension without full EVPN peering — typical when the colo fabric is a managed service — MPLS Layer 3 VPN over an MPLS provider WAN remains a valid design, particularly for VRF-isolated traffic between enterprise data centers and co-located cloud on-ramps. Cisco OTV (Overlay Transport Virtualization) was the previous-generation DCI L2 extension answer on Nexus 7000 and ASR 1000 platforms — Cisco has sunsetted OTV for new designs in favor of EVPN multisite, and WFHS migrates OTV-based DCIs to EVPN multisite as part of the data center fabric refresh cycle.

AI Training Fabrics: RoCEv2, PFC, and Lossless Ethernet

AI training fabrics are engineered differently from general-purpose enterprise data center fabrics, and the differences are not subtle. Training workloads use collective operations — all-reduce is the dominant pattern — that synchronize gradient updates across every GPU in the job on every training step. A 4,096-GPU training run generates a synchronized east-west burst every few milliseconds for the entire duration of the training job, which may run for weeks. Under those conditions, the fabric must deliver line rate with near-zero packet loss (loss triggers TCP retransmit or RoCEv2 NACK, both of which stall the collective and waste every other GPU in the job) and with tightly bounded tail latency.

RoCEv2 and PFC 802.1Qbb

RoCEv2 (RDMA over Converged Ethernet v2) runs RDMA traffic over UDP/IP on a lossless Ethernet fabric. Lossless behavior is provided by Priority Flow Control (PFC, 802.1Qbb) — the downstream switch sends a PAUSE frame to the upstream switch or NIC for a specific traffic class when its buffer fills, and the upstream pauses transmission of that class until the PAUSE expires.

PFC prevents packet loss but can cascade into head-of-line blocking and PFC storms if not paired with ECN feedback. ECN/DCQCN (Explicit Congestion Notification with DCQCN rate-control) marks packets as experiencing congestion before the buffer overflows, and the receiving NIC signals the sender to slow down — congestion is resolved proactively rather than by dropping packets or pausing the link.

Platform Options for AI Fabrics

NVIDIA Spectrum-4 SN5600 is the current reference platform for Ethernet AI training fabrics — 51.2 Tbps of non-blocking bandwidth per 1RU, optimized RoCEv2 ASIC behavior, and deep integration with NVIDIA NetQ telemetry and BlueField-3 DPUs at the server NIC. Arista 7800R3 and Cisco Nexus 9364E-SG2 are valid alternatives when the customer prefers a single-vendor relationship across general enterprise and AI fabrics.

The design discipline is the same across platforms: 1:1 non-blocking oversubscription, RoCEv2 and PFC tuned to the NIC vendor’s recommendation, ECN enabled with DCQCN parameters matched to the GPU cluster size, separate traffic classes for RoCEv2 vs. storage vs. management, and careful buffer tuning on every leaf and spine. See the AI-ready infrastructure page for cabling, power, cooling, and network-to-GPU integration detail.

Fabric Telemetry, Observability, and Legacy Migration

A modern spine-leaf fabric is instrumented by design, not after-the-fact. Streaming telemetry — Cisco Model-Driven Telemetry over gRPC, Arista EOS streaming telemetry to CloudVision, Juniper Junos Telemetry Interface, NVIDIA NetQ — replaces SNMP polling with push-based sub-second metrics. Prometheus and InfluxDB are the dominant open-source collectors; Splunk, Elastic, and Grafana are the dominant visualization layers. Flow data is captured via sFlow or IPFIX on every leaf and sent to a flow collector for east-west traffic analysis. BGP EVPN route convergence events are logged and time-synced via PTP so that root-cause analysis on a leaf failure has millisecond-accurate correlation data.

Migrating from Legacy 3-Tier or Collapsed Core

Legacy data center fabrics — Catalyst 6500/6800 cores, Nexus 5000/7000 aggregation, 3750/3850 access — are migrated to spine-leaf EVPN-VXLAN with a staged parallel-cutover approach. The new fabric is built alongside the legacy fabric. Layer 2 bridging between fabrics is established at a border leaf or via a temporary L2 trunk. Server VLANs are migrated subnet-by-subnet, with the default gateway moving from the legacy core to the new-fabric anycast gateway on a per-subnet cutover window.

Storage and database segments are migrated last, after the general application workload is stable on the new fabric. Legacy decommissioning happens only after a 30-day bake period on the new fabric with no unexplained packet loss or latency events. This approach keeps the rollback path intact through the entire migration and limits production risk to a single subnet at a time.

Scoping a Data Center Fabric Project

Send the switch inventory, rack elevations, server NIC counts, and current VLAN/VRF list. We return a fixed-fee SOW with a migration sequence, cutover plan, and rollback gates — not a T&M estimate that drifts over 18 months.

Data Center Fabric Credentials and Engagement Model

WiFi Hotshots is an engineer-led, vendor-agnostic network services firm with 25 years of leadership in enterprise networking. We are minority-owned, headquartered in Valencia, CA, with nationwide rollout capability. Every data center fabric engagement is staffed by our multi-CCIE engineering bench with Data Center track specialization, and every engagement is delivered under a fixed-fee SOW rather than hourly T&M billing. Our reference engagements across verticals include a top-tier academic medical center multi-campus fabric refresh, a global tier-1 financial services firm trading-floor DCI design, and multiple national retail distribution-center fabrics. No client is named on this site — proof is referenced only by vertical and scale to preserve VAR and NDA boundaries.

See the about page for the full engineering credential list and leadership bios, the partners page for our vendor authorizations, or the services overview for the complete service portfolio. Wireless engineering, campus LAN, SD-WAN, security architecture, and voice/UC services are delivered by the same engineering bench under the same fixed-fee model.

Data Center Fabric Design FAQs

What is the scope boundary between fabric design and server-to-fabric integration?

WFHS designs and migrates the fabric itself — spine, leaf, underlay routing, VXLAN EVPN overlay, border leaf, DCI, management fabric, and microsegmentation policy. Server-side NIC configuration (VMware vSphere Distributed Switch, Linux bonding, Windows NIC teaming), storage-fabric-specific tuning (iSCSI multipathing, NVMe-oF initiator config), and application-layer firewall policy are coordinated with the server and platform teams but implemented by them.

We will document the fabric-side parameters (VLAN/VNI assignments, MTU, LACP configuration, DCBX on the leaf) in a handoff specification that server engineers can implement against, and we validate end-to-end on a representative server before declaring a leaf pair production-ready.

How does a VXLAN EVPN migration compare to a Cisco ACI deployment for the same enterprise?

Both build a spine-leaf fabric with the same underlying Cisco Nexus 9000 hardware, the same VXLAN data plane, and the same BGP EVPN control plane. The operational difference is where policy lives. ACI puts policy on APIC 6.0 as a central authority with EPG and contract abstractions; day-two operations work through the APIC.

NX-OS EVPN leaves policy on the leaf (ACL, VRF, route-map) and uses Nexus Dashboard Fabric Controller (NDFC) as a configuration orchestrator rather than a policy authority. ACI is stronger for application-centric microsegmentation, multi-tenant isolation, and integrated L7 policy.

NX-OS EVPN is stronger for interoperability with non-Cisco EVPN peers, for teams that prefer routing-centric thinking, and for environments where fabric-level abstraction adds more complexity than it removes.

What is the migration strategy from a legacy Catalyst 6500 core or Nexus 7000 aggregation layer to a spine-leaf EVPN-VXLAN fabric?

Parallel build, staged subnet cutover, and bake period before decommissioning. The new spine-leaf fabric is built alongside the legacy core with its own power, cabling, and management.

A temporary Layer 2 bridge at the border leaf extends each subnet from the legacy fabric to the new fabric, and the default gateway for that subnet moves from the legacy SVI to the new-fabric anycast gateway on a cutover window per subnet.

General application subnets migrate first; storage, database, and backup subnets migrate last after the general workload is stable on the new fabric. A 30-day bake period on the new fabric with no unexplained packet loss or latency events precedes legacy decommissioning. Rollback at any point is a single-subnet revert — the legacy SVI is re-activated and the BGP EVPN route for that subnet is withdrawn.

How is an AI GPU training cluster fabric different from a general enterprise data center fabric?

Three differences dominate the design. First, oversubscription: AI training fabrics are 1:1 non-blocking because collective operations (all-reduce, all-gather) generate synchronized east-west bursts that saturate every uplink simultaneously, and oversubscription creates tail-latency events that stall the entire training job.

Second, the transport: RoCEv2 with PFC 802.1Qbb and ECN/DCQCN is tuned for lossless Ethernet behavior — packet loss triggers RDMA NACK and wastes GPU cycles. Third, platform selection leans toward NVIDIA Spectrum-4 SN5600 (51.2 Tbps per 1RU, optimized RoCEv2 ASIC, NetQ telemetry) or Arista 7800R3 and Cisco Nexus 9364E-SG2 as valid single-vendor alternatives.

The general-enterprise disciplines (BGP EVPN overlay, anycast gateway, microsegmentation) still apply, but the buffer tuning, traffic class separation, and NIC-to-fabric integration are materially more detailed than on a general workload fabric.

When should we use Ingress Replication (IR) versus a multicast underlay for BUM traffic?

Ingress Replication is the correct default for most enterprise fabrics under roughly 64 leafs with low BUM volume — it is operationally simpler (no multicast routing on the underlay) and the O(n) cost per BUM frame is manageable at that scale.

A multicast underlay (PIM-SM with RPs on the spine, or BIDIR-PIM for simpler state) becomes the right answer at hyperscale, in VDI environments where PXE boot and broadcast-heavy profiles generate non-trivial BUM, and in multicast application workloads (market data, video distribution, clustering protocols) where the single multicast tree in the underlay is more efficient than ingress replication in the overlay.

The decision is a trade between operational simplicity and BUM efficiency, and it should be made on measured BUM-as-a-fraction-of-total data, not on vendor reference architectures alone.

What are the standard EVPN route types we should expect to see in a production fabric?

Type-2 (MAC/IP advertisement) is the dominant route type on any fabric with endpoint mobility — every learned host MAC, every learned IP, every VNI is announced via a Type-2. Type-3 (Inclusive Multicast Ethernet Tag) sets up BUM replication per VNI, either as Ingress Replication or as a reference to a multicast underlay group.

Type-5 (IP Prefix) carries routed prefixes between VRFs and between fabrics — essential for DCI, for inter-VRF route leaking, and for summarization from leafs to border leafs. Type-4 (Ethernet Segment) elects a designated forwarder for ESI-LAG multi-homed servers, and Type-1 (Ethernet Auto-Discovery) supports fast convergence on ESI failure.

A fabric BGP table dominated by Type-2 with no Type-5 is a fabric where Layer 3 extensibility lives at a single tenant-edge device — usually a design gap rather than a design intent, and usually worth revisiting before the next fabric expansion.

How should DCI be designed between two enterprise data centers in 2024?

EVPN multisite is the default answer on new designs. Each data center is its own VXLAN EVPN fabric. A pair of border leafs at each site acts as the multisite gateway, peering with the other site’s border leafs across the DCI link as EVPN peers with a route-target rewrite.

That architecture controls which tenants extend across sites and which stay local on a per-VRF and per-VNI basis — the policy is explicit rather than implicit. Cisco Nexus 9000 supports EVPN multisite natively in NX-OS 10.4; Arista and Juniper support the equivalent border-leaf model under their respective orchestration.

For enterprise-to-colo extensions without full EVPN peering (typical when the colo is a managed service), MPLS Layer 3 VPN over a provider WAN is a valid design. Cisco OTV is sunsetted for new builds and is a migration target rather than a deployment target.

What is the right microsegmentation approach for a brownfield fabric that was not designed for it?

Layered, starting with what can be deployed without touching the fabric. Agent-based segmentation (Illumio Core or Akamai Guardicore Centra) is the standard brownfield answer because the agent runs on the server OS and enforces policy at the iptables or Windows Filtering Platform layer — no dependency on the fabric being microsegmentation-capable.

That gets a working policy graph in place while the longer-horizon fabric refresh is planned. If the brownfield environment is heavily virtualized, VMware NSX-T 4.2 distributed firewall is the alternate entry point — policy in the hypervisor, no fabric change, follows the workload across vMotion.

Fabric-native segmentation (ACI EPGs, EVPN VRF with leaf ACL) comes in with the next fabric refresh, and the pre-built policy graph from the agent or hypervisor layer becomes the reference for the fabric-native contract library.

In data center network design, what byte overhead does VXLAN add per tenant frame, and what underlay MTU do we plan for?

VXLAN adds 50 bytes with an IPv4 outer header and 70 bytes with an IPv6 outer header — 14 bytes outer Ethernet, 20 or 40 bytes outer IP, 8 bytes UDP, and 8 bytes VXLAN header per RFC 7348. Because every tenant frame gets re-encapsulated at the ingress VTEP, the underlay MTU must exceed the tenant payload MTU by at least the applicable overhead.

That is why jumbo frames at 9216 bytes are the de facto standard on EVPN-VXLAN underlays — it leaves room for the encapsulation plus safety margin for nested tagging or IPv6 extension headers.

Our data center fabric engineers validate MTU end-to-end before any VXLAN cutover — a 1500-byte underlay drops encapsulated frames silently and the failure mode looks like random packet loss.

In data center network design, which UDP destination port is standardized for VXLAN, and how do we handle legacy 8472 deployments?

IANA assigned UDP port 4789 as the VXLAN destination port per RFC 7348, and that value should be used by default on every greenfield fabric. Early Linux kernel implementations and some pre-standard deployments used UDP 8472, which persists in a handful of brownfields where the original VTEP was built before the RFC finalized.

Greenfield and vendor-interop designs run 4789 without exception; brownfield integrations should be audited for port mismatch before any flows are cut over, because a 4789-to-8472 VTEP pair will silently drop encapsulated frames and log only generic forwarding counters.

Spec the destination port explicitly in the fixed-fee SOW so the cutover runbook flags any non-4789 VTEP discovered during cable-over.

What does the 24-bit VXLAN VNI give us that a 12-bit 802.1Q VLAN ID cannot?

RFC 7348 defines the VXLAN Network Identifier as a 24-bit value, yielding roughly 16 million distinct overlay segments versus the 4,094 usable VLANs in 802.1Q. That scale ceiling is what makes EVPN-VXLAN viable for hyperscale multi-tenancy, IoT fleet isolation, and merged-entity fabrics where VLAN-ID collisions are guaranteed.

The 24-bit VNI is carried in the VXLAN header between VTEPs; from the tenant’s perspective the segment is still a normal bridge domain, which means legacy 802.1Q-tagged workloads stitch into a VNI without application changes.

At scale this is the single number that decouples data-center segmentation from the legacy VLAN ceiling.

What are the five EVPN route types our fabric actually carries, and what does each do?

RFC 7432 defines Types 1 through 4, and RFC 8365 extends to Type 5. Type 1 is Ethernet Auto-Discovery — per-ES and per-EVI routes that signal all-active multihoming and enable mass-withdrawal on link failure. Type 2 is MAC/IP Advertisement, the workhorse that carries host learning and enables ARP suppression. Type 3 is Inclusive Multicast Ethernet Tag, which advertises L2 domain membership and builds the BUM replication list.

Type 4 is Ethernet Segment, which drives Designated Forwarder election across multihoming peers.

Type 5 is IP Prefix, used for inter-VRF prefix advertisement and subnet summarization between L3 VNIs.

All five types appear in production Arista, Cisco Nexus 9000, and Juniper QFX fabrics. Diagnosing a silent-host or stuck-DF problem always starts with dumping Type-2 and Type-4 routes from a route reflector.

How does ESI length and DF election work for all-active EVPN multihoming?

RFC 7432 defines the Ethernet Segment Identifier as a 10-octet integer carried in Type-4 routes. Every PE attached to the same multihomed CE uses the same ESI, which is how the fabric identifies peers that share a segment.

Default Designated Forwarder election uses service carving: each PE builds an ordered IP-address list of all segment members, then applies (V mod N) equals i, where V is the VLAN, N is the PE count, and i is the PE’s ordinal position.

That formula deterministically spreads BUM forwarding responsibility across all segment members, which is why an all-active ESI with three PEs does not bottleneck BUM on a single forwarder.

Symmetric or asymmetric IRB — which do we deploy on EVPN-VXLAN fabrics?

Symmetric IRB by default, asymmetric only when tenant route counts are genuinely small. Per RFC 9135, symmetric IRB does matching MAC and IP lookups on both ingress and egress PEs and requires an L3 VNI per tenant VRF;

in exchange it saves ARP and bridge-table memory at scale because each PE only holds routes for locally attached hosts. Asymmetric IRB requires every PE to hold ARP entries for every remote host on every tenant, which breaks at a few thousand endpoints.

Cisco Nexus 9000, Arista, and Juniper EVPN fabrics all default to symmetric — it is the right answer for any fabric that will grow.

Our data center security architects review the L3 VNI assignment against the tenant VRF plan before the fabric is built.

Why do webscale Clos fabrics run eBGP as the underlay instead of OSPF or IS-IS?

RFC 7938 spells out the reasoning. eBGP assigns a unique ASN per leaf and a shared ASN across the spine tier, which exploits BGP’s path-vector scope — failure events propagate only where they affect reachability rather than flooding an entire IGP area. Multipath-relax and multipath-multiple-AS options engage every parallel leaf-spine uplink for ECMP, so traffic spreads across all available paths rather than hashing to a single next-hop.

The result is a fabric that scales to thousands of leafs without LSA-flooding storms or area boundaries, and where any single link or spine failure produces a bounded, local reconvergence rather than a fabric-wide event.

What does Arista MLAG deliver that EVPN multihoming does not, and when is each appropriate?

Arista MLAG is a two-chassis LACP dual-homing scheme — peer-link port-channel, shared domain-id, and heartbeat keep-alive with a 4000 ms default interval. The peer is declared dead after 30 seconds of missed heartbeats, at which point the surviving chassis takes over the LAG. MLAG is Layer-2 and chassis-pair limited;

it cannot scale past two switches. EVPN multihoming using Type-1 and Type-4 routes scales to many leafs and does all-active forwarding across any number of segment members.

Use MLAG for legacy L2 server dual-homing or when a classic pair-at-the-top is the operational model; use EVPN multihoming for multi-leaf all-active designs.

Our engineers scope the choice against the server team’s NIC-bonding posture before the fixed-fee SOW is cut.

What is Arista VARP, and how does it differ from HSRP, VRRP, or EVPN Distributed Anycast Gateway?

VARP lets multiple Arista switches answer ARP for the same virtual IP and MAC, providing active-active first-hop routing without a master or standby election — effectively a static anycast gateway. HSRP and VRRP are active/standby protocols with preemption timers, where only the master forwards on behalf of the virtual IP.

EVPN Distributed Anycast Gateway does what VARP does but at fabric scale, using EVPN Type-2 advertisements so every VTEP across the fabric shares the same gateway IP and MAC.

Picking between VARP and DAG is usually a fabric-scope question: VARP inside an MLAG pair, DAG across a spine-leaf fabric.

When does a design require Arista 7800R3 deep-buffer spines instead of shallow-buffer 7060X6?

The 7800R3 uses a deep-buffer Virtual Output Queue architecture — 7800R3-36P line cards carry 24 GB of buffer per card, and the flagship 7816R3 has 384 GB of system buffer. FIB scale on the L3-XXXL profile reaches 3,950k IPv4 routes, 384k MAC, and 112k ARP.

The shallow-buffer 7060X6 is the opposite trade-off: 51.2 Tbps of 800G OSFP with small on-chip buffers, tuned for AI lossless fabrics that rely on PFC and ECN rather than on-switch buffering.

Pick 7800R3 for traditional DC fabrics with bursty north-south or incast patterns; pick 7060X6 for GPU training clusters running RoCE.

What Arista platform do we spec for an 800G AI training leaf, and what capacity does it deliver?

Arista 7060X6 is the current 800G AI leaf — positioned on Arista’s platforms page as a best-of-breed 800G solution optimized for AI workloads, with 32 to 64 ports of 800G OSFP800. Pair it with the 7800R4 spine, which delivers up to 460 Tbps of throughput and 576 ports of 800G or 1152 ports of 400G per Arista’s AI networking documentation.

For very large training clusters, the 7700R4 Distributed Etherlink Switch scales to more than 30,000 400GbE accelerators in a single fabric.

The right split between 7060X6, 7800R4, and 7700R4 depends on the GPU count and the rail topology the cluster operator has chosen.

Our AI-ready infrastructure team sizes the leaf-spine ratio against the collective-traffic profile.

How does Arista Etherlink position against proprietary InfiniBand for RDMA and RoCE AI fabrics?

Arista Etherlink is standards-based Ethernet engineered for AI networking. The stack provides RDMA-aware QoS and load-balancing capabilities that ensure reliable packet delivery to NICs supporting RoCE, AI Analyzer with workload and NIC integration for end-to-end visibility, AVA machine-learning for anomaly detection, and forward compatibility with Ultra Ethernet Consortium specifications as they finalize.

The operational argument against InfiniBand is simple: Etherlink reuses the same EOS toolchain, the same CloudVision telemetry, and the same vendor-agnostic Ethernet cable plant that the rest of the data center already runs.

For organizations that do not want a parallel InfiniBand silo with its own OpEx, Etherlink is the path to keep AI training on a single fabric family.

Does our DCI link support IEEE 802.1AE MACsec end-to-end, and at what cipher strength?

Yes, on platforms that support it. IEEE 802.1AEbn-2011 added GCM-AES-256 as an available cipher suite beyond the original GCM-AES-128 baseline. MACsec peer discovery and authentication use the MACsec Key Agreement Protocol defined in IEEE 802.1X, which is what lets two MACsec endpoints establish mutually authenticated sessions without a pre-shared secret exchange in the clear.

On the Arista side, the 7280R3A, R3AM, and R3AK variants provide AES-256-GCM MACsec on applicable port types while keeping FlexRoute FIB scale intact, which is what you need for an encrypted DCI that also carries full internet tables.

What is the Arista 7388X5, and when is a 25.6 Tbps single-ASIC chassis preferable to a multi-ASIC build?

The 7388X5 is a 4U modular chassis built on a single 25.6 Tbps packet processor — 64 ports of 400G QSFP-DD or 128 ports of 200G QSFP56, 10.6 billion packets-per-second forwarding, 825 ns port-to-port latency, 114 MB of buffer, and under 10 W typical per 200G port. Use the 7388X5 for hyperscale or AI fabrics that value single-ASIC cut-through latency over the deep-buffer profile of a 7800R3.

Traffic that never leaves the single ASIC avoids the internal-fabric hop of a multi-ASIC chassis, which matters for latency-sensitive collectives and low-jitter workloads.

If bursty incast is the dominant pattern, a deep-buffer 7800R3 is still the right spine.

Where does the Arista 7050X3 sit in a spine-leaf build, and what are its real scale limits?

The 7050X3 is the fixed 1U general-purpose leaf option — up to 32 ports of 100G or 128 ports of 10/25G, 48xSFP25 through 96x25G SFP variants, port-to-port latency of 800 ns on most models (one variant at 3 microseconds), a fully shared 32 MB packet buffer tuned for lossless networks, MAC table up to 288k, and 64-way MLAG.

It is the right leaf for general-purpose 25G server access, enterprise DC tenants, and sites that do not need AI-specific shallow-buffer 800G leafs.

It is not the right leaf for GPU training clusters or for high-radix 100/400G aggregation — those workloads belong on 7060X6 or 7388X5.

Why do EVPN multihoming and fast convergence need BGP unnumbered in Cumulus and FRR leaf stacks?

Cumulus Linux uses FRR for BGP and implements RFC 5549 unnumbered: peers exchange IPv4 routes with IPv6 link-local next hops, which eliminates per-interface /30 or /31 addressing across hundreds of fabric links. That removes an entire class of addressing errors at scale and makes Auto-BGP viable for Clos designs, since Cumulus Auto-BGP auto-generates 32-bit ASNs for two-tier leaf-spine topologies without operator intervention.

Cumulus ships ECMP by default on the data plane, so the combination of unnumbered peering plus default ECMP means a greenfield Cumulus fabric reaches all-active leaf-spine forwarding with a minimum of bespoke configuration.

For brownfield moves off traditional IGP underlays this is a large operational simplification.

What are the three Arista EVPN service models, and when do we pick each?

Per Arista EOS there are three. VLAN-Based is one-to-one VLAN-to-MAC-VRF — granular route targets, finest policy control, largest route-table footprint. VLAN Bundle is N-to-one with a single bridge table across all VLANs in the MAC-VRF — smallest route-table footprint but no per-VLAN policy.

VLAN Aware Bundle keeps per-VLAN bridge tables inside a single MAC-VRF, combining bundling efficiency with VLAN awareness — the right choice for most enterprise tenants because it delivers per-VLAN policy without per-VLAN route-target bloat.

Pick VLAN-Based when every VLAN needs independent policy; pick VLAN Bundle for high-density IoT where policy is uniform; pick VLAN Aware Bundle for almost everything else.

What does Arista CloudVision give us that APIC, Apstra, and Nexus Dashboard do not?

EOS is built on a publish-subscribe SysDB that is evolving to NetDB — a streaming state layer that runs as a single binary image across every Arista platform. CloudVision consumes the NetDB stream for telemetry, change control, and compliance across the entire portfolio, with first-party Ansible integration and Arista Validated Designs automation on top.

The operational delta versus APIC, Apstra, or Nexus Dashboard is the single-binary, single-state-model story: the same EOS image runs on a 7050X3 leaf, a 7280R3 DCI router, and a 7800R4 AI spine, and every one of them streams state into the same CloudVision instance.

That consistency is what removes per-platform tooling forks.

Which Juniper QFX do we spec as a 400G data-center spine today?

Juniper QFX5700 is the current 400G spine. Per Juniper’s QFX Series page it supports up to 32 ports of 400GbE or 144 ports of 50/40/25/10GbE and is positioned for data-center fabric spine, EVPN-VXLAN fabric, and data-center interconnect. The QFX5120 is the general-purpose 1/10/25G leaf with 8×40/100G uplinks, and the modern portfolio now groups QFX5240, 5230, 5220, 5210, 5200, 5130, 5120, 5110, and 5700 as the data-center fabric family.

For a greenfield EVPN-VXLAN spine on Juniper, QFX5700 is the right answer until Juniper refreshes the 800G tier.

What is the scale delta between Arista 7280R3 standard and 7280R3K, and when does the K variant matter?

Standard 7280R3 models carry roughly 1,450k IPv4 base routes plus 1,792k FlexRoute routes. The 7280R3K variant scales to 2,250k base plus 2,048k FlexRoute, approaching around 5 million total routes across profiles. The K variant matters when the box sits at a DCI edge, a peering boundary, or an internet-facing spine where full IPv4 and IPv6 tables plus regional carrier overlap all have to fit in FIB simultaneously.

The 7280DR3A-54 pairs 24 GB deep buffers with 54 ports of 400G QSFP-DD for lossless incast on the same platform family.

Pick 7280R3 for internal DC routing; pick 7280R3K when the FIB budget is a real constraint.

Data Center — Further Reading

Adjacent disciplines that intersect with the data center fabric in any modern enterprise build. Each link below describes how the destination service line interacts specifically with EVPN-VXLAN, microsegmentation, fabric overlays, and DCI workstreams — not with the data center practice in the abstract.

  • Enterprise wireless engineering — the campus AP fabric whose controller, anchor, and policy north-south traffic ultimately lands at the DC services-VRF: per-SSID dynamic VLAN assignment from ISE / ClearPass / Mist that maps a guest, employee, IoT, or clinical Wi-Fi role to a fabric tenant via EVPN Type-5 prefix routes per IETF RFC 9135, plus the controller-cluster east-west sync (CAPWAP, mobility-tunnel, anchor pair) that traverses the EVPN-VXLAN overlay per IETF RFC 7348 as a tenant flow with its own VRF placement and microseg policy distinct from data-plane payloads.
  • Campus LAN refresh — the wired access fabric whose core / distribution uplinks terminate on the DC edge leafs as the DCI handoff: campus aggregation hands traffic into a DC services-VRF with PBR or VRF-leak for shared services using EVPN Type-5 IRB (asymmetric integrated routing and bridging per IETF RFC 9135), the deep-buffer requirement on the DC edge leaf that absorbs incast at the DCI seam without head-of-line blocking, and the leaf-uplink oversubscription budget that determines whether 1:3 (general enterprise) or 1:1 non-blocking (AI / HPC) ratio applies on the campus-to-fabric boundary.
  • SD-WAN fabric design and migration — branch-to-DC traffic that terminates on an SD-WAN head-end behind a DC perimeter VRF: multi-region active / active topologies require EVPN multisite design per IETF RFC 8365 with per-region border gateways advertising Type-2 (host MAC / IP) and Type-5 (prefix) routes selectively to control failover-domain blast radius, plus the IPsec / IKEv2 underlay per IETF RFC 7296 that hands branch-overlay traffic into the DC fabric tenant VRF without flattening the segmentation policy at the head-end interface.
  • Network security architecture — the firewall, NAC, and SASE perimeter stack at the DC north-south edge that feeds policy intent down into the fabric VRFs as Endpoint Security Groups (Cisco ACI ESG), NSX-T Distributed Firewall rules, or Arista MSS-Group tags running over the EVPN-VXLAN overlay per IETF RFC 7348, RFC 7432, and RFC 8365, with MACsec link-layer encryption per IEEE 802.1AE-2018 on inter-fabric DCI links and zero-trust east-west enforcement aligned to NIST SP 800-207.
  • Unified communications migrations — the UC platforms (CUCM 14 / 15 publisher and subscriber clusters, IM&P, Unity Connection, Webex Calling Local Gateway, Teams Phone direct-routing SBCs) that ride as VMs or containers on the DC compute fabric: intra-cluster signaling per IETF RFC 3261, SRTP media flows per IETF RFC 3711, and call-recording media (Dodd-Frank, MiFID II compliance) traverse the fabric VRFs as east-west traffic with their own QoS marking and microsegmentation policy distinct from the rest of the application estate, and SBC HA pair anchoring determines whether voice media stays east-west on the leaf or crosses a tenant boundary.
  • Structured cabling — the DC fabric optical plant the spine-leaf topology consumes: pre-terminated MPO-12 / MPO-16 / MPO-24 trunks per ANSI/TIA-568.3-E, zoned MDA / HDA / EDA / ZDA cabinet architecture per ANSI/TIA-942-C and BICSI 002-2024, OS2 single-mode trunks for 400GBASE-FR4 / 800GBASE-FR4 reach, OM4 / OM5 short-reach for 100GBASE-SR4 and 400GBASE-SR8, and the polarity Method A / B / C decision documented before procurement so transceivers light at link-up rather than failing at TX-to-RX pair flip.
  • AI-ready infrastructure — the GPU training-cluster fabric that consumes the leaf-spine plane as a 1:1 non-blocking RoCEv2 transport per IBTA RoCEv2 Annex A17 with PFC IEEE 802.1Qbb and ECN feedback so congestion signals propagate to the BlueField / ConnectX NIC before packets drop, rail-optimized topologies for NCCL all-reduce and all-gather collectives, in-network SHARP collective acceleration where the fabric supports it, and Ultra Ethernet Consortium 1.0 packet-spray transport on the silicon families that publish UEC compliance.
  • Independent validation testing — post-deployment proof of EVPN-VXLAN convergence under leaf failure, anycast-gateway move (per Type-2 MAC mobility extended-community signaling per IETF RFC 7432), BUM replication mode (ingress replication vs. multicast underlay) behavior, microsegmentation enforcement at the leaf VTEP, and DCI multisite border-gateway failover; deliverable maps to ANSI/TIA-942-C rated-tier infrastructure expectations and NIST SP 800-160 systems-engineering verification rather than a vendor self-attested telemetry dashboard.