Inside the Cisco Nexus 6004 Switch
The Nexus 6004 is a 10-Gpbs/40-Gbps data center switch. While a bit of a power hog vs. competitors’ boxes, it promises line rate forwarding on all ports, with 1-microsecond latency. I look at the architecture that makes this possible.
April 24, 2013
The Cisco Nexus 6004 is a 4U L2/L3 switch with up to 96 ports of 40-Gbps Ethernet, or 384 ports of 10-Gbps Ethernet. The ports are QSFP, which means a breakout cable is required to support 10 Gbps.
The primary target market for this switch is the data center that requires a non-blocking aggregation layer running 40-Gbps uplinks, although the 6004 could conceivably play a core role in certain deployments.
An obvious use case for the 6004 is deployed as a part of the backbone in a leaf-spine network topology, where access-layer leaf switches are uplinked to a series of spine-layer aggregation switches. This design keeps hosts topologically close to one another, minimizing hop counts and latency.
The 6004 ships with 48 fixed ports (sans optics), and offers four linecard expansion module (LEM) slots. Only one LEM is available as of this writing, with at least two more reportedly being considered by Cisco.
The N6K-C6004-M12Q, which is currently shipping, offers 12 ports of 40-Gbps Ethernet or FCoE. A Unified Port LEM (tentatively availability in the second half of 2013) would offer native 2/4/8 Fibre Channel support, as well as 1/10GbE SFP+ ports.
Cisco is considering a 100 GbE LEM, which would most likely include FCoE support. Tentative availability is the first half of 2014.
The 6004 is not a feature match for the Nexus 5K line, in that it doesn't offer native Fibre Channel. Then again, the 6004 not really positioned in that space. The 6004's focus is that of a 40 GbE monster Ethernet switch with line-rate forwarding capacity on all ports.
40 GbE isn't a bandwidth requirement for SAN fabrics just yet, and designers requiring storage at the outset have the option of FCoE--even multihop. A lack of native Fibre Channel seems a minor issue that's likely to be addressed by Cisco within the next 12 months.
Power consumption is a possible consideration for those evaluating the 6004. While there's no direct competitor for the 6004 I'm aware of due to its port density, I compared it against a few other (as it happens, fixed configuration) 40 GbE switches using numbers published on vendors' websites.
After a nominal analysis of per-port power draw based on maximum wattage specified, the 6004 does look like a bit of an electron chewer.
• The Arista Networks 7050Q has 16 40 GbE ports and is rated for 303W max, which equals 18.93W per port.
• The Juniper Networks QFX3600 has 16 40 GbE ports and is rated for 345W max, which equals 21.56W per port.
• The Dell Force10 Z9000 has 32 40 GbE ports and is rated for 800W max, which equals 25W max per port.
• The Cisco Nexus 6004 has 96 ports (when fully populated) and is rated for 3300W max, for 34.375W max per port.
Cisco states that the 6004 forwards at line rate using any combination of 10 Gbps or 40 Gbps ports at a latency of 1 microsecond. The 1-microsecond latency is consistent even when interfaces are loaded with functionality such as security and QoS policies. To understand how the switch accomplishes this feat, I'll review the architectural details inside the switch.
Overall Fabric Architecture
The job of any switch is to accept chunks of data flowing into it, determine where those chunks of data should go and send them. Going forward, I'll refer to these chunks as "packets," although that could mean L2 frames or L3 packets.
The key underlying components of the Nexus 6004 that create the non-blocking architecture are two unique Cisco ASICs that a packet flows through on its trip through the 6004:
• Unified Port Controller (UPC)
• Crossbar Switch Fabric.
Let's take a look at each. I'll examine the UPC at both ingress and egress.
Ingress Unified Port Controller
As implied by the name "ingress," this is where the packet flows into the switch. There's one UPC for every three 40 GbE (or 12 10GbE) ports in the Nexus 6004. Whether traffic is flowing into or out of the UPC, it is a busy chip, with four important functions.
1. Media access control is a lower-level function that handles things like Ethernet framing and flow control.
2. The forwarding controller determines *if* a packet is to be forwarded, *where* it will go and *what* it will look like when it gets there. Policy is applied here (access lists and so on). Functions such as tunnel encapsulation-decapsulation and header re-writes happen here, as well.
3. The buffer manager handles queuing and dequeuing of packets. The need for buffers might seem counterintuitive in a non-blocking fabric, but buffers help to manage contention of multiple packets on the wire if they try to access a single egress port during the same clock cycle.
4. The queuing subsystem manages the virtual output and egress queues themselves, as opposed to the packets in the queues (which is what the buffer manager does).
Once the packet has made it into the UPC, has policy applied to it, and is rewritten or encapsulated if necessary, the UPC determines the egress port. The packet is then buffered in the appropriate queue to prepare for a journey to the egress queue via the crossbar switch fabric. In the ingress UPC, the packet is going to be buffered in a virtual output queue (VOQ) if necessary. Let's look at VOQs in more detail, a common feature in switches that are input-queued.
Unified Port controller
In the 6004, every ingress interface gets eight VOQs (one per 802.1p priority class) per egress interface. With a maximum of 384 physical interfaces in the system, that translates to 3,072 VOQs per ingress interface.
Practically speaking, that means that as traffic flows into a switch port, the switch determines what port it should egress on and try to forward it there. But if the egress port is busy, the ingress port doesn't have to stall all the other incoming packets. Instead, the ingress port can service traffic flowing into the other VOQs and send them along.
Therefore, VOQs eliminate what's known as "head of line blocking," where the packet in the front of the line is holding up all the other packets queued up behind. The 6004 has enough VOQs to be able to forward from any ingress port to any egress port on any traffic class. Note that the 3,072 VOQs I refer to are for unicast traffic. Multicast traffic has 8,192 VOQs, as well as 32 VOQs dedicated to SPAN traffic (that is, port mirroring).
When a packet has been assigned to a VOQ, it's ready to make its trip across the crossbar fabric.
Next page: Crossbar Switch Fabric and Egress UPCThe Nexus 6004, has four crossbar fabric modules, each providing a cross-connect matrix of 192 x 384 (ingress x egress) 10-Gbps paths. This asymmetric design helps to reduce fabric collisions, as each ingress UPC fabric connection has double the egress UPC fabric connections. Let's look in more detail at the connection between the UPCs and crossbar fabric modules.
1. The ingress UPC connects to all four fabric modules with four connections each, for a total of 16 connections between the ingress UPC and the crossbar fabric.
2. Each of these 16 connections offers 14 Gbps of bandwidth each, for a total of 224 Gbps from ingress UPC to fabric.
3. The four crossbar fabrics connect to the egress UPC with eight connections each, for a total of 32 connections between the egress UPC and the crossbar fabric.
4. Each of these 32 connections offers 14 Gbps of bandwidth each, for a total of 448 Gbps from fabric to egress.
5. A switch fabric scheduler moves packets from the ingress UPC across the crossbar fabric matrix to the egress UPC.
10 Gig Fabric Mode
Egress Unified Port Controller
Architecturally, the egress UPC is the same chip as the ingress UPC. However, as the directional flow is different (egress instead of ingress), the buffering structure is also different. There are a total of 16 egress queues: eight for unicast and eight for multicast, each of the eight corresponding to one of eight 802.1p classes of service. If desired, the switch operator can divide the amount of system resources assigned to each queue.
When traffic on the egress UPC becomes congested (packets stacking up as they wait to be delivered), moving traffic out of the egress queues is handled by deficit round-robin, with the exception of one strict-priority queue. Strict priority queues are treated with special care to guarantee timely, jitter-free delivery, while the remaining queues are dequeued according to their weights. To help avoid further congestion, random early detection decides which packets should be marked with explicit congestion notification.
Once a packet is ready to be taken from the egress queue and delivered by the egress UPC to its destination, the same process of encapsulation/decapsulation, destination lookup, framing, and so on follows so that the packet can be sent.
Summary
The Cisco Nexus 6004 switch is positioned as a heavy-lifter. In a data center design, the switch can play roles at core, aggregation or access layer, and can be relied upon to deliver traffic with very low latency. The 6004 doesn't ask the network designer to compromise performance simply because a particular type of encapsulation or security policy is applied. Nor is Layer 3 forwarding a bolt-on afterthought; instead, line-rate routing functionality is built into the ASICs, providing a network designer with great flexibility and making the 6004 a candidate for a number of interesting roles in both greenfield and brownfield deployments.
This article summarizes key points from the Cisco Nexus 6004 Switch Architecture document soon to be published by Cisco, which was kind enough to let me review an advanced copy. I also used the Cisco presentation BRKARC-3453, which is available from CiscoLive365.com, as a reference. The illustrations used in this post come from Cisco.
About the Author
You May Also Like