The Aruba ESP Campus Switching Design section describes the technologies and design principals used in the design of an Aruba ESP Layer 2 LAN topology and control plane.
Table of contents
- Switching Design
- Quality of Service
- Spanning Tree Protocol
- Network Resiliency
- Aruba ESP Campus LAN Design Summary
Quality of service (QoS) refers to a network’s ability to provide higher levels of service using traffic prioritization and control mechanisms. Applying the proper QoS policy is important for real-time traffic (such as on Skype or Zoom) and business-critical applications (such as Oracle or Salesforce).
To accurately configure QoS on a network, consider several aspects, including bit rate, throughput, path availability, delay, jitter, and loss. The last three—delay, jitter, and loss—are easily improved with a queuing algorithm that enables the administrator to prioritize applications with higher requirements over those with lower requirements. The areas of the network that require queuing are those with constrained bandwidth, such as the wireless or WAN sections. Wired LAN uplinks are designed to carry the appropriate amount of traffic for the expected bandwidth requirements. Since QoS queuing does not take effect until active congestion occurs, it is not typically needed on LAN switches.
The easiest strategy to deploy QoS is to identify the critical applications running in the network and give them a higher level of service using the QoS scheduling techniques described in this guide. The remaining applications stay in the best-effort queue to minimize upfront configuration and lower the day-to-day operational effort of troubleshooting a complex QoS policy. If additional applications become critical, they are added to the list. This can be repeated as needed without requiring a comprehensive policy for all applications. This strategy is often used by organizations that do not have a corporatewide QoS policy or that are troubleshooting application performance problems in specific network areas of the network.
One example prioritizes real-time applications along with a few other critical applications that require fast response times because users are waiting on remote servers. Identifying business-critical applications and giving them special treatment on the network helps employees remain productive. Real-time applications are placed into a strict priority queue, and business-critical applications are given a higher level of service during congestion.
The administrator must limit the amount of traffic placed into strict priority queues to prevent oversaturation of interface buffers. Giving all traffic priority defeats the purpose of creating a QoS strategy. After critical traffic is identified and placed in the appropriate queues, the rest of the traffic is placed in a default queue with a lower level of service. If the higher-priority applications do not use the bandwidth assigned, the default queue uses all available bandwidth.
Aruba recommends using the access switch as a QoS policy enforcement point for traffic over the LAN. This means selected applications are identified by IP address and port number at the access switch and marked for special treatment. Optionally, traffic can be queued on the uplinks, but this is not required in an adequately designed campus LAN environment. Any applications that are not identified are re-marked with a value of zero, giving them a best-effort level of service. The diagram below shows where traffic is classified, marked, and queued as it passes through the switch.
Classification, Marking, and Queuing
In a typical enterprise network, applications with similar characteristics are categorized based on how they operate over the network. Then, the applications go in different queues according to category. For example, if broadcast video or multimedia streaming applications are not used for business purposes, there is no need to account for them in a QoS policy. However, if Skype and Zoom are used for making business-related calls, the traffic must be identified and given a higher priority. Specific traffic that is not important to the business (for example: YouTube, gaming, and general web browsing) should be identified and placed into the “scavenger” class, where it is dropped first and with the highest frequency during times of congestion.
A comprehensive QoS policy requires categorization of business-relevant and scavenger-class applications on the network. Layer 3 and layer 4 classifications group applications into categories with similar characteristics. After sorting the critical applications from the others, they are combined into groups for queuing.
Scheduling algorithms rely on classification markings to identify applications passing through a network device. Applications marked at layer 3 rather than layer 2 will carry the markings throughout the life of a packet.
The goal of the QoS policy is to allow critical applications to share the available bandwidth with minimal system impact and engineering effort.
Marking in layer 3 uses the IP type of service (ToS) byte with either the IP precedence three most significant bit values (from 0 to 7) or the DSCP six most significant bit values (from 0 to 63). The DSCP values are more common because they provide a higher level of QoS granularity. They are also backward compatible with IP precedence because of their leftmost placement in the ToS byte.
Layer 3 markings are added in the standards-based IP header, so they remain with the packet as it travels across the network. When an additional IP header is added to a packet, such as traffic in an IPsec tunnel, the inner header DSCP marking must be copied to the outer header to allow the network equipment along the path to use the values.
Several RFCs are associated with the DSCP values as they pertain to the per-hop behaviors (PHBs) of traffic passing through the various network devices along its path. The diagram below shows the relationship between PHB and DSCP and their associated RFCs.
DSCP relationship with PHBs
Voice traffic is marked with the highest priority using an Expedited Forwarding (EF) class. Multimedia applications, broadcast, and video conferencing are placed into an assured forwarding (AF31) class to give them a percentage of the available bandwidth as they cross the network. Signaling, network management, transactional, and bulk applications are given an assured forwarding (AF21) class. Finally, default and scavenger applications are placed into the default (DF) class to provide them a reduced amount of bandwidth but not completely starve them during times of interface congestion. The figure below shows an example of six application types mapped to a four-class LAN queuing model.
Six application types in a 4-class LAN queuing model
The best-effort entry at the end of the QoS policy marks all application flows that are not recognized by the layer 3 / layer 4 classification into the best-effort queue. This prevents end-users who mark their own packets from getting higher-priority access across the network.
The Aruba switch supports up to eight queues, but this example uses only four queues. The real-time interactive applications are combined into one strict priority queue. In contrast, multimedia and transactional applications are placed into deficit round-robin (DRR) queues, and the last queue is used for scavenger and default traffic. DRR is a packet-based scheduling algorithm that groups applications into classes and shares available capacity among them according to a percentage of bandwidth defined by the network administrator. Each DRR queue gets its fair share of bandwidth during congestion, but all of them can use as much bandwidth as needed when there is no congestion.
The outbound interface requires the DSCP values shown in the second column to queue the applications. The weighted values used in the DRR LAN scheduler column are added, and each DRR queue is given a share of the total. The values must be adjusted according to the volume of applications in each category. This is often a trial-and-error process because the QoS policy affects the applications in the environment. As shown in the four-class example below, the queues are sequentially assigned in top-down order.
QoS Summary for Aruba Switch
High availability is the primary goal for any enterprise conducting business on an ongoing basis. Layer 2 loops cause catastrophic network disruptions, making prevention and timely removal of loops a critical part of network administration.
The Spanning Tree Protocol (STP) dynamically discovers and removes layer 2 loops in a network. This section covers STP topology, the type of STP to use, and supplemental features to enable.
One method of increasing availability between network infrastructure is to establish layer 2 redundancy using multiple links, so an individual link failure does not result in a network outage. Multiple strategies can be applied to prevent the redundant connections from forwarding layer 2 frames in an infinite loop. The Aruba ESP architecture uses Virtual Switching Extension (VSX), discussed in the following Network Resiliency section, to prevent loops between centrally administered network switches. STP in combination with Loop Protect is configured primarily to resolve accidental loops created by users in the access layer.
With many different types of STP and varying network devices using different defaults, it is important to standardize on a common version for predictable STP topology. The recommended STP version for Aruba Gateways and Switches is Rapid per VLAN Spanning Tree (Rapid PVST+).
Enable STP on all devices providing layer 2 services to prevent accidental loops.
STP creates a loop-free topology by selecting a root bridge and subsequently permitting only one port on each non-root switch to forward frames in the direction of the root. The root bridge is dynamically selected, using the lowest bridge ID as its primary selector. The bridge ID begins with a bridge priority, which can be set administratively to influence root bridge selection. Aggregation switches and collapsed core switches should have low bridge priorities to ensure that a switch at that layer becomes the root bridge of the network. The root bridge should be a device that is central to the aggregation of VLANs for downstream devices. In the campus topologies discussed in this guide, the root bridge candidates are the collapsed core, access aggregation, and services aggregation devices.
In the three-tier wired design, the root bridges are the access aggregation switches and service aggregation switches. Virtual Switching Extension (VSX) and multichassis link aggregation groups (MC-LAGs) are used to allow dual connections between the access and aggregation layers without the need for STP on individual links. Each set of aggregation switches is a separate layer 2 domain with its own root bridge, but they do not interfere with each other because STP does not extend over the layer 3 boundary between the devices. In this example, the aggregation switches are set with a low bridge priority to ensure that one switch in each VSX pair becomes the root bridge. The core devices are layer 3 switches and do not run STP.
Spanning Tree Protocol root bridge placement
STP has several supplemental features that help keep the network stable. Here is a brief overview of supplemental features, with justifications for enabling them.
Root Guard stops devices from sending a superior bridge protocol data unit (BPDU) on interfaces that should not send a superior or lower-priority BPDU on an interface. Enable Root Guard on the aggregation or collapsed core downlinks to prevent access switches from becoming the root of the network. Do not enable it on the links connecting the aggregation switches to the core switch.
Admin Edge allows a port to be enabled automatically without going through the listening and learning phases of STP on the switch. This should be used only on single device ports or with a PC daisy-chained to a phone. Use Admin Edge with caution because STP does not run on these ports, so if there is a loop on the network, STP cannot detect it. This feature should be used only for client-facing ports on access switches.
BPDU Guard automatically blocks any port on which it detects a BPDU. This feature typically should be enabled on admin-defined client-facing ports on access switches. This prevents any BPDUs from being received on ports configured as client-facing. BPDU Guard ensures that BPDUs are not received on access ports, preventing loops and spoofed BPDU packets.
BPDU Filter ignores BPDUs sent to an interface and does not send its own BPDUs to interfaces. The main use for this feature is in a multitenancy environment when the servicing network does not want to participate in the customer’s STP topology. A BPDU Filter–enabled interface still allows other switches to participate in their own STP topologies. Using BPDU Filter is not recommended unless the network infrastructure does not need to participate in the STP topology.
Loop Protect is a supplemental protocol to STP that can detect loops when a device creating a loop does not support STP and also drops BPDUs. Loop Protect automatically disables ports to block a detected loop and re-enables ports when the loop disappears. This feature should be turned on for all access port interfaces to prevent accidental loops from port to port. Loop Protect should not be enabled on the uplink interfaces of access switches or in the core, aggregation, or collapsed core layers of the network.
Fault Monitor can be used to automatically detect excessive traffic and link errors. Fault Monitor can be used to log events, send SNMP traps, or temporarily disable a port. Enable Fault Monitor in notification mode for all recognized faults, and enable it on all interfaces for data continuity, but do not enable the disable feature with Fault Monitor because Loop Protect is used to stop loops.
For campus switches, Aruba recommends either a two-tier LAN with collapsed core or a three-tier LAN with a routed core. In both designs, common features can be enabled to ensure that the network is highly resilient. The two-tier campus is shown on the left below; the three-tier campus is on the right.
Two-Tier and Three-Tier Wired
VSX enables two AOS-CX switches to appear as a single switch to downstream-connected devices. Use VSX in a collapsed core or aggregation point to add redundancy. In a standard link aggregation group (LAG), multiple physical interfaces between two devices are combined into a single logical link. Virtual Switching Extension (VSX) extends this capability by combining ports across two AOS-CX switches on one side of the LAG, referred to as a multi-chassis LAG (MC-LAG). The VSX pair appears as a single layer 2 switch to downstream connected devices, which can be another switch, a gateway, or an individual network host. The active gateway features enable configuration of a redundant Layer 3 network gateway using a shared IP and MAC address. Dual-homing connected devices in this manner adds network resiliency by eliminating a single point of failure for upstream connectivity.
From a management/control plane perspective, each switch is independent. VSX pairs synchronizate MAC, ARP, STP, and other state tables tables over an inter-switch link (ISL). The ISL is a standard LAG between the VSX pair designated to run the ISL protocol.
VSX is supported on Aruba CX 6400, CX 8320, CX 8325, CX 8360 and CX 8400 models. A VSX pair cannot be mixed between different models, meaning a CX8320 cannot be mixed with a CX 8325.
VSX Pair Placement
The access switch uses a standard LAG connection. From the access switch perspective, the VSX pair is a single upstream switch. This minimizes the fault domains for links by separating the connections between the VSX-paired switches. This also minimizes the service impact with the Live Upgrade feature because each device has its own control plane and link to the downstream access devices.
Traditional STP vs VSX
MC-LAG enables all uplinks between adjacent switches to be active and passing traffic for higher capacity and availability, as shown in the right side of the figure below.
Note: When using LAG or MC-LAG, STP is not required but should be enabled as an additional enhanced loop protection security mechanism.
LACP—The Link Aggregation Control Protocol (LACP) combines two or more physical ports into a single trunk interface for redundancy and increased capacity.
LACP Fallback—LAGs with LACP Fallback enabled allow an active LACP interface to connect with its peer before it receives LACP protocol data units (PDUs). This feature is useful for access switches using Zero Touch Provisioning (ZTP) connecting to LACP-configured aggregation switches.
Inter-switch link—Best practice for configuring the ISL LAG is to permit all VLANs. Specifying a restrictive list of VLANs is valid if the network administrator requires more control.
MC-LAG—These LAGs should be configured with the specific VLANs and use LACP active mode. MC-LAGs should NOT be configured with all-VLAN permission.
VSX keepalive—The VSX keepalive is a User Datagram Protocol (UDP) probe that sends hellos between the two VSX nodes to detect a split-brain situation. The keepalives should be sent using the out-of-band management (OOBM) port connected to a dedicated management network, or enabled on a direct IP connection using a dedicated physical link between the VSX pair members.
Active gateway—This is the default gateway for endpoints within the subnet. It must be configured on VSX primary and secondary switches. Both devices also must have the same virtual MAC address configured from the private MAC address spaces listed below. There are four ranges reserved for private use.
Note: x is any hexadecimal value.
PIM Dual—If the network is multicast-enabled, the default PIM DR is the VSX primary switch. The VSX secondary also can establish PIM peering to avoid a long convergence time in case of VSX primary failure. Aruba recommends configuring PIM as active-active for each VSX pair.
VSX and Rapid PVST—Aruba recommends configuring VSX and STP per the design guidelines outlined in the best-practices document below.
Note: Certain VSX use cases fall outside this design guidance but are covered in detail in VSX Configuration Best Practices. It includes in-depth information about STP interactions, traffic flows, active forwarding, and the Live Upgrade process.
Stacking allows multiple access switches to be connected to each other and behave like a single switch. Stacking combines multiple physical devices into one virtual switch, increasing port density and allowing management and configuration from one IP address. This reduces the total number of managed devices while better utilizing the port capacity in an access wiring closet. Stack members share the uplink ports, which provides additional bandwidth and redundancy.
AOS-CX access switches provide front plane stacking using the Virtual Switching Framework (VSF) feature, using two of the four front panel SFP ports operating at 10G, 25G, or 50G speeds. VSF combines the control and management planes of both switches in a VSF stack, which allows for simpler management and redundancy in the access closet. VSF is supported on Aruba CX 6200 and 6300 switches.
VSF supports up to 10 members on a 6300 and up to eight members on a 6200. Aruba recommends a ring topology for the stacked switches. A ring topology can be used for 2–10 switches and allows for link-failure fault tolerance, because the devices can still reach the commander or standby switch using the secondary path.
The commander and standby switches should have separate connections to the pair of upstream aggregation switches. If the commander fails, the standby can still forward traffic, limiting the failure to the commander switch. The recommended interface for switch stacking links is a 50G direct-attach cable (DAC), which allows enough bandwidth for traffic across members.
There are three stacking-device roles:
- The commander conducts overall management of the stack and synchronizes forwarding databases with the standby.
- The standby provides redundancy for the stack and takes over stack management operations if the commander becomes unavailable or if an administrator forces a commander failover.
- Members are not part of the overall stack management, but they must manage their local subsystems and ports to operate correctly as part of the stack. The commander and standby also are responsible for their own local subsystems and ports.
To mitigate the effects of a VSF split stack, a split-detection mechanism must be enabled for the commander and standby, known as multi-active detection (MAD). MAD is enabled using a connection between the OOBM ports on the primary and secondary members to detect when a split occurs. Aruba recommends connecting OOBM ports directly with an Ethernet cable.
VSF OOBM MAD and links
The ESP campus wired LAN provides network access for employees, APs, and Internet of Things (IoT) devices. The campus LAN also becomes the logical choice for interconnecting the WAN, data center, and Internet access, making it a critical part of the network.
The simplified access, aggregation, and core design provides the following benefits:
- An intelligent access layer protects from attacks while maintaining user transparency within the layer 2 VLAN boundaries.
- The aggregation and core layers provide IP routing using OSPF and IP multicast using PIM sparse mode with redundant BSRs and RPs.
- The services aggregation connects critical networking devices such as corporate servers, WAN routers, and Internet edge firewalls.
- The core is a high-speed dual-switch interconnect that provides path redundancy and subsecond failover for nonstop forwarding of packets.
- Combining the core and services aggregation into a single layer allows the network to scale when a standalone core is not required.
When overlay networks are needed, Aruba provides the flexibility to choose between centralized or distributed overlays to address different traffic and policy requirements. Both overlay models support the “Colorless Ports” feature, which enables automated client onboarding and access control for ease of operations.