Aruba ESP Data Center Policy Design
The Aruba Edge Services Platform (ESP) Data Center provides powerful policy options to support multi-tenancy and achieve security goals to protect critical data.
Table of contents
Aruba ESP Data Center Policy Layer
Implementation of the Aruba ESP data center policy layer is dependent on the chosen network architecture. An EVPN-VXLAN design offers a rich combination of overlay technologies and traffic filtering mechanisms to isolate user and application traffic, configured primarily on leaf switches. A Layer 2 two-tier design offers many of the same filtering options, requiring configuration at both the core and access layers.
The Aruba CX10000 Distributed Services Switch (DSS) enforces east-west traffic policy using an inline stateful firewall in hardware within the switch. A DSS optimizes performance and traffic flow characteristics over a centralized firewall strategy and can replace hypervisor-based firewalls, increasing hypervisor CPU and memory resources for hosted workloads. The CX 10000 can be placed in both EVPN and Layer 2 two-tier architectures, but with greater policy flexibility in an EVPN design.
Aruba Fabric Composer (AFC) integration with both vCenter and AMD Pensando Policy Services Manager (PSM) provides a powerful combination for managing east-west data center policy using CX 10000 switches and VM guest policy assignment. Network and security administrators can manage all policy elements centrally, while empowering VM administrators to assign VM guests to a policy block in their own independent workflow through the assignment of VM tags. AFC also supports centralized configuration of access control lists (ACLs).
This document presents components, design recommendations, and best practices for designing the policy layer of an Aruba ESP data center network.
Out-of-Band Management Network
Organizations should plan to build a physically separate management LAN and role-based access control (RBAC) for network devices.
A separate management network prevents the data plane from becoming an attack surface for switch management compromise and ensures that data center switch reachability is not lost when modifying data plane policy.
RBAC requires login authentication against an enterprise directory, typically accomplished using either TACACS+ or RADIUS protocols with a policy server such as Aruba ClearPass Policy Manager.
Use of logging facilities, log management, and log analysis also should be considered.
Segmentation and Policy Prerequisites
Data center applications are deployed in many different ways. Applications can be implemented as VMs using hypervisors or hosted in bare-metal servers. Containerized apps are highly distributed and usually require connectivity between multiple computing and service nodes. In some cases, a single data center hosts applications for multiple tenants while offering a set of shared services across them. Because applications are deployed with the majority of traffic contained in the data center, it is incorrect to assume that all security threats are external.
Successful data center policy design begins with understanding the requirements of the applications that run in the environment. It is often necessary to re-profile legacy applications when there is insufficient documentation of the requirements. From a networking perspective, application profiling should document all network connections required for an application to run successfully. These might include connections to backend databases or cloud-hosted services. To properly define policy regarding which connections must be permitted and which must be denied, it is necessary to know the application profile.
Similarly, analyzing the profile of the users accessing the applications and data is typically required. Never leave a data center wide open to a campus, even if it is assumed to be a secure environment. To restrict access, understand the various user profiles associated with the applications and data required. It is important to identify on-campus, remote branch, mobile field workers, and public Internet requirements so that appropriate data center access profiles can be developed to represent their unique requirements.
Segmentation Overview
Segmentation is a logical separation of data center hosts between which a security policy can be enforced. A network segment can represent a large set of hosts such as a data center tenant, or it can be as granular as a single host.
VRF Segmentation
Establishing routing domains is a key method of segmenting internal data center communication and a requirement for an EVPN-VXLAN overlay. Network hosts that are members of one routing domain are not capable of communicating with hosts in another routing domain by default. Each routing domain contains a set of hosts allowed to communicate with one another, while traffic into or out of the routing domain can be controlled. Multiple strategies are available to share connectivity between routing domains, when necessary.
A switch supports multiple routing domains by implementing virtual routing and forwarding instances (VRFs). A VRF instance can correlate with a customer, an application, a set of hosts with common security requirements (i.e., PCI), or a set of hosts with other common characteristics (i.e., production environment, development environment, etc.). Each VRF consists of a unique route table, member interfaces that forward traffic based on the route table, and routing protocols that build the route table. Different VRFs may contain overlapping IP address ranges because the individual route tables are discrete.
VRF member interfaces that connect to external networks provide a natural point for implementing north-south security policy.
In an EVPN-VXLAN design, VRFs are a required construct in the overlay and are implemented on leaf switches. EVPN-VXLAN allows VRF segments to be extended to additional data center locations in a multi-fabric environment. The default VRF is reserved for underlay connectivity.
In a Layer 2 two-tier design, VRFs are optional and are implemented at the data center core. Data center implementations large enough to require VRF segmentation are generally implemented with an EVPN solution, but VRFs remain a useful segmentation strategy in a Layer 2 two-tier design. By default, all hosts in a Layer 2 two-tier data center are members of the default VRF.
VRFs should be added in a thoughtful manner, because the operational complexity of a network increases as the number of VRFs grows. Minimizing complexity results in a network implementation that is easier to maintain and troubleshoot. Each organization should define its own policy that clearly states the criteria to be met when adding a VRF to the network. For example, a service provider that supports multiple tenants will have different criteria and require more VRFs than a university data center.
Common best practice is to use the minimum number of VRFs required to achieve clearly defined organizational goals. VRFs are employed to support the following use cases:
- Separate production and development application environments. This provides a development sandbox while minimizing risk to production application uptime, and it supports overlapping IP space when required.
- Apply policy to segmented traffic that requires strict regulatory compliance, such as PCI or HIPAA.
- Apply policy to traffic from hosts identified by organizational policy as requiring segmentation and possessing a common set of security requirements. These sets of hosts often share a common administrative domain.
- Isolate Layer 3 route reachability in a multi-tenancy data center, while supporting overlapping IP space.
Inter-VRF Route Forwarding (IVRF) can be used within a data center to share IP prefixes between VRFs. For example, to provide shared services in a data center, a services VRF can be created to offer a common set of resources to some or all other data center VRFs. IVRF allows Layer 3 reachability between applications in the services VRF and hosts in other VRFs.
Note: IVRF can circumvent inter-VRF policy by joining previously discrete routing domains together, and it does not support overlapping IP address space.
VLAN/Subnet Segmentation
In addition to limiting broadcast domain size, a VLAN can be used to group sets of data center hosts by role, application, and administrative domain. A VLAN is typically associated with a single IP subnet. Traffic between VLANs must be routed, and all host traffic between VLANs is forwarded via an IP gateway interface, where security policy can be applied.
ACLs applied to Layer 3 VLAN interfaces are typically used to enforce a base policy between subnets. When more sophisticated policy requirements arise, a common solution is to deploy a centralized firewall and make it the default gateway for application subnets. This results in a sub-optimal, inefficient traffic pattern. Routed traffic between VLANs is hairpinned at the central firewall, unnecessarily consuming extra data center bandwidth.
The Aruba ESP data center provides a more elegant option of deploying CX 10000 ToR switches which offer hardware-based, Layer 4 firewall capabilities at the host’s connection layer. This model optimizes data center bandwidth capacity and eliminates the need to hairpin traffic through a central firewall.
Microsegmentation
Microsegmentation extends Layer 2 VLAN segmentation to an individual workload level using isolated and primary VLAN constructs available in the private VLAN (PVLAN) feature set. Similar to a hypervisor-based firewall, a PVLAN microsegmentation strategy provides the ability to enforce policy between VM guests on the same hypervisor.
Private VLANs coupled with the CX 10000 switch support microsegmentation policy enforcement local to the workload’s attachment point. Data center workloads assigned to the same isolated private VLAN cannot communicate directly with each other over Layer 2. The isolated VLAN is associated to a primary VLAN which hosts a default gateway SVI for each isolated VLAN. Proxy-ARP is configured on the primary VLAN’s SVI to enable communication between isolated hosts via the primary VLAN SVI.
A centralized firewall also can be used to achieve microsegmentation, but rapid growth of microsegmented workloads combined with the inefficient traffic engineering of a centralized design will exhaust overall data center bandwidth much more quickly as traffic to and from each microsegmented workload is required to be hairpinned centrally.
An EVPN-VXLAN solution using CX 10000 leaf switches provides the most flexible policy assignment. Stateful firewall policy is assigned to the primary VLAN associated with the segmented workload’s isolated VLAN. Since all traffic for a workload assigned to an isolated VLAN is forwarded to the primary VLAN, all traffic for the individual workload is subject to policy enforcement. Both egress and ingress firewall policy can be applied to the workload.
In a Layer 2 two-tier solution using CX 10000 access switches, firewall policy is limited to only the egress direction applied to the workload’s isolated PVLAN. Policy is enforced when traffic traverses the access layer toward the primary VLAN’s gateway IP defined at the core.
The CX 10000 provides a unified data center microsegmentation strategy that is hypervisor agnostic (supporting VMware, Microsoft Hyper-V, KVM, etc.) and also supports bare metal servers. Using the CX 10000 in place of a hypervisor-based implementation offloads policy enforcement cycles from a VM host CPU to dedicated switch hardware.
In both the EVPN-VXLAN and Layer 2 two-tier solutions, ACL policy can be applied to the primary VLAN’s SVI interface, and PVLANs can be extended across multiple switches.
Microsegmentation can be applied to a subset of hosts requiring a high level of scrutiny, or it can be applied more broadly to maximize a data center’s security posture.
Policy Overview
Security policy specifies the type of traffic allowed between network segments. Network-based policy is typically enforced using a firewall or ACL. If traffic is permitted by a stateful firewall, dynamic state is created to permit return traffic for the session. An ACL is applied to traffic only in one direction with no dynamic state created.
Applying network security policy plays a significant role in reducing the attack surface exposed by data center hosts. Blocking unnecessary protocols reduces the available tactics a threat actor can use in host exploitation. Scoping allowed outbound traffic inhibits command and control structures and blocks common methods of data exfiltration. Applying intra-data center security policy constrains the options for lateral threat movement if a host has been compromised.
Data Center Perimeter Policy
Data center routes require sharing with campus and other external networks. Applying policies at the edge between the data center and external networks is the first layer of security for data center applications. They limit access to only permitted networks and hosts while monitoring those connections, and they can be implemented using perimeter firewall appliances or ACLs.
In an EVPN-VXLAN spine-and-leaf design, a pair of leaf switches is the single entry and exit point to the data center. This border leaf is not required to be dedicated to that function. Border and services leaf functions are commonly combined, and less frequently computing hosts are also attached. In a Layer 2 two-tier design, the core layer provides a common ingress and egress point for traffic between the data center and external networks. In both cases, this data center network edge is where a set of policies is implemented to control access into and out of the data center network.
Perimeter policy is applied at a VRF segment level. If a single VRF contains all data center hosts, a single pair of policies is configured at the data center edge: one policy for ingress traffic and a second policy for egress traffic. If multiple VRFs are configured, a unique pair of policies should be implemented on a per-VRF basis at the data center edge.
Multiple data center VRFs can be extended to the upstream edge device on a single physical interface or aggregated link using 802.1Q tagging. A VLAN is associated with each VRF, and the VLAN’s corresponding SVI is used for router protocol peering purposes. The upstream edge device may have one or multiple VRFs defined. A direct VRF-to-VRF peering between a data center edge VRF and its campus VRF neighbor enables IP segmentation to be extended into the campus, which is referred to as VRF-lite.
In addition to filtering traffic between the data center and external networks, multiple data center VRFs can peer with a single external routing instance to create a combined enforcement point for external and inter-VRF policy, when overlapping IP address space is not implemented. Policy between data center VRFs is enforced by hairpinning traffic to the upstream device.
Perimeter Firewalls
Dedicated security systems at the perimeter can offer advanced monitoring, application-aware policy enforcement, and threat detection.
Perimeter firewalls are deployed in transparent or routed mode. In transparent mode, the firewalls behave like a bump in the wire, meaning they do not participate in Layer 3 network routing. From the perspective of directly attached switches, they are no different than a transparent bridge, but the firewall forwards only explicitly permitted user and network control traffic. In routed mode, a firewall participates in the routing control plane and generally has more flexibility with deep packet inspection and policy enforcement options. It is important to note that stateful firewalls require symmetric forwarding to apply policy correctly to subsequent traffic in a session.
When multiple data center VRFs contain overlapping IP address space or VRF segmentation must be extended beyond the perimeter firewall, the firewall must support a virtualization mechanism to allow route table isolation. This can be virtualization of the firewall itself into distinct logical instances or support for VRFs.
Perimeter ACLs
When IP subnets inside the data center are designed to map to security groups or business functions, Access Control Lists (ACLs) at the border leaf can provide policy enforcement from user locations into data center applications. If subnets cannot be mapped to security groups, ACLs can become difficult to manage and scale in larger environments. The primary benefit of perimeter ACLs is that they can be implemented directly on the switching infrastructure to enforce a policy foundation from which to establish data center access. Policies implemented using switch ACLs specifically target Layer 3 and Layer 4 constructs. Switch ACLs are not stateful or application-aware.
East-West Security Policy
The majority of traffic in a modern data center is east-west traffic between the data center workloads themselves. Policy enforcement can be implemented between VRF, VLAN, and microsegmentation segments using firewalls or ACLs.
Firewalls offer more comprehensive filtering capabilities, when compared to ACLs. Firewall policy inside an Aruba ESP data center can be implemented using two methods at the network layer: inline using distributed services switches (DSSs) or centrally using a firewall appliance in a services leaf.
Distributed Services Switch Policy Enforcement
The AMD Pensando programmable data processing unit (DPU) extends Aruba CX 10000 switches to include stateful firewall capabilities. Using this built-in hardware feature, firewall enforcement is provided inline as part of the switch data plane.
There are several advantages to this approach. Data paths are optimized by applying policy at the workload attachment point, without the requirement to hairpin data through a centralized firewall. Firewall policy can be granular to the host with support for microsegmentation. The Pensando DPU provides wire-rate performance that can alleviate resource consumption on hypervisor-based firewall services processing large data flows by moving firewall services to dedicated switch hardware.
CX 10000 switches are deployed as leaf switches in an EVPN-VXLAN solution and as access switches in a Layer 2 two-tier solution.
Central Firewall Policy Enforcement
Another commonly deployed policy enforcement approach is placing a firewall appliance in a central location that provides IP gateway services to data center hosts.
In an EVPN-VXLAN solution, central firewalls are connected to a services leaf and are Layer 2 adjacent to fabric hosts using the overlay network. The default gateway for hosts requiring policy enforcement is moved from the ToR to the centralized firewall. Similar to a border leaf, the services leaf is not required to be dedicated to this function. One advantage of this approach is the ease with which a Layer 2 overlay network can be used to transport host traffic to the firewall.
In a Layer 2 two-tier network, central firewalls are connected to the core switches, and the default gateway for hosts is moved from the core switches to the central firewalls.
There are several disadvantages of using a centralized firewall for both EVPN-VXLAN and Layer 2 two-tier topologies. A centralized firewall requires multiple switch hops from the data center host to enforce policy, which mitigates the efficiency of data path delivery in an EVPN-VXLAN model. Policy enforcement between two hosts attached to the same switch must be forwarded to the central firewall. Hairpinning a large volume of east-west traffic through a central point can create a bottleneck and reduces effective data center bandwidth. Microsegmentation can be supported centrally, but is not recommended as it requires all policy-driven data center traffic to traverse a single point in the network, significantly increasing the risk of a bottleneck.
The diagram below illustrates the inefficient traffic hairpin, when using a services leaf firewall.
Hypervisor-based Firewall Enforcement
Some vendors offer virtualized firewall services within a hypervisor environment. This approach can provide granular, service-level policy enforcement while allowing for the use of active gateways. VMware NSX is an example of a product that can integrate in this way. VXLAN overlays can be implemented in both hardware and software to achieve optimal network virtualization and distributed firewall services while securing east–west traffic inside the data center.
Virtualized firewalls can consume a large volume of CPU resources, reducing CPU resources available for compute processing in VM infrastructure. The CX 10000 alleviates this pressure by moving firewall inspection to dedicated hardware on switch infrastructure that can support microsegmentation between VMs.
Applying Aruba CX 10000 Policy
Aruba CX 10000 with AMD Pensando delivers a powerful policy enforcement engine. This section provides background information and details on how to implement CX 10000 firewall policy.
The Pensando Policy and Services Manager (PSM) application defines policy and associated elements that are pushed to the AMD Pensando DPU. Aruba Fabric Composer (AFC) can be used to orchestrate policy via PSM’s API.
PSM Policy Foundations
PSM policy can be assigned to two different object types: Network and VRF. PSM associates a Network object with a VLAN configured on a CX 10000 switch. Defining a PSM Network informs the switch to redirect traffic for the associated VLAN to the on-board, AMD Pensando DPU-based firewall for Layer 4 policy enforcement. Policy assigned to a Network is applied only to traffic in the VLAN associated with the Network object. PSM associates a VRF object with a VRF configured on a CX 10000 switch.
Policy assigned to a VRF object applies to all VLANs in the associated VRF that also are associated with a Network object. VRF policy enforcement does not require that a policy is assigned to a Network object, but it does require that a Network object exists for each VLAN in the associated VRF to redirect traffic to the Pensando DPU. VLANs in a VRF with assigned PSM policy that do not have a corresponding Network object defined do not forward traffic to the Pensando DPU.
Caution: Define a Network object for each VLAN within a VRF, if any VLAN or the VRF requires policy enforcement. Network communication failures may result, if only a subset of VLANs in a VRF have corresponding Network objects defined.
When traffic is redirected to the Pensando DPU for firewall enforcement, both VRF and Network policies are enforced. PSM evaluates both policy types using a logical AND function. The policies are not concatenated or evaluated sequentially. If both policies permit the traffic, the traffic is forwarded. If either policy denies the traffic, it is dropped. If a policy is not assigned at one level, policy is enforced by the other level’s policy. If no policy is assigned at either level, traffic is permitted. The table below summarizes when traffic is permitted or denied.
Network Policy | VRF Policy | Result |
---|---|---|
Permit | Permit | Permit |
Deny | Permit | Deny |
Permit | Deny | Deny |
Deny | Deny | Deny |
Permit | No Policy | Permit |
Deny | No Policy | Deny |
No Policy | Permit | Permit |
No Policy | Deny | Deny |
No Policy | No Policy | Permit |
PSM firewall policy is a set of rules that specify source and destination addresses, and the type of traffic allowed between them using IP protocol and port numbers. PSM policy is assigned to a Network or VRF in either an ingress or egress direction, from the perspective of the connected host. Traffic destined to a directly attached host is considered ingress, and traffic sourced from a directly attached host is considered egress. This ingress/egress relationship is the reverse of applying switch ACLs, from the perspective of the switch’s network interface.
PSM Ingress Policy
Ingress policy applies to traffic destined to a host in a VLAN with an associated PSM Network, with the exception of traffic between two hosts in the same VLAN attached to the same CX 10000. Ingress policy is generally applied to filter inbound traffic destined to an application server.
Ingress policy applies to traffic routed between hosts attached to the same CX 10000, when running AOS-CX 10.10.1000 and above. Previous versions of AOS-CX are constrained to applying egress policy between hosts attached to the same switch for both routed and bridged traffic.
When using an EVPN fabric overlay, ingress policy applies to VXLAN-forwarded traffic sourced by hosts in the same VLAN connected to other fabric switches. After hitting the VTEP interface on the destination switch, the traffic is forwarded to the Pensando DPU for evaluation. A microsegmentation strategy using private VLANs and proxy ARP can be used to apply ingress policy between hosts in the same isolated VLAN, described below under Microsegmentation.
Ingress policy is not applied to traditional Layer 2 bridged traffic between hosts on the same switch over a standard VLAN, and thus cannot be applied in a Layer 2 two-tier topology using CX 10000 access switches.
PSM Egress Policy
Egress policy defines allowed traffic initiated by hosts directly attached to a DSS switch. Defining egress policy can protect the data center from lateral movement by a compromised host and prevent data exfiltration. For example, backend database servers may only require initiating communication with peer database servers for synchronization, DNS, NTP, authentication services, and with a local update server. This communication can be scoped by using Layer 4 filters to allow only the desired traffic type between hosts.
Egress policy is applied to all traffic sourced by hosts in an inspected VLAN, irrespective of where the destination hosts resides. Unlike ingress policy, egress policy filters Layer 2 bridged traffic between hosts in the same VLAN on the same switch. Egress policy is applied on both EVPN-VXLAN leaf switches and Layer 2 two-tier access switches to enforce inter-VLAN and microsegmentation policy.
CX 10000 Mixed Environment Considerations
Applying a consistent PSM-based policy across a data center fabric is achieved more easily when all leaf or access switches are CX 10000s. This supports uniform policy enforcement without the need to manage exceptions.
Ubiquitous host mobility within a fabric requires that all leaf and access switches support the same capabilities. DSS stateful firewall security policies are not available on non-DSS switches, so VM mobility must be constrained in a network with a mix of DSS and non-DSS switches. For example, when using dynamic tools such as VMware’s Distributed Resource Scheduler (DRS), ensure that virtual switch and port group resources are defined to prevent automated movement of a VM guest requiring firewall services to a VM hypervisor host that is not connected to a DSS switch.
A mixed environment of DSS- and non-DSS capable switches is fully supported. For example, a VSX-pair of CX 10000s can be used to protect applications in a services leaf.
Workload Microsegmentation
Microsegmentation enables the enforcement of PSM firewall policy between VMs hosted by the same hypervisor in the same VLAN. Although egress policy inspects Layer 2 bridged traffic between hosts, traffic between VMs installed on the same hypervisor will not forward traffic to the switch in a standard configuration. To perform microsegmentation, a private VLAN (PVLAN) strategy is used to force traffic from a VM to traverse the upstream DSS switch connected to the hypervisor. Enabling proxy ARP on the SVI of the primary PVLAN allows communication between hosts in the same private VLAN via the DSS switch. This strategy enables the application of both ingress and egress firewall policy between hosts in the same VLAN on the same hypervisor.
When using a Layer 2 two-tier topology, only egress policy can be applied to microsegmented traffic, because the VLAN SVI is configured at the core layer, while the CX 10000 policy engine is positioned at the access layer.
When using an EVPN-VXLAN topology, in addition to egress policy, ingress policy can be applied to microsegmented traffic, because the VLAN gateway IP is configured on every leaf switch using the active gateway feature.
PSM Policy Considerations
Rules in a policy are applied in the order they appear on the list. An implicit “deny all” rule is applied at the end of a rule set. Rules used more often should appear higher on the list.
Defining a PSM Network redirects all traffic for the associated VLAN to the Pensando DPU firewall as described above. Network requirements of all VLAN members must be considered when building policy rule sets.
When an ingress policy is configured, policy applies to all hosts in the destination VLAN for any routed or VXLAN-forwarded Layer 2 traffic, which includes traffic sourced from outside the data center.
When an egress policy is configured, policy applies to all traffic sourced by hosts in the VLAN, so all destinations within and outside the data center must be considered, including all hosts on the same VLAN. When defining an egress policy, rules allowing underlying services such as DNS, logging, and authentication are required.
When applying firewall policy between VM guests, a data center design must use PVLAN-based microsegmentation or assign VMs that require policy between them to different VLANs. Microsegmentation forces traffic between VMs on the same hypervisor to traverse the DSS switch, allowing policy to be applied. VMs in different VLANs also send traffic to the DSS switch for inspection.
It is recommended to apply only one PSM policy level (Network or VRF) for a particular enforcement direction. The primary advantage of defining policy at the VRF level is having a single policy represent the complete rule set for the entire data center in a single enforcement direction. Defining policy at the Network level reduces the size of an individual policy and ensures that policy applied to one Network does not impact another Network.
Mixing policy levels for a single direction can add complexity and duplication of rules in both sets of policy, although mixing policy levels is fully supported. When defining policies at both levels, one policy level requires a rule to permit any traffic at the end of the policy, which is recommended in the Network level policy. The policy rules defined above the “permit any” rule should use a “deny” action. This allows the VRF level policy to define what is permitted globally within the fabric with more granular deny restrictions applied at the Network level.
It is best practice to define a complete set of rules before applying a policy to a network. If the complete rule set is unknown, an “allow all” rule can be applied to collect log data on observed traffic. A complete rule set can be built by inserting rules to allow more specific traffic above the “allow all” rule. When no wanted traffic is hitting the “allow all” rule at the bottom of the rule set, remove it.