Roaming and the Key Management Service

Deep dive into how roaming is accomplished in AOS 10 and the Key Management Service (KMS) that helps to enable the process.

The Key Management Service (KMS) is a novel addition to HPE Aruba Networking Wireless Operating System Software 10, designed with the specific purpose of facilitating seamless wireless user roaming and enhancing network performance. Its primary function is to distribute critical information, including the Pairwise Master Key (PMK) or 802.11r R1 key, among neighboring APs. This exchange enables fast roaming, ensuring a smooth and uninterrupted user experience in the wireless network.

In addition to key sharing, KMS serves as a conduit for disseminating crucial user-related data. This includes details such as VLAN assignments, user role information, and, when machine authentication is in use, the authentication state of the user’s device. These data elements collectively form a station record for each user, which plays a pivotal role in the roaming process.

The core responsibility of KMS is to efficiently communicate these station records to neighboring APs, thereby enabling them to provide uninterrupted service as users move between APs. The list of neighboring APs is sourced from the AirMatch service, which plays a complementary role in optimizing wireless network performance.

Both KMS and AirMatch services operate within the broader framework of HPE Aruba Networking Central and work collaboratively to facilitate the key-sharing process.

Workflows

Initial state

In this workflow, we delve into the key stages of how KMS manages and disseminates vital data, such as Pairwise Master Keys (PMKs), 802.11r R1 keys, VLAN assignments, user roles, and authentication states, to create a seamless and secure wireless user experience.

Key Management Service workflow

  1. A wireless user initiates association with an Access Point (AP1) and undergoes 802.1X authentication, resulting in the acquisition of either the Pairwise Master Key (PMK) or the derivation of the R0 key from the master session key, depending on whether or not the 802.11r protocol is enabled.

  2. Subsequently, AP1 transmits the user’s station record to KMS located within HPE Aruba Networking Central. This comprehensive station record contains user-specific details, including the PMK or R0 key, VLAN ID, user role, and machine authentication state (if machine authentication is enabled).

  3. Upon receipt of the user’s station record, KMS stores this information in its cache and simultaneously retrieves the list of neighboring APs associated with AP1 through the AirMatch service.

  4. Leveraging the list of neighboring APs for AP1, KMS accesses the cached user station record, including the PMK or R0 key. If the network employs the 802.11r fast roaming protocol, KMS proceeds to generate R1 keys for each of the neighboring APs. However, if the Opportunistic Key Caching (OKC) roaming protocol is utilized, the R1 key generation step is omitted.

  5. To ensure seamless roaming for the user, KMS disseminates the user’s station record to all neighboring APs connected to AP1. Consequently, when the user later transitions to AP2 or AP3, a full authentication process is not required. AP2 or AP3 already possess the user’s PMK or R1 key, allowing for streamlined four-way key exchange between the user and the respective AP, simplifying and expediting the roaming process.

Bridged user roaming

AOS 10 introduces two distinct user types: bridged users and tunneled users. Bridged users encompass all individuals connected to a bridge-mode SSID. In this configuration, user traffic remains localized within the AP’s network and is not routed through a gateway. For bridged users, the associated VLANs are established on the uplink switches of APs and are permitted by the uplink ports of these APs.

Illustrated below is an example of a bridged user engaging in fast roaming by leveraging the capabilities of KMS.

Bridged user roaming workflow with KMS

  1. Following the initial association with AP1 and the completion of the first-time full authentication, the wireless user eventually transitions to neighboring AP2 during the course of their wireless session.

  2. AP2 promptly updates KMS with the user’s new location, ensuring seamless handoff within the network.

  3. KMS, driven by the user’s movement to AP2, retrieves the list of neighboring APs specific to AP2 from the AirMatch service.

  4. Building upon this list of neighboring APs for AP2, KMS references the cached user station record, which includes PMK or R0 key, and generates R1 keys for each neighboring AP. This process is contingent on the utilization of the 802.11r fast roaming protocol, while the R1 key generation step is omitted if OKC roaming protocol is in use.

  5. KMS commences the distribution of the user station record solely to those neighboring APs of AP2 that do not possess a cache of the user station record. This process avoids redundancy by excluding neighbors common to both AP1 and AP2.

  6. AP2 initiates the synchronization of user sessions by transmitting a broadcast user session sync request message across the user VLAN. This synchronization action pertains to the top 120 user datapath sessions.

  7. The user, now associated with AP2, engages in a four-way key exchange with AP2 as part of the seamless roaming process.

  8. AP2 effectively communicates with AP1, instructing it to clear all entries related to the user, such as datapath entries. Subsequently, the user resumes data transmission through the new access point, AP2, ensuring a smooth and uninterrupted wireless experience.

Tunneled user roaming

In the realm of AOS 10, the implementation of a gateway cluster is highly recommended when network scalability becomes a primary concern. As networks grow to encompass a substantial number of APs, typically exceeding 500, or serve a significant client base that surpasses 5000 users, the introduction of a gateway cluster becomes essential. This architectural choice offers a multitude of advantages, including supporting large scale of APs and clients, centralized management of user VLANs, the establishment of unified firewall policies spanning both wireless and wired users, RADIUS proxy capabilities, and more.

With the presence of gateways, wireless users adopt a tunneled user configuration, where all of their network traffic is efficiently tunneled through the gateway cluster. This configuration eliminates the need for individual APs to manage user VLANs, centralizing this function at the gateway level. One notable advantage is that APs no longer need to belong to the same layer 2 domain for smooth client roaming. Consequently, when a tunneled user roams between different APs, their user session synchronization relies on seamless communication with their designated User Designated Gateway (UDG).

Illustrated below is a tunneled user executing fast roaming facilitated by KMS. This approach ensures network scalability while maintaining seamless and uninterrupted user experiences.

Tunneled user roaming workflow with KMS

  1. Following a wireless user’s initial association with AP1 and the completion of full authentication, the user may eventually roam to a neighboring AP2.

  2. AP2 promptly updates KMS with the user’s new location.

  3. KMS, in turn, retrieves the list of neighbor APs associated with AP2 from the AirMatch service.

  4. Leveraging this list, KMS fetches the user station record, encompassing the PMK or R0 key from its cache, and proceeds to generate the R1 keys for each neighboring AP present in the list if 802.11r fast roaming protocol is used for roaming.

  5. KMS initiates the distribution of the user record to the neighboring APs of AP2 that lack the cached user station record. KMS refrains from repeating the station record distribution process for any APs that happen to be neighbors to both AP1 and AP2.

  6. AP2 broadcasts a user session synchronization request message over the user VLAN.

  7. The User Designated Gateway (UDG) forwards this session synchronization message to the user’s original AP, AP1.

  8. AP2 proceeds to synchronize the top 120 user datapath sessions with AP1.

  9. A start accounting notice is dispatched by AP2 to the UDG.

  10. When the UDG gets the start accounting packet, it changes the bridge or user entry to send traffic to the AP2 tunnel. If the user is the first one from that VLAN on AP2, the multicast group gets updated with the client’s VLAN information.

  11. The user embarks on a four-way key exchange with AP2.

  12. AP2 then notifies AP1 to perform cleanup, which includes purging all entries related to the user, such as datapath entries. Following this, the user begins forwarding traffic through AP2.

Non-fast-roaming users

In older versions of AOS 10, user cache synchronization, which included user key information, was exclusively reserved for fast-roaming users like 802.11r users, OKC users, or MPSK users. However, a pressing need arose for cache synchronization among non-fast-roaming users, such as Captive Portal users and MAC authentication users. This need stems from the desire to prevent reauthentication when these users transition from one access point to another. To address this requirement, cache synchronization between neighboring APs was introduced and has been supported from AOS 10.4 onwards.

Cache classification

To optimize cache distribution, cache entries are classified into three distinct types:

  1. Partial Roam Cache: This cache structure exclusively contains essential information necessary during roaming. For non-fast-roaming users, the partial roam cache is synchronized with neighboring APs.

  2. Full Roam Cache: In addition to the data found in the partial roam cache, the full roam cache includes supplementary station-related state information that may not be immediately required during roaming. The full roam cache entry is consistently available in KMS and on the AP to which the client is currently associated.

  3. Key Cache: This specific cache structure is exclusively employed by fast-roaming users. It houses station keys essential for fast roaming, including PMK (Pairwise Master Key), PMKR0, PMKR1 (per-BSSID), and MPSK, alongside comprehensive full roam cache information.

Workflows

Initial state

The diagram below provides an overview of the process for creating and synchronizing cache entries among neighboring AP for non-fast-roaming users.

Cache entry creation and synchronization for non-fast-roaming users

  1. The user establishes a connection with the AP and successfully completes the authentication process.

  2. In this step, the AP generates a full roam cache entry. Within this full cache entry, the partial roam cache information includes user-specific details such as user role, user VLAN, username, ESSID, and sequence number. In addition to the partial roam cache, the full cache incorporates various user state attributes like Class ID, multi-session ID, idle/session timeout, and more.

  3. The AP transmits the full roam cache information of the user to KMS.

  4. KMS retrieves the list of neighboring APs associated with this particular AP.

  5. KMS proceeds to distribute the partial cache information of the user to all the neighboring APs linked to the same AP. This ensures that neighboring APs possess the essential cache data for seamless user roaming and authentication.

Roaming

The roaming workflow for non-fast-roaming users closely resembles that of fast-roaming users, with a notable distinction: the complete roam cache is exclusively retained by the AP and KMS, while only a partial roam cache is distributed to neighboring APs.

Illustrated below are the primary steps in the roaming process for non-fast-roaming clients.

Non-fast-roaming user roaming workflow with KMS

  1. The user initiates a roam from AP1 to AP2.

  2. AP2 transmits a roaming notification to KMS.

  3. KMS retrieves the list of neighboring APs for AP2 from the AirMatch service.

  4. KMS dispatches the partial roam cache for this user to the neighboring APs of AP2, excluding those that overlap with AP1. For instance, in this scenario, AP3 is a common neighbor of both AP1 and AP2. Since AP3 already received the partial roam cache when the user initially connected to AP1, KMS only sends the partial roam cache to AP4 at this stage.

  5. AP2 sends a broadcast session synchronization request within the user’s VLAN to AP1 in an underlay scenario, to AP1 via AP2’s UDG in an overlay scenario, or within the default VLAN of the SSID if the cache is unavailable on AP2.

  6. AP1 responds to the session synchronization request by sharing the top 120 user sessions.

  7. AP2 forwards a user move request to AP1.

  8. AP1 acknowledges the move request.

  9. KMS dispatches the user’s complete roam cache to the AP2 to which the user has roamed.

  10. AP2 initiates an accounting start message to AP1 in an underlay case or to the AP2’s UDG in an overlay case.

  11. AP1 undertakes user entry cleanup, deletes the user’s full roam cache, and installs the partial roam cache. In an overlay scenario, the AP2’s UDG updates the bridge or user entry to direct traffic toward the AP2 tunnel. If the user marks the first instance of that VLAN on AP2, the multicast group is updated with the client’s VLAN information.

Configuration

To configure fast roaming in AOS 10, follow these steps:

  1. Navigate to the WLANs section and select the specific SSID you want to configure.

  2. Access the Security tab on the AP configuration page.

Fast roaming configuration

By default, 802.11r fast roaming is enabled, while OKC is disabled.

For optimal 802.11r configuration, it is highly recommended to set up the Mobility Domain Identification (MDID). MDID represents a cluster of APs that create a continuous radio frequency space, allowing 802.11r R1 keys for devices to be shared and enabling fast roaming.

Additionally, it is recommended to enable 802.11k. This standard facilitates swift AP discovery for devices searching for available roaming targets by creating an optimized channel list. As the signal strength from the current AP weakens, the device scans for target APs based on this list.

When 802.11k is enabled, 802.11v is automatically activated in the background. 802.11v facilitates BSS (Basic Service Set) transition messages between APs and wireless devices. These messages exchange information to help guide the device to a better AP during the 802.11r fast roaming process.

Verification

Command Line Interface

AP CLI command for checking the PMK or R1 key caching of wireless users of the AP:

show ap pmkcache

APIs

  • Retrieving the neighbor APs list for an AP:

    URL: https://<central-url>/airmatch/ap_nbr_graph/v1/Ap/NeighborList/<AP Serial Number>

  • Retrieving the client record:

    https://<app-url>/keymgmt/v1/keycache/{client_mac}

  • Retrieving the encryption key hash:

    https://<app-url>/keymgmt/v1/keyhash

  • Retrieving the client key synced AP list:

    https://<app-url>/keymgmt/v1/syncedaplist/{client_mac}

  • Retrieving the stats per AP:

    https://<app-url>/keymgmt/v1/Stats/ap/{AP_Serial}

  • Checking on the health of KMS:

    https://<app-url>/keymgmt/health

Survivability

Client roaming

In scenarios where connectivity to HPE Aruba Networking Central is lost during a roaming event, the station records and roam cache information of existing users have typically been synchronized among neighboring APs. Consequently, the fast roaming experience for these users remains unaffected.

It is, however, possible that during a network outage, the station records or cache information for new users cannot be synchronized among neighboring APs. In this scenario:

  • For new users who roam during this period, their user devices will undergo full authentication during the roaming event.

  • Despite the full authentication process, these users will continue to enjoy uninterrupted service.

In summary, while connectivity issues with HPE Aruba Networking Central may necessitate full authentication for new users, it does not disrupt their ongoing communications on the network.

Cloud Fallback

In light of the earlier sections detailing user roaming workflows, it is important to highlight that there are two specific steps in which the new AP might not receive a response from the previous AP due to a timeout in the network:

  • Datapath session synchronization: In this phase, the new AP attempts to synchronize datapath sessions with the previous AP.

  • User state cleanup in the previous AP: During this step, the new AP requests the previous AP to clean up user-related information.

To address potential timeouts in these situations, KMS employs the Cloud Fallback mechanism. When a session synchronization or user state cleanup request times out, the new AP communicates with KMS to report the lack of response from the previous AP. KMS then searches the client-AP association table. If a client entry is found, KMS facilitates the communication between both APs, enabling them to coordinate the above-mentioned steps effectively.


Last modified: March 29, 2024 (b258226)