Features

Gateway clustering with AOS 10.

Cluster Features

Seamless Roaming

The advantage of introducing the concept of the UDG is that it significantly enhances the experience for client roaming within a cluster. Once a client associates to an AP, it hashes the client’s MAC address and assigns it a UDG using the bucket map published for the cluster. Each client’s traffic is always anchored to its UDG which remains the same regardless of which AP the clients roams to. As each AP maintains GRE tunnels to each cluster node, any AP the client roams to will automatically forward the traffic to the UDG upon association and authentication.

A visual representation of the roaming process within a cluster is displayed below. In this example, GW-B is the assigned UDG for the client:

Seamless client roaming

Stateful Failover

Stateful failover is a critical aspect of cluster operations that safeguards clients from any impacts associated with a Gateway failure event. When multiple Gateways are present in a cluster, each client’s state is fully synchronized between the UDG and the S-UDG meaning that information such as the station table, the user table, layer 2 user state, layer 3 user state, will all be shared between both Gateways.

In addition, high value sessions such as FTP and DPI-qualified sessions are also synced to the S-UDG. Synchronizing client state and high value session information enables the S-UDG to assume the role as the client’s new UDG if the client’s current UDG fails. This permits stateful failover with no client de-authentication when clients move from their UDG to their S-UDG.

Event Driven Load Balancing

Client and device distribution is greatly simplified in AOS 10. One major change is that load balancing is no longer periodically performed during run-time and is now event driven as Gateways are added or removed from the cluster. Client distribution between cluster nodes is performed using the published bucket map for the cluster while device distribution is performed by the cluster leader based on each Gateways device capacity.

The goal of load balancing during a node addition or removal is to avoid disruption to clients and devices. When a Gateway in a cluster is taken down for maintenance or fails, impacted UDG, DDG and S-DDG sessions seamlessly transition to their standby nodes with little or no impact to traffic:

  • The cluster leader recomputes a new bucket map which is published to all devices. The bucket map is not immediately republished to provide sufficient time to activate standby client entries. The new bucket map includes the new S-UDG assignments for the clients.

  • The cluster leader reassigns the S-DDG/S-SDG sessions which are immediately published.

If the cluster leader is taken down for maintenance or fails, a new cluster leader is elected, and a role change notification is sent to all devices. The new cluster leader is responsible for recomputing and distributing the new bucket map for the cluster and performing DDG/SDG reassignments.

When a Gateway is added to a cluster, the cluster leader recomputes UDG and S-UDG assignments to avoid disruption to clients. The new bucket map from the first pass is published after 120 seconds while the bucket map for the second pass is published after 165 seconds.

DDG assignments are also recomputed when Gateways are added to a cluster. If the cluster is operating with a single node, S-DDG assignments are made for all devices that don’t have an S-DDG assignment. The cluster leader also performs load-balancing and re-assigns DDG and S-DDG sessions based on each Gateways capacity.

Live Upgrades

In AOS 10 Gateways are configured, managed, and upgraded independently from APs. AOS 10 APs and Gateways can run different AOS 10 software versions and can both be independently upgraded with zero network downtime as maintenance windows allow.

The live upgrade feature for Gateways allows cluster nodes to be upgraded with minimal or no impact to clients. When a live upgrade is initiated, the new firmware version is downloaded to all the Gateways in the cluster to the specified partition. Once the new firmware version has been downloaded and validated, Gateways are upgraded then sequentially rebooted to ensure all tunneled sessions are synchronized as UDGs, DDGs and SDGs are rebooted.

When a live upgrade is initiated for a cluster, the upgrade status of each node is displayed. Each node will first download the specified firmware image from the cloud and will upgrade the target partition. Once upgraded, the nodes are sequentially rebooted to minimize the impact to clients and devices:

Example of Live Upgrade

Live upgrades can be performed on-demand or be scheduled. Scheduled upgrades can be scheduled for any time within 1 week of the current date and time. A time zone, date and start time in hours and minutes must be specified. Scheduled live upgrades can be cancelled any time prior to the scheduled event. Here’s an example of a live upgrade being scheduled for an individual cluster where new firmware will be downloaded and installed on the Gateways’ primary partitions. The time zone is set to UTC and date and time is specified.

Live Upgrade scheduling


Last modified: February 28, 2024 (614bf13)