ArubaOS 8.6.0.0 Help Center
You are here: Home > Cluster > Troubleshooting Cluster

Troubleshooting Cluster

This section provides commands that can be used to troubleshoot different scenarios in a cluster configuration.

The different control plane processes in the cluster are GSM manager (GSM), cluster manager (CM), Station Manager (STMStation Management. STM is a process that handles AP management and user association.), and AUTH. On the AP, the main modules are A-STMStation Management. STM is a process that handles AP management and user association. and ASAP (datapath).

The following is a list of some common troubleshooting scenarios in a cluster: 

Cluster Formation Unsuccessful

All managed devices in a cluster are collectively known as cluster members. The cluster formation is successful when all the managed devices in the cluster are connected to each other.

Some of the reasons because of which a cluster formation is unsuccessful are as follows:

  1. If the cluster group membership is not executed.
  2. If all the managed devices are not listed in cluster.
  3. If there is a connectivity issue and managed devices are not able to reach their peer.
  4. If IPsecInternet Protocol security. IPsec is a protocol suite for secure IP communications that authenticates and encrypts each IP packet in a communication session. SASecurity Association. SA is the establishment of shared security attributes between two network entities to support secure communication. is not formed.

To check the status of the cluster formation, execute the show lc-cluster group membership command.

(host) [mynode] #show lc-cluster group-membership

Mon Dec 21 17:30:51.952 2015

Cluster Enabled, Profile Name = "6NodeCluster"

Redundancy Mode On

Active Client Rebalance Threshold = 50%

Standby Client Rebalance Threshold = 75%

Unbalance Threshold = 5%

Cluster Info Table

------------------

Type IPv4 Address Priority Connection-Type STATUS

---- --------------- -------- --------------- ------

self 10.15.116.3 128 N/A ISOLATED (Leader)

peer 10.15.116.4 128 L3-Connected CONNECTED-FROM-SELF-DISCONNECTED-FROM-PEERS

peer 10.15.116.5 128 L3-Connected CONNECTED-FROM-SELF-DISCONNECTED-FROM-PEERS

peer 10.15.116.8 128 L3-Connected CONNECTED-FROM-SELF-DISCONNECTED-FROM-PEERS

peer 10.15.116.9 128 N/A SECURE-TUNNEL-NEGOTIATING

peer 10.15.116.10 128 N/A SECURE-TUNNEL-NEGOTIATING

 

DISCONNECTED

INCOMPATIBLE

DISCONNECTED-FROM-SELF-CONNECTED-FROM-PEERS",

CONNECTED-FROM-SELF-DISCONNECTED-FROM-PEERS",

SECURE-TUNNEL-NEGOTIATING

SECURE-TUNNEL-ESTABLISHED

CONNECTED

Table 1: Cluster state

State

Reason

INCOMPATIBLE

This error can occur in the following scenario:

If two managed devices are running different ArubaOS versions, then a build string mismatch is found and the managed devices are not part of the cluster.

DISCONNECTED

This error can occur in the following scenario:

  • If none of the managed devices in the cluster are in the CONNECTED state.
  • If there is an issue with the physical connectivity among the managed devices in the cluster.
  • If one of the ports is an untrusted node.

SECURE TUNNEL NEGOTIATION

This status is displayed for a very short period of time till the IPsecInternet Protocol security. IPsec is a protocol suite for secure IP communications that authenticates and encrypts each IP packet in a communication session. tunnel is set up. If the status persists, it indicates that there is an issue in the IPsecInternet Protocol security. IPsec is a protocol suite for secure IP communications that authenticates and encrypts each IP packet in a communication session. tunnel setup.

CONNECTED FROM SELF DISCONNECTED FROM PEER

This error can occur in the following scenario:

  • Managed device 1 and managed device 2 are connected. Managed Device 3 is later introduced in the cluster. Managed device 1 and managed device 3 are connected but managed device 2 and managed device 3 are not connected.

After the cluster moves to the CONNECTED state, check if it is L2-connected, where every VLANVirtual Local Area Network. In computer networking, a single Layer 2 network may be partitioned to create multiple distinct broadcast domains, which are mutually isolated so that packets can only pass between them through one or more routers; such a domain is referred to as a Virtual Local Area Network, Virtual LAN, or VLAN. on the peer is reachable as determined by VLANVirtual Local Area Network. In computer networking, a single Layer 2 network may be partitioned to create multiple distinct broadcast domains, which are mutually isolated so that packets can only pass between them through one or more routers; such a domain is referred to as a Virtual Local Area Network, Virtual LAN, or VLAN. probing. Use the following command to check the VLANVirtual Local Area Network. In computer networking, a single Layer 2 network may be partitioned to create multiple distinct broadcast domains, which are mutually isolated so that packets can only pass between them through one or more routers; such a domain is referred to as a Virtual Local Area Network, Virtual LAN, or VLAN. probing status:

(host) [mynode] #show lc-cluster vlan-probe status

Execute the VLANVirtual Local Area Network. In computer networking, a single Layer 2 network may be partitioned to create multiple distinct broadcast domains, which are mutually isolated so that packets can only pass between them through one or more routers; such a domain is referred to as a Virtual Local Area Network, Virtual LAN, or VLAN. probing algorithm on the managed device, if you have made some VLANVirtual Local Area Network. In computer networking, a single Layer 2 network may be partitioned to create multiple distinct broadcast domains, which are mutually isolated so that packets can only pass between them through one or more routers; such a domain is referred to as a Virtual Local Area Network, Virtual LAN, or VLAN. changes to the distribution switch:

(host) [mynode] (config) #lc-cluster start-vlan-probe

AP Rebootstrap

An AP rebootstraps when a S-AAC is not assigned to it. The following is a list of some reasons because of which an AP rebootstraps:

  1. Platform capacity — If the managed device has reached its maximum capacity or it already has the maximum APs it can support.

    To resolve this issue, perform the following steps:

    • Add another managed device or upgrade an existing managed device to support more number of APs.
    • Rework on the network configuration.
  2. Multiple managed devices are down – If an S-AAC goes down, the Standby Controller (S-UAC) is made the Active Controller (A-UAC). However, if the S-UAC also goes down, then the AP rebootstraps.

    To resolve this issue, ensure that you make an appropriate selection of the distribution switch to handle the required scale.

Users are Unable to Connect to a Cluster

The following is a list of some reasons why a user might be unable to connect to a cluster:

  1. The AP and the managed device have different roles for the user.

    Every user has an A-UAC and if the AP's information of the UAC for a user is different from the actual managed device's information and if the managed device does not have this information regarding the user, then it rejects the user.

  2. IPsecInternet Protocol security. IPsec is a protocol suite for secure IP communications that authenticates and encrypts each IP packet in a communication session. tunnel is not established.

    If CPsecControl Plane Security. CPsec is a secure form of communication between a controller and APs to protect the control plane communications. This is performed by means of using public-key self-signed certificates created by each master controller. is enabled on the APs, then the APs are expected to have the IPsecInternet Protocol security. IPsec is a protocol suite for secure IP communications that authenticates and encrypts each IP packet in a communication session. tunnel established with all the managed devices in the cluster. If the IPsecInternet Protocol security. IPsec is a protocol suite for secure IP communications that authenticates and encrypts each IP packet in a communication session. tunnel is not established, the user cannot connect to the cluster.

  1. There is incomplete AP configuration for an 802.1X802.1X is an IEEE standard for port-based network access control designed to enhance 802.11 WLAN security. 802.1X provides an authentication framework that allows a user to be authenticated by a central authority. client.

    For 802.1X802.1X is an IEEE standard for port-based network access control designed to enhance 802.11 WLAN security. 802.1X provides an authentication framework that allows a user to be authenticated by a central authority. clients to connect, multicast key (mkey) has to go from the AAC to the UAC. If the mkey is not available in the UAC, the status is not displayed and the user cannot connect. To check for incomplete AP configurations, execute the show auth-tracebuf command .

Users are Getting Deauthenticated

The following is a list of some reasons why a user might get deauthenticated:

  1. Cluster failover — If a user is deauthenticated in a cluster, check if there is a cluster failover at the same time. To check when a managed device in DOWN status was first disconnected, use the show lc-cluster heartbeat counters command.
    1. In case a failover occurs when the managed devices are down, check if the managed devices are L2-connected using the show lc-cluster vlan-probe status command.
    2. If the managed devices are L3-connected, fix the VLANVirtual Local Area Network. In computer networking, a single Layer 2 network may be partitioned to create multiple distinct broadcast domains, which are mutually isolated so that packets can only pass between them through one or more routers; such a domain is referred to as a Virtual Local Area Network, Virtual LAN, or VLAN. probe using the lc-cluster exclude-vlan <vlan-number> command.
  2. If the managed devices are L2-connected and if the issue persists, check for a solution in AP Rebootstrap.
  3. If the AP does not rebootstrap and if there is no fail over, contact Technical Support Team.

Enabling Debug

In a cluster setup, a lightweight tracing mechanism is added to collect debug information with minimal performance impact on the cluster.

In a 7200 Series managed device, the debug information gets collected in the flash1 partition of the managed device and can be used for future troubleshooting. In a 7000 Series and 7205 managed devices, there is no flash1 partition and a USBUniversal Serial Bus. USB is a connection standard that offers a common interface for communication between the external devices and a computer. USB is the most common port used in the client devices. device is needed to collect this debug information, which can be used for future debugging or reporting of an issue.

Execute the following trace commands to collect debug information for the cluster:

(host) #gsm trace channel ap application stm

(host) #gsm trace channel ap application dds

(host) #gsm trace channel ap application cluster_mgr

(host) #gsm trace channel radio application stm

(host) #gsm trace channel radio application dds

(host) #gsm trace channel sta application stm

(host) #gsm trace channel sta application auth

(host) #gsm trace channel sta application dds

(host) #gsm trace channel sta application cluster_mgr

(host) #gsm trace channel mac_user application auth

(host) #gsm trace channel mac_user application dds

(host) #gsm trace channel mac_user application cluster_mgr

(host) #gsm trace channel ip_user application auth

(host) #gsm trace channel ip_user application dds

(host) #gsm trace channel user application auth

(host) #gsm trace channel user application dds

(host) #gsm trace channel sectun application dds

(host) #gsm trace channel sectun application cluster_mgr

(host) #gsm trace channel key_cache application auth

(host) #gsm trace channel key_cache application dds

(host) #gsm trace channel pmk_cache application stm

(host) #gsm trace channel pmk_cache application auth

(host) #gsm trace channel pmk_cache application dds

(host) #gsm trace channel rep_key application dds

(host) #gsm trace channel rep_key application cluster_mgr

(host) #gsm trace channel cluster application dds

(host) #gsm trace channel cluster application cluster_mgr

(host) #gsm trace channel bucket_map application stm

(host) #gsm trace channel bucket_map application auth

(host) #gsm trace channel bucket_map application dds

(host) #gsm trace channel bucket_map application cluster_mgr

(host) #gsm trace channel cluster_bss application dds

(host) #gsm trace channel cluster_bss application cluster_mgr

(host) #gsm trace channel cluster_aac application dds

(host) #gsm trace channel cluster_aac application cluster_mgr

(host) #gsm trace channel cluster_ap application dds

(host) #gsm trace channel cluster_ap application cluster_mgr

(host) #gsm trace channel bss application stm

(host) #gsm trace channel bss application auth

(host) #gsm trace channel bss application cluster_mgr

(host) #dds trace receive channel sta peer $peerIP

(host) #dds trace transmit channel sta peer $peerIP

(host) #dds trace receive channel ip_user peer $peerIP

(host) #dds trace transmit channel ip_user peer $peerIP

(host) #dds trace receive channel mac_user peer $peerIP

(host) #dds trace transmit channel mac_user peer $peerIP

(host) #dds trace receive channel key_cache peer $peerIP

(host) #dds trace transmit channel key_cache peer $peerIP

(host) #dds trace receive channel pmk_cache peer $peerIP

(host) #dds trace transmit channel pmk_cache peer $peerIP

(host) #dds trace receive channel bucket_map peer $peerIP

(host) #dds trace transmit channel bucket_map peer $peerIP

(host) #dds trace receive channel cluster_bss peer $peerIP

(host) #dds trace transmit channel cluster_bss peer $peerIP

(host) #dds trace receive channel cluster_sta peer $peerIP

(host) #dds trace transmit channel cluster_sta peer $peerIP

(host) #dds trace receive channel cac_usage peer $peerIP

(host) #dds trace transmit channel cac_usage peer $peerIP

(host) #dds trace receive channel cluster_aac peer $peerIP

(host) #dds trace transmit channel cluster_aac peer $peerIP

(host) #dds trace receive channel cluster_ap peer $peerIP

(host) #dds trace transmit channel cluster_ap peer $peerIP

(host) #ap debug stm-trace category all loglevel debug

(host) #aaa auth-trace loglevel debug

(host) #scm intiate audit <peerip>

/*]]>*/