Classic STP Convergence: Handling Direct and Indirect Link Failures

Original: Understanding STP and RSTP Convergence (by Petr Lapukhov, CCIE #16379)

Introduction

  • Switches store the most recent BPDUs with every port that receives them, even blocked ports. Only the best information is relayed downstream. 
  • There are two important stability properties incorporated in the spanning tree algorithm:
    • Topology synchronization timeout. If a port is unblocked, it must go through the Listening and Learning states. This process takes exactly 2 x Forward_Delay seconds (2 x 15 seconds = 30 seconds by default). The reason for this delay is ensuring that new information is disseminated among the other switches and MAC addresses are re-learned.
    • Aging out old information. Every configuration BPDU contains two fields: Max_Age and Message_Age. The Message_Age field is incremented every time a BPDU traverses a switch. When a switch stores the BPDU with the respective port, it will count the time in seconds, starting from Message_Age up to the Max_Age. If during this interval, no further BPDUs are received, the current BPDU information is expired and the port is declared designated. This procedure ensures that the old root information is eventually aged out of the topology. 
  • Receiving an inferior BPDU is handled differently than receiving a superior BPDU. 
  • A BPDU is considered inferior, if it carries information about the root bridge that is worse than the BPDU currently stored on the port. 
  • Inferior BPDUs may appear when a neighboring switch loses its root port and has no alternate path, so it claims itself the new root for the topology. 
  • Switches ignore inferior BPDUs until old BPDU information expires in Max_Age – Message_Age seconds. 
  • Ignoring inferior BPDUs allows for guaranteed recovery in situations when a switch receiving inferior BPDUs still has an active path to the real root bridge. 
  • In cases when the root bridge goes down, however, this process makes convergence slower by adding extra overhead for waiting for the  Max_Age time to expire.


Handling Direct Link Failures

  • Failure could be detected in two ways: by sensing signal loss at physical level, or by missing BPDU information for Max_Age - Message_Age seconds. 
  • Depending on the port state, the spanning tree algorithm will handle the failure differently:
    • If the port was blocking, nothing happens except the associated BPDU information is expired.
    • If the port was designated, the local switch does nothing. However, the downstream switch may have lost a root port and start reconverging.
    • If the port was a root port, the BPDU information associated with the port is invalidated and the switch attempts to elect a new root port based on stored information. If such port can be found, it is unblocked and transitioned through the Listening and Learning states.
    • If a root port cannot be found, the switch declares itself as the root and starts announcing it in BPDUs. Downstream switches will ignore this information until the old information expires.
  • At best, the convergence process would take 2 x Forward_Delay (30 seconds) if the link failure is detected at the physical layer.
  • If BPDU aging is used, the convergence process takes (Max_Age - Message_Age) + 2 x Forward_Delay (up to 50 seconds).


Handling Indirect Link Failures

  • Indirect link failures happen on an upstream switch.
  • There are two types of indirect link failures: the upstream switch elects a new root port or the upstream switch loses all paths to the root.
  • If an upstream switch loses its root port but has an alternate path, a new root port is elected, and BPDUs continue to flow, possibly with a different root path cost. 
  • The downstream switch receives these BPDUs on either its root port or blocked port. 
  • Based on the new information, the switch may elect to unblock the blocked port and change the root port. 
  • If a new root port is elected, it takes 2 x Forward_Delay to make it forwarding by transitioning it through the Listening and Learning states.
  • If a new root port is not elected, no reconvergence is required.
  • The total time to respond to the indirect link failure could be as low as 2 x Forward_Delay if the upstream switch detects the root port failure in a fast manner (carrier loss) or as much as Max_Age + 2 x Forward_Delay  if the switch needs to expire the original BPDU information and unblock alternate port(s).
  • If the switch loses all paths to the root, the original root bridge information is expired (immediately or in up to Max_Age seconds) and the switch chooses itself as a new root and starts sending (inferior) BPDUs declaring itself as the new root.
  • The "downstream" switch ignores this new (inferior) BPDU information for the duration of the Max_Age - Message_Age, retaining information about the original root.
  • Two possible outcomes:
    • If the "downstream" switch still hears the original root, it will transition the previously blocked port receiving inferior BPDUs through the Listening and Learning states and start relaying current root bridge information. The previously “upstream” switch turns into downstream and adapts to the new root port. The convergence takes at maximum Max_Age + 2 x Forward_Delay seconds.
    • If the "downstream" switch detects a loss of the original root by either losing all directly connected root and alternate ports or expiring the original BPDU information in Max_Age seconds, it may now accept inferior BPDU information. Based on its local priority, the switch either agrees to the new root information or starts announcing itself as the root, making the previously "upstream" switch to adapt. The total convergence time is once again Max_Age + 2 x Forward_Delay seconds. 


Comments