Spanning Tree Protocol (STP) is a networking protocol with a particularly long history. Back in 1995 STP was designed specifically to prevent Layer 2 network loops or broadcast storms from disrupting networks. This protocol sits on Layer 2 of the OSI model and acts as a link management protocol.
STP made use of an algorithm to make routing decisions and cut down on network collisions. The brains behind STP was Radia Perlman of the Digital Equipment Corporation which later became standardized as IEEE 802.1D. In this article, we’re going to take you through what STP is and look at some of the most important concepts surrounding this critical protocol.
What is a Broadcast Storm?
Broadcast storms are a particularly pervasive problem within networks using interconnected switches. In Local Area Networks (LANs) you will often find that switches are interconnected for a number of reasons but one of these is for connection stability.
The rationale behind interconnecting switches is redundancy, so if a switch fails the others will help pick up the slack. The problem is that interconnecting switches can lead to broadcast storms unless carefully managed by ongoing maintenance.
Essentially a broadcast storm starts when messages broadcast across a network by a host triggers other hosts to respond with their own messages. These messages combine to create a circulation of broadcasts throughout the network which consume an extraordinary amount of network bandwidth. A broadcast storm monopolizes network resources and prevents normal traffic from flowing freely.
The Spanning Tree Protocol prevents broadcast storms by choosing some paths to block and leaving others in a forwarding state. In doing this you end up with a loop-free network where signals can’t collide and congest the network. This setup can also be adapted on the fly as well. For instance, if a path in the forwarding state goes down then one of the blocked paths gets activated to keep the connection up and running. In the next section, we’re going to look at how Spanning Tree Protocol chooses which paths to block.
How does Spanning Tree Protocol Stop Broadcasts Storms and Loops?
One of the main ways that STP stops broadcast storms and loops from occurring is by stopping redundant links between devices from allowing traffic. To do this, there needs to be a reference point for other devices throughout the network. Electing a root bridge helps to define this reference point.
Electing the Root Bridge
The Root Bridge is a bridge elected by the other bridges and has all of its ports open. This process begins with all bridges acting as the root bridge and sending BPDU packets throughout the network every two seconds. All the connected bridges then decide which bridge is to become the root based on the Bridge ID of the connected bridges. The bridge ID is made up of the bridge priority (a number assigned to a bridge or switch) and a MAC address.
The election works by choosing the switch with the lowest bridge ID. If all bridge ID values are the same then the switch with the lowest MAC address wins instead. When all bridges are sending out BPDU packets to each other, bridges with higher bridge IDs quit the election until only one is remaining.
Electing the Root Port
Once the root bridge has been elected it will then send BPDUs out of its forwarding ports. All the other bridges that aren’t classified as the root bridge will receive BPDUs via the port closest to the root. The root port is calculated by every bridge apart from the root bridge. The port with the lowest path cost value will be chosen as the root port. The spanning tree cost value is dependent on the bandwidth consumed to take a path. The faster the link can be made, the lower the cost value. We’ve listed these cost values below:
|Link Speed||Cost Value|
|10 Gbps ||2|
The distance between ports is indicated in terms of how much a path costs in terms of network resources. The simplest way to remember this is the port closest to the root bridge with has the lowest cost value. The port closest to the root bridge will then be designated as the root port (forwarding port) and all other ports that aren’t designated ports are blocked. It is important to note that a non-root bridge can only have one root port.
STP Basic Rules
Another way to look at how STP works is through rules. STP makes its way through these rules systematically. We’ve outlined the process in further details above. But here are the basic rules:
- Every port of the root bridge/switch needs to be in forwarding mode (the only exception is if there is a self-looped port).
- The root port must be set to forwarding mode.
- The port of the designated switch must be set to forwarding mode.
- The remaining ports of all switches must be set to blocking mode. Note this is only for ports that connect to other bridges or switches. Ports connected to PCs and other devices are set to ‘forwarded.’
Limitations of STP
Even though STP does a great job of nipping broadcast storms in the bud, there are many areas where STP is quite problematic. The main problem is if the protocol fails. In relying on exchanging BPDUs between bridges there is a risk that the protocol can fail (usually this is when the algorithm fails). In the event that this happens it is extremely difficult to troubleshoot because of the lack of visibility. No less inconvenient is the fact that algorithm failures often lead to bridging loops.
A bridging loop is where a port that should be blocked proceeds to forward traffic instead. While STP is reliable the majority of the time, the few instances where this occurs can be difficult to deal with, particularly when a bridging loop emerges seemingly out of nowhere.
It is important to note that there are many reasons why the STP protocol can fail. We’ve listed some of these below:
- Spanning Tree Convergence
- Duplex Mismatch
- Corrupted packets
- Unidirectional Link
- Resource Errors
Spanning Tree Convergence
Spanning Tree Convergence is where bridges and switches have transitioned into forwarding or blocking connections. When spanning tree convergence takes place BPDUs are lost which result in a blocked port becoming a forwarding port. This can lead to failure of the STP algorithm.
Duplex mismatch is a configuration issue where one side of the network is set to one duplex mode and the other to another duplex mode. Having one bridge on half duplex and the other on full duplex results in collisions that cause bridging loops. This problem can be difficult to diagnose and troubleshoot as it is dependent on how your network is configured.
Packet corruption is another incredibly common form of service disruption that occurs with STP. Packet corruption occurs when packets become corrupted in transit. This can occur as a result of everything from duplex mismatches to unsuitable cables, or cables that are too long. Packet corruption is a problem because it often puts blocking ports into a forwarding state which confuses the STP algorithm.
A unidirectional link is where a link is up at both ends of the connection but the sending device isn’t receiving packets from the receiving device (which should be responding with packets as well). When this occurs the connection becomes a one-way and uncollaborative setup.
Unidirectional links commonly occur in fiber links where there are no detection measures in place. Cisco designed the UDLD protocol for helping to detect unidirectional links before a forwarding loop takes route.
In many cases, STP won’t be to blame for failure but the resource that the protocol resides within. If a device’s CPU resources are overstretched then you’re likely to run into problems with transmitting BPDUs. When other devices don’t get the BPDUs they should be, STP’s storm prevention abilities go out the window.
How to Troubleshoot Spanning Tree Protocol
When you experience a failure with STP you’re in an awkward predicament because there is no consensus on how to troubleshoot STP faults. It is well-known that bridging loops are a particularly tricky occurence to deal with effectively. Finding the root cause or even the solution to a bridge loop often takes time that administrators simply don’t have.
In order to even start the process of troubleshooting for Spanning Tree Protocol you need to know the topology of the bridge network, root bridge location, and location of blocked ports. The reason is that you need to know how the network is supposed to look when it’s up and running. Without this reference point, you’ll be running your troubleshooting efforts blind.
If you have this information, then you can troubleshoot Spanning Tree Protocol with some of the following options:
- Diagnose a Bridging Loop
- Restore Connectivity
- Check Ports
- Look for Resource Errors
Diagnose a Bridging Loop
Bridging loops are more of an annoyance on high-speed networks, but for more old-fashioned networks they can be a nightmare. The most basic way to tell if you have a bridging loop is if all users on one bridge domain have connection problems at one time. Another way to tell if a bridging loop is taking place is to capture traffic from an inundated link and to look for the same or similar packets over and over.
Restore Connectivity Quickly
Most of the time you’re going to want to solve the problem as soon as possible regardless of what the problem is. One possible solution is to disable all the ports responsible for providing redundancy. You should also disable blocking points one at a time to see if connectivity is restored. Once you find the port that stops the loop you will have found the original failure site.
When it comes to troubleshooting network ports there are a number of things you can do:
- The first thing you’re going to want to do is to investigate the blocking ports. In particular, you want to see if blocked ports still receive BPDUs. On Cisco IOS you can check BPDUs by entering the show spanning-tree bridge-group # command. This will show you how many BPDUs were received for each interface. You want to enter this a few times to see if the device is still receiving BPDUs.On CatOS you can use the show spantree statistics module#/port# vlan# command. This will show you the number of BPDUs a port has received for an individual VLAN.
- Another common problem you’ll run into is mismatched duplexes. This is where two devices that are connected to each other are operating in different duplex modes. For instance, one device operates in full duplex whereas the other operates in half duplex.To look for a duplex mismatch on Cisco IOS, you need to enter the show interfaces [inter interface-number] status command to check the speed and duplex status of the port. On CatOS you need to use the show port module#/port# command to check the speed and duplex.
- You will also want to check port utilization to make sure that the interface hasn’t been overwhelmed by traffic. When this happens, BPDUs can fail to be transmitted. To troubleshoot this on Cisco IOS software you need to use the show interfaces command to view the utilization. In particular, you’re looking for load and packet input/output. On CatOS you would use the show mac module#/port# command to display packet statistics. The show top command analyzes port utilization for 30 seconds and shows you the result.
- Packet corruption is another common culprit for packets not making it to their destination in good condition. On Cisco IOS you can check for corruption by looking for errors with the show interfaces command. This will include an error counter so you can look for unusual activity. On CatOS, you can use the show port module#/port# to give you similar information. You can use the show counters module#port# command to view more details on statistics.
Look for Resource Errors
Errors on individual resources like high CPU use can cause failure of the STP algorithm. To check for high CPU you can run the show processes cpu command on Cisco IOS. This will allow you to look at how high the resource’s CPU utilization is. Similarly, on CatOS, you can use the show proc cpu command to view CPU utilization as well. If you have a switch that runs CatOS you can also use the show spanning-tree summary totals command to view the number of logical ports or interfaces per VLAN.
You want to look in the STP Active column for the total number and make sure that it doesn’t exceed the maximum number supported by the device’s Supervisor Engine type.
For Cisco IOS:
show interfaces show spanning-tree show bridge show processes cpu debug spanning-tree logging buffered
show port show mac show spantree show spantree statistics show spantree blockedports show spantree summary show top show proc cpu show system show counters set spantree root [secondary] set spantree uplinkfast set logging level set logging buffered
There are a number of terms that come up when discussing Spanning Tree Protocol. For convenience we’ve listed these here:
- Bridge Priority – A bridge priority is a number assigned to a bridge or switch. The default bridge priority on Cisco switches is 32768.
- Bridge ID – The Bridge ID is the identification of each switch in the network. It’s made up of the bridge priority and the MAC address of the switch.
- Root Bridge – The root bridge is a bridge that is elected as the root of the tree of bridges. The root bridge makes decisions on transmission paths from its own perspective. The bridge that has the lowest bridge ID is designated as the root bridge.
- Bridge Protocol Data Unit (BPDU) – The BPDU is the data exchanged between switches that is used to elect the root bridge. It is also used for configuring the network.. A Cisco switch can send BPDUs from intervals between 1 to 10 seconds.
- Root Port – The fasted path to a root bridge from a non-root bridge
- Designated Port – Bridges work together to see which one has the best path from the network segment of the root. The port that connects this bridge to the network segment becomes the designation port.
Spanning Tree Port States
Any given switch port running STP is going to be in one of five states:
- Blocking – Port won’t transmit or receive data but will still listen to BPDUs. All ports are in the blocking state once powered up.
- Listening – Port listens to BPDU for frames but doesn’t reach out to them. Frames are received but discarded without any action taken.
- Learning – Port listens to BPDU and learns switched network paths and MAC addresses. This information will be added to its CAM table. Doesn’t forward frames.
- Forwarding – The switch learns MAC addresses and forwards traffic.
- Disabled – The port will receive BPDUS but won’t forward them. It also doesn’t listen to BPDUs.
That concludes our guide to Spanning Tree Protocol. As you can see, even though Spanning Tree Protocol is adept at preventing bridge storms it is not without its problems. If you’re considering using Spanning Tree Protocol, we recommend that you develop a sound knowledge of your related topology. Having this knowledge will provide you with a strong frame of reference to conduct your troubleshooting.
When STP fails it can be a nightmare to find what caused the future but if you take the time to go through the process you’ll be able to take action and address the root cause. Ultimately, the challenges you run into are will depend on the type of hardware you’re using just as much as the usage requirements of your network. The more you understand about your environment the better you’ll be able to respond to a Spanning Tree Protocol failure.