On Ethernet and Switchport interfaces, the “Discard” stat can be incremented for many different reasons; some indicating healthy network operation and others indicating a network issue. Understanding the discard stat is important to evaluate your network health in correlation with them. This document explains the discard stat thoroughly as well as offers reasons for why a discard is incremented and what action you should take if any when you see this.
One of the biggest misconceptions concerning the discard statistic is that a discard is an error. In fact, examining the other Ethernet based stats (including Switchports and VLANs) you will see there is a general set of errors called “input errors” and “output errors”. Whenever one of the other statistics like “CRC errors” or “overruns” increments, so does one of these general error counters. When a discard increments, the input and output errors stay the same. What does this mean? That the discard stat is its own unique counter and should not be treated as an error.
In any healthy network, traffic needs to be discarded at certain points. Consider configuring a switchport to trunk mode. For security reasons, the administrator only allows VLANs 1 and 2 on the link with the switchport trunk allowed VLAN 1,2 command. If a packet is received with a VLAN tag of 3, it will be dropped. In this case, a discard will be incremented indicating the interface is working as configured.
Additional reasons that a unit can increment a discard legitimately will be explained further in this document. Before continuing in this document, it is very important to remember that a healthy network will absolutely have discards and the presence of them does not necessarily indicate a network problem. The cause of discards should be investigated and troubleshot if they are being incremented at a high rate that can not be explained, or when they can be correlated to a specific network problem.
When a packet/frame is received on an interface and when it is put in a queue to exit an interface, there are several checks that are run to make sure it is something that should be transmitted. The following sections discuss when a discard may be incremented on the different types of interfaces for one of these reasons. Note that this list only includes the most common reasons that a packet may be discarded – it does include every unique situation where it may occur. However, discards for other than the following reasons should be very rare, and in this case, would not warrant the reason to troubleshoot discards anyway.
- Discards on Layer 2 Interfaces
Layer 2 interfaces are considered to be switchport interfaces (all types), interfaces that function as switchports but have the Ethernet moniker, and Ethernet Subinterfaces (802.1q mode). A discard can be incremented for any of the following reasons:
- Receiving frames tagged in a VLAN the unit does not have configured
- Commonly in networks with multiple VLANs, each switch will have all VLANs in its VLAN database and the inter-switch links will be designated as trunks so that they can carry all VLANs across the link. However, if one of the two switches does not have a VLAN configured that a frame may be tagged with, the link on that switch will increment discards for each such frame because it will not know what to do with the unknown VLAN tag.
- Frames received that are tagged in the Native VLAN
- Every trunk port in a network has a “native” VLAN which means this VLAN’s traffic will not be tagged across the link. If this link does receive a packet that is actually tagged in that VLAN instead of untagged, the packet will be dropped as it does not conform to port expectations (this is a security measure).
- For example, consider two switches connected by a trunk link. Switch A has the command switchport trunk native vlan 2 configured and switch B has the command switchport trunk native vlan 3. In this case, when switch A sends a frame tagged with VLAN 3, Switch B will drop it because it expects that VLAN to be untagged. The same will happen in the other direction when Switch B tags VLAN 2.
- Spanning Tree blocked ports (in stable topology or in a transitive state)
- The port that is in the “blocking” state would increment input discards for any traffic that is sent across the link. Because this port should not generally be a destination port for the mac address table, this would mostly be broadcast and multicast traffic.
- Unknown Layer 2 Protocols
- Most of these are broadcast or multicast at L2. In this case, we will forward the frames out and increment a discard counting that the frame was technically also destined for us, assuming they are using a broadcast Mac address like FF:FF:FF:FF:FF:FF
- Most unknown protocols will be incremented as “unknown protocol” and as an error – but in certain cases, these protocols automatically fail discard checks because of the unknown format and are therefore incremented as a discard instead.
- Port-authentication and Port-security violations
- If a violation occurs and a port is put into “restrict” or “protect” mode, the violating unit’s packets will be discarded.
- If a MAC address that is bound to a port is plugged into another port with or without port security, all its frames will be discarded.
- Frames exceeding storm control limits
- Frames exceeding the Destination Lookup Failure (DLF) limit
- DLF applies to units that continue to send to addresses that the unit can not locate.
- This can also happen if the MAC table does not have an entry for a unit but we do have a host entry in the route cache. Fix this by setting the MAC table timeout to 21 Minutes and setting all endpoints to edgeport.
- All zero MAC addresses for either the source or destination address.
- This is considered an invalid mac address.
- Gratuitous ARPs with all 0’s for the IP address
- This is generally due to an end user unit misconfiguration.
- This presents a debug message when using the debug arp command as well.
- The source and destination MAC Addresses are the same.
- This is called a “Land” attack and is actually more prevalent in the IP layer (layer 3) with IP addresses equaling each other.
- This was originally an attack developed as early networking equipment did not know what to do with a packet where the source and destination addresses are equal.
- The destination interface for the frame and the source interface for the frame is the same.
- This occurs when sending unit does not know where the unit is located, but the receiving unit does. Normally this will be seen when a switch broadcasts a frame out because it doesn’t have the destination address in its CAM table. A switch downstream may receive it, but have that MAC address as coming from the same port as where the broadcast entered. In this case, the switch will drop the frame instead of sending it back to avoid congestion
- Result of a Hardware ACL dropping traffic.
- This happens if a particular MAC address is denied in a hardware ACL. If the hardware ACL is using Layer 3 addresses to block and allow traffic, drops are not incremented as layer 2 discards.
- The hardware queue on the interface is full (overbooking).
- Though an interface may be able to transmit at 100Mbps, traffic does not always follow a strict pattern when being sent. Bursts of traffic can come in causing an interface to become overbooked for a short period of time. Instead of just dropping all packets that do not conform to the interface rate, the interface has a hardware output buffer to keep the extra traffic in until the momentary congestion is gone.
- A hardware output buffer has a non-configurable depth. When this hardware buffer reaches its limit, it will trigger a hardware interrupt. This will shut down the interface queuing for a short amount of time to let it catch up before it begins queuing frames to be transmitted again. Any frames queued to be sent during this time will increment output discards. Nothing in the queue before the interrupt will be dropped. There will not be an exact relation to # of discards and frames lost because the interface stops processing frames during this period of time.
- This can happen if the other side is using flow control and wants us to slow down, but we have too many frames in the output queue.
- Another example would be 11+ 100M ports receiving traffic at line rate and the output port is a Gig port. It simply doesn’t have the capacity to keep up with the output.
- Different switch’s interfaces have different hardware buffer lengths. Same thing between 10/100 ports and 10/100/1000 ports. Generally the more powerful the switch and the faster port, the bigger the hardware output queue.
- This has nothing to do with the software queues and the CPU. If one port is overbooked and starts discarding, other interfaces will not necessarily discard.
- The addendum to this is that you will see discards on other ports if they receive a frame destined out the discarding interface during that time period. For example, if swx 0/1 receives a burst of traffic and starts discarding, and swx 0/3 receives a frame destined to leave swx 0/1, an input discard will increment on swx 0/3.
- Discards on Layer 3 Interfaces
For VLAN interfaces only
- If a layer 2 discard increments, due to a reason mentioned above, on a switchport in access mode, a discard will also increment on the associated VLAN interface.
- If a layer 2 discard increments on a switchport in trunk mode, it will show up on either the sending VLAN or the receiving VLAN interface based upon the part of processing the frame was discarded during.
- For example, if a frame is received on a trunk port sourced from VLAN 1, and it is discarded upon entry as the source and destination mac addresses are equal to each other, VLAN 1 will increment a discard. If a new frame is received from VLAN 1 destined for VLAN 2 and the destination port is in the discarding state due to overbooking on the output port, VLAN 2 will increment a discard.
On Ethernet Ports only
- Routed Ethernet ports are unique in that they possess some layer 2 functions with the added layer 3 functions. Though the interface does not switch, it does perform mac address operations and also performs checks on the Ethernet frames that are input (for example, if the MAC address is all zeros). When one of the non-switching examples from the Layer 2 Discards section occurs on an Ethernet port, it will increment a discard.
VLANs and Ethernet Interfaces
- The L3 software buffer is full and the unit cannot process incoming and outgoing packets.
- This can happen because of high CPU utilization (i.e. the CPU does not have enough resources to process the software queues).
- This generally happens because of the thread “PacketRouting” which performs the majority of the router functions.
- If the PacketRouting process hits a queue depth of 80%, it will trigger an interrupt which will cause it to the unit to stop transmitting and processing traffic for a short period of time (micro seconds), causing the interfaces to increment discards as input and output queues fill up.
- You can see what the current processor utilization is with the command show process cpu as shown below:
- You can also use the show process queue command to see the max depth that each queue has gotten to as a percentage. This does not reset until it is manually cleared or the unit is rebooted. This will not be indicative of spikes in utilization, but rather consistent utilization heights:
- As in the hardware buffer case, the number of discards incremented is not exactly equal to the number of packets actually lost because the CPU stops processing packets.
- This will not affect anything that is routed or switched in hardware. This would only affect packets that are sent to the processor for pure layer 3 routing, firewall, etc.
- Addendum: if the hardware route-cache is full, the overage is being routed by the processor. If the processor becomes over-utilized as well then discards would begin incrementing.
- Configured QoS and traffic-shaping policies discard packets based on prioritization.
- Packets discarded because no route exists to the destination.
- This check happens upon entry to the unit (to save processing down the road if the traffic cannot be routed anyway), so generally these will only be input discards unless routing information changes while the packet is being processed.
- Packets multicast at layer 3 that can not be routed (we will also discard a copy of the one “destined for us”, for example IPv4 address 224.0.0.1).
- Packets with invalid IPv4 and/or IPv6 address information
- This would include source and destination address being equal, invalid field lengths, etc.
It is important to note a couple of things before continuing with this section. First, as stated earlier in this document: discards are a normal byproduct of network operation. You should only be troubleshooting discards on your units if you notice an actual network problem that could correlate with them, or you notice them increment at an increased rate from the normal for that interface. Secondly, you should go through the above sections explaining the types of discards as well. Not only are the troubleshooting steps below based on these examples, but there are many implied troubleshooting steps you can take by going through the above sections. Not all of these will be covered below. For example, reading above you know that when the source and destination MAC address in a frame are equal, the frame is dropped. So this implies that if you see these types of frames in your network through some type of packet analyzer that would explain at least some of the discards. This is not directly discussed below because it was fully covered in the description section.
- Layer 2 Interfaces
- Verify VLAN configurations on ports and switches experiencing the discards
- It is important to make sure the port is in the correct mode (trunk or access).
- If a trunk, make sure the unit plugged into it is not tagging traffic in a VLAN that is not configured on the switch. This can be done by verifying that unit’s configuration, or by using a port mirror to take a packet capture.
- All the VLANs that have a path to the particular unit you are using should be added to the unit’s VLAN database using the vlan <VLAN ID> command.
- Similarly, make sure there are no non-used VLANs configured. Not only does this create a security concern, but if a unit is accidentally placed in this VLAN, all its traffic may cause discards to increment on other switches.
- This requires that you also check VLAN configuration on units connected to this unit to make sure they are correct as well.
- Check the Spanning-Tree Topology
- This can be done using the show spanning-tree blockedports command. You can see if the port incrementing discards are in the blocking state.
- Check the interface bandwidth to see if its possibly overbooked
- This can be done using the show interface command:
- You can see above the bandwidth being used on the interface currently to tell if it's close to being overbooked.
- Check the MAC address table in the unit to make sure there are no more entries than the unit supports.
- Check the hardware ACLs that are in the unit (if any).
- Layer 3 Interfaces
- As with a layer 2 interface, check the interface stats using the show interface <type> <slot/port> to make sure the interface bandwidth is not being exceeded.
- Check the unit’s QoS and shaping policies to see what type of traffic is dropped and at what point it should be dropped.
- Check to show ip route to make sure there is a route to all destinations.
- Check the CPU for over-utilization
- show proc CPU shows information relative to the present as shown in the description section above.
- show proc shows information relative to queue depths since the last clearing of the queues or a reboot. Note: Once an individual process queue hits 80%, interrupts will start causing possible discards. Check to show ip route to make sure there is a route to all destinations.
The best way to know what could be causing the discards in a network is to know your network. If you aren’t aware of the protocols in your network, how they function, which types of hosts are connected to each interface, and so on, you won't be able to fully understand the root cause of discards. You should make sure you are familiar with the below:
- Protocols in your network.
- Do they use multicast or broadcast traffic?
- Are there proprietary protocols that your units may not participate in or understand?
- How much bandwidth do these applications use?
- VLAN configuration
- Which units should be in each VLAN?
- Are the units in my network from different vendors consistent in their VLAN tagging and treatment of access and trunk ports?
- Know which sections of your network require what amount of bandwidth
- If you have sections of the network that serve as bottlenecks for larger bandwidth network portions, this should be resolved as it will cause discarded traffic.
- Design your network so as to avoid bottlenecks whenever possible.
- Set up QoS on a bottleneck to making sure that less time sensitive traffic is dropped during periods of over-utilization.
- Make sure you have purchased the correct equipment sufficient for handling the amount of load and features you require.
- This will help prevent overutilization issues.
- Make sure your network is secure.
- An insecure network can experience problems that may cause discards like a denial of service attack using up available bandwidth, or an attack using insecure VLANs to transmit traffic.
In the end, the best way to troubleshoot discards is to take a packet capture on the interface or interfaces seeing the excess stat. This will tell you what is going on because you can see actual packets and match them with all the potential causes described above.
Several important notes to remember:
- Running port scanners and monitoring programs commonly cause discards because they send uncontrolled bursty traffic.
- Frames/packets discarded because of CRC errors, runts, giants, and other errors are not included in the discard count.
- Packets dropped by the firewall do not increment discards.
- Packets dropped by access-groups do not increment discards