By
David Davis CCIE, MCSE+I, SCSA
Wednesday, November 09 2005 01:04 PM
URL:
http://www.zdnetasia.com/techguide/network/0,3800010800,39289060,00.htm
Do you have network congestion? If you don't now, you
probably have before, or you likely will in the future. How do you fight
network congestion?
While there isn't one quick-hit solution, you have several
available options. Let's look at how you can begin troubleshooting network
congestion and discuss some possible solutions.
Ask these questions
Before we begin troubleshooting, you need to answer some
questions about your network. Even if you think you already know the answers, you
still need to use tools to validate them.
Start off with these questions:
- What
does your network look like? Do you have a diagram?
- What
size are the network links?
- What
types of applications are running on the network?
- What
are the characteristics of those applications? Are they latency-sensitive
or latency-insensitive? How much traffic do they generate? What are their
traffic patterns?
- When
did the congestion start? Was it all of the sudden, or has it slowly developed
over time?
- Is
the congestion constant, or does it come and go? Does it happen at a
certain time of the day, week, or month?
- Has anything
recently changed that could have caused the congestion (e.g., new
applications, hardware changes, applied patches, etc.)?
Validate your answers
Using your answers to these questions, you may think that you
know what's causing the congestion. However, you need to use tools to verify these
deductions.
So how do you corroborate that the congested link is really
the one you think it is? On a Cisco router, this may be as simple as using the show interfacecommand. Here's an
example:
Router# show interface s3/0
Serial3/0 is up, line protocol is up
Hardware is QUICC with integrated T1 CSU/DSU
Internet address is 10.0.100.2/30
MTU 1500 bytes, BW 512 Kbit, DLY 20000 usec,
reliability 255/255, txload 36/255, rxload 255/255
Encapsulation HDLC, loopback not set
Keepalive set (10 sec)
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 1/75/0/0 (size/max/drops/flushes); Total output drops: 4281
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 498000 bits/sec, 400 packets/sec
5 minute output rate 73000 bits/sec, 110 packets/sec
148239286 packets input, 3250920677 bytes, 0 no buffer
Received 536509 broadcasts, 0 runts, 5 giants, 0 throttles
31566 input errors, 2219 CRC, 14502 frame, 0 overrun, 0 ignored, 14840 abort
148886376 packets output, 1823664299 bytes, 0 underruns
0 output errors, 0 collisions, 200 interface resets
0 output buffer failures, 0 output buffers swapped out
17 carrier transitions
DCD=up DSR=up DTR=up RTS=up CTS=up
Router#
As you can see, the receive load on this 512K circuit is
high, and so is the 5-minute input rate. These results show that this circuit
is indeed congested.
You can also use Paessler's PRTG--an
easy, graphical tool for monitoring utilization--o validate your answers. However,
while these tools can help you make sure you're on the right track, neither PRTG
nor the show interface command can
tell you where the traffic is coming
from or what traffic it is.
Determine what the traffic is
To get a better idea of the traffic, you'll need to take a
packet capture or use a tool such as Packeteer,
Network General Sniffer, or Network Instruments Observer.
These tools sport remote hardware that can capture those packets and bring them
back to a decoding station (such as your desktop). They then decrypt the
traffic to be able to explain it. (Packeteer can also block traffic.)
Or, if you're local to the site with the congestion, determining
the problematic traffic could be as simple as mirroring the port on the switch
going to that router and using a PC with Ethereal
to view the traffic. There are a lot of ways to find out what that traffic is,
so choose a method you're comfortable and familiar with.
Decide how to deal with the traffic
Once you've determined what the traffic is, you basically have
two options. You can stop the traffic, or you can choose to allow the traffic.
If you're lucky, opting to stop the traffic should resolve
the congestion. You can stop it with an access control list, or you can terminate
it at the source.
On the other hand, if you choose to allow the traffic, you
then have a few choices for how to deal with the congestion. Of course, there
are pros and cons to each option.
- Add
more bandwidth.
- Perform
quality of service (QoS) on the traffic.
- Compress
the traffic.
Weigh your options
Adding more bandwidth (at least on a WAN link) means you can
expect to pay a higher price per month. In some cases, however, this is your only
option.
For example, if you have 25 users who are all trying to use
Citrix over a 56-K dedicated frame-relay circuit, no amount of QoS or compression
will resolve the extreme slowness. You just need more bandwidth.
On the other hand, let's say you already have a reasonable
amount of bandwidth for your Citrix and VoIP traffic, but users complain of
periodic slowness. This slowness happens when users print 10-MB PDF files over
the 256-K WAN link. In this case, you need to perform QoS.
This solution goes back to the question about the
requirements of the applications running on your network (in this case,
latency-sensitive vs. non-latency-sensitive). The non-latency-sensitive print
jobs are slowing down the latency-sensitive traffic, and the latency-sensitive
traffic needs higher priority. Most users won't notice if their print job takes
a little longer to print out, but they will notice if their phone call sounds
bad or their Citrix session is slow.
As for the third option, you can use compression in place of
additional bandwidth. However, keep in mind that there are several caveats that
go along with compression.
One big stipulation is that this solution doesn't always
work. Compression only works for certain types of traffic, and it can cause
delay on other types of traffic. In addition, compression can be expensive if
you have several locations because you'll need a compression unit at each one.
While Cisco routers can carry out compression, it does cause
a bit of delay and a larger increase in CPU utilization. Cisco routers can also
perform QoS, but it isn't very friendly to configure--nor is it easy to see what's
going on.
Although a dedicated QoS device like Packeteer will cost
you, in my opinion, it's far superior to trying to perform QoS inside a Cisco
router. As much as I love Cisco routers and try to use them as much as
possible, sometimes you need to take the "best-of–breed" approach.