[UFO Chicago] [DLC]help in investigating a possible packet storm

Politik Durden politikdurden at yahoo.com
Fri Apr 2 15:32:24 PDT 2010


Thanks all for the replies and good advice. Two loops and a shady coupler (on an uplink line of all places) were the culprit. Pulled some logs and found out that the overnight scripts had been steadily increasing in run time from an hour to almost 8 hours over the past 2 or 3 months. It's so appropriate that these issues all came to a head at closing time on April Fools Day. 

We're running Wireshark (thanks John) and virus scans (thanks Greg) overnight but traffic is normal......for now. They're also going to update the firmware on all switches this weekend. 

Thanks again :-)

--- On Thu, 4/1/10, Politik Durden <politikdurden at yahoo.com> wrote:

From: Politik Durden <politikdurden at yahoo.com>
Subject: [DLC]help in investigating a possible packet
 storm
To: "Depaul Linux" <dlc at mailman.depaul.edu>, "UFO Mail list" <ufo at ufo.chicago.il.us>
Date: Thursday, April 1, 2010, 11:15 PM

Hello all, 

Going to a client site at 6 AM tomorrow because at about 5 PM today (Thursday) all network traffic started getting really really slow.

Here's what I know:

- no recent changes (no new switch, NIC, changes to static routes, config changes, patches/upgrades, etc)

- about a dozen switches feed into a 3COM switch (no model #s yet). ballpark of 2 to 3 hundred nodes total

- no protocols are used, all devices are in "dumb" mode and act as just a plain 'ol switch. some can be managed but no features (snmp, etc) are turned on.

- most nodes *seem* to be pingable from both sides of the firewall, but everything is just crawling. 

-
 nothing (reports, scripts, etc) is timing out, but everything is just super super slow.

They tried swapping out switches one at a time to narrow down the culprit and that helped
 for a bit, but then traffic slowed down again and they couldn't really do any more during production hours.

Theories: 

- Can one bad port cause this kind of a traffic jam ? They started diags on all the major nodes (server NICs, the central 3COM switch, etc) but nothing obvious so far. 

- Some sort of protocol/feature was turned on by mistake and now all the switches are confused ? A quick "topeka" (ha!!) points to stories of spanning tree causing these kinds of traffic jams.

- Somehow a loop got introduced ? 

What I really need is suggestions on a good free traffic tool, something we can install on two or three laptops and put each switch through its paces. Any ideas ? 

Thanks in advance for your comments. This lot always points me in the right direction :-)





      _______________________________________________
DLC mailing list
DLC at mailman.depaul.edu
http://mailman.depaul.edu/mailman/listinfo/dlc
Use the Above Link to Unsubcribe!!
http://linux.depaul.edu/



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ufo.chicago.il.us/pipermail/ufo/attachments/20100402/287060ca/attachment.htm 


More information about the ufo mailing list