Networking-Forums.com

General Category => Forum Lobby => Topic started by: dlots on March 30, 2016, 08:22:09 AM

Title: Bad 6509
Post by: dlots on March 30, 2016, 08:22:09 AM
We had a 6509 go bad over a reboot last Friday: the system just wouldn't come back up.  After lots of trouble shooting and several RMAs we found we had
1 bad chassis
2 bad sup 720s
1 bad 10gb fiber line card
1 bad 48 port GE card
Not everything in the chassis, but quite abit.  I have never seen that much stuff crap out... espeshally for a reboot.

Gonna try and get them to do the Cisco GOLD thingy next time (basically runs the POST test while the system is running to make sure it will come back up.
Title: Re: Bad 6509
Post by: routerdork on March 30, 2016, 08:26:35 AM
That was the only thing I hated about our 7600's. A reboot could instantly change a maintenance window. Only had one or two modules at a time though. You sir might hold a record now.
Title: Re: Bad 6509
Post by: deanwebb on March 30, 2016, 08:32:45 AM
:ckfacepalm:

Yeah, that's a tough set of crap to hit your fan.
Title: Re: Bad 6509
Post by: icecream-guy on March 30, 2016, 08:39:01 AM
....and I _just_ finished reading this about 5 mins ago

Is the Cisco 6500 Series invincible?
http://www.networkworld.com/article/3049220/network-switch/is-the-cisco-6500-series-invincible.html



LOL
Title: Re: Bad 6509
Post by: deanwebb on March 30, 2016, 09:00:55 AM
Quote from: ristau5741 on March 30, 2016, 08:39:01 AM
....and I _just_ finished reading this about 5 mins ago

Is the Cisco 6500 Series invincible?
http://www.networkworld.com/article/3049220/network-switch/is-the-cisco-6500-series-invincible.html

:haha2:
Title: Re: Bad 6509
Post by: dlots on March 30, 2016, 09:54:21 AM
Quote from: ristau5741 on March 30, 2016, 08:39:01 AM

Is the Cisco 6500 Series invincible?



There we go, now the answer is yes... yes it is
Title: Re: Bad 6509
Post by: NetworkGroover on March 30, 2016, 12:54:14 PM
Quote from: dlots on March 30, 2016, 08:22:09 AM
We had a 6509 go bad over a reboot last Friday: the system just wouldn't come back up.  After lots of trouble shooting and several RMAs we found we had
1 bad chassis
2 bad sup 720s
1 bad 10gb fiber line card
1 bad 48 port GE card
Not everything in the chassis, but quite abit.  I have never seen that much stuff crap out... espeshally for a reboot.

Gonna try and get them to do the Cisco GOLD thingy next time (basically runs the POST test while the system is running to make sure it will come back up.

Holy crap!  :wtf:
Title: Re: Bad 6509
Post by: wintermute000 on March 30, 2016, 04:54:49 PM
I've had 50% of the line cards fail on me before thanks to this little gem. The fault doesn't reveal itself until reboot, the faulty module can run for years without any noticeable symptoms.

http://www.cisco.com/c/en/us/support/docs/field-notices/637/fn63743.html
Title: Re: Bad 6509
Post by: deanwebb on March 30, 2016, 08:20:07 PM
Quote from: wintermute000 on March 30, 2016, 04:54:49 PM
I've had 50% of the line cards fail on me before thanks to this little gem. The fault doesn't reveal itself until reboot, the faulty module can run for years without any noticeable symptoms.

http://www.cisco.com/c/en/us/support/docs/field-notices/637/fn63743.html

:kramer:

NEVER. EVER. UPGRADE.
Title: Re: Bad 6509
Post by: Dieselboy on March 31, 2016, 04:19:45 AM
Interesting. How does the unit still go on functioning when the memory goes bad? Pretty clever.
Title: Re: Bad 6509
Post by: wintermute000 on April 02, 2016, 04:27:10 AM
Its some kind of electrical tolerance thingy so the blades run fine in normal operation with normal electrical supply specs, then when subjected to startup/reload voltages or amps (don't ask me I'm no sparkie!) it craps out.


I talked to a few guys from my former MSP after I got nailed by it and they told me that at one point they had a level 1 grunt doing almost nothing but replacement calls for this particular fault, reckon they had several come in every night for a year or two. Because the ROMMON output is so specific its easy to nail it down to this bug.
Title: Re: Bad 6509
Post by: mlan on April 20, 2016, 05:27:51 PM
We got hit with this memory component bug pretty hard in our 28xx router fleet, and I'm afraid every component in our 6509 is going to suffer the same thing on the next reload.
Title: Re: Bad 6509
Post by: Dieselboy on April 22, 2016, 02:45:16 AM
So it's not just specific to the 6509 either?
Title: Re: Bad 6509
Post by: EOS on April 22, 2016, 06:07:55 AM
DDAAAMMNNN!!!!!

That is not what you'd expect of a simple reboot.
Title: Re: Bad 6509
Post by: deanwebb on April 22, 2016, 07:44:07 AM
And now you know why there are several zillion Windows XP boxes around the world, running business-critical applications, with brilliantly-colored post-it notes slapped on them, bearing the stern warning, "DO NOT REBOOT!"

:ckfacepalm:
Title: Re: Bad 6509
Post by: dlots on April 22, 2016, 10:30:02 AM
Do you know if that an error Cisco's GOLD will find?


diagnostic start system test non-disruptive

diagnostic start system test all
Running test(s) may disrupt normal operation
Do you want to continue? [no]: yes

show diagnostic result module all
Title: Re: Bad 6509
Post by: mlan on April 25, 2016, 05:16:24 PM
Quote from: Dieselboy on April 22, 2016, 02:45:16 AM
So it's not just specific to the 6509 either?

No, indeed.

http://www.cisco.com/c/en/us/about/supplier-sustainability/memory.html
Title: Re: Bad 6509
Post by: Otanx on April 26, 2016, 01:51:20 PM
That was one notice I was always worried we would hit. We had close to 100 devices that fell under that. Never had one fail (knock on wood). I think we have 10 devices left. All others were refreshed last year as part of a network upgrade.

-Otanx
Title: Re: Bad 6509
Post by: mlan on April 26, 2016, 04:16:58 PM
Yeah, I believe we have lost over fifty 2821's from this memory failure.  Thankfully, you could just pop in a new SDRAM module to resolve the issue in a pinch.  Cisco has been good about replacing them, but they will only replace-on-fail.  I am hoping to replace our 6509's before I have to reload them again, but I'm not holding my breath.  I am fully expecting both the sup720-3C's and all the line cards to fail on the next reload.
Title: Re: Bad 6509
Post by: icecream-guy on April 27, 2016, 07:30:49 AM
Quote from: mlan on April 26, 2016, 04:16:58 PM
Yeah, I believe we have lost over fifty 2821's from this memory failure.  Thankfully, you could just pop in a new SDRAM module to resolve the issue in a pinch.  Cisco has been good about replacing them, but they will only replace-on-fail.  I am hoping to replace our 6509's before I have to reload them again, but I'm not holding my breath.  I am fully expecting both the sup720-3C's and all the line cards to fail on the next reload.

ProTip: Open a Proactive TAC case before the upgrade, explain that you are upgrading a device that is affected by the memory issue, include a show inventory in the case, have TAC verify replacement parts for the system you are upgrading, making sure they are in stock at your local depot. So in case of failure so you aren't stuck with NBD when you have 4 hours turnaround.

I've been doing this process for some time, due to devices / cards / module failing on reboot, either related to unrelated to the memory issues, sometimes things just don't come up right.  Not had any problems with TAC opening a case for this either.

I've still got 95 pieces of hardware that are affected by the memory issue in production, and yes we had AS look everything up, sent them a list of hardware inventory,  I assume they used serial numbers to identify manufacture date / location or something like that to produce a list for us.