Is this ever done? If so, what do you tweak and why?
BGP timers in the DC for peering and dead time.
Direct links on stable gear drop them all the way down. They were designed for unpredictable WAN links not DC links.
Honestly BFD and fast-reroute when available have made timers a non-issue.
Quote from: that1guy15 on December 10, 2016, 02:07:08 PM
BGP timers in the DC for peering and dead time.
Direct links on stable gear drop them all the way down. They were designed for unpredictable WAN links not DC links.
Honestly BFD and fast-reroute when available have made timers a non-issue.
I think I'm going to wind up learning something here... but that's what this place is for, right? We ask and answer questions to learn and to keep sharp.
I'll start with, "what is peering and dead time?"
Sorry mixed up terms on my part. The timers Im talking about are the keep-alive and holddown timers. Dead timers mixed up in my head as BGP Dead Peer Detection which is used to quickly detect if a neighbor/peer in BGP is lost. Once noticed the peer and prefixes are flushed instantly which can reduce convergence from 3+ minutes to seconds or less.
Hold timers are the number of seconds to keep a peer session "established" without receiving a hello. Once the hold timer is reached the peer session goes "active" and prefixes from the peer are flushed. By default keep-alive is set to send every 60 seconds and hold down is 3x the keep-alive.
This means if a peer goes down it will take 3 minutes to notice and start the convergence process.
My recommendation was to reduce the timers to the smallest possible which is a keep-alive of 1 second and 2x for holddown. So 3 seconds to detect a peer loss. Not bad but as my original post said BFD and Dead Peer Detection drop this well below sub-second.
Sorry for all the mixup.
Honestly, this is all new to me, so I had no idea if you had a mixup or not. :D
But now that leads me to ask, why would the defaults be set the way they are if most rapid detection is preferable?
In my opinion, I wouldn't touch any of the timers. Between BFD and fast-fallover or whatever it's called, convergence time should be minimal.
Am I misunderstanding? Can you propose a scenario I can build out in vEOS to tinker (If I ever have time).
Dunno if this helps at all, but I tinkered with it a bit while I was writing http://aspiringnetworker.blogspot.com/2015/08/bgp-in-arista-data-center_90.html
Check out "The Need for Fast Failure Detection"
Quote from: AspiringNetworker on December 11, 2016, 08:09:38 PM
In my opinion, I wouldn't touch any of the timers. Between BFD and fast-fallover or whatever it's called, convergence time should be minimal.
Am I misunderstanding? Can you propose a scenario I can build out in vEOS to tinker (If I ever have time).
Dunno if this helps at all, but I tinkered with it a bit while I was writing http://aspiringnetworker.blogspot.com/2015/08/bgp-in-arista-data-center_90.html
Check out "The Need for Fast Failure Detection"
I tested both ways during our DC turnup testing last year and timers played no role. We put both in place.
Its free and an additional safety net beneath BFD. Also, not everything / every scenario can run BFD (peerings via SVIs on some hw/sw etc.). Also, P2MP topologies.
^^This.
Dean, essentially you want to tweak the BGP timers because they are so drastically long (naturally because it is traditionally an external routing protocol). You want to tweak this across your DC well... because itz SUPA FASTTTTTT
OK, thanks. I think I'm getting an understanding here. Remember, I didn't go to no routin' school! :)
That being said, what's BFD?
Bidirectional Forward Detection. AKA advanced heartbeat between device interfaces, protocols, etc used to very quickly detect failures or issues.
Supported across almost all vendors now.
OK, that explains BFD. Thanks.
Yep support between vendors and done at the line card level is really what the benefits are to it other than really awesome convergence. VMware started offering it as well within NSX as of late so thats pretty neato.
Anything else?
that1guy15 mentions unstable WAN links... what makes a WAN link unstable?
WAN links are typically under powered like t1's small ethernet lines etc. But the same thing that makes proxies and firewalls unstable ie youtube,facebook etc.
Quote from: deanwebb on December 13, 2016, 01:55:38 PM
that1guy15 mentions unstable WAN links... what makes a WAN link unstable?
I use that as a pretty broad term. Your internal 3 foot fiber connections or IDF-MDF connections are always more stable than any connection to your ISP and to the internet. You also have lower latency and packet loss over these links as well. WAN links can vary greatly in this regard but they are better than they used to be.
Also routes/prefixes on the internet are constantly being added/removed and BGP was designed to handle a large number of prefixes (600K+ IPv4 now) and constant churning without falling over from having to process this. That is why you see BGP with timers and convergence so slow compared to IGPs.
IGPs actively processes when something changes. BGP passively processes based off timer.
When you bring BGP in-house in a more stable environment then the longer timers are not needed anymore.
Ok, so we did a test with a customer where they built out the same BGP DC with us and Cisco. Ixia traffic was run in both directions and we did a slew of single-failure convergence scenarios. The
worst amount of packet loss duration we experienced was 4.67 seconds and only in one direction (was 69 milliseconds in the other). This was during a Leaf/ToR reboot. The second worst performance was after no-shutting a Leaf/ToR port after it had been shut down (I didn't write down exactly what the port was.. I suspect a host or uplink port though it was too long ago to remember) - and that was only 1.15 seconds in one direction (4 ms in the other).
Other than that, everything was pretty much ~ 50 ms or less - a lot of them zero. So I don't see how playing with timers is going to help you there.
Here's a spine BGP config from that environment - all spines configured similarly:
router bgp 64700
router-id 10.146.16.11
bgp log-neighbor-changes
maximum-paths 64 ecmp 64
neighbor BD_eBGP_GROUP peer-group
neighbor BD_eBGP_GROUP fall-over bfd
neighbor BD_eBGP_GROUP maximum-routes 12000
neighbor 10.146.0.65 peer-group BD_eBGP_GROUP
neighbor 10.146.0.65 remote-as 64702
neighbor 10.146.0.67 peer-group BD_eBGP_GROUP
neighbor 10.146.0.67 remote-as 64702
neighbor 10.146.0.69 peer-group BD_eBGP_GROUP
neighbor 10.146.0.69 remote-as 64703
neighbor 10.146.0.71 peer-group BD_eBGP_GROUP
neighbor 10.146.0.71 remote-as 64703
neighbor 10.146.0.73 peer-group BD_eBGP_GROUP
neighbor 10.146.0.73 remote-as 64704
neighbor 10.146.0.75 peer-group BD_eBGP_GROUP
neighbor 10.146.0.75 remote-as 64704
address-family ipv4
neighbor BD_eBGP_GROUP activate
network 10.146.16.11/32And here's a Leaf/ToR BGP config from that environment - all Leafs are configured similarly:
router bgp 64702
router-id 10.146.16.13
bgp log-neighbor-changes
maximum-paths 64 ecmp 64
neighbor BD_eBGP_GROUP peer-group
neighbor BD_eBGP_GROUP fall-over bfd
neighbor BD_eBGP_GROUP maximum-routes 12000
neighbor 10.146.0.64 peer-group BD_eBGP_GROUP
neighbor 10.146.0.64 remote-as 64700
neighbor 10.146.0.76 peer-group BD_eBGP_GROUP
neighbor 10.146.0.76 remote-as 64701
neighbor 10.146.0.105 remote-as 64702
neighbor 10.146.0.105 next-hop-self
neighbor 10.146.0.105 fall-over bfd
neighbor 10.146.0.105 maximum-routes 12000
address-family ipv4
neighbor 10.146.0.105 activate
neighbor BD_eBGP_GROUP activate
network 10.146.16.13/32
network 10.146.32.128/26Here's my short-hand list of tests we did:
- Shut 40G between ToR1 and Spine1 at ToR1
- no shut
- Shut 40G between Tor2 and Spine1 at ToR1
- no shut
- Shut ToR1 port 1
- no shut
- Shutdown peer-link port 1 on ToR1
- No shut
- Reboot Spine1
- Reboot ToR1
- Unplug 40G between ToR1 and Spine1 at ToR1
- Plug
- Unplug 40G between ToR2 and Spine1 at ToR1
- Plug
- Unplug ToR1 port 1
- Plug
- Unplug peer-link port 1 on ToR1
- Plug
- Unplug Spine1 sup 1
- Plug
- Unplug Spine1 PS1
- Plug
- Power off Spine1
- Power on
- Power off ToR1
- Power on
dont disagree with you when you have BFD and Dead Peer Detection. There are scenarios when its not an option so Im trying to outline why.
Quote from: that1guy15 on December 14, 2016, 11:45:00 AM
dont disagree with you when you have BFD and Dead Peer Detection. There are scenarios when its not an option so Im trying to outline why.
Ah k - sorry if I jumped the gun.
Quote from: AspiringNetworker on December 14, 2016, 11:54:30 AM
Quote from: that1guy15 on December 14, 2016, 11:45:00 AM
dont disagree with you when you have BFD and Dead Peer Detection. There are scenarios when its not an option so Im trying to outline why.
Ah k - sorry if I jumped the gun.
Nah all cool, we are kinda drilling down into a single aspect of the original question pretty far :)
Tweaking route metrics when redistributing from one routing protocol is a popular endeavor to show preference of which routes are seen in the routing table. some routing protocols requires a network engineer to set a metric for redistribution.
One could also change administrative distance of a routing protocol for the routing table to have preference for one routing protocol over another.
Network engineers tweak the cost for OSPF routing for preferring faster links. The default cost for any link over 100Mbps is currently 1.
So traffic going over a 10Gbps link would have the same cost as going over a 100Mbps link. Changing the cost metric allows support for sending data over the preferred faster links.
:wtf:
Ahhh yeah I got lazer-focused on TIMERS which I loathe toying with. TIMERS isn't really metrics.
Yes, tweaking routing metrics is a common exercise for traffic engineering:
- I want this flow to go this way while this other flow goes a different way, but I still want both paths available for HA
- I want corp traffic to take the faster path, but my less-important R&D traffic to take this slower path as to not hinder corp traffic - but again I need both paths available to either traffic for HA.
etc... etc..
Does one generally tweak metrics in conjunction with QoS or are those kept separate concerns?
Quote from: deanwebb on December 15, 2016, 12:17:04 PM
Does one generally tweak metrics in conjunction with QoS or are those kept separate concerns?
one may tweak a metric using a PBR to have the VOIP traffic take a faster path to a destination to keep the latency lower.
but generally routing table's and interface QoS are ships in the night.