Hey guys,
Browsing through some documentation on internet edge designs/etc, and I am curious how y'all are doing this. Lets assume 2 ISPs with BGP (and you want active/active with failover)
Here is what I have seen from different design guides:
1) Run iBGP on all equipment, and advertise prefixes (half of internet on one edge router, and half of internet on other edge router).
2) HSRP with static routing (which really is not true active/active)
3) GLBP with static routing (I guess this will work, but meh)
4) Redistribute iBGP into OSPF and then advertise routes in ospf
5) (anything else)?
What is your preferred way of internet edge design (assuming active/active), and why?!
BGP everything, OSPF for loopbacks. There's no reason (in utopia) why your entire DC isn't built like this. SP large-scale design best practices FTW
But failing that then BGP to OSPF is pretty standard. Just don't forget to avoid redistribution feedback, and that unless you are running MPLS (obviously not) then every router in the iBGP path must have the iBGP routes and the correct next-hops. I've seen people resort to GRE tunnels for this (works, if messy and then dealing with frag/MTU). Then there's also the religious debate about next-hop-self vs redistributing the ISP transit subnet (CCDE says latter for faster convergence, FWIW).
I assume re: iBGP you are referring to your internal peerings, as it will always be eBGP to the ISP.....
I also assume that you have multiple /24s to go active/active as you're not doing full tables (if you were then this question would be moot)
every HSRP/GLBP/static route "solution" is an instant fail
Yes I was assuming eBGP to the providers. haha.
I agree, that anything static is a fail. Lets assume you are getting a full table from your ISPs. Do you then advertise two default routes to your iBGP peers? Or do you kill them with the full table?
edit: Why do you need multiple /24's for active active?
Sorry not full picture. Let me elaborate.
24 is smallest prefix you can advertise to the internet. So if you have two blocks you can preference them differently for inbound. And hence active active.
You could advertise same block to both ISP same attributes then pot luck which inbound path is better, you'll get asymmetry.
Full tables let's you take optimal link bgp wise outbound. Again asymmetry. Which is fine from a pure routing pov but messes with services which tend to be stateful
There's also pbr and other duct tape. LOL
I got where you are coming from, from an inbound perspective. But from an outbound perspective, what is the best way to "load share" outgoing traffic. Would this be to advertise two defaults to downstream iBGP peers, and then allow maximum-paths?
Or advertise through route maps a prefix range (0-127), (then anything else). One goes advertises out one router, and the rest out the other.
Last place we used one ISP link for inbound, the other ISP link for outbound. Later on, had to apply filtering to advertise subnet routes from either one when one of the pipes got full..
Yeah. I believe I have a few options because you could either:
1) Advertise two full tables
2) Advertise two partial tables, and two defaults from the providers (then use maximum paths + multi path relax)?
3)?
Rule #1: Do Not make yourself a transit link between the two ISP's
Rule #2: Always use BFD
Quote from: ristau5741 on July 13, 2017, 06:28:31 AM
Rule #1: Do Not make yourself a transit link between the two ISP's
Is there a story to go with this? :drama:
What I would probably do is
Give Core 1 (or whatever your 1st hop out of the edge is) a default static route to Edge 1, and a crappy static route to Edge 2. (redistribute the static route)
Give Core 2 a default static route to Edge 2 and a crappy static route to Edge 1. (redistribute the static route)
Edge 1 and 2 have iBGP running between them, but don't advertise routes they learn from one another out to the ISPs (In theory your ISP should limit what routes they learn from you, but unless you want to pay your ISPs to move data between them I would advise against trusting them). I would advise setting up your BGP metric so Edge 1 to use ISP 1 and Edge 2 to use ISP 2. If you have a full BGP table if a route isn't being advertised from 1 ISP but it is for the other the traffic will go to the appropriate Edge and still make it out, if you don't have a full BGP table if one ISP goes down you still go out the other edge.
Now you just need to tweek your routing on the Distribution devices to prefer Core1 or Core2, in this way you can balance how much traffic is going out which ISP by moving your Distribution devices between cores. This lets you have more control especially if you are doing a full BGP table. For example if you are at a collage the ISP with the better route to Netflix, hulu, and youtube will always be getting hit way harder than the one with the worse routes.
The main downside to this is if one of your cores go down one of your ISPs will get hammered.
To fix that last issue, have a condition on the proxy to block FB, YouTube, and NetFlix if it can't ping one core or the other.
That would work
Another option if you have a full BGP table is to have an IPSLA on the edge that monitors the core, If a core goes down it triggers an EEM script that modifies your BGP metric so they take the best BGP route. And another one that when the IPSLA comes back up it puts the BGP metric back.
re: outbound, I'd probably prefer to (not knowing your specific requirements) Keep It Simple Stupid as follows, if you are open to running full tables
- Full tables + iBGP in edge block
- ECMP into edge block
- let full tables take it wherever.
PBR is also another option, although rapidly gets messy.
If partial tables or straight default, what dlot says - each router prefers a different ISP then you can control which link yourself in your internal routing. However most of the time, for enterprise at least, outbound traffic is much less of a concern than inbound balancing.
If active/active is not mandatory, classic design using AS-PATH prepend out the secondary link and inbound local-pref - maximum stability and simplicity, also keeps jitter to minimum (deterministic).
In any event you will need the asymmetric routing in front of any stateful services (FWs) so you won't be able to terminate ISP directly on firewalls using any non deterministic design.
re: not becoming transit, regex is your friend (the classic ^$)
Whenever customers ask for active-active I always ask them 'so if your normal demands are so high that you need to utilise your 'secondary' link as well, do you think performance will be acceptable if you lose your primary link?'.
Quote from: wintermute000 on July 14, 2017, 06:54:04 AM
Whenever customers ask for active-active I always ask them 'so if your normal demands are so high that you need to utilise your 'secondary' link as well, do you think performance will be acceptable if you lose your primary link?'.
... and then there was silence...
:wha?:
People shooting for five nines and max ROI simply
do not understand the concept of "backup capacity". These are the same types that will insist that all business functions are 100% critical, everything is top priority, and that you can buy unicorns in bulk via eBay and Amazon Prime.
Once upon a time, 80% utilization of an asset was considered "full capacity". Anything over that was "over capacity" and would get the business owner to look at expansion plans before utilization got critical at 85%+.
Quote from: deanwebb on July 14, 2017, 09:59:48 AM
types that will insist that all business functions are 100% critical, everything is top priority, and that you can buy unicorns in bulk via eBay and Amazon Prime.
Amazon really does have everything
https://www.amazon.com/UNICORN-RUBBER-DUCKIES-DOZEN-BULK/dp/B0076ZWTZ6/ref=sr_1_2?ie=UTF8&qid=1500046904&sr=8-2&keywords=bulk+unicorns
You should always go active/active if you can. Treating it as a capacity discussion is the wrong direction imo. In other words, you shouldn't have active/active be a requirement in order for your network to handle the load on it. It's purely an HA discussion.
EDIT - I'm now realizing we're really talking about the edge and it may not be as appropriate here... but I stand by my point that any one link should be able to handle the load. I'm probably being unrealistic.
Quote from: AspiringNetworker on July 14, 2017, 01:45:39 PM
You should always go active/active if you can. Treating it as a capacity discussion is the wrong direction imo. In other words, you shouldn't have active/active be a requirement in order for your network to handle the load on it. It's purely an HA discussion.
EDIT - I'm now realizing we're really talking about the edge and it may not be as appropriate here... but I stand by my point that any one link should be able to handle the load. I'm probably being unrealistic.
True. Any *one* link should handle the load. But when managers get grabby and decide that they can increase the load so that there is no redundancy, then that backup link is now a second primary link.
Quote from: ristau5741 on July 14, 2017, 10:43:33 AM
Quote from: deanwebb on July 14, 2017, 09:59:48 AM
types that will insist that all business functions are 100% critical, everything is top priority, and that you can buy unicorns in bulk via eBay and Amazon Prime.
Amazon really does have everything
https://www.amazon.com/UNICORN-RUBBER-DUCKIES-DOZEN-BULK/dp/B0076ZWTZ6/ref=sr_1_2?ie=UTF8&qid=1500046904&sr=8-2&keywords=bulk+unicorns
Yes, but those are not eligible for Prime shipping, and that's where they get ya!
Quote from: deanwebb on July 14, 2017, 03:02:46 PM
Quote from: AspiringNetworker on July 14, 2017, 01:45:39 PM
You should always go active/active if you can. Treating it as a capacity discussion is the wrong direction imo. In other words, you shouldn't have active/active be a requirement in order for your network to handle the load on it. It's purely an HA discussion.
EDIT - I'm now realizing we're really talking about the edge and it may not be as appropriate here... but I stand by my point that any one link should be able to handle the load. I'm probably being unrealistic.
True. Any *one* link should handle the load. But when managers get grabby and decide that they can increase the load so that there is no redundancy, then that backup link is now a second primary link.
Quote from: ristau5741 on July 14, 2017, 10:43:33 AM
Quote from: deanwebb on July 14, 2017, 09:59:48 AM
types that will insist that all business functions are 100% critical, everything is top priority, and that you can buy unicorns in bulk via eBay and Amazon Prime.
Amazon really does have everything
https://www.amazon.com/UNICORN-RUBBER-DUCKIES-DOZEN-BULK/dp/B0076ZWTZ6/ref=sr_1_2?ie=UTF8&qid=1500046904&sr=8-2&keywords=bulk+unicorns
Yes, but those are not eligible for Prime shipping, and that's where they get ya!
ever since then, all my amazon internet ads are fulll of unicorns... LoL :wub:
Quote from: ristau5741 on July 19, 2017, 08:30:28 AM
ever since then, all my amazon internet ads are fulll of unicorns... LoL :wub:
Should have seen mine after I went in drag one halloween... it was scary :-(
Does that mean you are now a brony?
:disappoint:
Quote from: deanwebb on July 19, 2017, 10:26:59 AM
Does that mean you are now a brony?
:disappoint:
no, unicorns, not My Little Pony.
:barf:
It's probably already bad that I know what you are referring to..
PINKY PIE IS THE BEST PONY!!
:dj:
https://youtu.be/W29uMcp5BWU?t=9
:dj:
So... back to the need to have a proper backup line capacity... and to make sure the ISP doesn't route through your network...
Man,
So I go on vacation, and I come back to this. This thread took a HUGE DETOUR. The idea/philosophy behind active/active is load sharing, while also using both links for BPA. Many execs do get greedy, and it can become messy very quickly. I understand. But why would you force traffic out only 1 ISP, is the other has a much better path.
There is only so much you can do for inbound, and that would be AS-PATH prepending. I am just surprised that there is nothing out here, that can make an engineers life simple, while giving you granular control on your internet routing.
Sorry about the detour, LynK, but I think we got the cart back on the track...
For inbound traffic, there's currently no way to signal a link-balance situation across multiple ISPs. It would be nice if there could be a designated VIP, but we then get into the question of which vendor hosts the VIP and what happens if that vendor goes down... gets messy there, real quick.
We'd need another protocol for this. Say there are two firms that want to send traffic to each other. Both have a desire to link-balance traffic. Firm A could send a message to firm B's routers with information on which paths to balance on, then firm B can respond likewise to firm A. Both firms update their routing tables to balance across those links. Should one or more links go down, routing tables update to prefer the active link. Balance is restored after a link comes back online and a keepalive message verifies the link is up and active, and then the routing tables go back to balancing the links.
Quote from: LynK on July 24, 2017, 07:41:52 AM
Man,
So I go on vacation, and I come back to this. This thread took a HUGE DETOUR. The idea/philosophy behind active/active is load sharing, while also using both links for BPA. Many execs do get greedy, and it can become messy very quickly. I understand. But why would you force traffic out only 1 ISP, is the other has a much better path.
There is only so much you can do for inbound, and that would be AS-PATH prepending. I am just surprised that there is nothing out here, that can make an engineers life simple, while giving you granular control on your internet routing.
You shouldn't have gone on vacation, you should have stayed around and kept the thread on track yourself. LOL :(