How would one configure Jumbo MTU support for individual server ports connected to a Nexus 5k,
Server admins want to me set MTU to 9000 on a few of their servers.
it looks like a global config either on or off. not supported on per interface basis
having said that, there are some things that can be set in a service policy,
but setting MTU is a network-qos function which is not supported on an interface configuration.
I thought you could do it per port with a policy-map but looks like its in the system qos context and box-wide.
You're referring to setting IP MTU specifically? That's a QoS function?? What a PITA.
On the 5k's, you have to do it as a policy-map, not per port.
http://www.cisco.com/c/en/us/support/docs/switches/nexus-5000-series-switches/112080-config-mtu-nexus.html (http://www.cisco.com/c/en/us/support/docs/switches/nexus-5000-series-switches/112080-config-mtu-nexus.html)
yeah, that's what I was figuring.... don't now how it will affect everything else flowing through the switch.
Quote from: ristau5741 on June 06, 2016, 02:37:46 PM
yeah, that's what I was figuring.... don't now how it will affect everything else flowing through the switch.
In what regard?
Quote from: ristau5741 on June 06, 2016, 02:37:46 PM
yeah, that's what I was figuring.... don't now how it will affect everything else flowing through the switch.
Probably won't affect it at all. It will only enable larger frames than what you're passing today, can't see what that would break. It's not the L3 MTU.
Ristau, I have this configured on our N3k's since their inception back in 2013:
policy-map type network-qos JUMBO-FRAMES
class type network-qos class-default
mtu 9000
system qos
service-policy type network-qos JUMBO-FRAMES
That was taken from a Cisco doc. somewhere.
All this does is allow the switch to switch jumbo frames at 9000 bytes. It has not changed the routing MTU:
Quote from: switch
3048-1# show int vl 7
Vlan7 is up, line protocol is up, autostate enabled
Hardware is EtherSVI, address is hoho.haha.cb3c
Description: VM-MGMT SVI
Internet Address is 192.168.7.2/24
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec
Last clearing of "show interface" counters never
Unfortunately for me, the storage guy didn't bother enabling jumbo's on the SAN and so although the underlying network supports it, they aren't used.
Enabling jumbo frames doesn't really do anything on the switch in terms of traffic flowing through it. I assume that it would allocate more memory to certain processes or the asic so that it could store and forward 9000-byte frames.
The question I would be asking is What are your server guys trying to do? In my experience the server guys have some idea or some knowledge of the end goal but don't usually understand the full extent or full scope of what that involves (In My Past Experiences...).
Are the server guys enabling jumbos on a dedicated server storage NICs, so that jumbos aren't used for non-storage data packets?
Quote from: ristau5741 on June 06, 2016, 02:37:46 PM
yeah, that's what I was figuring.... don't now how it will affect everything else flowing through the switch.
We recently configured this on a pair of Nexus 5672-UP's for our Virtual Team. They needed it to support VMotion.
It did not effect anything on the switch... If a jumbo frame comes through the Nexus, it can now handle it without fragmenting.
Quote from: EOS on June 07, 2016, 05:47:31 AM
It did not effect anything on the switch... If a jumbo frame comes through the Nexus, it can now handle it without fragmenting.
Actually this config means the switch can handle it full stop. Without the config the switch would drop the frame. A router would fragment the frame so it can be routed at a different MTU.
Quote from: Dieselboy on June 07, 2016, 02:18:11 AM
Are the server guys enabling jumbos on a dedicated server storage NICs, so that jumbos aren't used for non-storage data packets?
not dedicated storage NIC's. Storage is to same switch connected on different ports in a different VLAN, so mixing the server packets and storage packets
and this....
Quote from: EOS on June 07, 2016, 05:47:31 AM
...They needed it to support VMotion.
not that the packets would be affected, but the ESX servers are on trunk ports, with a whole lot of VLANs trunked as well the storage VLANs also trunked.
I don't get a warm fuzzy feeling when the interface is mixing MTU sizes between the frames for other servers 1500 MTU and storage 9000 MTU.
I feel like there is going to be a lag there. Queuing or slow server response issues with the switch interface processing 9000 MTU frames sitting in front of 1500 MTU frames in the queue heading to the server.
Quote from: ristau5741 on June 07, 2016, 08:10:08 AM
I don't get a warm fuzzy feeling when the interface is mixing MTU sizes between the frames for other servers 1500 MTU and storage 9000 MTU.
I feel like there is going to be a lag there. Queuing or slow server response issues with the switch interface processing 9000 MTU frames sitting in front of 1500 MTU frames in the queue heading to the server.
I suppose 10G interfaces makes this a moot point.
You can actually calculate this.
8*9000/10,000,000,000 = 0.000 007 2 seconds, or 7.2 microseconds to serialize a jumbo frame
8*1500/10,000,000,000 = 0.000 001 2 seconds, or 1.2 microseconds to serialize a jumbo frame
You lose 6 microseconds for each jumbo frame in front of you. I don't think it will make a difference for a typical application, really.
Yes, L2 and L3 MTUs are separate, I've seen plenty of deployments with 9k on L2 but standard 1500 on the SVIs.
The 1500 routing MTU is blissfully unaffected by the 9k underlying L2 MTU.
Quote from: wintermute000 on June 07, 2016, 10:02:01 PM
Yes, L2 and L3 MTUs are separate, I've seen plenty of deployments with 9k on L2 but standard 1500 on the SVIs.
The 1500 routing MTU is blissfully unaffected by the 9k underlying L2 MTU.
Unless it needs to be routed :) Hence the "Storage VLAN".
Ristau, your comment yesterday means I'm going to provision a "vmotion" vlan for our Red Hat system (called live migration). If I can make VMs migrate quicker then I'll do it. I have 90+ VMs running on 4 BEEFY servers with multiple 10GB connections between them. However even red hat say only to migrate a few at a time to save any issues. I don't think Red Hat know about 10GB yet either, we've just upgraded our rhev to the latest so will see if they've fixed the bug where 10GB is seen as 1GB on the reporting console.
Reggle, thanks for the formulae!
Ristau regarding mixing MTU sizes, I see what youre saying. But if you do a packet capture you'll probably find that you have lots of small frames anyway as well. Going from <200 bytes up to 1500 > 9000. The switch just switches :) It's all done in hardware anyway and it's microseconds of a difference.
Even a 9000 byte jumbo frame at 1GB interface speed only takes 72us (microseconds) using the above calculation
Quote from: Dieselboy on June 08, 2016, 04:52:24 AM
Unless it needs to be routed :) Hence the "Storage VLAN".
Agree, good clarification :)
Not one much on storage but it looks like most, if not all the VLAN's trunked to the ESX Servers are trunked to the NetApp storage device.
I don't know if that makes a difference, doesn't seem to be a need for routing with configuration like that.
Quote from: ristau5741 on June 07, 2016, 08:10:08 AM
I don't get a warm fuzzy feeling when the interface is mixing MTU sizes between the frames for other servers 1500 MTU and storage 9000 MTU.
I feel like there is going to be a lag there. Queuing or slow server response issues with the switch interface processing 9000 MTU frames sitting in front of 1500 MTU frames in the queue heading to the server.
Wouldn't Path MTU discovery address this?
Quote from: AspiringNetworker on June 08, 2016, 10:48:44 AM
Quote from: ristau5741 on June 07, 2016, 08:10:08 AM
I don't get a warm fuzzy feeling when the interface is mixing MTU sizes between the frames for other servers 1500 MTU and storage 9000 MTU.
I feel like there is going to be a lag there. Queuing or slow server response issues with the switch interface processing 9000 MTU frames sitting in front of 1500 MTU frames in the queue heading to the server.
Wouldn't Path MTU discovery address this?
it's really a moot point by Reggle's calculations
Erm, moot in what regard? I'm probably just having an ADD moment and not paying enough attention, but I'd say serialization delay isn't the only concern with mismatched MTU between hosts???
EDIT - http://networkengineering.stackexchange.com/questions/3524/mtu-and-fragmentation (http://networkengineering.stackexchange.com/questions/3524/mtu-and-fragmentation)
Am I just going down an unnecessary rabbit hole and completely missing the point?
This is a good link:
http://www.ccierants.com/2013/10/ccie-dc-definitive-jumbo-frames.html
I wrote a reply yesterday but it's not here so I probably was going off on a tangent and didn't post it :)
I calculated that 9000-byte packet on a 1GB would take 72us to serialise, if I blindly use that calculation from earlier. That's 0.072ms. I guess it could add up over a whole day depending on the number of packets.
PMTUD would not come in to play here. I had a quick google and I don't think it would come in to play at all if 2 servers were communicating with different MTU sizes on their NICs on correctly configured network switches. I think their communication could break in one direction.
Quote from: Dieselboy on June 08, 2016, 10:52:46 PM
I wrote a reply yesterday but it's not here so I probably was going off on a tangent and didn't post it :)
I calculated that 9000-byte packet on a 1GB would take 72us to serialise, if I blindly use that calculation from earlier. That's 0.072ms. I guess it could add up over a whole day depending on the number of packets.
PMTUD would not come in to play here. I had a quick google and I don't think it would come in to play at all if 2 servers were communicating with different MTU sizes on their NICs on correctly configured network switches. I think their communication could break in one direction.
'cording to that article
"The (N5K) switch supports jumbo frames by default."
Quote from: ristau5741 on June 09, 2016, 09:30:53 AM
Quote from: Dieselboy on June 08, 2016, 10:52:46 PM
I wrote a reply yesterday but it's not here so I probably was going off on a tangent and didn't post it :)
I calculated that 9000-byte packet on a 1GB would take 72us to serialise, if I blindly use that calculation from earlier. That's 0.072ms. I guess it could add up over a whole day depending on the number of packets.
PMTUD would not come in to play here. I had a quick google and I don't think it would come in to play at all if 2 servers were communicating with different MTU sizes on their NICs on correctly configured network switches. I think their communication could break in one direction.
'cording to that article
"The (N5K) switch supports jumbo frames by default."
At L2 maybe, same as all Arista switches - but we're talking IP MTU here, or no?
Quote from: Dieselboy on June 08, 2016, 10:52:46 PM
I wrote a reply yesterday but it's not here so I probably was going off on a tangent and didn't post it :)
I calculated that 9000-byte packet on a 1GB would take 72us to serialise, if I blindly use that calculation from earlier. That's 0.072ms. I guess it could add up over a whole day depending on the number of packets.
PMTUD would not come in to play here. I had a quick google and I don't think it would come in to play at all if 2 servers were communicating with different MTU sizes on their NICs on correctly configured network switches. I think their communication could break in one direction.
Don't hosts do PMTUD? If a host sends a jumbo packet with DF bit set, the receiving end needs to send an ICMP response, "Fragmentation needed and DF bit set" (Or something of that nature). In a network that is properly configured with jumbo from end to end of course it won't matter
within the network, but the hosts still do. We just ran into this with a customer who had issues with their DNS because of mismatched MTU of 9000 on one side and 1500 on the other - the network was jumbo all the way through. Of course the network got blamed and we had to prove otherwise.
I'm honestly not sure but I didn't see that hosts themselves would send an ICMP response to packet too big ? My understanding is that if a host received a packet that was larger than its MTU on that interface then it would drop it. But I could be wrong, though because when you install a VPN client, doesn't that drop the MTU to 1380 or something anyway. I would need to test it but I'm in a hotel right now so cant.
Quote from: Dieselboy on June 09, 2016, 12:59:24 PM
I'm honestly not sure but I didn't see that hosts themselves would send an ICMP response to packet too big ? My understanding is that if a host received a packet that was larger than its MTU on that interface then it would drop it. But I could be wrong, though because when you install a VPN client, doesn't that drop the MTU to 1380 or something anyway. I would need to test it but I'm in a hotel right now so cant.
It's a confusing topic and I've heard/seen mixed messages. It gets even more complex with DNS because apparently there's some separate setting for DNS servers in regards to how large of a message they'll recieve...
So I'm right there with you about being sure or not and I guess the answer may be like 90% of all things in IT - "it depends"? It's like one of those Networking 101 things you remember studying about years ago but don't touch anymore unless you need to.
A host should never have to send packet to big ICMP messages. As part of the TCP handshake the systems report their MSS, lowest wins. So unless some network stack just ignores the MSS the end host should not get anything larger. What if they do? I don't know. Never had it happen. Wait you say what about UDP. UDP and PMTUD is just broken. How can a host resend a UDP packet if it did not keep the information? How long should a host use the smaller size when there is no session? It depends on your network stack. Many just ignore PMTUD and say UDP is unreliable. Good luck. Others respect the new MTU for X minutes. This is why DNS, and many other UDP applications limit packet size to 512 or 576 bytes (minimum IP MTU). That way they are not going to be fragmented.
*Note to self: Find out what happens if I use IPSec on a physical interface with an MTU of 576.
What really gets ugly is when firewalls or ACLs block ICMP in one direction only. So the ICMP packet to big that PMTUD relies on works sometimes depending on who sent the first large packet.
-Otanx
Quote from: Otanx on June 09, 2016, 07:15:39 PM
A host should never have to send packet to big ICMP messages. As part of the TCP handshake the systems report their MSS, lowest wins. So unless some network stack just ignores the MSS the end host should not get anything larger.
That's a good point in the case of TCP.
EDIT - AND a good point about UDP. I think that was directly related to the DNS issue we saw.
EDIT #2 - AND a good point about blocking ICMP. That's a message I saw on multiple sources - need to intelligently evaluate and decide how and where to block ICMP instead of just blatantly blocking it altogether and preventing networks from doing their job.
I believe that the host doesn't send ICMP too large responses, only intermediate L3 devices do, if we're assuming RFC compliance.
So yeah interesting point, intra-VLAN/subnet traffic there is a potential for MTU mismatch, but as you say TCP should negotiate MSS correctly.
I've enjoyed this thread discussion :)