Networking-Forums.com

Professional Discussions => Routing and Switching => Topic started by: Dieselboy on April 29, 2016, 06:33:56 AM

Title: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on April 29, 2016, 06:33:56 AM
I have a 2921 connected to a WAN switch. Also in the WAN switch are 2 internet circuits.
The router port 0/0 has one circuit /30 on the physical interface and a sub interface with a different internet circuit /30. I'm using VRF lite to segregate the two internet circuits from each other.

If I ping the IP on the physical interface I get good response and no dropped packets.
If I ping the IP on the subinterface I get packet loss.

example:
This is the ping to the physical interface from the internet:

TP-2901V#ping 116.[] rep 1000
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 116.[], timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 99 percent (999/1000), round-trip min/avg/max = 4/4/40 ms
TP-2901V#


This is the ping to the subinterface from the net


TP-2901V#ping 139.[] rep 1000
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 139.[], timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!.!!!.!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!.!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.
!!!!!!!!!!!!!!!!!!!!
Success rate is 98 percent (985/1000), round-trip min/avg/max = 1/4/156 ms


It's the same if I ping through the router to other devices internally. But likewise if I ping through the good physical interface, no issues.

If I take the router out of the equation, put a laptop into the switch where the internet circuit resides and configure the laptop NIC with the IP of the router, I get ping response like the one that is good. All physical interfaces are 1GB (including the laptop) and there's no incrementing errors or duplex issues.

The config of the subinterface is more simple on the one that is getting the dropped packets, hence the confusion. Simple in terms of there's no service policy applying QoS.
CPU use is <20% on the router.


!
interface GigabitEthernet0/0
description Auto 1GB link to HP CIN-1620WAN-1 port 1 - INTERNET FACING
bandwidth 20000
ip vrf forwarding ~~VRF-1
ip address 116.[] 255.255.255.252
ip access-group GI00-~~-INBOUND in
no ip redirects
no ip proxy-arp
duplex auto
speed auto
no cdp enable
no mop enabled
service-policy output QOS-~~-OUT
end



interface GigabitEthernet0/0.132
description ~~ interface to ~~ /30
bandwidth 50000
encapsulation dot1Q 132
ip vrf forwarding ~~-TID-VRF-2
ip address 139.[] 255.255.255.252
ip access-group GI02.132-IPv4-~~-INBOUND in
no ip redirects
no ip unreachables
no ip proxy-arp
no cdp enable
end


There are policer's applied to the ISP circuit upstream which police traffic to 50M. All I'm doing is pinging with another Cisco 2900 router with default data size so I wont be anywhere near the 50M, so I would not expect the policer to be kicking in. Likewise if I ping my laptop with the internet IP I can get fast ping results and no loss.

To rule it out being an issue with the switch, I connected my laptop into the switch into the same VLAN as the internet circuit and run the same pings - results were good. 

Last thing I can do is disconnect the router entirely and connect in my laptop into the same port, configured with VLAN on the NIC and then run the same test.

Wanted to post here to see if anyone else had something similar...

:o
Title: Re: What could cause a router sub interface to drop random pings?
Post by: routerdork on April 29, 2016, 10:41:42 AM
I've not seen anyone run a config like this without two sub-interfaces. Unless that's a VRF Lite thing? I see that you did drop one packet on the other ping so that leads me to believe all is not perfect in the world. What do the show interface outputs on each side of the link show?
Title: Re: What could cause a router sub interface to drop random pings?
Post by: NetworkGroover on April 29, 2016, 10:46:33 AM
Yeahh if you know the ICMP packets are getting there.... then this is where counters can help... if they expose any that are useful.  Again though, that's assuming they are getting there.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Reggle on April 29, 2016, 01:47:34 PM
First of all, you have a service policy on the physical link. That will likely affect subinterfaces too.
Which brings me to the second point: I would use two subinterfaces to avoid any accidental interference.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: NetworkGroover on April 29, 2016, 03:31:52 PM
Quote from: Reggle on April 29, 2016, 01:47:34 PM
First of all, you have a service policy on the physical link. That will likely affect subinterfaces too.
Which brings me to the second point: I would use two subinterfaces to avoid any accidental interference.

If the service policy were affecting both, shouldn't there be similar behavior?
Title: Re: What could cause a router sub interface to drop random pings?
Post by: deanwebb on April 29, 2016, 03:34:50 PM
Does one path go through a load balancer and the other not?
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Reggle on April 29, 2016, 05:56:27 PM
Quote from: AspiringNetworker on April 29, 2016, 03:31:52 PM
Quote from: Reggle on April 29, 2016, 01:47:34 PM
First of all, you have a service policy on the physical link. That will likely affect subinterfaces too.
Which brings me to the second point: I would use two subinterfaces to avoid any accidental interference.

If the service policy were affecting both, shouldn't there be similar behavior?
If you go for two subinterfaces, service policy on the subinterfaces only in that case. I believe it's pssible, I may be mistaken.
If you mean that the service policy should affect both the main and the subinterface, that's indeed the case. But I also see differing bandwidth statements and the service policy class and ACL (not shown) may consider the subinterface traffic more interesting to drop.

I don't know, this setup just doesn't "feel" right.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on April 29, 2016, 06:55:12 PM
Hi guys thanks for all your replies.

You're all completely correct about the service policy on the physical / subinterface. I'm moving the physical configuration to a subinterface today as I will have a window.
How this happened is that the router was configured for Internet on the physical interface and then we had another circuit procured so during business hours i set that up on a subint. You can either apply service policy to the physical only or you can not apply to the physical and then apply to each individual sub int.
Although removing the service policy entirely has no change.

There's no load balancer. At the moment it goes like this on both circuits
Internet > fibre > NTE > my switch > router
I think i know what you're getting at in terms of load balancer - upstream mac address. I did check already to see if the ISP mac address changed when i done a clear arp but it did not.

The physical interface is 1gb. Theres no errors but there are unknown protocol drops and there are output drops due to the service policy shaping at 20MB. Even though removing the service policy entirely has no change.

I'll move the interfaces to how it should be and see if there's any change. If that's the only thing that sticks out then it's a good place to start. I was wary about making that change and have the potential risk of getting that issue on the in production circuit.
Cheers
Tony
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on April 30, 2016, 12:10:31 AM
Moving the interfaces has had no effect at all on the lost ICMP but I have gone and applied individual QoS per sub interface at least.

Since the ASA's are connected into the same switch, I moved the IP to the ASA and the ASA can ping the upstream gateway fine. If I do this test on the router I get the same odd packet loss.

So I then moved the IP to another subinterface on the router, in case there was any issues with the interface itself, physical cable (even though no errors), or other interface issue and I still get the same packet loss.

So to summarise:
ISP1 -> router = good
ISP2 -> router = bad
ISP2 -> ASA = good

So this got me thinking... What changed? I've ruled out a lot of things.

I then decided to change the mac manually on the interface.

So what I did was (since I've still got a window to do intrusive changes):
- copy the ASA mac address from the show int on the ASA
- remove the VLAN from the switch going to the ASA (I left the config there for the moment, but can be deleted)
- under the physical interface of the router, set the mac address from the Burned in address (4c00.828a.cf00) to the virtual one from the ASA (00a0.c9c0.8201)
- run pings

And what do you know, there's no packet loss from either pinging from the router, or pinging from my Cisco router at home across the internet to the router:


- my main ISP connection, came back before I had a chance to clear arp. I was expecting this to drop until I changed the mac address back.

TP-2901V#ping 116.[] rep 1000
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 116.[], timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 1/4/160 ms
TP-2901V#ping 139.[] rep 1000
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 139.[], timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 1/3/8 ms


If the mac address change "fixed" it, why didn't it fix it when I moved the config from gi0/0 (4c00.828a.cf00) to Gi0/2 (4c00.828a.cf02) ?

Could the issue be at the ISP end, meaning they have a mac in their table the same as both of my BIA's?

Just had a thought - if they're doing QinQ somewhere (the ISP) then this might be the issue. I saw some mac address re-learn issue between our Datacentres back in 2010. Going to have to think to remember what the cause was and the resolution.

What do you guys make of it? :) I don't really want to have to specify mac addresses like this.

I checked the mac table of my switch and it does not have an entry for the burned in address any more. So what ever the issue is, it's on the other side of the NTE which is a Cisco ME3400 provided by the ISP.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on April 30, 2016, 01:46:11 AM
Just remembered regarding the QinQ and packet loss. The issue was when we used HSRP mac addresses because they were the same in each VLAN. So when the packet transited the QinQ, the VLAN was all the same QinQ outer VLAN so the Mac addresses were not unique, and the core switches were re-learning the mac each time in different locations.
The fix, was to move to HSRPv2 and use unique FHRP macs.

I'm not using HSRP or any FHRP, and the upstream gateway mac is 0014.1bd5.8c00. So still unsure at this time.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Reggle on April 30, 2016, 07:05:23 AM
Very interesting, thanks for the feedback. Since I'll be deploying a new QinQ soon myself I can use it. However, have you confiration from your service provider that this the case? Because the router MAC address should be globally unique.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on April 30, 2016, 08:11:54 AM
Havent managed to confirm anything yet.. I called them after my last post but I can only speak to 1st and 2nd line. To quote "there is no way they can contact 3rd line" they have to leave them a message. Of course that's BS though. It's the weekend anyway, I've reverted my config from earlier.

Indeed the router physical address must be unique, but it's confusing why I get packet loss when using that address and not with a virtual address. This makes me think that there is a duplicate mac or similar as this was the same symptoms as when we had the QinQ issue.

One other thing worth mentioning is that ISP 1 and ISP 2 are just different physical ports on the NTE ME3400 switch. Originally before I joined the company, the company sourced fibre internet with ISP 1 but they don't run their own fibre, they get their fibre from ISP2. So as we are "on-net" ISP 2 just had to configure port 2 on the ME3400.
Now, my router uses the same MAC as the customer interface for both ISPs (the burned in address). So the same MAC will be seen on port 1 and port 2 of their ME3400, albeit in different VLANs... it must be different VLANs on their 3400... However, if the problem was because of the same mac being seen then the issue wouldn't be fixed by using a virtual mac. So then this makes me lean toward my Mac not being unique. But even if this was the case, I also tried with another physical interface on that router and different burned in mac - same problem. I would have thought that duplicate mac is possible but highly unlikely. 2 duplicates can't be true.

So I really am not sure. I look forward to speaking with them soon.

One final thing, if the issue was at my end then I would expect it to affect the working internet connection I have too. The problem must be in a cloud somewhere. I'll go and pray.  :awesome:
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 02, 2016, 03:28:46 AM
I suggested to the ISP I know that using my laptop to test the circuit previously I get zero packet loss (because the BIA is different). And I said if they really wanted, I could copy the MAC from the router and place it onto my NIC driver so my known-good laptop will use the MAC which is experiencing packet loss.
And as expected, extreme packet loss when sourcing from that specific MAC address. It is worse during business hours. Yesterday (Sunday) when I was on a call with the ISP I was only getting 4 drops within 1000. During the day I'm getting a lot more but not as much as I've seen (15% packet loss) in previous times.

So all day long I've been sending 1500 byte ICMP pings sourcing from the bad MAC in the hope it's having a negative impact elsewhere in the ISP network so someone else logs a fault :)
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 02, 2016, 03:48:17 AM
To again rule out any issues with my WAN switches, I plugged my laptop directly into the NTE. The attached screenshot shows good pings with the burned in address of my laptop. I've circled the point at which I specified the MAC address, copied from the Cisco router.

While I'm here, I used MAC addresses from 4C00828ACF00 to 4C00828ACF0-9 and they all exhibit the same behavior.

I really don't think I can do any more tests from my end.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: deanwebb on May 02, 2016, 01:42:23 PM
You know it's a crazy case when you be messing with MAC addresses.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: routerdork on May 02, 2016, 04:05:29 PM
I've seen some ISP's that use the MAC's for security but in that case if it didn't match it shouldn't work. Unless maybe they are shaping/policing unknown MAC's.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: NetworkGroover on May 02, 2016, 04:18:56 PM
Yuck - I seriously hope the issue is on their end at this point.

You can use Wireshark to create an I/O graph to show your ICMP requests vs. responses - though of course the only way you could 100% trust it is if it was on the wire itself and not a device.  That would eliminate half your troubleshooting though if you know for fact the response isn't making it back to your edge device.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Otanx on May 02, 2016, 04:43:27 PM
Quote from: Dieselboy on May 02, 2016, 03:48:17 AM

While I'm here, I used MAC addresses from 4C00828ACF00 to 4C00828ACF0-9 and they all exhibit the same behavior.


Take a look at this discussion on reddit

https://www.reddit.com/r/networking/comments/4hduco/mpls_throughput_issue/

In short apparently MPLS routers can have an issue if your MAC address starts with a 4 or 6. I don't know if this is your issue or not, but thought of you when reading it, and then looking saw your MAC addresses started with a 4.

-Otanx
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 02, 2016, 08:29:26 PM
Quote from: deanwebb on May 02, 2016, 01:42:23 PM
You know it's a crazy case when you be messing with MAC addresses.

I know!  :awesome:

Quote from: routerdork on May 02, 2016, 04:05:29 PM
I've seen some ISP's that use the MAC's for security but in that case if it didn't match it shouldn't work. Unless maybe they are shaping/policing unknown MAC's.

Level 3 tech said they're not doing anything with MACs at all.

Quote from: AspiringNetworker on May 02, 2016, 04:18:56 PM
Yuck - I seriously hope the issue is on their end at this point.

You can use Wireshark to create an I/O graph to show your ICMP requests vs. responses - though of course the only way you could 100% trust it is if it was on the wire itself and not a device.  That would eliminate half your troubleshooting though if you know for fact the response isn't making it back to your edge device.

WOW can you do that?? I was thinking yesterday when I was sending thousands of pings "wouldn't it be nice if...". Do you have a link for that?
PS the issue is 100% definitely on their end. For the reasons of:
1. the issue is present even if connecting my laptop directly into their NTE switch (Cisco ME3400)
2. Configuring the problematic MAC address into my laptop NIC driver to source traffic from
3. using a non-problematic MAC on the same physical equipment (my laptop or other hardware) and being directly connected to their NTE gives no issues.

Quote from: Otanx on May 02, 2016, 04:43:27 PM
Take a look at this discussion on reddit

https://www.reddit.com/r/networking/comments/4hduco/mpls_throughput_issue/

In short apparently MPLS routers can have an issue if your MAC address starts with a 4 or 6. I don't know if this is your issue or not, but thought of you when reading it, and then looking saw your MAC addresses started with a 4.

-Otanx

Otanx, this might be it I wonder! Because while I had my laptop directly connected to the ISP switch, I did start playing around with MAC addresses whilst clenching my buttocks. I found:

5C00828ACF0F is GOOD
4D00828ACF0F is GOOD
4C00928ACF0F is BAD
4C00938ACF0F is BAD

Going to read through the Reddit whilst consuming the morning coffee. Thanks a bunch :)
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Otanx on May 02, 2016, 09:32:53 PM
Quote from: Dieselboy on May 02, 2016, 08:29:26 PM

Otanx, this might be it I wonder! Because while I had my laptop directly connected to the ISP switch, I did start playing around with MAC addresses whilst clenching my buttocks. I found:

5C00828ACF0F is GOOD
4D00828ACF0F is GOOD
4C00928ACF0F is BAD
4C00938ACF0F is BAD

Going to read through the Reddit whilst consuming the morning coffee. Thanks a bunch :)

What if you do it without clenching your buttocks? Do you get different results? What about standing on one foot, and type only using your thumbs? Then swap feet, then try with only pinkies. We will get this solved eventually. We just need to test everything!

-Otanx
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 02, 2016, 09:58:29 PM
Waiting for them to call me to discuss..
It's worth me mentioning this to them although the article and reddit lists no packet loss but odd TCP throughput through MPLS network. In my case I have up to 15% packet loss when using macs starting with 4c00..... I'm now worried I'll get similar provider response to what the Reddit OP had.

I know a different mac will fix the issue but I'm reluctant as it's non-standard and will affect the physical router interface, not just the subinterface.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 03, 2016, 03:07:30 AM
They state there is no MPLS in path.

What are the implications of me going ahead and configuring a new mac address on the router interface? I can use the Cisco 00a0.c9xx.xxxx which is from the Cisco ASA documentation. I've already designed a mac address standard for configuring ASA subinterfaces, I can adapt it for a router so it will be unique in my network.. But that's as much as I can assure.

It's not something I've needed to do before, ever. I don't like doing something odd like this mainly because of unknowns. So I'm reluctant to go ahead due to any possible issues arising which would be service affecting.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: NetworkGroover on May 03, 2016, 10:34:50 AM
Quote from: Dieselboy on May 02, 2016, 08:29:26 PM
WOW can you do that?? I was thinking yesterday when I was sending thousands of pings "wouldn't it be nice if...". Do you have a link for that?

Heck yeah you can.  I thank God every day for the HORRIBLE time I had in Tech Support at Websense - because it got me thorough experience with Wireshark, and that has never stopped helping me to this day.

In fact, funny thing about the link - it's a Websense Tech Support article I wrote years ago for analyzing DNS request vs. response which is critical for good proxy behavior - otherwise you get that, "WTF mah interwebz are teh slow".

So take a look at that, and instead just use the icmp request and response filters... I can find them for you if you need me to but I'm sure you're capable ;)

http://www.websense.com/support/article/kbarticle/Identify-DNS-related-errors-using-Wireshark (http://www.websense.com/support/article/kbarticle/Identify-DNS-related-errors-using-Wireshark)

EDIT - Oh, you can watch it live during the capture as well if you like.  I've used this to verify QoS policy behavior (look at rate of traffic for particular DSCP value), etc. - pretty cool stuff.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 05, 2016, 05:50:32 AM
So the latest update is that Level 3 have passed this off to the "SME" and I had a nice chat with her. She looked into the network at my end (customer end) and was going on about seeing two MACs at my end. I said well I can see two macs coming from your end, one is the gateway and she confirmed the other is the NTE which is a Cisco ME3400. I said it's probably spanning tree / cdp / lldp or something like that.
I explained regardless, even if I have only my laptop connected, we get packet loss with a certain MAC.

I mentioned I had sent in the document explaining the bug with MPLS but I had previously been told there's no MPLS between. She said yes she's seen it and actually there IS MPLS between the customer port and the L3 gateway I'm routing to! But she said "but it's Ethernet"... I'm not sure if she meant that it's a VPLS type carrier (IE not mpls) or MPLS on top of a L2 switched network.
From what I can remember, MPLS is just layer "2.5" but uses the same routing equipment. I didn't really probe it / her.

So I said, well a good test, then would be to try with my MAC address starting with a 4 and a 6 and see if we get packet loss on both of those. And we do get packet loss with both 4c00 and 6c00 but NOT 5c00. So she's passing it off to some other team.

How many departments in a network team?

Anyway, if this is the issue Otanx commented, which it is looking like it is, then it's not just because the MAC begins 4xxx like the document says. There's more to it than that. I saw packet loss to 4c00 but not 4D00... This might be why it's not been picked up yet by other customers of theirs, not sure.

I'll be happy to get this resolved as the Sri Lanka to Australia routing is only 130ms on this service compared to 320<450ms average on the current one. :)

Nice work Otanx :)
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Otanx on May 05, 2016, 10:31:51 AM
Quote from: Dieselboy on May 05, 2016, 05:50:32 AM
Anyway, if this is the issue Otanx commented, which it is looking like it is, then it's not just because the MAC begins 4xxx like the document says. There's more to it than that. I saw packet loss to 4c00 but not 4D00... This might be why it's not been picked up yet by other customers of theirs, not sure.

I wish I had the audio, or notes to go with the presentation, but I think you are right. The 4 or 6 is just the starting point of the problem. What I understand is happening is that when one of the MPLS routers tries to load balance it needs to look at the packet being encapsulated. The router does not know if this is a IPv4 packet, IPv6 packet, or an ethernet frame. The logic that is being used says look at the first four bits. If this is a 4 or a 6 then assume that the field is an IP Version field, and the encapsulated data is a IP packet. If the first four bits are not 4 or 6 then assume an ethernet frame. Then the router uses this assumption to identify what part of the encapsulated data it will use to load balance. So if you look at the MAC address and map it to a IPv4 header the second number will identify the header length. This will change the location of where the router looks for data as well. If you get lucky the bits picked are static packet to packet, and everything works. Shift the bits selected for load balancing (by changing the second number from a D to a C) then the data is not static, and you get issues.

-Charles
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 05, 2016, 08:40:54 PM
Great explanation :)

I love finding these types of issues because you learn a lot at a real deep level. These types of issues I find only come up every now and then. Sometimes they take MONTHS to resolve because finding the cause is difficult. I think you've saved the ISP months of troubleshooting :)
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 13, 2016, 01:49:09 AM
Update: They use Juniper switches and the layer 2 technology is VPLS.
They're taking this issue to the vendor.
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 30, 2016, 07:42:40 AM
Got a call from their overseas call centre to say they had fixed the issue and I should test. I've not been able to test yet as I need an outage window but I'm now losing 10% ESP packets on the VPN using this internet circuit.
:matrix:
Title: Re: What could cause a router sub interface to drop random pings?
Post by: deanwebb on May 30, 2016, 07:53:46 AM
Ouch, 10% loss is not good, at all.

I mean, ifwe lost te percent f the chaacters I yped, tha would mae things ard to unerstand, or sure!

ee what Idid there
Title: Re: What could cause a router sub interface to drop random pings?
Post by: Dieselboy on May 30, 2016, 09:34:43 PM
The other VPNs aren't perfect either though. 1% loss on the back up tunnel :( :( :(