MS Teams voice Direct Routing - call disconnects after 2 minutes on mute

Started by Dieselboy, May 25, 2022, 03:56:59 AM

Previous topic - Next topic

Dieselboy

Scenario:

When joining a voice conference, we go on mute. Around 2 minutes later, the call gets disconnected.
or
When calling or receiving a call from a BT remote party, the call is disconnected after going on mute for 2 minutes (muting MS Teams).

In our scenario, our realised issue is with BT, however this will impact any SIP peer connected to a MS Teams Direct Routing provider for any organisation - therefore it's a high impact scenario that may not be realised until going on mute for 2+ minutes.

Voice path:

MS Teams app -> "PSTN call" -> remote party is British Telecom (BT)

"PSTN call" = a SIP service provided by a carrier that supports MS Teams Direct Routing.

Captures taken at the PSTN carrier (facing BT) as well as BT response, tell us that BT disconnect the call (normal call clearing) after a 2 minute timer because while on mute BT no longer receive any media or signaling packets whatsoever. So essentially, BT see the call as hung up on the remote side (from this perspective, the remote side is MS Teams).

Where I am

I'm just coming into this issue right now. I am told by my peers that, Microsoft do not send anything when on mute. No signaling, no media and also that Micosoft say they refuse to fix this issue because they dont need to for USA customers.

I have my doubts surrounding this so I am trying to push my own investigation.

However I wanted to reach out to the forum here to see if any of you had come across this issue?

What I *think* is actually happening, is that MS Teams do send signaling to the PSTN carrier and this signaling gets lost / dropped and doesnt make it to BT. Therefore this would be a PSTN provider issue which I can try and push.

When looking around the internet, these people have the exact same issue:

https://techcommunity.microsoft.com/t5/microsoft-teams/call-disconnects-while-on-mute/m-p/1020879


Dieselboy

So which RFC comes into play during mute stage of a call?

Muting a call basically stops your side from sending media or still sends media but silence inside the packets. In our case, media is halted. Though, we cant just stop the RTP media and nothing else because it will be the same as hanging up. So if RTP is stopped then something must be signalled to the remote side to say "still here, just not sending you any audio".

Microsoft lists a whole bunch of RFCs that they have implemented. in Teams Direct Routing. One of them is 6337 which relates to media hold. I admittedly have never looked at this previously so I'm unsure on it.

https://docs.microsoft.com/en-us/microsoftteams/direct-routing-protocols

deanwebb

To start the conversation: Microsoft will not budge from where they sit.

So, can the solution be to use physical muting instead of software muting to control sound, with unmuted software continuing to send signals?
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Dieselboy

That's the workaround but with these new Jabra headsets that are "ms teams certified* when muting the headset, it in turn mutes teams 🤣

I think I have got the pstn provider to acknowledge that this is a fault with them. I've suggested that they capture traffic ingress from Microsoft to confirm what is being received. The ms teams RFC list I think covers the sip reinvite process. So if that is indeed what is coming in, then why is it not getting to BT. Or, is their SBC discarding it. I recall having to use sip translation rules in the past to get some interop working in the past.

Btw this impact is high as it's every customer that has teams.

deanwebb

Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Dieselboy

The latest on this is that it now appears intermittent. For some calls it is working but it is not yet known why.

New therory - PSTN provider uses multiple SIP trunks for resilience and load balancing and there is a difference of applied configuration on them somewhere.

To be continued...

icecream-guy

:professorcat:

My Moral Fibers have been cut.

Dieselboy

Quote from: icecream-guy on May 31, 2022, 02:57:28 PM
need to do a packet capture or several, that will tell all.

For some reason the telco won't do it. They have captures but just one, seems like it's in their core network somewhere. When I ask for multiple, I get wishy washy responses.

Dieselboy

Telco said they are unable to take multiple captures because on the ms teams side the traffic is encrypted.  :squint:


Anyway this is resolved now. The provider uses Audiocodes SBC. There is a specific option in there to always send media during silence. I suggested that as a possible resolution option. Another option which I located online was to "force transcoding". The provider did make a codec change because they noticed a difference between the two resilient trunks to the upstream carrier (BT) and when I mentioned transcoding as a possible solution they realised that the few calls which didnt get disconnected while on mute were invoking transcoding because of the upstream advertising the wrong g711 codec which in turn invoked transcoding anyway.
The force transcoding means that the media is terminated on the provider and then initiated again from the provider into the upstream carrier, and during a mute event, the upstream carrier always receives RTP packets which is why it works around the issue there. We're unsure but maybe "always send mediad during silence" does sort of the same thing.

Unfortunately I didnt get to hear:
a) what the actual resolution was
and
b) if ms teams are indeed signalling the mute signal in some way or not

For b), when a ms teams call does go on mute, Microsoft must not simply stop sending traffic because this is picked up as a lost call (hence the disconnection). Microsoft MUST be compliant in some way, such as sending RTCP or SIP re-INVITE and I wasnt able to confirm this. Though, Microsoft do publish on their docs a while list of RFCs they have implemented with this MS Teams direct routing and included in there is "media call hold" which I believe covers the mute scenario. MS docs explain that they signal this through a re-invite. Our provider was sure that they're not getting this. So this is actually back with them and I probably wont be involved from here out.

Looking around the web and another PSTN provider we tested with, there are a fair amount of people complaining of this issue, some not really knowing why the call was disconnected. When we raised it with Microsoft, they said it's completely normal / no issue - but I think this was just the case of a support engineer not knowing what to do and/or just trying to close a stale ticket.

deanwebb

Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Dieselboy

Just trying to remember this for my notes.
I found a old forum from I think 2016 where someone had this exact issue with calls dropping and were able to play around with their Audiocodes SBC to get it resolved. They posted that, forcing "transcode always" resolves the issue. As well as the other option "always send media during silence". Depending on the software version, the options are or are not available. I've never touched an audiocodes device so I'm not sure what the current software number is, but the post from 2016 was stating that v6.9 has "transcode always" but the "always send media during silence" is only available from v7.1+

Sort of found the links:
https://flinchbot.com/ucnow/index.php/2016/11/29/bt-sip-trunk-calls-dropping-after-2-minutes-on-mute/

Original website that hosted the forum is still down as it was back in june. So the only way I could view the content was to use the wayback machine:
https://web.archive.org/web/20161228204721/http://www.gecko-studio.co.uk/bt-sip-trunk-calls-dropping-after-2-minutes-on-mute/

deanwebb

Be interesting if this is also your solution or if things changed to where it's not.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Dieselboy

Yes indeed. Though at that point I was just grateful it was fixed. It was not easy to make progress and with the ITSP and Telco inc. Microsoft stating that it was "normal and no issue, closing case" I was grateful to have made progress at all.
It did get to the point where I was leaning to move all of our impacted customers away from that design onto something that worked for them. Though we did a test and had intermittent call disconnects with another ITSP anyway, probably due to the same issue on the backend.