WebRTC enables real-time communication through the web, while SIP is a protocol commonly used for initiating and maintaining real-time communication sessions, particularly in telephony networks.
Bridging WebRTC with SIP is essential in many industries, such as remote healthcare, education, and customer support, where current modern video solutions must communicate with telephony infrastructure at scale. The integration of WebRTC-based video conferencing with legacy SIP-based systems enables seamless communication across platforms and devices. In this presentation, we will talk about lessons learned and explore different approaches to bridging WebRTC and SIP, discussing their advantages and disadvantages.
2. @WebRTCventures
Agenda
• WebRTC and SIP
• Why?
• Integration Challenges
• Architectural options
• New WebRTC and SIP integrated applications
• Q&A
Bridging WebRTC with SIP!
3. WebRTC
An open framework that enables Real-Time Communications (RTC) capabilities in
apps and in the browser. Officially standarazied in 2021.
@WebRTCventures
Latency VS other low latency protocols
4. SIP
@WebRTCventures
SIP (Session Initiation Protocol) is a signaling protocol used for initiating, modifying,
and terminating multimedia communication, such as voice and video calls, over IP
networks. SIP is widely adopted in VoIP and UC (Unified Communications) systems.
5. Why using both technologies?
• WebRTC excels in browser and mobile
RTC
• WebRTC provides easy to use library for
P2P media exchange.
• SIP brings extensive interoperability. It
handles session establishment and call
control needed for integration with existing
telephony systems.
@WebRTCventures
7. We should care, what are SIP/VoIP and WebRTC differences
@WebRTCventures
Some Interoperability differences in SIP vs WebRTC; (o) means optional.
8. Architecture philosophies for your apps to integrate with
VoIP
@WebRTCventures
• Use SIP as much as you can
• Use your preferred signaling protocol with
WebRTC. Translate to SIP only when
necessary.
If you don’t need to host the infrastructure, you could just use a
CPaaS that provides this functionality (you can ALMOST forget
about SIP and WebRTC)
10. Architecture Option A, just use SIP end to end!
Pros:
• Each participant only gets one downloaded stream each
for audio and video
Cons:
• Needs a central SIP server that mixes all audio and video.
Heavy processing required on MCU, but more predictable
bandwidth requirements
• Layout limitations, managed server side
• Complex, SIP knowledge required
M
C
U
@WebRTCventures
SIP/RTP
11. ?
@WebRTCventures
Pros:
• Same as with A, each participant gets one downloaded
stream each for audio and video if using MCU
• Easier for developers not familiar with SIP, being able to
use other more dev friendly protocols like WS and JSON
Cons:
• Same as with A, needs a central SIP server that mixes all
audio and video. High CPU server requirements.
• Layout limitations
Architecture Option B, WebRTC-SIP PBX and your signaling
top choice
WS/WebRTC
12. Architecture Option C, WebRTC media server to SIP
@WebRTCventures
Pros:
• Lighter server-side processing if we use a SFU
WebRTC media server for video
• Leverage capabilities of WebRTC
Cons:
• Each participant gets as many downloaded
stream as participants
• Distributed architecture complexities
13. Architecture Option C. A real world example of a media
flow with hybrid SFU for video and MCU for audio
@WebRTCventures
14. Real world situations, existing WebRTC solutions that
need SIP
@WebRTCventures
Bringing in a headless
services to provide SIP
audio and video on
demand…
15. What’s next in the future of telephony and WebRTC
integrations?
@WebRTCventures
• Advanced call insights
• Omnichannel
• Chat and voice bots
• 2D/3D virtual conferences and avatars
17. Thank you!
Learn more about us:
https://webrtc.ventures
Follow us:
@WebRTCventures
@lbertogon
Editor's Notes
We are experts building, integrating, testing and managing live video and chat applications for web and mobile.
We have worked quite a bit with this technologies and I am going to talk about why and how this 2 technologies interconnect.
Some examples of situations where you might need sub second latencies: Video chat -- When you want to have an interactive call, Professional Events -- Online meetings and webinars
Live broadcasting -- When you want to broadcast an event or a sports game to large audiences with minimum latency (superbowl), Telepresence -- When you want to remotely operate trucks, drones, cars, etc
Gaming -- When you want to send the visuals of a game to another player in realtime,Online betting or Auctions(aak·shnz)
IoT Endpoints -- Endpoints like ATMs, kiosks, bus stops, and vending machines can be embedded with webRTC engines
Session Initiation Protocol
Combining WebRTC and SIP allows leveraging the strengths of both technologies. Useful for:
Integrating WebRTC-based video conferencing solutions with legacy video conferencing systems
Connecting WebRTC-based voice or video calls to traditional PSTN networks
So, should we care?
I somewhat agree with the generic answer provided here. It is difficult to predict if SIP will still exist in the next decade, but it is highly probable. The main reason, as highlighted in bullet point number three, IT IS the well-established ecosystem in telephony infrastructures.
It seems we aren’t going to get to know the answer from chatgpt…But seems like webrtc and sip aren’t going anywhere soon..
Integrate this 2 technologies isn’t that simple…
What this differences mean to you? Well, for once, you might need to transcode media if, for example H264 is used for your VoIP infrastructure and VP8 for Web apps.
Another real-world integration issue happened to us when dealing with Cisco Z70 devices. A videophone hardware device that uses SIP video.
When we missed the initial frame we wouldn’t get video. This happened due to the implementation differences on each leg!
SIP implementation of this devices didn’t automatically handle keyframe generation as WebRTC does. So we had to pass a SIP info message with a
XML Schema for Media Control to trigger a keyframe from the videophone
In the past we worked with companies with 2 very clearly identifiable architecture preferences for this integrated solutions…
SIP everywhere
SIP only when absolutely necessary
VERY high level view of 3 options, keep in mind here I didn’t include SIP proxies or SBCs that are common in telephony architectures
Bottom, where client apps or web apps connect directly via sip and using RTP or WebRTC
Top, where clients also connect directly, but, an alternative signaling mechanism is used (SIP is only spoken at the other leg of the PBX)
Left, a hybrid approach with WebRTC media servers later connected with the PBX
Messaging server would be ideal but if not possible we might need to use SIP from the sip legacy devices…and handle the conversion.
For simplicity not included in diagram.
MCU ( Multipoint Conferencing Unit) controls a composited layout of that video for everyone, which can be nice but also introduces latency.
PJSIP is hard, mobile native SIP is hard, no easy to use demos out there
WebRTC-SIP gateway, which integrates WebRTC and SIP into a single platform
Eliminates need for an intermediary gateway. BUT, may not be as flexible. If I want, for example, to initialize calls or conferences I will need to follow whatever the PBX implemented
“?” Because the server could be an SFU, although not common with SIP legacy networks that don’t support it.
Suitable only when all the endpoints are using the same codec for example
A common approach is to use WebRTC SFUs for video and MCU for audio only.
Describe SFU architecture... Selective Forwarding Unit
Simulcast allows webrtc clients to publish multiple versions of the same source. Critical for large calls!
Opus RED also useful to send redundant audio to handle packet loss
Noise reduction and echo cancellation built-in --
More complex logic to manage and challenges like monitoring complexity due to having a more distributed infrastructure
SIP Video not possible as described! We need an MCU or a system acting as an MCU to support SIP video
(20:00 min, if more skip) Usin Janus API and Freesiwtch mod_verto to manage the offer/answer and other signaling mechanisms
There are still challenges with this approach. One would be the detection of the active speaker. To solve that we’d need to detect who is speaking before audio mixing. (esl)
SIP video could be possible too, but with some changes to this by doing video mixing in Freeswitch or in the WebRTC media server
Cool, now we can communicate, but is there anything else to it?
Once we have figure out the architecture, that’s not all. There are more things that can be build around this integrations…
(21min or more? skip!!)
Definitively not the most efficient but powerful and less complex to do. It doesn’t require any changes on the existing webrtc infrastructure
Cool, so if I haven’t confused you yet with the so many options to do this. We can now communicate, but is there anything else to it?
Once we have figure out the architecture, that’s not all. There are more things that can be build around this integrations…
---
flowchart TB
subgraph SFU Server
SFU((SFU Server))
end
subgraph User 1
A((User 1))
end
subgraph User 2
B((User 2))
end
subgraph Headless[Headless Web Server]
C((Headless WebRTC</br>client))
CLayout((Layout Generation</br> with Webcodecs))
CGateway((WebRTC to SIP service))
end
subgraph IP/PBX
PBX((IP/PBX))
end
A -->|WebRTC| SFU
B -->|WebRTC| SFU
SFU -->|WebRTC| A
SFU -->|WebRTC| B
C -->|WebRTC| SFU
SFU -->|WebRTC| C
C -->|Media| CLayout
CLayout -->|WebRTC| CGateway
CGateway -->|RTP| PBX
style Headless fill:#E6C91E
(21:00 min)
Advanced Call Insights:
Real-time analytics provide deep insights into call interaction, quality, user engagement, and performance metrics.
For example HR audio interviews, healthcare, etc
Leveraging data for continuous improvement in call experiences (call centers for example)
Omnichannel Communication:
Smooth transitions across various communication channels (voice, video, text) while maintaining context.
Enhance user experience and reducing friction
Chat and Voice Bots: (we saw it in the presentations Tuesday by the Signalwire team, for example)
Integration of AI-powered chatbots and voice bots for enhanced customer interactions. E.g call centers
Automated responses and intelligent routing streamline support and engagement.
2D/3D Virtual Conferences and Avatars:
Engaging virtual conferences enriched with 2D and 3D avatars for more immersive interactions.
More common for gaming and social verticals, also streaming one to many scenarios. But will show a case where could be used for calls…
I wanted to share some screenshots showcasing this use cases, but I thought a short 3 minutes video with 4 demos would be better
Keep in mind some are integrations with 3rd party platforms or cloud providers
Call agent real time and post call insights
Chat bots for scheduling
Voice bots for management of schedules
Audio avatars
Uses three-vrm npm module
That brings us to the end of the presentation. I’d like to summarize some main tips:
Use cases: one to one, multiparty, webinars, one to many broadcast..etc. That will influence which architecture you use. Or maybe you don’t even need low latency streaming?
Don’t underestimate the challenges of the infrastructure setup, low latency systems require faster that usual autoscalable services. And we need to make sure we recover quickly from failure to meet demands
What are you willing to compromise? (quality, latency, battery performance…) cost?, do you really need low latency?
optimize, different architecture changes and some optimizations can make the difference in a product: codecs, configuration options like svc, hints for the type of media…
With that, here concludes my presentation about Bridging WebRTC with SIP.
I wanted to conclude with some of those quick demos as some food for though about how SIP and WebRTC can be integrated and some integration examples today.
Just like the interconnected gears in a well-oiled machine, the convergence of these technologies empowers industries like healthcare, education, and customer support.
IF you are interested about this topics feel free to ping me directly in twitter or follow us in any social platform!
Thank you