Telecom of
Tomorrow
|
TTI/Vanguard - The Power of Networks
We Still Live in 1938
In the 1930s, Bell Labs conducted studies on human hearing
They selected 300hz to 3400hz as being reasonable for voice quality.
Enshrined in the digital telephony era via the PCM 16bit, 8khz mono format
Intelligibility
Original Primary Concern
“Frequency limitation is essentially an economic one, subject to
change as conditions change.”
- A.H. Ingles, 1938
Why are we tethered to 83+ year old
economic considerations?
Important information is contained in the <250hz and >4000hz+ range
- Subjects could determine talking from singing and the sex of the speaker with reasonable accuracy when all data
below 5000hz was removed
Previous models didn’t contend with loud ambient noises, conference rooms, or kids on Zoom school in the next room.
Critically important with the rapid rise of remote collaboration amongst all genders/ages/languages
Speech Quality as conceived by H.W. Gierlich and can be extended for video
Focus on Overall Quality
Overall
Quality
Sound Quality &
Naturalness
Listening Effort
Talking Effort
Conversational Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Original Primary Concern
These new requirements need to be
baked into the network
2020s Telecom Network
- Loosely coupled to legacy PSTN networks
- Elastic, running distributed on any
compute/networking equipment worldwide
- Uses a vast array of connectivity paths
- Can provide 1-to-1 or centralized mixed audio/video
- Programmability via APIs that provide rich, seamless
control and metadata
Programmability
All features and metadata accessible via API
- Allow for complete global command & control of all resources
- Realtime contextual information about on-going calls, conferences, or access
- Get distance pings for all participants, geolocation information, realtime network analytics
- Full command & control of conference layout, participant actions, and settings
The Network Must Satisfy Quality
& Programmability Requirements
Telecom of Tomorrow
Background Noise
- Suppression as a built-in commodity
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Intelligibility
Video Quality
Background Noise
Intelligibility
Telecom of Tomorrow
Background Noise
- Suppression as a built-in commodity
Double Talk
- Excellent echo cancellers in WebRTC
- Full room cancellation
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Video Quality
Intelligibility
Use Case - Hybrid Work
As companies return to office, how do we elegantly engage remote employees
Telecom of Tomorrow
Background Noise
- Suppression as a built-in commodity
Double Talk
- Excellent echo cancellers in WebRTC
- Full room cancellation
Network Conditions
- Latency as a top-tier concern
- ML + SD-WAN
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Recommended Latency
VoLTE defines the requirements for voice call latency as 100ms or less (one-way), VoLTE video latency as 150 ms
Speed of Sound
343m/s @ 20’C
Recommended Latency
VoLTE defines the requirements for voice call latency as 100ms or less (one-way), VoLTE video latency as 150 ms
Maximum Latency is
1 Arc de Triomphe
Intelligent Networking Routing
- Utilize Cloud, Near-Edge, and Edge nodes to actively manage
participants - 25ms one-way latency goals
- Location of centralized muxing can be moved during a conversation
- Optimize location for greatest participant happiness
- Constantly examine the state of the network to provide optimal paths
- ML-based network analysis can detect disruptions or uncover path
optimizations on a per-endpoint basis
Use Case - Karaoke
Distributed Video Compute
Centralized muxing for a broadcast level experience
- Everyone receives the same video & audio experience
- Can be siloed, air-gapped, or run independent of centralized command & control when necessary
- Works for 1000+ people in an interactive experience
- GPU offloading, dedicated video cores allow for massive expansion
- Scaling to unlimited number worldwide
- FHE to allow for edge-based node computation without risk
Can take in any format (SIP, PSTN, NDI, h264, VP8/9, AV1) and bridge them together
Use Case - Interactive Concerts / Events
- Musicians can host major concerts with the audio of thousands mixed and reproduced live
- Sports venues can create stadium audiences from at-home fans
- Jam sessions or Behind-the-Scenes moments that are in real-time and organic
Telecom of Tomorrow
Overall
Quality
Background Noise
- Suppression as a built-in commodity
Double Talk
- Excellent echo cancellers in WebRTC
- Full room cancellation
Network Conditions
- Latency as a top-tier concern
- AI + SD-WAN
Listening / Talking / Conversational Effort
- Cognitive challenges of degraded speech are significant
Jonathan Peelle, Department of Otolaryngology, Washington University in Saint Louis
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5821557
- Hardware Improvements - beamforming, higher quality cameras
- Broadcast-quality audio mixing
- Realtime speaker diarization/translation
Seamlessly engage and disengage from conversations
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Use Case - Remote Work / Interaction
- With instant on/instant off video and audio it allows for seamless interaction across language boundaries
- Health/Wellness - Camera sensor systems for health monitoring, Virtual AI Doctor
- Work - Remote Working Teams, Realtime Translation with Language Generation
- Fitness - Community-based VR interaction
- Remote Learning - Enabling hyperlocal/global opportunities
Telecom of Tomorrow
Background Noise
- Suppression as a built-in commodity
Double Talk
- Excellent echo cancellers in WebRTC
- Full room cancellation
Network Conditions
- Latency as a top-tier concern
- AI + SD-WAN
Listening / Talking Effort
- Cognitive challenges of degraded speech are significant
- Hardware Improvements
- Broadcast-quality audio mixing
Sound / Video Quality
- Better Codecs
- ML Augmentation
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Use Case - Field Workers / Servicemen
- Endpoints can all be connected to a mesh drone and/or direct-to-satellite (Base stations in
Space)
- A/V shared amongst teams while disconnected from central C&C
- HD resolution stored locally while a different resolution can be transmitted
- Information about the client endpoints (approx. distance, etc…) is sent over side data
channels
Machine Learning
Super Resolution Video
- Upscale poor endpoint performance in realtime or in post-production.
Audio Optimization
- Compensate on both the server and client side for poor audio / dropouts
Voice Decon/Reconstruction
- Original voice speech models to allow for multi-model, ultra-bandwidth constrained, or other conditions
where text is preferred.
Telecom of Tomorrow
Background Noise
- Suppression as a built-in commodity
Double Talk
- Excellent echo cancellers in WebRTC
- Full room cancellation
Network Conditions
- Latency as a top-tier concern
- AI + SD-WAN
Listening / Talking Effort
- Cognitive challenges of degraded speech are significant
- Hardware Improvements
- Broadcast-quality audio mixing
Intelligibility / Sound Quality
- Better Codecs
- Better hardware
Expectation
Perfection
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
Video Quality
Overall
Quality
Sound Quality
& Naturalness
Listening Effort
Talking Effort
Conversational
Effort
Double Talk
Speech & Video
Characteristics
Expectation
Network Conditions
Background Noise
Intelligibility
2028 doesn’t need to be
like 1938
Evan McGee
CTO & Founder @
Twitter / Instagram / Reddit: @startledmarmot
SignalWire - Telecom of Tomorrow

SignalWire - Telecom of Tomorrow

  • 1.
  • 2.
  • 3.
    In the 1930s,Bell Labs conducted studies on human hearing They selected 300hz to 3400hz as being reasonable for voice quality. Enshrined in the digital telephony era via the PCM 16bit, 8khz mono format
  • 4.
  • 5.
    “Frequency limitation isessentially an economic one, subject to change as conditions change.” - A.H. Ingles, 1938
  • 6.
    Why are wetethered to 83+ year old economic considerations?
  • 7.
    Important information iscontained in the <250hz and >4000hz+ range - Subjects could determine talking from singing and the sex of the speaker with reasonable accuracy when all data below 5000hz was removed Previous models didn’t contend with loud ambient noises, conference rooms, or kids on Zoom school in the next room. Critically important with the rapid rise of remote collaboration amongst all genders/ages/languages
  • 8.
    Speech Quality asconceived by H.W. Gierlich and can be extended for video Focus on Overall Quality Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality Original Primary Concern
  • 9.
    These new requirementsneed to be baked into the network
  • 10.
    2020s Telecom Network -Loosely coupled to legacy PSTN networks - Elastic, running distributed on any compute/networking equipment worldwide - Uses a vast array of connectivity paths - Can provide 1-to-1 or centralized mixed audio/video - Programmability via APIs that provide rich, seamless control and metadata
  • 11.
    Programmability All features andmetadata accessible via API - Allow for complete global command & control of all resources - Realtime contextual information about on-going calls, conferences, or access - Get distance pings for all participants, geolocation information, realtime network analytics - Full command & control of conference layout, participant actions, and settings
  • 12.
    The Network MustSatisfy Quality & Programmability Requirements
  • 13.
    Telecom of Tomorrow BackgroundNoise - Suppression as a built-in commodity Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Intelligibility Video Quality Background Noise Intelligibility
  • 14.
    Telecom of Tomorrow BackgroundNoise - Suppression as a built-in commodity Double Talk - Excellent echo cancellers in WebRTC - Full room cancellation Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Video Quality Intelligibility
  • 15.
    Use Case -Hybrid Work As companies return to office, how do we elegantly engage remote employees
  • 16.
    Telecom of Tomorrow BackgroundNoise - Suppression as a built-in commodity Double Talk - Excellent echo cancellers in WebRTC - Full room cancellation Network Conditions - Latency as a top-tier concern - ML + SD-WAN Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality
  • 17.
    Recommended Latency VoLTE definesthe requirements for voice call latency as 100ms or less (one-way), VoLTE video latency as 150 ms
  • 18.
  • 19.
    Recommended Latency VoLTE definesthe requirements for voice call latency as 100ms or less (one-way), VoLTE video latency as 150 ms Maximum Latency is 1 Arc de Triomphe
  • 20.
    Intelligent Networking Routing -Utilize Cloud, Near-Edge, and Edge nodes to actively manage participants - 25ms one-way latency goals - Location of centralized muxing can be moved during a conversation - Optimize location for greatest participant happiness - Constantly examine the state of the network to provide optimal paths - ML-based network analysis can detect disruptions or uncover path optimizations on a per-endpoint basis
  • 21.
    Use Case -Karaoke
  • 22.
    Distributed Video Compute Centralizedmuxing for a broadcast level experience - Everyone receives the same video & audio experience - Can be siloed, air-gapped, or run independent of centralized command & control when necessary - Works for 1000+ people in an interactive experience - GPU offloading, dedicated video cores allow for massive expansion - Scaling to unlimited number worldwide - FHE to allow for edge-based node computation without risk Can take in any format (SIP, PSTN, NDI, h264, VP8/9, AV1) and bridge them together
  • 23.
    Use Case -Interactive Concerts / Events - Musicians can host major concerts with the audio of thousands mixed and reproduced live - Sports venues can create stadium audiences from at-home fans - Jam sessions or Behind-the-Scenes moments that are in real-time and organic
  • 24.
    Telecom of Tomorrow Overall Quality BackgroundNoise - Suppression as a built-in commodity Double Talk - Excellent echo cancellers in WebRTC - Full room cancellation Network Conditions - Latency as a top-tier concern - AI + SD-WAN Listening / Talking / Conversational Effort - Cognitive challenges of degraded speech are significant Jonathan Peelle, Department of Otolaryngology, Washington University in Saint Louis https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5821557 - Hardware Improvements - beamforming, higher quality cameras - Broadcast-quality audio mixing - Realtime speaker diarization/translation Seamlessly engage and disengage from conversations Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality
  • 25.
    Use Case -Remote Work / Interaction - With instant on/instant off video and audio it allows for seamless interaction across language boundaries - Health/Wellness - Camera sensor systems for health monitoring, Virtual AI Doctor - Work - Remote Working Teams, Realtime Translation with Language Generation - Fitness - Community-based VR interaction - Remote Learning - Enabling hyperlocal/global opportunities
  • 26.
    Telecom of Tomorrow BackgroundNoise - Suppression as a built-in commodity Double Talk - Excellent echo cancellers in WebRTC - Full room cancellation Network Conditions - Latency as a top-tier concern - AI + SD-WAN Listening / Talking Effort - Cognitive challenges of degraded speech are significant - Hardware Improvements - Broadcast-quality audio mixing Sound / Video Quality - Better Codecs - ML Augmentation Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility
  • 27.
    Use Case -Field Workers / Servicemen - Endpoints can all be connected to a mesh drone and/or direct-to-satellite (Base stations in Space) - A/V shared amongst teams while disconnected from central C&C - HD resolution stored locally while a different resolution can be transmitted - Information about the client endpoints (approx. distance, etc…) is sent over side data channels
  • 28.
    Machine Learning Super ResolutionVideo - Upscale poor endpoint performance in realtime or in post-production. Audio Optimization - Compensate on both the server and client side for poor audio / dropouts Voice Decon/Reconstruction - Original voice speech models to allow for multi-model, ultra-bandwidth constrained, or other conditions where text is preferred.
  • 29.
    Telecom of Tomorrow BackgroundNoise - Suppression as a built-in commodity Double Talk - Excellent echo cancellers in WebRTC - Full room cancellation Network Conditions - Latency as a top-tier concern - AI + SD-WAN Listening / Talking Effort - Cognitive challenges of degraded speech are significant - Hardware Improvements - Broadcast-quality audio mixing Intelligibility / Sound Quality - Better Codecs - Better hardware Expectation Perfection Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility Video Quality Overall Quality Sound Quality & Naturalness Listening Effort Talking Effort Conversational Effort Double Talk Speech & Video Characteristics Expectation Network Conditions Background Noise Intelligibility
  • 30.
    2028 doesn’t needto be like 1938
  • 31.
    Evan McGee CTO &Founder @ Twitter / Instagram / Reddit: @startledmarmot