The document provides an overview of an interactive voice conference on voice processing theory and algorithms for successful smart speakers and voice-enabled products. The agenda includes discussions on voice recognition algorithms, audio front-end processing, trigger word detection, beamforming, noise reduction, acoustic echo cancellation, and considerations for microphone and speaker integration in product design. Performance metrics and factors that affect various voice processing techniques are also outlined.
Nhằm giúp người mới học soidworks có thể nắm bắt được các tính năng của phần mềm, trung tâm advance cad chia sẻ nội dung của giáo trình đào tạo solidworks để nhiều người có thể tự học
Nhằm giúp người mới học soidworks có thể nắm bắt được các tính năng của phần mềm, trung tâm advance cad chia sẻ nội dung của giáo trình đào tạo solidworks để nhiều người có thể tự học
Full immersion is achieved by simultaneously focusing on the broader dimensions of visual quality, sound quality, and intuitive interactions. This presentation discusses how:
- Technology improvements continue to drive more immersive experiences, especially for VR and AR
- High Dynamic Range (HDR) will enhance the visual quality on all our screens
- Scene-based audio is a new paradigm for 3D audio
- Natural user interfaces like voice, gestures, and eye tracking are making interactions more intuitive
New code requirements in the USA and Canada are coming into effect that will mean changes to voice communication systems. This presentation by David Sylvester of the Mircom Group of Companies discusses the changes and what the will mean for the life safety & fire protection industries.
Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012Ellis Reid
This presentation, given by JC Morizur at TVNext 2012, outlines how service providers can use high quality multichannel audio to help them improve the quality of experience (QoE) and quality of service (QoS) of their multiscreen entertainment offerings. Topics covered in this presentation include the following:
• What consumers are looking for in their multiscreen viewing and listening experience.
• How delivering a high quality multi-channel audio experience enhances overall video QoE?
• How providers are using adaptive audio and video bit-rate switching that seamlessly adjusts to bandwidth availability while maintaining a high quality experience.
• Examples of how some vendors and service providers are using sound to differentiate their products and services.
• How the audio experience impacts ad delivery and content monetization.
"Embracing Web 2.0 and New Media Communications"arester
Presentation by Renee Basick, Interim Director, Chicago Media Initiatives Group (University of Chicago) and Aaron Rester, Manager of Electronic Communications, University of Chicago Law School. Presented Monday, December 10th, at the CASE V Conference "Connecting the Best" in Chicago, Illinois.
HD Voice: The Hurdles and how to overcome the codec warJohn Gallagher
Global IP Solutions (GIPS) held a webinar on July 14, 2009.
"It’s all Gone HD! Overcoming the hurdles to supplying HD Voice and resolving the Codec war"
A COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptxJessicaWein1
MaestroVision's CEO, Claude Turcotte has over 20 years of experience installing hundreds of interview recording systems for police departments and child advocacy centers in the US & Canada. He's used his audio/video and broadcasting expertise to create this insightful (unbiased) presentation which will help you find the right interview recording system including:
- How to select your equipment
- What features to look for in your recording software
-What security features are a must for protecting your recordings
-What minimum capabilities should be required when submitting a bid for an interview room
- How to detail the room to ensure optimal sound and video quality, and more!
(Useful for Police, Detectives, Investigators, Child Advocacy Centers, Victim Service Nonprofits, and more!)
Email jwein@maestrovision.com to book your presentation today!
RTASC Lite - Real Time Audio System Check LiteDru Wynings
RTASC helps you verify hardware and software functionality of an audio product. It's specifically targeted at OEMs that make Voice-enabled products with both microphones and loudspeakers.
Has video really killed the audio star?Cisco Canada
Video has drastically transformed the way we work with remote teams, business partners and customers. We have gone from faceless “who just joined?” audio only solutions to HD quality “better than being there” video options that foster active participation no matter your location or device.
Cisco has modernized and simplified our Video solutions. In this session, we will cover our repertoire of end points (from the pocket to the boardroom), the infrastructure that powers these end points (cloud, hybrid and on-premise), and the integration with other collaboration tools and applications (interoperability with Cisco and other vendor soft phones, hard phones, and conferencing).
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Full immersion is achieved by simultaneously focusing on the broader dimensions of visual quality, sound quality, and intuitive interactions. This presentation discusses how:
- Technology improvements continue to drive more immersive experiences, especially for VR and AR
- High Dynamic Range (HDR) will enhance the visual quality on all our screens
- Scene-based audio is a new paradigm for 3D audio
- Natural user interfaces like voice, gestures, and eye tracking are making interactions more intuitive
New code requirements in the USA and Canada are coming into effect that will mean changes to voice communication systems. This presentation by David Sylvester of the Mircom Group of Companies discusses the changes and what the will mean for the life safety & fire protection industries.
Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012Ellis Reid
This presentation, given by JC Morizur at TVNext 2012, outlines how service providers can use high quality multichannel audio to help them improve the quality of experience (QoE) and quality of service (QoS) of their multiscreen entertainment offerings. Topics covered in this presentation include the following:
• What consumers are looking for in their multiscreen viewing and listening experience.
• How delivering a high quality multi-channel audio experience enhances overall video QoE?
• How providers are using adaptive audio and video bit-rate switching that seamlessly adjusts to bandwidth availability while maintaining a high quality experience.
• Examples of how some vendors and service providers are using sound to differentiate their products and services.
• How the audio experience impacts ad delivery and content monetization.
"Embracing Web 2.0 and New Media Communications"arester
Presentation by Renee Basick, Interim Director, Chicago Media Initiatives Group (University of Chicago) and Aaron Rester, Manager of Electronic Communications, University of Chicago Law School. Presented Monday, December 10th, at the CASE V Conference "Connecting the Best" in Chicago, Illinois.
HD Voice: The Hurdles and how to overcome the codec warJohn Gallagher
Global IP Solutions (GIPS) held a webinar on July 14, 2009.
"It’s all Gone HD! Overcoming the hurdles to supplying HD Voice and resolving the Codec war"
A COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptxJessicaWein1
MaestroVision's CEO, Claude Turcotte has over 20 years of experience installing hundreds of interview recording systems for police departments and child advocacy centers in the US & Canada. He's used his audio/video and broadcasting expertise to create this insightful (unbiased) presentation which will help you find the right interview recording system including:
- How to select your equipment
- What features to look for in your recording software
-What security features are a must for protecting your recordings
-What minimum capabilities should be required when submitting a bid for an interview room
- How to detail the room to ensure optimal sound and video quality, and more!
(Useful for Police, Detectives, Investigators, Child Advocacy Centers, Victim Service Nonprofits, and more!)
Email jwein@maestrovision.com to book your presentation today!
RTASC Lite - Real Time Audio System Check LiteDru Wynings
RTASC helps you verify hardware and software functionality of an audio product. It's specifically targeted at OEMs that make Voice-enabled products with both microphones and loudspeakers.
Has video really killed the audio star?Cisco Canada
Video has drastically transformed the way we work with remote teams, business partners and customers. We have gone from faceless “who just joined?” audio only solutions to HD quality “better than being there” video options that foster active participation no matter your location or device.
Cisco has modernized and simplified our Video solutions. In this session, we will cover our repertoire of end points (from the pocket to the boardroom), the infrastructure that powers these end points (cloud, hybrid and on-premise), and the integration with other collaboration tools and applications (interoperability with Cisco and other vendor soft phones, hard phones, and conferencing).
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
2. Introducing the Speakers
Paul Beckmann
• PhD, MS, and BS from MIT. All in EE.
• Technical specialties: signal
processing, audio product
development, and tools.
Mike Klasco
• Combined MS/PhD ABT NYU
• Audio product development,
acoustics, transducers, materials
and sourcing
2020 Interactive Voice Con
Founder and CTO of DSP Concepts Founder and CEO, Menlo Scientific
3. Outline
• Kickoff
• Voice Processing Theory [45 minutes]
• Algorithms
• Measuring performance
• Processor requirements
• Product design guidelines
• Break [15 minutes]
• Demos? [15 minutes]
• What Happens in Practice [30 minutes]
• Microphone integration issues
• The enclosure – the space between the mics and speakers
• Loudspeakers, acoustic
• Q&A [15 minutes]
2020 Interactive Voice Con
5. Types of Voice Recognition Algorithms
• Voice trigger
• Identifies a single word or phrase like “Alexa” or “Hey Siri”
• Small vocabulary voice recognition
• Fixed vocabulary set for embedded applications. 10’s of
words.
• “Turn on the lights”, “Next track”, etc.
• Full voice recognition
• Large vocabulary set. 1,000’s of words
• “Play Beatles”
• Natural language understanding (NLU)
• Combines application specific information for more flexible
user interface
• “Play Music by the Beatles”, “Give me Beatles Music”, “I want
to listen to music by the Beatles”
• Can be combined with small vocabulary set
2020 Interactive Voice Con
6. Audio Front End = Microphone Cleaner
2020 Interactive Voice Con
Audio Front
End
Voice
Recognition
Mic Array N Channels 1 Channel
The Audio Front End (AFE) cleans up signals to improve the
performance of the voice recognition. It is like glasses for a camera.
Interfering
Noise
Device
Playback
Desired
Speech
7. Audio Front End Details
2020 Interactive Voice Con
Echo
Canceler
Trigger Word
& Voice
Recognition
Mic Array N Channels
1 Channel
Direction
of Arrival
Noise
Reduction
Beam-
former
Eliminates
loudspeaker
sound during
device playback
Determines
location of
sound source.
Used to steer
beamformer.
Combines multiple
microphone signals
to improve signal
quality.
Removes
various types of
noise
8. Comparing Amazon and Google
• 2 microphones only
• 65 to 71 mm spacing
• Mono or stereo
• High-end application processor required
• No variation in products
• No variation in performance
• Performance lags behind AVS
2020 Interactive Voice Con
Google AFE and Trigger
Word
3rd Party
AFE
Amazon
Trigger
Word
ASR
ASR
• Any number of microphones
• Any spacing
• Any number of playback channels
• Application processor or MCU solutions
• Wide variety of designs
• 2 to 7 microphones
• Different form factors
• Better performance
• Low cost designs possible
9. AVS Integration for AWS IoT
• Cost effective way to add Alexa voice
features
• Connects to the cloud
• Uses an RTOS and lightweight MQTT
network stack
• Suitable for low cost microcontrollers
• Will expand voice to a much larger
number of products
2020 Interactive Voice Con
https://docs.aws.amazon.com/iot/latest/developerguide/avs-integration-aws-iot.html
(AKA. “Alexa for Microcontrollers”)
10. Trigger Word
• Voice recognition algorithm trained for a single word or phrase
• “Alexa”, “OK Google”, “Bixby”, “Siri”, “Cortana”, etc.
• Available from multiple suppliers
• Amazon, Google, Baidu, etc.
• Sensory “Truly Handsfree”
• PicoVoice / SoundHound / Cyberon / etc.
• They all use machine learning
• Often optimized for low power consumption
• Sound → Voice Activity Detector → Key word detector
• Large models perform better
• Sensory: 17 kbyte → 1 Mbyte
2020 Interactive Voice Con
11. Characterizing Trigger Performance
• Probability of False Alarm
• How many times does the algorithm
accidentally trigger over a 24-hour period?
• Probability of Miss
• What % of trigger words are not detected by the
algorithm
• Trigger word algorithms have an adjustable
“sensitivity” setting that allows you to tradeoff
false alarms and misses.
• Amazon requires <3 false alarms per 24 hours
of continuous speech
2020 Interactive Voice Con
False Alarm Rate
ProbabilityofDetection
100%
Ideal operating point
Tune sensitivity based on allowable
false alarm rate
12. Wake Word Performance in Noise
SNR at microphone is main driver
of wake word performance
• Independent of distance
• Independent of room
reflections / reverb (for normal
household environments)
Improve your SNR to improve your wake
word performance.
2020 Interactive Voice Con
14. Beamforming Principles
• Beamformers are spatial filters. They
pass signals from certain directions and
reduce signals from other directions.
• Performance depends heavily upon the
geometry of the microphone array
• Fixed beamformers utilize FIR filters
• Time domain or frequency domain
• There are many ways to compute the filter
coefficients (MVDR, DAS, etc.)
2020 Interactive Voice Con
h1[n]
h2[n]
h3[n]
h4[n]
FIR Filters
15. DSPC Design Method: Maximize SNR
• Inputs to design
• Microphone geometry
• Look angle and beam width
• Diffuse field noise level
• Microphone SNR
• Signal is person’s voice in specified beam
• Noise = diffuse field noise + microphone self
noise
• Iterative design procedure maximizes SNR
2020 Interactive Voice Con
17. Optimal Array Geometries
2020 Interactive Voice Con
Far Field Products
180 or 360 Degree
Smart speakers
Middle of the room
180 Degree
Set-top box
Side of the room
Flat Line
Array
TVs, appliances
On a wall
High-End
Standard
Low-Cost
40 to 70 mm diameter works.
70 mm works the best
25 mm spacing between mics
75 mm total length
+7 dB +6.5 dB
+5 dB
+2 dB
+3 dB
+2 dB
+4 dB
18. SNR vs. Mic Geometry
Assumptions:
• 71 mm diameter
• Microphone array is in
diffuse field noise with SNR
= 50 dB
• Speech is at 60 dB in the
direction of the beam
• Beam width is 45 degrees
• Microphone SNR = 65 dB
• Look angle = 0 degrees
2020 Interactive Voice Con
19. Linear Arrays
• Linear arrays work well when in an end-fire
configuration.
• Requires person to be in a specified location.
• Provides 4 to 5 dB SNR improvement
• Broadside arrays work poorly and should be
avoided.
• Very little SNR improvement to low frequencies where
the bulk of speech energy is
• Use broadside arrays only as a last resort when the
industrial design dictates no other options
• Television
• Wall panel
2020 Interactive Voice Con
End-fire
Broadside
Intuition: beamformers use time
differences to steer beam. In broadside,
voice arrives at the same time at both
mics.
21. Stationary Noise Reduction
2020 Interactive Voice Con
Before
After
Example demonstrates improvement
in automotive environments
• Effective against:
• Fan noise
• Automotive road noise
• Microphone self noise
• Creates a model of the background
noise and then removes in real-time
• Improves ASR performance by 2 to 3
dB
22. Interference Canceler
• Effective against noise from:
• TVs
• Appliance self noise
• Air conditioners
• Requires a minimum of 2 microphones
• Combines beamforming, adaptive filtering,
and other statistical signal processing
techniques
• Effective for music and speech interferers
• Improves ASR performance up to 30 dB!
2020 Interactive Voice Con
2 Microphone Example
23. Adaptive Interference Canceler Performance
2020 Interactive Voice Con
• Measured in a typical living
room environment
• Interfering music noise
played
• Speech at constant level (62
dBC) at DUT
• Varied music level
• Speech and noise 2 meters
from DUT
Echo Plus
7-mic
DSPC 2-
mic
DSPC 4-
mic
8 dB
better
DSPC 6-
mic
11 dB
better
Echo 2
7-mic
Relative to Amazon Echo Plus and Echo 2
25. Acoustic Echo Cancellers (AEC)
• Eliminates loudspeaker sound at the microphone
• Enables Voice UI to function while music or text-to-
speech is active
• Music is usually ducked after the wake word is detected
• Best algorithms operate in the frequency domain
• Better cancellation
• Faster convergence
• Lower computation
• ERL = Echo Return Loss quantifies performance = How
many dB of loudspeaker signal is canceled by the AEC
Demo Setup
Single microphone with
loudspeaker close to the mic.
Mono playback in home
environment.
26. Factors Affecting AEC Performance
• What type of algorithm are you using?
• Time domain vs frequency domain
• LMS vs Kalman vs Other?
• Echo tail length
• How many msec of audio can you cancel?
• Longer is better but requires more processing
and memory
• Far-field smart speakers require 150 to 200
msec of echo tail
• Reverberation time of the room (lower
is better)
• Linearity of your loudspeakers
2020 Interactive Voice Con
27. Speaker Distortion Affects AEC
• This is usually the limiting factor for AEC performance
• Loudspeakers distort when playing loud or low frequencies
• Speakers need to be tuned to minimize distortion
• Rule of thumb:
1% THD AEC up to 40 dB
2% THD AEC up to 34 dB
3% THD AEC up to 30 dB
5% THD AEC up to 26 dB
10% THD AEC up to 20 dB
• Product developers must tradeoff low frequency sound quality vs. voice
performance
2020 Interactive Voice Con
28. Rule of Thumb for Speaker Distortion
1. Play a low frequency sine wave through
your loudspeaker and plot the
spectrum
2. You’ll see harmonics at multiples of the
fundamental frequency
3. The largest harmonic determines the
absolute limit of the echo canceler
4. ERLE performance based on difference
between fundamental and harmonic
5. Repeat at different output levels and
frequencies
2020 Interactive Voice Con
OK. 30 dB down = 30 dB max ERLE.
Bad. 15 dB down = 15 dB max
ERLE
29. AECs and Speaker Processing
2020 Interactive Voice Con
Reference signal must be taken
after nonlinear processing
DRC = Dynamic range compression.
This includes nonlinear processing like
compressors and limiters
EQ
Ref
DRC DAC AMP
EQ
Ref
DRC DAC AMP
Cross-
Over
Crossovers after the DRC are
allowed. Higher order crossover
perform better.
30. Multichannel Echo Cancelers
• Some applications
require multichannel
echo cancelers (e.g.,
soundbars)
• For optimal performance,
you need to cancel all the
channels. Downmixing
reduces performance.
• The example to the right
shows what happens
when you have a 3
channel product and
apply a 2 channel AEC
2020 Interactive Voice Con
Full performance when using a
3 channel AEC to cancel L, R,
and C speakers.
Reduced performance when
downmixing to 2 channels and
using a stereo echo canceler.
L’ = L + 0.5 * C
R’ = R + 0.5 * C
Performance reduced
by 5 to 10 dB
31. Woofer Reference Mic
2020 Interactive Voice Con
• Work done in conjunction with Vesper
• Uses a new high AOP microphone
placed directly in front of the woofer
• Advanced processing improves ERL by
up to 15 dB
• Trigger word performance at max
playback level:
• Standard processing: 63%
• Advanced processing: 91%
• Similar feature used in the HomePod
34. Understanding Amazon Results
• False Alarm Tests
• Number of false alarms using Amazon’s 24-hour continuous talking test track
• The lower the better
• Trigger Detection
• % of time that the device wakes up when “Alexa” is spoken
• Tested in silence, kitchen noise, music noise, and during music playback
• The higher the better
• Response Accuracy Rate (RAR)
• % of time that the cloud accurately understood the question (i.e., “Alexa, what is the
capital of China”)
• Tested in silence, kitchen noise, and music noise
• The higher the better
2020 Interactive Voice Con
35. Testing Scenarios
Silence
No interfering sound, uttering “Alexa” at 62 dBC
Kitchen Noise (0, -3 dB, -6 dB)
Alexa utterance at 62 dBC / Noise at 62, 65, and 68 dBC
Music Noise (0, -3 dB, -6 dB)
Alexa utterance at 62 dBC / Music at 62, 65, and 68 dBC
Acoustic Echo Canceler
Music playback at 90 dBC while trigger words are played at 62 dBC.
2020 Interactive Voice Con
39. Many Performance Levels
Low Power / Near-field
1 or 2 mics
ARM Cortex-M4
20 to 30 MHz
Basic Far-Field
2-mics. Mono
ARM Cortex-M7 or Cortex-A53
200 MHz
High-Performance Far-Field
4+ mics. Stereo
ARM Cortex-A53
350 to 600 MHz
High-Performance Far-Field
4+ mics. Multichannel
ARM Cortex-A53
900 to 1200 MHz
2020 Interactive Voice Con
40. Processor Comparisons
2020 Interactive Voice Con
ARM Cortex-M4
ARM Cortex-M7
ARM Cortex-A35
ARM Cortex-A53
ARM Cortex-A72
Tensilica HiFi 4
0.26
0.45
0.37
0.48
0.98
1.00
Processor efficiency per MHz. The larger the better.
ST, NXP, Renesas, Ambiq, Quicklogic
ST, NXP
Mediatek
NXP, Amlogic, Qualcomm
Coming soon!
NXP, Mediatek, Amlogic
ARM Cortex-A53 is the sweet
spot for smart speakers.
42. Smart Speaker Designs
• 360-degree operation
• Microphones on top of product
• 40 to 75 mm diameter
• Physically separate microphones and
loudspeakers for best performance
• Mono or stereo playback
High-End
Standard
2020 Interactive Voice Con
43. Sound Bar Designs
• 180-degree operation
• Microphones on top of product near center of device
• 60 to 75 mm design
• Physically separate microphones and loudspeakers
for best performance
• Stereo or multichannel playback (up to 7 reference
channels)
• Compatible with Dolby Atmos
High-
End
Standard
2020 Interactive Voice Con
44. TV Designs
Placement options
• Top is better than bottom
• Further away from speakers
• Bottom usually wins out because of
lower cost
• Mics do not have to be centered
• 2 mics sufficient
2020 Interactive Voice Con
Good
Better
45. Set-Top Box Designs
• Top of Device
• 180-degree operation
• Microphones on top of product
• Tethered “puck”
• 360-degree operation
• Microphones on top of product
• Support for optional internal
speaker for voice playback
• Audio playback through HDMI
High-
End
Standard
2020 Interactive Voice Con
46. Appliance / Tablet Designs
• 180-degree operation
• 2 or 4 microphone linear array
• 25 to 75 mm design
• Physically separate microphones
and loudspeakers for best
performance
• Mono or stereo playback
Good
Better
2020 Interactive Voice Con
47. Design Guidelines – Microphones
2020 Interactive Voice Con
Far Field Products
• Microphones should be placed on the top of the product, if possible.
• Microphones should be on a flat horizontal surface
• Microphones should be visible to the user (not occluded)
• Flat line arrays are not recommended. These are only last choice, if
necessary. (Microphone arrays work best if the microphones are
displaced in the horizontal plane)
• Microphones need to be properly ported (see design guidelines from
microphone vendor)
• 4 microphones is sufficient for most products
48. Design Guidelines – Microphones
2020 Interactive Voice Con
Far Field Products
• SNR of 65 dB. Higher SNRs provide no benefit for voice recognition but has
benefits for voice communication
• Gain matching:
• +/- 1 dB in the range 200 to 6 kHz (recommended)
• +/- 1dB in 200 to 4 kHz and +/-3 dB in 4k to 7 kHz (required)
• Microphone AOP must be high enough so that the system doesn’t clip when
loudspeakers are played at full volume. Recommendations:
• 120 dB for smart speakers
• 130 dB for sound bars
• 40 to 70 mm microphone spacing is recommended. As small as 20 mm is
possible with some degradation in performance.
49. Microphone Acoustical Porting
2020 Interactive Voice Con
(No Common Cavity)
MEMS
Mic
Vent
hole
Case
PCB
MEMS
Mic
Vent
hole
You need individual gaskets to
make a direct connection
between each mic and its vent
hole
If you block a microphone hole
with putty, you should see the
level drop by at least 30 dB
MEMS
Mic
Case
PCB
MEMS
Mic
Gasket Gasket
This design with a common
cavity shared by all
microphones won’t work.
50. Design Guidelines – Microphones (A)
2020 Interactive Voice Con
In Ear Products
• 2 microphones are sufficient for most products
• Use 2 microphones in an end fire configuration
pointing towards the mouth
• Space microphones as far apart as possible. 10
mm is the minimum spacing. 20 mm is
preferred
• Microphone on end of “boom” improves
performance
52. Overview
2020 Interactive Voice Con
What Happens in Practice
• Microphone selection
• The Physical world in front of the
mic
• No Man’s Land between the mic
and speaker (leakage)
• Loudspeakers – good, bad and ugly
• Software integration issues
53. MEMs Microphone selection cheat sheet
• Analog or digital?
• Analog single-ended or
balanced?
• Top or bottom port?
• Standard size or compact ?
• AOP – Acoustic Overload
Point?
• S/N – Signal to Noise?
• Sensitivity (asic gain)?
• Robustness (IPXX)?
2020 Interactive Voice Con
54. MEMs Microphones – what is inside?
• MEMs mic element + ASIC in a package
• Wiring between mems mic die and
ASIC
• Typical package envelope of 3.50mm x
2.65mm x 0.98mm
• Smaller foot print on some models but
reduced back volume = reduced s/n
• Faraday shield on some models
2020 Interactive Voice Con
55. Microphones - Analog vs digital?
What are the mic inputs on codec or soc (System On Chip)?
• Analog single-ended
• Analog pseudo balanced
• Digital – PDM
2020 Interactive Voice Con
56. Microphones – Top or Bottom port?
• The MEMs smt package can have the
sound aperture either on the top or
bottom
• If on the bottom then the circuit
board it is flow soldered to the flex
pcb) and have a hole that aligns to
the MEMs mic port
• Bottom port warning
• Sealing - back port smt seal eyelet
2020 Interactive Voice Con
57. Microphones – signal to noise
• S/N was once a deal killer for most
serious applications, MEMs mics
have caught up with ECMs with
commodity analog and digital
MEMs reaching beyond 60 dB s/n.
• Active noise canceling
headphones, hearing aides, voice
command desire 65 dB s/n or
better
• 70+ dB from a few vendors by the
start of 2021 (but this keeps
slipping!)
• Better s/n = less mics?
Some discussion of higher s/n enables
reduction in mics required
2020 Interactive Voice Con
58. Microphones
• Analog MEMs mics - single-ended or balanced differential outputs?
• balanced output analog is good defensive engineering if your product
will have longer wire runs, digital noise, emi/rf floating around
• How differential is MEMs mic topology?
True differential capacitive MEMs mics use dual grids for improved noise
immunity over single ended for high noise immunity
2020 Interactive Voice Con
59. Microphones - Digital
• Digital MEMs mics offer greater immunity to interference than analog
MEMs
• time to market considerations avoiding having to tweak and rework your
board layout if noise problems await you, then digital is the way to go
• If the mic performance is critical for your type and class of product analog
may be better with external premium codec (both AOP and noise floor
2020 Interactive Voice Con
60. Microphones – Acoustic Overload Point (AOP)
• Is AOP due to mic element saturation vs asic overload clipping?
• MEMs analog mics typically have better acoustic overload point (aop) which is
where serious distortion sets in (codec overload before MEMs mic element)
• Analog MEMs overload a bit more gracefully than digital as when an A/D codec
overloads it is a line in the sand and nasty.
• Digital MEMs aop can be as low as 116 dB and more typically 120 dB. Analog
aop tends to be over 120 dB and can be 130+ dB on some MEMs mics.
• Vesper’s piezo MEMs mics have versions with very high AOP.
2020 Interactive Voice Con
61. Microphones - Directivity
• MEMs mics are omni-directional
• For achieving directional
characteristics they are used in arrays
• One requirement for mic arrays is that
the mics are closely matched in
sensitivity and response and will be
able maintain that uniformity over
time
2020 Interactive Voice Con
63. Microphones – the world around the mic
Key topics
• MEMs mics are mounted to flex
PCB using smt reflow along
with the rest of the smt
components
• Port Helmholtz resonance –
moving it out of band
• The port and wind noise
• Laminar entry
• Acoustic mesh
2020 Interactive Voice Con
64. Microphones -
What are membranes for?
Woven and non-woven used for;
• wind noise, water blocking
• acoustic resistance determines crossover to DSP wind noise filtering
• Dust problems – internal membrane (within package) blocks smt reflow gasses
• Field use issue - shift over time
- gunk in the membrane over the mics facing facing stove top
2020 Interactive Voice Con
65. Microphones – the world around the mic
Wind noise blocking/acoustic mesh
• Mic element overloaded/
saturated by wind
• Wind pressure must be blocked
acoustically (acoustic resistance
membrane)
• Mic overload cannot be fixed by
DSP (but some turbulence can
be filtered out)
• Acoustic mesh can also block
liquids
• (hydrophobic & oleophobic )
2020 Interactive Voice Con
66. Microphones – the world around the mic
Port and wind noise
• Laminar entry (flared aperture)
• (turbulence in port to be
avoided)
• Port Helmholtz resonance peak
– moving it out of band
• Acoustic mesh damps peak Q
2020 Interactive Voice Con
67. The physical world between the
mic and speaker
2020 Interactive Voice Con
68. Leakage between the mic & speaker
Audio output leakage is both airborne and through the enclosure structure
• Minimizing Airborne leakage
• keep the mic(s) and speakers as far apart as possible
• avoid overlapping the mic(s) pickup pattern and speaker radiation pattern
• Structural transconduction (microphonics)
• Enclosure housing – ribs, joints, wall thickness
• Plastics are not all equal
• speaker sub-enclosure isolation mounts (grommets or gaskets)
• mic isolation
2020 Interactive Voice Con
69. -
Construction and Materials
• Plastics have different
acoustical characteristics
• Stiffness and damping are
key factors
• Compatibility considerations
• Shrink
• Tool temperature
• Flow
• Impact strength
• Sink marks/wall thickness
2020 Interactive Voice Con
71. -
Construction and Materials
• TreBlend (Ineos) PA/SAN
• Cellulose Plastics
• Treva (Eastman)
• Symbio (Sappi)
• Thicker walls/ ribs without sink marks
Acoustically engineered plastics
2020 Interactive Voice Con
Genelec M040 – NCE enclosure
72. The physical world of speakers
and the AEC Achilles heel -
distortion
2020 Interactive Voice Con
73. - Enclosure Mechanical Engineering E
• Open the window more and more bugs come in
• More power and more bass = no gain without pain
• Increase acoustic output before feedback and AEC breakdown by
reducing the cabinet resonance peak
• Extending low-end response of product will shake things up more
2020 Interactive Voice Con
74. Speaker Nonlinearities AEC issues
• Speaker distortion nonlinearities are the enemy of AEC
• Loudspeaker nonlinearities effect AEC
• - low-end distortion impact on aec yet not audible for listening
• Fine tuning of suspension and motor nonlinearities are critical
• or source off-the-shelf application-specific speakers optimized for AEC and ANC
2020 Interactive Voice Con
75. -
50 mm AEC / ANC optimized speakers
• Application-specific ANC and AEC high
linearity /lower distortion speakers to meet
TIA 930
• Typically around 50 mm diameter
• SEAS
• Tymphany
• Stetron
2020 Interactive Voice Con
76. subVo servo feedback correction
Next generation solution for increased AEC headroom
• subVo bend-sensor provides distortion reduction at
the lower octaves enabling increased AEC
headroom
• Precision position sensor provides error correction
feedback
• 10 dB of feedback = 10 dB of piston range distortion
reduction
2020 Interactive Voice Con
78. Software Integration Challenges
• Real-time CPU load
• Wrong interrupt levels
• Dropping samples / blocks
• Non constant latency between mics and reference signals
• Misconfigured PDM filters
• Different clocks for mics and reference signals
2020 Interactive Voice Con
79. Example #1: Noisy PDM Microphones
PDM to
PCM
Converter
PCM
Samples
PDM
Bitstream
Problem Statement
• ASR accuracy only 72% in quiet speech conditions
• High quality microphone:
• -41 dB sensitivity / 66 dB SNR
• Noise floor expected at 28 dBA
• Noise floor measured at 39 dBA
• Root cause
• PDM to PCM converter was implemented with
16-bit math
• Generated noise floor was at -96 dBFS → 39
dBA
• Solution
• Implement PDM to PCM conversion in software
• ASR accuracy improved to 94%
80. Example #2: Incorrect thread priorities
CPU Load Problems
Audio processing was taking 18% on average but there
were large spikes. Bluetooth thread priority was
incorrectly set higher than real-time audio processing.
Corrected Thread Priorities
Steady and consistent CPU load
0
20
40
60
80
100
120
140
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
CPU Load over Time
Peak Average
0
10
20
30
40
50
60
70
80
90
100
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
CPU Load over Time
Peak Average