SlideShare a Scribd company logo
1 of 45
We Can Hear You with Wi-Fi !
Guanhua Wang
Yongpan Zou, Zimu Zhou, Kaishun Wu, Lionel M. Ni
Hong Kong University of Science and Technology
Advanced Research in ISM band
•Localization
Advanced Research in ISM band
•Gesture Recognition
Advanced Research in ISM band
•Object Detection
Full Duplex Backscatter. In hotnets 2013
•Localization
•Gesture recognition
•Object Classification
They enable Wi-Fi to “SEE” target objects.
Advanced Research in ISM band
Can we enable Wi-Fi signals to HEAR talks?
Can we enable Wi-Fi signals to HEAR talks?
What is WiHear?
“Hearing” human talks with Wi-Fi signals
Hello
Non-invasive and device-free
Hearing through walls and doors
I am upset.
Understanding complicated
human behavior (e.g. mood)
Hearing multiple people simultaneously
MIMO Technology
Easy to be implemented in
commercial Wi-Fi products
How does WiHear work?
WiHear Framework
Feature
Extraction
Classification &
Error Correction
Learning-based
Lip Reading
LaptopLaptop
PeoplePeople
APAP
MIMO
Beamforming
Vows and
consonants Filtering
Wavelet
Transform
Remove
Noise
Mouth Motion Profiling
Partial
Multipath
Removal
Segmentation
Profile
Building
Mouth Motion Profiling
• Locating on Mouth
• Filtering Out-Band Interference
• Partial Multipath Removal
• Mouth Motion Profile Construction
• Discrete Wavelet Packet Decomposition
Mouth Motion Profiling
• Locating on Mouth
• Filtering Out-Band Interference
• Partial Multipath Removal
• Mouth Motion Profile Construction
• Discrete Wavelet Packet Decomposition
Locating on Mouth
APAP
PeoplePeople
LaptopLaptop
T1
T2
T3
T3
Mouth Motion Profiling
• Locating on Mouth
• Filtering Out-Band Interference
• Partial Multipath Removal
• Mouth Motion Profile Construction
• Discrete Wavelet Packet Decomposition
Filtering Out-Band Interference
•Signal changes caused by mouth motion: 2-5 Hz
•Adopt a 3-order Butterworth IIR band-pass filter
Cancel the DC component
Cancel wink issue (<1 Hz)
Cancel high frequency interference
The impact of wink (as denoted
in the dashed red box).
Mouth Motion Profiling
• Locating on Mouth
• Filtering Out-Band Interference
• Partial Multipath Removal
• Mouth Motion Profile Construction
• Discrete Wavelet Packet Decomposition
Partial Multipath Removal
• Mouth movement: Non-rigid
• Covert CSI (Channel State Information) from frequency
domain to time domain via IFFT
• Multipath removal threshold: >500 ns
• Convert processed CSI (with multipath < 500ns) back to
frequency domain via FFT
The multipath threshold value can be
adjusted to achieve better performance
Mouth Motion Profiling
• Locating on Mouth
• Filtering Out-Band Interference
• Partial Multipath Removal
• Mouth Motion Profile Construction
• Discrete Wavelet Packet Decomposition
Mouth Motion Profiling
• Locating on Mouth
• Filtering Out-Band Interference
• Partial Multipath Removal
• Mouth Motion Profile Construction
• Discrete Wavelet Packet Decomposition
Discrete Wavelet Packet Decomposition
• A Symlet wavelet filter of order 4 is selected
WiHear Framework
Feature
Extraction
Classification &
Error Correction
Learning-based
Lip Reading
LaptopLaptop
PeoplePeople
APAP
MIMO
Beamforming
Vows and
consonants Filtering
Wavelet
Transform
Remove
Noise
Mouth Motion Profiling
Partial
Multipath
Removal
Segmentation
Profile
Building
Lip Reading
• Segmentation
• Feature Extraction
• Classification
• Context-based Error Correction
Lip Reading
• Segmentation
• Feature Extraction
• Classification
• Context-based Error Correction
Segmentation
•Inter word segmentation
Silent interval between words
•Inner word segmentation
 Words are divided into phonetic events
Lip Reading
• Segmentation
• Feature Extraction
• Classification
• Context-based Error Correction
Feature Extraction
• Multi-Cluster/Class Feature Selection (MCFS) scheme
Lip Reading
• Segmentation
• Feature Extraction
• Classification
• Context-based Error Correction
Lip Reading
• Segmentation
• Feature Extraction
• Classification
• Context-based Error Correction
Extending To Multiple Targets
• MIMO: Spatial diversity via multiple Rx antennas
• ZigZag decoding: a single Rx antenna
Implementation
Floor plan of the testing environment. Experimental scenarios layouts. (a) line of sight;
(b) non-line-of-sight; (c) through wall Tx side;
(d) through wall Rx side; (e) multiple Rx; (f)
multiple link pairs.
Vocabulary
• Syllables:
[æ], [e], [i], [u], [s], [l], [m], [h], [v], [ɔ], [w], [b], [j], [ ʃ ].
• Words:
see, good, how, are, you, fine, look, open, is, the, door,
thank, boy, any, show, dog, bird, cat, zoo, yes, meet, some,
watch, horse, sing, play, dance, lady, ride, today, like, he,
she.
Automatic Segmentation Accuracy
Automatic segmentation
accuracy for
(a) Inner-word segmentation
on commercial devices
(b) Inter-word segmentation
on commercial devices
(c) Inner-word segmentation
on USRP
(d) Inter-word segmentation
on USRP
Classification Accuracy
Training Overhead
Impact of Context-based Error Correction
Performance with Multiple Receivers
Example of different views for pronouncing words
Performance for Multiple Targets
Performance of multiple users with
multiple link pairs.
Performance of zigzag decoding for
multiple users.
Through Wall Performance
Performance of two through wall scenarios. Performance of through wall with multiple Rx.
Resistance to Environmental Dynamics
Waveform of a 4-word sentence without
interference of ISM band signals or irrelevant
human motions
Impact of irrelevant human movements
interference
Impact of ISM band interference
Conclusion
• WiHear is the 1st prototype in the world, trying to use
Wi-Fi signal to sense and recognize human talks.
• WiHear takes the 1st step to bridge communication
between human speaking and wireless signals.
• WiHear introduces a new way so that machine can
sense more complicated human behaviors (e.g. mood).
Thank you for your listening !
Questions ?
Guanhua Wang
gwangab@cse.ust.hk
WiHear
We Can Hear You With Wi-Fi !

More Related Content

Viewers also liked

Vision-based Finger Detection and Its Applications
Vision-based Finger Detection and Its ApplicationsVision-based Finger Detection and Its Applications
Vision-based Finger Detection and Its Applicationsyifang
 
RECENT ADVANCES IN BRAIN-COMPUTER INTERFACES
RECENT ADVANCES IN BRAIN-COMPUTER INTERFACES RECENT ADVANCES IN BRAIN-COMPUTER INTERFACES
RECENT ADVANCES IN BRAIN-COMPUTER INTERFACES Touradj Ebrahimi
 
Audio watermarking
Audio watermarkingAudio watermarking
Audio watermarkingLikan Patra
 
Make Me Think. A Brief Introduction to BCI.
Make Me Think. A Brief Introduction to BCI.Make Me Think. A Brief Introduction to BCI.
Make Me Think. A Brief Introduction to BCI.Nikita Lukianets
 
Lifetime Power® Wireless Sensor System
Lifetime Power® Wireless Sensor SystemLifetime Power® Wireless Sensor System
Lifetime Power® Wireless Sensor SystemPowercast Sensors
 
Power electronics technology in wind turbine system
Power electronics technology in wind turbine systemPower electronics technology in wind turbine system
Power electronics technology in wind turbine systempranavi kasina
 
Future on power electronics for wind turbine systems
Future on power electronics for wind turbine systemsFuture on power electronics for wind turbine systems
Future on power electronics for wind turbine systemsKashish Srivastava
 
34. optical switch
34. optical switch34. optical switch
34. optical switchrj14011992
 
Silent sound-technology ppt final
Silent sound-technology ppt finalSilent sound-technology ppt final
Silent sound-technology ppt finalLohit Dalal
 
Wireless Sensor Networking in Railways
Wireless Sensor Networking in RailwaysWireless Sensor Networking in Railways
Wireless Sensor Networking in RailwaysHari Wiz
 
Wi vi presentation
Wi vi presentationWi vi presentation
Wi vi presentationerrajagrawal
 
Advances in ic engines
Advances in ic enginesAdvances in ic engines
Advances in ic enginesSankar Ram
 

Viewers also liked (17)

Vision-based Finger Detection and Its Applications
Vision-based Finger Detection and Its ApplicationsVision-based Finger Detection and Its Applications
Vision-based Finger Detection and Its Applications
 
RECENT ADVANCES IN BRAIN-COMPUTER INTERFACES
RECENT ADVANCES IN BRAIN-COMPUTER INTERFACES RECENT ADVANCES IN BRAIN-COMPUTER INTERFACES
RECENT ADVANCES IN BRAIN-COMPUTER INTERFACES
 
Audio watermarking
Audio watermarkingAudio watermarking
Audio watermarking
 
Make Me Think. A Brief Introduction to BCI.
Make Me Think. A Brief Introduction to BCI.Make Me Think. A Brief Introduction to BCI.
Make Me Think. A Brief Introduction to BCI.
 
Lifetime Power® Wireless Sensor System
Lifetime Power® Wireless Sensor SystemLifetime Power® Wireless Sensor System
Lifetime Power® Wireless Sensor System
 
Power electronics technology in wind turbine system
Power electronics technology in wind turbine systemPower electronics technology in wind turbine system
Power electronics technology in wind turbine system
 
Submarine brief
Submarine briefSubmarine brief
Submarine brief
 
Future on power electronics for wind turbine systems
Future on power electronics for wind turbine systemsFuture on power electronics for wind turbine systems
Future on power electronics for wind turbine systems
 
34. optical switch
34. optical switch34. optical switch
34. optical switch
 
Wireless Body Sensor Networks
Wireless Body Sensor Networks Wireless Body Sensor Networks
Wireless Body Sensor Networks
 
Wi vi technology
Wi vi technologyWi vi technology
Wi vi technology
 
Mems optical switches
Mems optical switchesMems optical switches
Mems optical switches
 
Silent sound-technology ppt final
Silent sound-technology ppt finalSilent sound-technology ppt final
Silent sound-technology ppt final
 
Wireless Sensor Networking in Railways
Wireless Sensor Networking in RailwaysWireless Sensor Networking in Railways
Wireless Sensor Networking in Railways
 
Wimax / ieee 802.16
Wimax / ieee 802.16Wimax / ieee 802.16
Wimax / ieee 802.16
 
Wi vi presentation
Wi vi presentationWi vi presentation
Wi vi presentation
 
Advances in ic engines
Advances in ic enginesAdvances in ic engines
Advances in ic engines
 

Similar to WiHear - We Can Hear You with Wi-Fi!

Voice recognition
Voice recognitionVoice recognition
Voice recognitionYoseop Shin
 
Supergluing Asterisk to the Web with Adhearsion
Supergluing Asterisk to the Web with AdhearsionSupergluing Asterisk to the Web with Adhearsion
Supergluing Asterisk to the Web with AdhearsionMojo Lingo
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
SoundSense
SoundSenseSoundSense
SoundSensebutest
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processingdiegogee
 
How Do You Hear Me Now?
How Do You Hear Me Now?How Do You Hear Me Now?
How Do You Hear Me Now?Voxeo Corp
 
Call Control Power Tools with Adhearsion
Call Control Power Tools with Adhearsion Call Control Power Tools with Adhearsion
Call Control Power Tools with Adhearsion Mojo Lingo
 
Samsung voice intelligence.v5.5
Samsung voice intelligence.v5.5Samsung voice intelligence.v5.5
Samsung voice intelligence.v5.5vinutharani1995
 
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead DelalicDataScienceConferenc1
 
Flying to the Moon on Radio Waves: Optimizing Outcomes with RF Technologies
Flying to the Moon on Radio Waves: Optimizing Outcomes with RF Technologies Flying to the Moon on Radio Waves: Optimizing Outcomes with RF Technologies
Flying to the Moon on Radio Waves: Optimizing Outcomes with RF Technologies Phonak
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Spoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and BeyondSpoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and Beyondlinshanleearchive
 
A Investigation of Cisco Technologies & Access Solutions
A Investigation of Cisco Technologies & Access SolutionsA Investigation of Cisco Technologies & Access Solutions
A Investigation of Cisco Technologies & Access SolutionsNTID
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITIONDeep Learning JP
 

Similar to WiHear - We Can Hear You with Wi-Fi! (20)

Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)
 
Research_Wu.pptx
Research_Wu.pptxResearch_Wu.pptx
Research_Wu.pptx
 
Voice recognition
Voice recognitionVoice recognition
Voice recognition
 
Supergluing Asterisk to the Web with Adhearsion
Supergluing Asterisk to the Web with AdhearsionSupergluing Asterisk to the Web with Adhearsion
Supergluing Asterisk to the Web with Adhearsion
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
SoundSense
SoundSenseSoundSense
SoundSense
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
 
How Do You Hear Me Now?
How Do You Hear Me Now?How Do You Hear Me Now?
How Do You Hear Me Now?
 
Call Control Power Tools with Adhearsion
Call Control Power Tools with Adhearsion Call Control Power Tools with Adhearsion
Call Control Power Tools with Adhearsion
 
Samsung voice intelligence.v5.5
Samsung voice intelligence.v5.5Samsung voice intelligence.v5.5
Samsung voice intelligence.v5.5
 
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
 
Flying to the Moon on Radio Waves: Optimizing Outcomes with RF Technologies
Flying to the Moon on Radio Waves: Optimizing Outcomes with RF Technologies Flying to the Moon on Radio Waves: Optimizing Outcomes with RF Technologies
Flying to the Moon on Radio Waves: Optimizing Outcomes with RF Technologies
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Spoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and BeyondSpoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and Beyond
 
A Investigation of Cisco Technologies & Access Solutions
A Investigation of Cisco Technologies & Access SolutionsA Investigation of Cisco Technologies & Access Solutions
A Investigation of Cisco Technologies & Access Solutions
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
 

Recently uploaded

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

WiHear - We Can Hear You with Wi-Fi!

Editor's Notes

  1. Good afternoon everyone. Today, I will present the paper: We Can Hear You with Wi-Fi. This work is done by Guanhua Wang, with the help of Yongpan Zou, Zimu Zhou, Kaishun Wu and Lionel Ni. The authors are from the Hong Kong University of Science and Technology, CSE department.
  2. Recent research has pushed the limit of ISM band radiometric detection to a new level, including localization
  3. Gesture Recognition
  4. And even Object Detection.
  5. By detecting and analyzing signal reflection, they enable Wi-Fi to “SEE” the target objects.
  6. In this paper, we ask the following question: Can we enable Wi-Fi signals to HEAR talks ?
  7. We present WiHear.
  8. So what is WiHear?
  9. The key insight of WiHear is to sense and recognize the radiometric impact of mouth movement on WiFi signals. WiHear exploits the radiometric characteristics of mouth movements to analyze micro-motion in a non-invasive and device-free manner.
  10. Since WiFi signals do not require line of sight, WiHear can “hear” people talks through walls and doors within the radio range. In addition, WiHear can“hear” and “understand” your speaking, which can get more complicated information from human talks than traditional gesture based interaction interface like Xbox Kinect (e.g. one wants to express upset, it is unlikely to be delivered by simple body gestures, but can be conveyed by speaking).
  11. By using MIMO technology, WiHear can be extended to hear multiple human talking simultaneously. In addition, since we do not need to make any changes on commercial wi-fi signals, WiHear can be easily implemented in commercial WiFi infrastructure like access points and mobile devices.
  12. So how does wihear work?
  13. This slide gives a general view of how WiHear works. The basic prototype consists of a transmitter and a receiver. First, the transmitter focuses radio beam on the target’s mouth to reduce irrelevant multipath effects. Then, WiHear receiver analysis signal reflection by mouth motions in two steps: 1. Wavelet-based Mouth Motion Profiling: WiHear analysis received signals by filtering out-band interference and partially eliminating multipath. It then constructs mouth motion profiles via discrete wavelet packet decomposition. 2. Learning-based Lip Reading: Once WiHear extracts mouth motion profiles, it applies machine learning to recognize pronunciations, and translates them via classification and context-based error correction. In the rest of this talk, we will describe each part in details. First we focus on mouth motion profiling process.
  14. The details of Mouth Motion profiling consists of five steps: namely, Locating on Mouth, Filtering Out-Band Interference, Partial Multipath Removal, Mouth Motion Profile Construction and Discrete Wavelet Packet Decomposition.
  15. First, let’s look at the process of locating radio beam on mouth.
  16. We exploit MIMO beamforming techniques or directional antennas to let radio beam locate and focus on the mouth, thus both introducing less irrelevant multipath propagation and magnifying signal changes induced by mouth motions. It works as follows: The transmitter sweeps its beam for multiple rounds while the user repeats a predefined gesture (e.g. pronouncing [æ] once per second). Meanwhile, the receiver searches for the time when the gesture pattern is most notable during each round of sweeping. The receiver sends the selected time stamp back to the transmitter and the transmitter then adjusts and fixes its beam accordingly. As in this example, after the transmitter sweeping the beam for several rounds, the receiver sends back time slot 3 to the transmitter.
  17. After the beam locating process, we need to remove the radio information of irrelevant frequency components
  18. Signal changes caused by mouth motion in the temporal domain are often within 2-5 Hz. Therefore, we apply band-pass filtering on the received samples to eliminate out-band interference. It can Cancel the DC component, the wink issue (<1 Hz), and also the high frequency interference.
  19. After filtering out-band interference, we apply our proposed “partial multipath removal” procedure.
  20. Unlike previous works in gesture recognition, where multipath reflections are eliminated thoroughly, WiHear performs partial multipath removal. The rationale is that mouth motions are non-rigid compared with arm or leg movements. It is common for the tongue, lips, and jaws to move in different patterns. Therefore, we need to remove reflections with long delays (often due to reflections from surroundings), and retain those within a delay threshold (corresponding to non-rigid movements of the mouth). The procedure detail is to remove multipath effects over 500ns via IFFT and FFT. Further, in order to achieve better performance, the multipath threshold value can be dynamically adjusted in different application scenarios.
  21. To reduce computational complexity with keeping the temporal spectral characteristics, we explore to select a single representative value for each time slot. For more details, please take the paper as reference.
  22. Then WiHear performs discrete wavelet packet decomposition on the obtained Mouth Motion Profiles as input for the learning based lip reading.
  23. Based on our experimental performance, a Symlet wavelet filter of order 4 is selected.
  24. This slide gives a general view of how WiHear works. The basic prototype consists of a transmitter and a receiver. First, the transmitter focuses radio beam on the target’s mouth to reduce irrelevant multipath effects. Then, WiHear receiver analysis signal reflection by mouth motions in two steps: 1. Wavelet-based Mouth Motion Profiling: WiHear analysis received signals by filtering out-band interference and partially eliminating multipath. It then constructs mouth motion profiles via discrete wavelet packet decomposition. 2. Learning-based Lip Reading: Once WiHear extracts mouth motion profiles, it applies machine learning to recognize pronunciations, and translates them via classification and context-based error correction. In the rest of this talk, we will describe each part in details. First we focus on mouth motion profiling process.
  25. Next, we come into second part lip reading. It includes segmentation, feature extraction, classification and context-based error correction.
  26. The first component is the segmentation of collected signals.
  27. We have two options. Inner word segmentation and inter word segmentation. For inner word segmentation, we divide the words into phonetic events. And then directly match these phonetic events with training samples. For inter word segmentation, we leverage silent interval between words to do the segmentation.
  28. Next is feature extraction.
  29. We apply a sophisticate scheme MCFS to extract signal features of each pronunciation. The figure shows 9 examples of selected feature and corresponding mouth movements.
  30. For a specific individual, his speed and rhythm of speaking each word share similar patterns. We can thus directly compare the similarity of the current signals and previously sampled ones by generalized least squares.
  31. since the pronunciations spoken are correlated, we can leverage context-aware approaches widely used in automatic speech recognition to improve recognition accuracy.
  32. To support debate or discussion, we need to extend WiHear to track multiple talks simultaneously. A natural approach is to leverage MIMO technology. That is to use spatial diversity via multiple rx antennas. WiHear can also achieve hearing multiple people even with single antenna via zigzag decoding. Zigzag decoding: The key insight is that, for most of the circumstances, multiple people do not begin pronouncing each word exactly at the time. As shown in the Figure, After segmentation and classification, we can see each word as encompassed in the dashed red box. As is shown, three words from user1 have different starting and ending time compared with those of user2. Take the first word of the two users as an example, we can first recognize the beginning part of user1 speaking word1, and then use the predicted ending part of user1’s word1 to cancel in the combined signals of user1 and user2’s word1. Thus we use one antenna to simultaneously decode two users’ words.
  33. We tested WiHear in a typical office environment and run our experiments with 4 people (1 female and 3 males). And we conduct experiments on both USRP N210 and commercial Wi-Fi products. We extensively evaluate WiHear’s performance in the six scenarios. (a) line of sight; (b) non-line-of-sight; (c) through wall Tx side; (d) through wall Rx side; (e) multiple Rx; (f) multiple link pairs.
  34. The vocabulary that WiHear can currently recognized is listed here, which includes 14 syllables and 33 words.
  35. Fig.9 shows the inner-word and inter-word segmentation accuracy. We mainly focus on LOS and NLOS scenarios. The correct rate of inter-word segmentation is higher than that of inner-word segmentation. The main reason is that for inner-word segmentation, we directly use the waveform of each vowel or consonant to match the test waveform. Since different segmentation will lead to different combinations of vowels and consonants, even some of the combinations do not exist. In contrast, inter-word segmentation is relatively easy since it has a silent interval between two adjacent words. Furhter, we find the overall segmentation performance of commercial devices is a little bit better than USRPs.
  36. This figure depicts the recognition accuracy on both USRP N210s and commercial Wi-Fi infrastructure in LOS and NLOS. The accuracy performance of commercial Wi-Fi infrastructure system achieves 91% on average with no more than 6-word sentences. In addition, with multiple receivers deployed, WiHear can achieve 91% on average with fewer than 10-word sentences. Since overall commercial Wi-Fi devices perform better than USRP N210, we mainly focus on commercial Wi-Fi devices in the following evaluations.
  37. As a whole, we can see that for each word or syllable, the accuracy of word-based is higher than syllable-based scheme. Given this result, empirically we choose the quantity of training sample ranging from 50 to 100, which has good recognition accuracy with acceptable training overhead.
  38. Here we evaluate the importance of context-based error correction in LOS and NLOS scenarios. As we can see that without context-based error correction, the performance drops dramatically. Especially in the scenario of more than 6 words, context-based error correction achieves 13% performance gain than without it. This is because the longer the sentence, the more context information can be exploited for error correction.
  39. The figure below shows the same person pronouncing the word “GOOD” has different radiometric impacts on the received signals from different perspectives (from the angles of 0, 90 and 180). As shown in the upper side figure, with multiple (3 in our case) dimensional training data, WiHear can achieve 87% accuracy even when the user speaks more than 6 words. It ensures the overall accuracy to be 91% in all three words’ group scenarios. Given this, if it is needed for high accuracy of Wi-Fi hearing, we recommend to deploy more receivers from different views.
  40. The left side figure shows to use multiple link pair for hearing multiple targets. Compared with a single target, the overall performance decreases with the number of targets increasing. Further, the performance drops dramatically when each user speaks more than 6 words. For zigzag decoding on the right side, the performance drops more severely than that of multiple link pairs. The worst case (i.e. 3 users, 6<words) only achieves less than 30% recognition accuracy. Thus we recommend to use ZigZag cancelation scheme with no more than 2 users who speak fewer than 6 words.
  41. As shown in the left figure, although recognition accuracy is pretty low (around 18% on average), compared with the probability of random guess (i.e. 1/33=3%), the recognition accuracy is acceptable. Performance with target on the Tx side is better. As depicted in right side figure, With trained samples from different views, multiple receivers can enhance through wall performance.
  42. Here we evaluate WiHear’s Resistance to Environmental Dynamics. We let one user repeatedly speak a 4-word sentence. For each of the following 3 cases, we collect the radio sequences of speaking the repeated 4-word sentence for 30 times and draw the combined waveform. For the first case, we remain the surroundings stable. As shown in the upper side figure, we can easily recognize 4 words that user speaks. For the second case, we let three men randomly stroll in the room but always keep 3 m away from the WiHear’s link pair. As shown in the middle, the words can still be correctly detected even though the waveform is loose compared with the first case. For the third case, we use a mobile phone to communicate with an AP (e.g. surfing online) and keep them 3 m away from WiHear’s link pair. As shown in the bottom, the generated waveform fluctuates a little compared with the first case, but still recognizable.