SlideShare a Scribd company logo
1 of 40
SoundSense by AndriusAndrijauskas
Introduction Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc. Arguably, one of the most overlooked sensors, available on every device, is the microphone. Sound captured by a mobile phone’s microphone can be a rich source of information.
Introduction From the captured sound various inferences can be made about the person carrying the phone. ,[object Object]
Activity recognition
Location classification,[object Object]
Scaling People live in different environments, containing a wide variety of everyday sounds. Typically, sounds encountered by a student will be different from those encountered by a truck driver. It is no feasible to collect every possible sound and classify it.
Robustness People carry phones in a number of different ways; for example, in the pocket, on a belt, in a bag. Location of the phone with respect to the body presents various challenges because in the same environment sound levels will vary based on the phone’s location.
Robustness Phone context alters the volume: ,[object Object]
B – in pocket of a user facing source
C – in hand of a user facing away from the source
D in pocket of a user facing away from the source,[object Object]
SoundSense SoundSense– scalable sound sensing framework for mobile phones. It is the first general purpose sound event classification system designed specifically to address scaling, robustness, and device integration.
Architecture
Sensing and Preprocessing Audio stream is segmented into uniform frames. Frame admission control is required since frames may or may not contain “interesting” audio content. Uninteresting audio content can be white noise, silence, or insufficient amount of signal captured. How is this determined?
Sensing and Preprocessing Determining “interesting” audio content.
Architecture
Feature Extraction Zero Crossing Rate Low Energy Frame Rate Spectral Flux Spectral Rolloff Bandwidth Spectral Centroid
Feature Extraction - ZCR Zero Crossing Rate – is a number of time domain crossings within a frame. Human voice consists of voiced and unvoiced sounds. Voiced frames – high ZCR value. Unvoiced frames – low ZCR values. Human voice will have high ZCR variation. Typically music does not have high ZCR variation
Feature Extraction – Low Energy Frame Rate  Low Energy Frame Rate – number of frames within a frame window that have an RMS value less than 50% of the mean RMS for the entire window. Because human voice has voiced and unvoiced frames, this feature will he higher for speech than for music and constant noise.
Feature Extraction – Spectral Flux Spectral flux – measure how quickly the spectrum of the signal is changing. Calculated by comparing the power spectrum for one frame against the power spectrum from the previous frame. Speech generally switches quickly between voiced and unvoiced frames, resulting in a higher SF value.
Feature Extraction – Spectral Rolloff Spectral Rolloff is a measure of the skewedness of the spectral distribution. Right skewed (higher frequency) distribution will have higher spectral rolloff values. Music will typically have high spectral rolloff values
Feature Extraction – Spectral Centroid Spectral Centroid indicates where the “center of mass” of the spectrum is. Music usually involves high frequency sounds which means it will have higher spectral centroid values.
Feature Extraction – Bandwidth Bandwidth – range of frequencies that the signal occupies. In this case, it shows if the spectrum is concentrated around the spectral centroid or spread out over the whole spectrum. Most ambient sound consists of a limited range of frequencies. Music often consists of a broader mixture of frequencies than voice and ambient sound.
Feature Extraction – Normalized Weighted Phase Deviation Normalized Weighted Phase Deviation – shows the phase deviations of the frequency bins. Usually ambient sound and music will have a smaller phase deviation than voice.
Feature Extraction – Relative Spectral Entropy Relative spectral entropy - differentiates speech and other sounds.
Architecture
Decision tree classification Extracted featured are then fed to one of the following models: Gaussian Mixture Model K-Nearest Neighbors Hidden Markov Model Essentially these are machine learning / pattern recognition type models that in this case help us classify sounds (as voice, music, or ambient sound) based on the extracted features. All three models yielded about 80% classification accuracy.
Architecture
Markov Recognizer Decision tree classification sampleoutput: ,[object Object]
1 – music
2 - speech,[object Object]
Architecture
Unsupervised Adaptive Classification Unsupervised Adaptive Classification components copes with all ambient sound events. The objective of this component is to discover over time environmental sounds that are significant in every day life in addition to voice and music.
Mel Frequency Cepstral Coefficient (MFCC) Waveform Convert to Frames DFFT Take Log of amplitude spectrum Mel-scaling and smoothing Discrete Cosine Transform MFCC Features
Unsupervised Adaptive Classification Once the event is classified as ambient sound, UAC algorithm starts to run. UAC algorithm extracts MFCC features of the sound event. System checks if the event has already been classified. If not then user is asked to label the unknown sound event. System can store up to 100 of these events.
Implementation SoundSense prototype is implemented as a self sustained piece of software that runs on the Apple iPhone. Contains approximately 5,500 lines of code. Written mostly in C/C++ and then implemented in Objective C. Complete package is about 300KB.
Implementation
Evaluation In the quiet environment , the system adopts a long duty cycle, the CPU usage is less than 5%. Once acoustic processing starts CPU usage increases to about 25%. Maximum memory usage (with all bins being used) is about 5MB.
Evaluation ,[object Object]
Unsupervised Adaptive Classification accuracy:,[object Object]

More Related Content

What's hot

Audio spot light
Audio spot lightAudio spot light
Audio spot lightMansi Gupta
 
Audio Spotlighting - Nikhil Srivastava
Audio Spotlighting - Nikhil SrivastavaAudio Spotlighting - Nikhil Srivastava
Audio Spotlighting - Nikhil SrivastavaNikhil Srivastava
 
Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...
Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...
Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...Bruce Wiggins
 
HIS 240 - Dir Mics and Dig Noise Reduction 2011
HIS 240 - Dir Mics and Dig Noise Reduction 2011HIS 240 - Dir Mics and Dig Noise Reduction 2011
HIS 240 - Dir Mics and Dig Noise Reduction 2011Rebecca Krouse
 
Interactive media Brief
Interactive media BriefInteractive media Brief
Interactive media Briefelliotnjones
 
20150211 NAB paper - Audio Loudness Range -John Kean
20150211 NAB paper - Audio Loudness Range -John Kean20150211 NAB paper - Audio Loudness Range -John Kean
20150211 NAB paper - Audio Loudness Range -John KeanJeremy Adams
 
Loudspeaker-Types, designs, Application & Crossover
Loudspeaker-Types, designs, Application & CrossoverLoudspeaker-Types, designs, Application & Crossover
Loudspeaker-Types, designs, Application & CrossoverSankaranarayanan K B
 
10 Minute Research Presentation on Ambisonics and Impact
10 Minute Research Presentation on Ambisonics and Impact10 Minute Research Presentation on Ambisonics and Impact
10 Minute Research Presentation on Ambisonics and ImpactBruce Wiggins
 
Audiospotlighting
AudiospotlightingAudiospotlighting
AudiospotlightingBiji Raju
 

What's hot (19)

Audio spot light
Audio spot lightAudio spot light
Audio spot light
 
Audio Spotlighting - Nikhil Srivastava
Audio Spotlighting - Nikhil SrivastavaAudio Spotlighting - Nikhil Srivastava
Audio Spotlighting - Nikhil Srivastava
 
Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...
Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...
Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...
 
HIS 240 - Dir Mics and Dig Noise Reduction 2011
HIS 240 - Dir Mics and Dig Noise Reduction 2011HIS 240 - Dir Mics and Dig Noise Reduction 2011
HIS 240 - Dir Mics and Dig Noise Reduction 2011
 
Presentation2
Presentation2Presentation2
Presentation2
 
Audio Spot Light
Audio Spot LightAudio Spot Light
Audio Spot Light
 
Audio Spotlight
Audio SpotlightAudio Spotlight
Audio Spotlight
 
Interactive media Brief
Interactive media BriefInteractive media Brief
Interactive media Brief
 
Audio spotlighting
Audio spotlightingAudio spotlighting
Audio spotlighting
 
Audio spotlighting
Audio spotlightingAudio spotlighting
Audio spotlighting
 
20150211 NAB paper - Audio Loudness Range -John Kean
20150211 NAB paper - Audio Loudness Range -John Kean20150211 NAB paper - Audio Loudness Range -John Kean
20150211 NAB paper - Audio Loudness Range -John Kean
 
N
NN
N
 
Sound
SoundSound
Sound
 
Loudspeaker-Types, designs, Application & Crossover
Loudspeaker-Types, designs, Application & CrossoverLoudspeaker-Types, designs, Application & Crossover
Loudspeaker-Types, designs, Application & Crossover
 
Audio spotlighting
Audio spotlightingAudio spotlighting
Audio spotlighting
 
10 Minute Research Presentation on Ambisonics and Impact
10 Minute Research Presentation on Ambisonics and Impact10 Minute Research Presentation on Ambisonics and Impact
10 Minute Research Presentation on Ambisonics and Impact
 
Audiospotlighting
AudiospotlightingAudiospotlighting
Audiospotlighting
 
Audio spotlighting
Audio spotlightingAudio spotlighting
Audio spotlighting
 
Digital Audio
Digital AudioDigital Audio
Digital Audio
 

Viewers also liked

What s an Event ? How Ontologies and Linguistic Semantics ...
What s an Event ? How Ontologies and Linguistic Semantics ...What s an Event ? How Ontologies and Linguistic Semantics ...
What s an Event ? How Ontologies and Linguistic Semantics ...butest
 
Type of learning style (English for young learners
Type of learning style (English for young learnersType of learning style (English for young learners
Type of learning style (English for young learnersJENNET SENAWATI
 
Learning style ppt
Learning style pptLearning style ppt
Learning style pptBrentVan3
 

Viewers also liked (6)

What s an Event ? How Ontologies and Linguistic Semantics ...
What s an Event ? How Ontologies and Linguistic Semantics ...What s an Event ? How Ontologies and Linguistic Semantics ...
What s an Event ? How Ontologies and Linguistic Semantics ...
 
Type of learning style (English for young learners
Type of learning style (English for young learnersType of learning style (English for young learners
Type of learning style (English for young learners
 
Ego presentation
Ego presentationEgo presentation
Ego presentation
 
Ego
EgoEgo
Ego
 
Ego, stress & conflict management
Ego, stress & conflict managementEgo, stress & conflict management
Ego, stress & conflict management
 
Learning style ppt
Learning style pptLearning style ppt
Learning style ppt
 

Similar to SoundSense

PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm Portfolio
PHOENIX AUDIO TECHNOLOGIES  - A large Audio Signal Algorithm PortfolioPHOENIX AUDIO TECHNOLOGIES  - A large Audio Signal Algorithm Portfolio
PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm PortfolioHTCS LLC
 
Specific features of hearing aids
Specific features of hearing aidsSpecific features of hearing aids
Specific features of hearing aidsPra_buddha
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellationChaitanya S
 
Introduction to microphones
Introduction to microphonesIntroduction to microphones
Introduction to microphonesDJShirlee
 
Summary of the paper「PrivacyMic: Utilizing Inaudible Frequencies for Privacy ...
Summary of the paper「PrivacyMic: Utilizing Inaudible Frequencies for Privacy ...Summary of the paper「PrivacyMic: Utilizing Inaudible Frequencies for Privacy ...
Summary of the paper「PrivacyMic: Utilizing Inaudible Frequencies for Privacy ...YutaFunada
 
Silent sound interface
Silent sound interfaceSilent sound interface
Silent sound interfaceJeevitha Reddy
 
Sound Engineering, What Does It Involve?.pptx
Sound Engineering, What Does It Involve?.pptxSound Engineering, What Does It Involve?.pptx
Sound Engineering, What Does It Involve?.pptxDerek896886
 
Chapter 02 audio recording - part i
Chapter 02   audio recording - part iChapter 02   audio recording - part i
Chapter 02 audio recording - part iNazihah Ahwan
 
Tv sound..
Tv sound..Tv sound..
Tv sound..PTV
 
Landscape assignment
Landscape assignmentLandscape assignment
Landscape assignmentPaskal Wanda
 

Similar to SoundSense (20)

PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm Portfolio
PHOENIX AUDIO TECHNOLOGIES  - A large Audio Signal Algorithm PortfolioPHOENIX AUDIO TECHNOLOGIES  - A large Audio Signal Algorithm Portfolio
PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm Portfolio
 
Specific features of hearing aids
Specific features of hearing aidsSpecific features of hearing aids
Specific features of hearing aids
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
 
DSP 3rd Year.pptx
DSP 3rd Year.pptxDSP 3rd Year.pptx
DSP 3rd Year.pptx
 
Report
ReportReport
Report
 
Introduction to microphones
Introduction to microphonesIntroduction to microphones
Introduction to microphones
 
Summary of the paper「PrivacyMic: Utilizing Inaudible Frequencies for Privacy ...
Summary of the paper「PrivacyMic: Utilizing Inaudible Frequencies for Privacy ...Summary of the paper「PrivacyMic: Utilizing Inaudible Frequencies for Privacy ...
Summary of the paper「PrivacyMic: Utilizing Inaudible Frequencies for Privacy ...
 
Silent sound interface
Silent sound interfaceSilent sound interface
Silent sound interface
 
Audio spotlighting
Audio spotlightingAudio spotlighting
Audio spotlighting
 
Speech Compression
Speech CompressionSpeech Compression
Speech Compression
 
Sound Engineering, What Does It Involve?.pptx
Sound Engineering, What Does It Involve?.pptxSound Engineering, What Does It Involve?.pptx
Sound Engineering, What Does It Involve?.pptx
 
Sound unit 4
Sound unit 4Sound unit 4
Sound unit 4
 
Sound unit 4
Sound unit 4Sound unit 4
Sound unit 4
 
Chapter 02 audio recording - part i
Chapter 02   audio recording - part iChapter 02   audio recording - part i
Chapter 02 audio recording - part i
 
Reverb w5 imp_2
Reverb w5 imp_2Reverb w5 imp_2
Reverb w5 imp_2
 
Basics of microphone
Basics of microphoneBasics of microphone
Basics of microphone
 
brochure
brochurebrochure
brochure
 
Tv sound..
Tv sound..Tv sound..
Tv sound..
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Landscape assignment
Landscape assignmentLandscape assignment
Landscape assignment
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

SoundSense

  • 2. Introduction Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc. Arguably, one of the most overlooked sensors, available on every device, is the microphone. Sound captured by a mobile phone’s microphone can be a rich source of information.
  • 3.
  • 5.
  • 6. Scaling People live in different environments, containing a wide variety of everyday sounds. Typically, sounds encountered by a student will be different from those encountered by a truck driver. It is no feasible to collect every possible sound and classify it.
  • 7. Robustness People carry phones in a number of different ways; for example, in the pocket, on a belt, in a bag. Location of the phone with respect to the body presents various challenges because in the same environment sound levels will vary based on the phone’s location.
  • 8.
  • 9. B – in pocket of a user facing source
  • 10. C – in hand of a user facing away from the source
  • 11.
  • 12. SoundSense SoundSense– scalable sound sensing framework for mobile phones. It is the first general purpose sound event classification system designed specifically to address scaling, robustness, and device integration.
  • 14. Sensing and Preprocessing Audio stream is segmented into uniform frames. Frame admission control is required since frames may or may not contain “interesting” audio content. Uninteresting audio content can be white noise, silence, or insufficient amount of signal captured. How is this determined?
  • 15. Sensing and Preprocessing Determining “interesting” audio content.
  • 17. Feature Extraction Zero Crossing Rate Low Energy Frame Rate Spectral Flux Spectral Rolloff Bandwidth Spectral Centroid
  • 18. Feature Extraction - ZCR Zero Crossing Rate – is a number of time domain crossings within a frame. Human voice consists of voiced and unvoiced sounds. Voiced frames – high ZCR value. Unvoiced frames – low ZCR values. Human voice will have high ZCR variation. Typically music does not have high ZCR variation
  • 19. Feature Extraction – Low Energy Frame Rate Low Energy Frame Rate – number of frames within a frame window that have an RMS value less than 50% of the mean RMS for the entire window. Because human voice has voiced and unvoiced frames, this feature will he higher for speech than for music and constant noise.
  • 20. Feature Extraction – Spectral Flux Spectral flux – measure how quickly the spectrum of the signal is changing. Calculated by comparing the power spectrum for one frame against the power spectrum from the previous frame. Speech generally switches quickly between voiced and unvoiced frames, resulting in a higher SF value.
  • 21. Feature Extraction – Spectral Rolloff Spectral Rolloff is a measure of the skewedness of the spectral distribution. Right skewed (higher frequency) distribution will have higher spectral rolloff values. Music will typically have high spectral rolloff values
  • 22. Feature Extraction – Spectral Centroid Spectral Centroid indicates where the “center of mass” of the spectrum is. Music usually involves high frequency sounds which means it will have higher spectral centroid values.
  • 23. Feature Extraction – Bandwidth Bandwidth – range of frequencies that the signal occupies. In this case, it shows if the spectrum is concentrated around the spectral centroid or spread out over the whole spectrum. Most ambient sound consists of a limited range of frequencies. Music often consists of a broader mixture of frequencies than voice and ambient sound.
  • 24. Feature Extraction – Normalized Weighted Phase Deviation Normalized Weighted Phase Deviation – shows the phase deviations of the frequency bins. Usually ambient sound and music will have a smaller phase deviation than voice.
  • 25. Feature Extraction – Relative Spectral Entropy Relative spectral entropy - differentiates speech and other sounds.
  • 27. Decision tree classification Extracted featured are then fed to one of the following models: Gaussian Mixture Model K-Nearest Neighbors Hidden Markov Model Essentially these are machine learning / pattern recognition type models that in this case help us classify sounds (as voice, music, or ambient sound) based on the extracted features. All three models yielded about 80% classification accuracy.
  • 29.
  • 31.
  • 33. Unsupervised Adaptive Classification Unsupervised Adaptive Classification components copes with all ambient sound events. The objective of this component is to discover over time environmental sounds that are significant in every day life in addition to voice and music.
  • 34. Mel Frequency Cepstral Coefficient (MFCC) Waveform Convert to Frames DFFT Take Log of amplitude spectrum Mel-scaling and smoothing Discrete Cosine Transform MFCC Features
  • 35. Unsupervised Adaptive Classification Once the event is classified as ambient sound, UAC algorithm starts to run. UAC algorithm extracts MFCC features of the sound event. System checks if the event has already been classified. If not then user is asked to label the unknown sound event. System can store up to 100 of these events.
  • 36. Implementation SoundSense prototype is implemented as a self sustained piece of software that runs on the Apple iPhone. Contains approximately 5,500 lines of code. Written mostly in C/C++ and then implemented in Objective C. Complete package is about 300KB.
  • 38. Evaluation In the quiet environment , the system adopts a long duty cycle, the CPU usage is less than 5%. Once acoustic processing starts CPU usage increases to about 25%. Maximum memory usage (with all bins being used) is about 5MB.
  • 39.
  • 40.
  • 41. Evaluation SoundSense in everyday life http://metrosense.cs.dartmouth.edu/projects.html
  • 42. Conclusions SoundSense is light-weight scalable framework capable of recognizing a broad set of sound events. In contrast to traditional audio context recognition systems that are offline, SoundSense performs online classification at a lower computational cost and yields results that are comparable to the offline systems.
  • 44. References SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones Mel Frequency Cepstral Coefficients for Music Modeling Wikipedia

Editor's Notes

  1. that can be used to make accurate inferences about the person carrying the phone, their environment, and social events.
  2. Scaling - it is difficult to scale to a large number of people because each person has different everyday environments.Robustness – physical phone location matters, is the person carrying the phone in hand, is the phone in the pocket, or is it in the backpack? All of these will affect the sound received by the microphone differently.Device integration – algorithms must be simple enough not to require a lot of computing power and resources, they must not hinder phone’s primary function, they must also safeguard the privacy of the user.
  3. A -
  4. RMS is a quadratic mean of a continuous time waveform. Frame admission is done on the basis of energy level and spectral entropy. Low energy level indicates silence or undesirable phone context, which prevents meaningful classification.Spectral entropy gives cues about the frequency pattern of the sound. A flat spectrum (silence or white noise) results in high entropy. A low value indicates a strong pattern in the spectrum.Entropy helps differentiate “informative low volume” samples from the non-informative ones.Two thresholds – rms_threshold and entropy_threshold are used for frame admission control. If the frame fails both of this criteria it is discarded.
  5. Feature extraction limitation – clipping at 8kHz
  6. The models are way too complex to get into any detail. Essentially these are machine learning / pattern recognition type models that in this case help us classify sounds based on the extracted features.
  7. MFCC Provides fine details in the frequent bands to which human ear is most sensitive, while also capturing coarser information over the other spectral ranges.MFCC based algorithm is unsupervised and is processed on the go.MFCC Provides fine details in the frequency bands to which human ear is most sensitive, while also capturing coarser information over the other spectral ranges.
  8. Mel-scaling and smoothing – this step is used to emphasize perceptually meaningful frequencies. This is achieved by collecting (say) 256 spectral components into (say) 40 frequency bins. Although one would expect these bins to be equally spaced in frequency, it has been found that for speech, the lower frequencies are perceptually more important than higher frequencies. Therefore the bin spacing follows the so-called “Mel” frequency scale.Spectral components are averaged over Mel-spaced bins to produce a smoother spectrum.Discrete cosine transform is used to “clean up” the signal, and finally MFCC features are obtained.