SlideShare a Scribd company logo
mjav
                  meong
     h




                miaou     mja
    m




                                u
ea
m




                                             m
                                             ia
                                     m



                                               uw
                                      na
    meow




                                         u
                                    mi



                                             m
            v




                                      ao



                                             já
           ya




                                                    ?
           mi
Automatic	
  Language	
  
   Identification	
  	
  
         	
  
       (LiD)	
  
      Alexander	
  Hosford	
  

         @awhosford	
  
The	
  LiD	
  Problem	
  


•  The	
  identification	
  of	
  a	
  given	
  spoken	
  language	
  
   from	
  a	
  speech	
  signal	
  

•  94%	
  of	
  global	
  population	
  speak	
  only	
  6%	
  of	
  
   world	
  languages	
  
Real	
  World	
  Situations	
  


•  Automation	
  of	
  LiD	
  is	
  desirable	
  

•  Offers	
  many	
  benefits	
  to	
  international	
  service	
  
   industries	
  

•  Hotels,	
  Airports,	
  Global	
  Call	
  Centres	
  
Language	
  Differences	
  
            	
  
•  Languages	
  contain	
  information	
  that	
  makes	
  
  one	
  discernable	
  from	
  the	
  other;	
  
   –  Phonemes	
  

   –  Prosody	
  

   –  Phonotactics	
  

   –  Syntax	
  
Human	
  Abilities	
  

•  Bias	
  towards	
  native	
  language	
  arises	
  in	
  infancy	
  
•  Prosodic	
  features	
  some	
  of	
  the	
  first	
  cues	
  to	
  be	
  
       recognised	
  
•  Humans	
  can	
  make	
  a	
  reasonable	
  estimate	
  on	
  language	
  
       heard	
  within	
  3-­‐4	
  seconds	
  of	
  audition	
  
•  Even	
  unfamiliar	
  languages	
  may	
  be	
  plausibly	
  judged	
  
	
  
Previous	
  Attempts	
  

•  Attempts	
  as	
  early	
  as	
  1974	
  –	
  USAF	
  work,	
  therefore	
  
   classified.	
  
•  Methodological	
  Investigations	
  as	
  early	
  as	
  1977	
  (House	
  &	
  
   Neuburg)	
  
•  Studies	
  for	
  the	
  most	
  part	
  center	
  on	
  phonotactic	
  constrains	
  
   and	
  phoneme	
  modeling	
  
•  Raw	
  acoustic	
  waveforms	
  have	
  been	
  visited	
  (Kwasny	
  1993)	
  
A	
  Simpler	
  Approach?	
  

•  Phonotactic	
  approaches	
  require	
  expert	
  linguistic	
  
   knowledge	
  
•  Phoneme	
  and	
  phonotactic	
  modeling	
  time	
  
   consuming	
  
•  Given	
  the	
  speed	
  of	
  human	
  LiD	
  abilities,	
  
   discrimination	
  most	
  likely	
  based	
  on	
  acoustic	
  
   features	
  
System	
  Overview	
  


                       SuperCollider


                                       MFCC Vectors
                                                                    SuperCollider
                                                                    SystemCmd
Pre-processed Speech        Onset                      ARFF File
                                                                                    WEKA Tools
                                           nPVI        Generation
      Utterance            Detection


                                       Pitch Contour
Feature	
  Extraction	
  
•  Spectral	
  Information	
  –	
  MFCC	
  Vectors	
  
•  Pitch	
  contour	
  information	
  
•  Handled	
  by	
  SCMIR	
  Library	
  (Collins,	
  2010)	
  
•  Speech	
  Rhythm	
  –	
  Normalised	
  Pairwise	
  Variability	
  
   Index	
  (nPVI)	
  

                           Feature	
  Type	
       Implemented	
  Measure	
  
                         Spectral	
  Content	
       MFCC	
  Vector	
  uGen	
  
                           Pitch	
  Contour	
            Tartini	
  uGen	
  
                         Speech	
  Rhythm	
             nPVI	
  function	
  
Classification	
  


•  Handled	
  by	
  the	
  WEKA	
  toolkit	
  

•  Built	
  in	
  Multilayer	
  Perceptron	
  

•  Called	
  from	
  the	
  command	
  line	
  through	
  a	
  
   SuperCollider	
  system	
  command	
  
Comparisons	
  Made	
  


•  66	
  language	
  pairs	
  from	
  12	
  languages	
  

•  A	
  comparison	
  within	
  language	
  families	
  

•  A	
  comparison	
  of	
  all	
  12	
  languages	
  

•  Averaged	
  &	
  segmented	
  data	
  
Results	
  Within	
  Families	
  
100.00	
  


 90.00	
  


 80.00	
  


 70.00	
  
                                                                                                           Germanic	
  

                                                                                                           Romantic	
  
 60.00	
  
                                                                                                           Slavic	
  

                                                                                                           SinoAltaic	
  
 50.00	
  
                                                                                                           Mean	
  


 40.00	
  


 30.00	
  


 20.00	
  
             4	
  MFCC	
     13	
  MFCC	
  Tartini	
     41	
  MFCC	
  Tartini	
     All	
  Features	
  



                             *Data	
  averaged	
  across	
  files	
  
Results	
  Within	
  Families	
  
100.00	
  


 90.00	
  


 80.00	
  


 70.00	
                                                                                               Germanic	
  

                                                                                                       Romantic	
  
 60.00	
                                                                                               Slavic	
  

                                                                                                       SinoAltaic	
  
 50.00	
                                                                                               Mean	
  


 40.00	
  


 30.00	
  


 20.00	
  
              4	
  MFCC	
                 13	
  MFCC	
  Tartini	
          41	
  MFCC	
  Tartini	
  


                              *Data	
  in	
  5	
  second	
  segments	
  
                              	
  
Results	
  From	
  All	
  Languages	
  
100.00	
  


 90.00	
  


 80.00	
  


 70.00	
  


 60.00	
                                                                                                                                                                                                                        Averaged	
  

                                                                                                                                                                                                                                Mean	
  
 50.00	
                                                                                                                                                                                                                        5s	
  Segments	
  

                                                                                                                                                                                                                                Mean	
  
 40.00	
  


 30.00	
  


 20.00	
  


 10.00	
  
              4	
  MFCC	
     4	
  MFCC	
  Tartini	
   4	
  MFCC	
  Tartini	
     13	
  MFCC	
     13	
  MFCC	
  Tartini	
   13	
  MFCC	
  Tartini	
     41	
  MFCC	
     41	
  MFCC	
  Tartini	
   41	
  MFCC	
  Tartini	
  
                                                              nPVI	
                                                                 nPVI	
                                                                 nPVI	
  
Extensions	
  

•  nPVI	
  function	
  to	
  use	
  vowel	
  onsets	
  

•  Phonemic	
  Segmentation	
  

•  A	
  Larger	
  dataset	
  

•  Better	
  Efficiency	
  

•  Real-­‐time	
  operation	
  
Robustness	
  

•  Real	
  world	
  signals	
  are	
  very	
  different	
  from	
  
   processed	
  ‘clean’	
  data.	
  

•  ‘Ideal’	
  LiD	
  systems	
  –	
  independent	
  &	
  robust	
  

•  A	
  need	
  to	
  analyse	
  only	
  the	
  part	
  of	
  the	
  signal	
  
   that	
  matters.	
  
CASA	
  

•  A	
  computational	
  modeling	
  of	
  human	
  
  ‘Auditory	
  Scene	
  Analysis’	
  (Bregman	
  1990)	
  

•  Separation	
  of	
  signal	
  into	
  component	
  parts	
  
  and	
  reconstitution	
  into	
  meaningful	
  ‘streams’	
  
Using	
  Noisy	
  Signals	
  



  I’m	
  open	
  to	
  suggestions!	
  
alexanderhosford@googlemail.com	
  
          07814	
  692	
  549	
  
          @awhosford	
  

More Related Content

Viewers also liked

subrat
 subrat subrat
subrat
ABA,BALASORE
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Iván Compañy Avi
 
แรงดันในของเหลว
แรงดันในของเหลวแรงดันในของเหลว
แรงดันในของเหลวtewin2553
 
Aids to creativity
Aids to creativityAids to creativity
Aids to creativity
preciouspresentation
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profilecharlyheus
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profilecharlyheus
 
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePointSharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
Marc D Anderson
 
Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010
Mark Morrell
 
поездка на родину
поездка на родинупоездка на родину
поездка на родинуzizari
 
Edge Amsterdam Profile
Edge Amsterdam ProfileEdge Amsterdam Profile
Edge Amsterdam Profilecharlyheus
 
Conversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pagesConversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pages
Dale Lane
 
Automatic Language Identification
Automatic Language IdentificationAutomatic Language Identification
Automatic Language Identificationbigshum
 
01 introduction image processing analysis
01 introduction image processing analysis01 introduction image processing analysis
01 introduction image processing analysis
Rumah Belajar
 
3D visualisation of medical images
3D visualisation of medical images3D visualisation of medical images
3D visualisation of medical images
Shashank
 
Media text analysis
Media text analysisMedia text analysis
Media text analysis
mariaahmad82
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)
Lithium
 
Mining the CPLEX Node Log for Faster MIP Performance
Mining the CPLEX Node Log for Faster MIP PerformanceMining the CPLEX Node Log for Faster MIP Performance
Mining the CPLEX Node Log for Faster MIP Performance
IBM Decision Optimization
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
Drift
 

Viewers also liked (20)

subrat
 subrat subrat
subrat
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Verb to be
Verb to beVerb to be
Verb to be
 
แรงดันในของเหลว
แรงดันในของเหลวแรงดันในของเหลว
แรงดันในของเหลว
 
Aids to creativity
Aids to creativityAids to creativity
Aids to creativity
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
 
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePointSharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
 
TRABAJO GRUPAL
TRABAJO GRUPALTRABAJO GRUPAL
TRABAJO GRUPAL
 
Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010
 
поездка на родину
поездка на родинупоездка на родину
поездка на родину
 
Edge Amsterdam Profile
Edge Amsterdam ProfileEdge Amsterdam Profile
Edge Amsterdam Profile
 
Conversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pagesConversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pages
 
Automatic Language Identification
Automatic Language IdentificationAutomatic Language Identification
Automatic Language Identification
 
01 introduction image processing analysis
01 introduction image processing analysis01 introduction image processing analysis
01 introduction image processing analysis
 
3D visualisation of medical images
3D visualisation of medical images3D visualisation of medical images
3D visualisation of medical images
 
Media text analysis
Media text analysisMedia text analysis
Media text analysis
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)
 
Mining the CPLEX Node Log for Faster MIP Performance
Mining the CPLEX Node Log for Faster MIP PerformanceMining the CPLEX Node Log for Faster MIP Performance
Mining the CPLEX Node Log for Faster MIP Performance
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Automatic Language Identification

  • 1. mjav meong h miaou mja m u ea m m ia m uw na meow u mi m v ao já ya ? mi
  • 2. Automatic  Language   Identification       (LiD)   Alexander  Hosford   @awhosford  
  • 3. The  LiD  Problem   •  The  identification  of  a  given  spoken  language   from  a  speech  signal   •  94%  of  global  population  speak  only  6%  of   world  languages  
  • 4. Real  World  Situations   •  Automation  of  LiD  is  desirable   •  Offers  many  benefits  to  international  service   industries   •  Hotels,  Airports,  Global  Call  Centres  
  • 5. Language  Differences     •  Languages  contain  information  that  makes   one  discernable  from  the  other;   –  Phonemes   –  Prosody   –  Phonotactics   –  Syntax  
  • 6. Human  Abilities   •  Bias  towards  native  language  arises  in  infancy   •  Prosodic  features  some  of  the  first  cues  to  be   recognised   •  Humans  can  make  a  reasonable  estimate  on  language   heard  within  3-­‐4  seconds  of  audition   •  Even  unfamiliar  languages  may  be  plausibly  judged    
  • 7. Previous  Attempts   •  Attempts  as  early  as  1974  –  USAF  work,  therefore   classified.   •  Methodological  Investigations  as  early  as  1977  (House  &   Neuburg)   •  Studies  for  the  most  part  center  on  phonotactic  constrains   and  phoneme  modeling   •  Raw  acoustic  waveforms  have  been  visited  (Kwasny  1993)  
  • 8. A  Simpler  Approach?   •  Phonotactic  approaches  require  expert  linguistic   knowledge   •  Phoneme  and  phonotactic  modeling  time   consuming   •  Given  the  speed  of  human  LiD  abilities,   discrimination  most  likely  based  on  acoustic   features  
  • 9. System  Overview   SuperCollider MFCC Vectors SuperCollider SystemCmd Pre-processed Speech Onset ARFF File WEKA Tools nPVI Generation Utterance Detection Pitch Contour
  • 10. Feature  Extraction   •  Spectral  Information  –  MFCC  Vectors   •  Pitch  contour  information   •  Handled  by  SCMIR  Library  (Collins,  2010)   •  Speech  Rhythm  –  Normalised  Pairwise  Variability   Index  (nPVI)   Feature  Type   Implemented  Measure   Spectral  Content   MFCC  Vector  uGen   Pitch  Contour   Tartini  uGen   Speech  Rhythm   nPVI  function  
  • 11. Classification   •  Handled  by  the  WEKA  toolkit   •  Built  in  Multilayer  Perceptron   •  Called  from  the  command  line  through  a   SuperCollider  system  command  
  • 12. Comparisons  Made   •  66  language  pairs  from  12  languages   •  A  comparison  within  language  families   •  A  comparison  of  all  12  languages   •  Averaged  &  segmented  data  
  • 13. Results  Within  Families   100.00   90.00   80.00   70.00   Germanic   Romantic   60.00   Slavic   SinoAltaic   50.00   Mean   40.00   30.00   20.00   4  MFCC   13  MFCC  Tartini   41  MFCC  Tartini   All  Features   *Data  averaged  across  files  
  • 14. Results  Within  Families   100.00   90.00   80.00   70.00   Germanic   Romantic   60.00   Slavic   SinoAltaic   50.00   Mean   40.00   30.00   20.00   4  MFCC   13  MFCC  Tartini   41  MFCC  Tartini   *Data  in  5  second  segments    
  • 15. Results  From  All  Languages   100.00   90.00   80.00   70.00   60.00   Averaged   Mean   50.00   5s  Segments   Mean   40.00   30.00   20.00   10.00   4  MFCC   4  MFCC  Tartini   4  MFCC  Tartini   13  MFCC   13  MFCC  Tartini   13  MFCC  Tartini   41  MFCC   41  MFCC  Tartini   41  MFCC  Tartini   nPVI   nPVI   nPVI  
  • 16. Extensions   •  nPVI  function  to  use  vowel  onsets   •  Phonemic  Segmentation   •  A  Larger  dataset   •  Better  Efficiency   •  Real-­‐time  operation  
  • 17. Robustness   •  Real  world  signals  are  very  different  from   processed  ‘clean’  data.   •  ‘Ideal’  LiD  systems  –  independent  &  robust   •  A  need  to  analyse  only  the  part  of  the  signal   that  matters.  
  • 18. CASA   •  A  computational  modeling  of  human   ‘Auditory  Scene  Analysis’  (Bregman  1990)   •  Separation  of  signal  into  component  parts   and  reconstitution  into  meaningful  ‘streams’  
  • 19. Using  Noisy  Signals   I’m  open  to  suggestions!  
  • 20. alexanderhosford@googlemail.com   07814  692  549   @awhosford