EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data

806 views

Published on

Data Science Curriculum: Nick Campbell, Speech Communication Lab, Trinity College Dublin, Ireland, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Speech Technology & big data

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data

  1. 1. * Nick Campbell Speech Communication Lab Trinity College Dublin, Ireland
  2. 2. * * TCD – Stokes Professor (Dublin) * CNGL – PI – Delivery & Interaction * ELRA – board member / VP – speech * ISCA – board member – workshops * IEEE – Sig Proc Soc - SLTC member * ATR/NiCT – research director(Japan) * Speech Prosody 2014 (Dublin) host * Speech scientist/researcher/corpus analyst
  3. 3. * AT&T Bell Labs * The ideas people – think ‘BIG’* IBM UK Scientific Centre * The corpus people – ‘collect it all’* ATR basic telecom research * The fundamentals - learn how to ‘infer’ from it*
  4. 4. * we used to be considered BIG – speech data (and now multimedia) gobbled up memory* I collected 1500 hours of everyday chat/daily conversations in 2000 – (@1GB per minute) - took 5-years to process!* now Apple, Google, Ms, .. get that each minute (but the secret is in the metadata)* we need accessible data & tools for everybody! *
  5. 5. * but we need to manage privacy issues first! *
  6. 6. * and we need a way to protect IP as well* written publications have ISBN standard* work is now underway (cf ELRA & COCOSDA) to institute ISLRN for Language Resources* researchers need to get credit for corpora as well as for publishing research results* The community needs a way to identify, acknowledge, attribute, and reference data *
  7. 7. * tools for processing speech & multimodal data* htk, hts, R, etc . . . not simple to use* little consensus on what features to encode* manual bootstrap – much too time-consuming!*
  8. 8. * social interaction* personal idiosyncracies* group dynamics – multimodal data (TB/hr)* issues of robustness / domain specificity / privacy / storage & archiving / redistribution *
  9. 9. context analytics:* cultural and language-specific needs* multimodal – multimedia – multilingual* tools for ‘less-well-supported’ languages* e.g., U-STAR consortium for speech research – sharing tools & data & knowledge for research *
  10. 10. * European Language Resources Association* COCOSDA – int’l coordinating committee* IEEE SLTC, ISCA SIGS, there are places to go * but are they ready for really BIG data? perhaps not yet . . . *
  11. 11. * curricula prepare people* what standards to rely on?* what resources available?* what features to extract?* what tools to work with?* what use to put it to?* what info to hide?* what to do next? *
  12. 12. *

×