Your SlideShare is downloading. ×
0
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

EDF2013: Data Science Curriculum: Nick Campbell: Speech Technology & big data

475

Published on

Data Science Curriculum: Nick Campbell, Speech Communication Lab, Trinity College Dublin, Ireland, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Speech Technology & big data

Data Science Curriculum: Nick Campbell, Speech Communication Lab, Trinity College Dublin, Ireland, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Speech Technology & big data

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
475
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. * Nick Campbell Speech Communication Lab Trinity College Dublin, Ireland
  • 2. * * TCD – Stokes Professor (Dublin) * CNGL – PI – Delivery & Interaction * ELRA – board member / VP – speech * ISCA – board member – workshops * IEEE – Sig Proc Soc - SLTC member * ATR/NiCT – research director(Japan) * Speech Prosody 2014 (Dublin) host * Speech scientist/researcher/corpus analyst
  • 3. * AT&T Bell Labs * The ideas people – think ‘BIG’* IBM UK Scientific Centre * The corpus people – ‘collect it all’* ATR basic telecom research * The fundamentals - learn how to ‘infer’ from it*
  • 4. * we used to be considered BIG – speech data (and now multimedia) gobbled up memory* I collected 1500 hours of everyday chat/daily conversations in 2000 – (@1GB per minute) - took 5-years to process!* now Apple, Google, Ms, .. get that each minute (but the secret is in the metadata)* we need accessible data & tools for everybody! *
  • 5. * but we need to manage privacy issues first! *
  • 6. * and we need a way to protect IP as well* written publications have ISBN standard* work is now underway (cf ELRA & COCOSDA) to institute ISLRN for Language Resources* researchers need to get credit for corpora as well as for publishing research results* The community needs a way to identify, acknowledge, attribute, and reference data *
  • 7. * tools for processing speech & multimodal data* htk, hts, R, etc . . . not simple to use* little consensus on what features to encode* manual bootstrap – much too time-consuming!*
  • 8. * social interaction* personal idiosyncracies* group dynamics – multimodal data (TB/hr)* issues of robustness / domain specificity / privacy / storage & archiving / redistribution *
  • 9. context analytics:* cultural and language-specific needs* multimodal – multimedia – multilingual* tools for ‘less-well-supported’ languages* e.g., U-STAR consortium for speech research – sharing tools & data & knowledge for research *
  • 10. * European Language Resources Association* COCOSDA – int’l coordinating committee* IEEE SLTC, ISCA SIGS, there are places to go * but are they ready for really BIG data? perhaps not yet . . . *
  • 11. * curricula prepare people* what standards to rely on?* what resources available?* what features to extract?* what tools to work with?* what use to put it to?* what info to hide?* what to do next? *
  • 12. *

×