Transcription verhaal2010

347 views

Published on

Searching in spoken words . Disclosure of recorded content in MediaMosa.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
347
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Éénéén
  • Q-go plaatjes
  • Transcription verhaal2010

    1. 1. Arjan van Hessen<br />Speech & Language TechnologyA.J.vanHessen@ewi.utwente.nl<br />Searching in spoken wordsDisclosure of recorded content in MediaMosa<br />SURFnetRelatiedagen 2010Noordwijkerhout December 9, 2010<br />
    2. 2. Content<br />Introduction<br />Why speech is so important<br />What is HLT?<br />Working applications:<br />Self-service (Internet & Telephony)<br />Searching in recorded audiovisual recordings<br />Demonstrations<br />
    3. 3. Humans as speaking creatures<br />The start of the human speech started some 100.000 years ago.<br />Before, the shape of the vocal track was not “ready” for the modern speech. The larynx was situated too high, something you can see with chimps.<br />
    4. 4. Humans as writing creatures<br />Sumer (3300 AD, Mesopotamia) is probably the oldest written language.<br />NU<br />-3300text<br />-10.000farming<br />-100.000speech<br />
    5. 5. What is HLT?<br />Human Language Technology is the technology that mimics the human language capacity.<br />Language<br />UNDERSTANDING<br />speech<br />text<br />sign<br />
    6. 6. Redundancy<br />Vlgoneseenoznrdeeok op eenEglneseuvinretsietmkaat het neituitinwlkeevloogdre de ltteers in eenwroodsaatn, het eingewatblegnaijrk is is dat de eretse en de ltaatseltteer op de jiutsepatalssaatn. De rset van de ltteersmgoenwllikueirggpletaastwdoren en je knutvrelvogensgwoeonlzeenwatersaatt. Ditkmotodmat we neitekleltteer op zcihlzeen maar het wroodalsgheeel.<br />
    7. 7. Pensez a cequevsavezfnit et demandezvs i<<est coque 3 ai<br />
    8. 8. Working applications<br />Dialogue systems (telephony, real time, limited complexity)<br />Disclosure systems (high quality audio, offline, complex)<br />
    9. 9. ContactCenter<br />Voice<br />HLT<br />Natural Language Search<br />Web<br />Mobile<br />
    10. 10. Companies using speech technology<br />
    11. 11. How may I help you<br />Why are they calling?<br />Classification based on the recognition of the question: “how may I help you”<br />Who is calling?<br />Identification via ZIP-code and house number<br />
    12. 12. Organisations using speech technology<br />
    13. 13. Disclosure of audiovisual archives<br />The number of AV-archives on the Internet increases rapidly<br />Archiving is not enough: disclosure and reusing is required!<br />The use of HLT is needed (humans cost too much).<br />
    14. 14. Digitalized (historic) collections<br />Digital recorded collections<br />WFH<br />H.M. Koningin<br />Wilhelmina<br />Second feministic wave<br />Buchenwald<br />Memories of Indonesia<br />LVSR<br />
    15. 15. Searching in historic radio recordings:Radio Oranje<br />
    16. 16. Oral History: Buchenwald<br />
    17. 17. Oral History: Brandgrens, Rotterdam<br />10 getuigen van het bombardement van Rotterdam (mei ‘40) vertellen hun verhaal. TST wordt gebruikt om in de getuigenissen te zoeken.<br />
    18. 18. Searching in the radio interviews of WFH<br />
    19. 19. Searching in 46 interview collections:getuigenverhalen (600 hour)<br />
    20. 20. Searching in 500 interviews in Croatia<br />
    21. 21. CroMe - Audio Search<br />Searching for: commandant<br />Phrase boundaries<br />5 fragments found<br />Found word<br />(5x commandant)<br />
    22. 22. CroMe - Audio Search<br />Search word<br />traumas<br />Language<br />found<br />
    23. 23. Politicalmetings<br />
    24. 24. Parliament<br />transcriptions<br /> Gisteren was er een bespreking ivm de betrekkingen tussen Nederland en Vlaanderen<br />
    25. 25. Recognition of lectures<br />Record the speech<br />Record the PPT<br />Recognise the speech<br />Use the display time of each slide as THE time unit<br />Use the recognised speech as keywords for each slide<br />
    26. 26. Searching in news broadcasts<br />
    27. 27. Metadata -> Language model<br />Text in the slide(s)<br />Lectures handouts<br />Language model<br />Environmental texts<br />
    28. 28. Questions?<br />

    ×