Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

4Developers 2015: Talking and listening to web pages - Aurelio De Rosa


Published on

Speaker: Aurelio De Rosa

Language: English

As web developers, our job is to build nice, fast, and reliable websites, web apps, or web services. But our role isn't limited to this. We have to build these products not only for our ideal users but for a range of people as wide as possible. Today's browsers help us in achieving this goal providing APIs created with this scope in mind. One of these APIs is the Web Speech API that provides speech input and text-to-speech output features in a web browser.

In this talk you'll learn what the Web Speech API is and how it can drastically improve the way users, especially those with disabilities, perform tasks in your web pages.


Published in: Software
  • Be the first to comment

4Developers 2015: Talking and listening to web pages - Aurelio De Rosa

  1. 1. TALKING AND LISTENING TO WEB PAGES Aurelio De Rosa Warsaw, Poland - 20 April 2015
  2. 2. WEB & APP DEVELOPER CONTRIBUTE(D) TO ... jQuery CanIUse PureCSS WRITE(D) FOR ... SitePoint Tuts+ .NET megazine php [architect] megazine Telerik Web & PHP magazine
  4. 4. WHAT WE'LL COVER Natural language processing (NLP) Why it matters The Web Speech API Speech recognition Speech synthesis Issues and inconsistencies Demo
  5. 5. NATURAL LANGUAGE PROCESSING (NLP) A field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.
  6. 6. NATURAL LANGUAGE PROCESSING (NLP) It all started in 1950 when Alan Turing published an article titled “Computing Machinery and Intelligence” where he proposed what is now called the Turing test.
  8. 8. ONCE UPON A TIME...
  9. 9. VOICEXML It's an XML language for writing Web pages you interact with by listening to spoken prompts and other forms of audio that you can control by providing spoken inputs. Specifications:
  10. 10. VOICEXML: EXAMPLE <?xml version="1.0" encoding="ISO‐8859‐1"?> <vxml version="3.0" lang="en"> <form>    <field name="city">       <prompt>Where do you want to travel to?</prompt>       <option>New York</option>       <option>London</option>       <option>Tokyo</option>    </field>    <block>       <submit next="" namelist="city"/>    </block> </form> </vxml>
  11. 11. JAVA APPLET It's an application written in Java and delivered to users in the form of bytecode through a web page. The applet is then executed within a Java Virtual Machine (JVM) in a process separated from the browser itself.
  12. 12. WHY I CARE
  13. 13. WHY YOU SHOULD CARE A step ahead to fill the gap with native apps Improve user experience Feature needed by some applications such as navigators Help people with disabilities
  14. 14. “DEMO IT OR IT DIDN'T HAPPEN”™ Register to our website Name: Surname: Nationality: Start Thisdemocanbefoundat
  15. 15. WEB SPEECH API The Web Speech API allows you to deal with two aspects of the computer-human interaction: Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). Specifications:
  16. 16. WEB SPEECH API Introduced at the end of 2012 Defines two interfaces: one for recognition and one for synthesis Requires the permission before acquiring audio Agnostic of the underlying technology
  17. 17. SPEECH RECOGNITION There are two types of recognition available: one-shot and continuous. The first stops as soon as the user stops talking, the second must be stopped programmatically. To instantiate a new speech recognizer you have to call speechRecognition(): var recognizer = new speechRecognition();
  18. 18. SPEECH RECOGNITION: BROWSERS SUPPORT Explorer Chrome Safari Firefox Opera None 25+(-webkit) None None None Dataupdatedto18thApril2015
  19. 19. SPEECH RECOGNITION: PROPERTIES continuous grammars* interimResults lang maxAlternatives serviceURI** *UptoChrome42addingagrammartothegrammarspropertydoesnothing.ThishappensbecauseThegroupis currentlydiscussingoptionsforwhichgrammarformatsshouldbesupported,howbuiltingrammartypesarespecified, anddefaultgrammarswhennotspecified. **serviceURIisn'texposedbyanybrowser.
  20. 20. SPEECH RECOGNITION: METHODS start() stop() abort()
  21. 21. SPEECH RECOGNITION: EVENTS start end* audiostart audioend soundstart soundend speechstart speechend result nomatch error *UptoChrome42onWindows8.1doesn'tfiretheresultortheerroreventbeforetheendeventwhenonly noisesareproduced(issue ).#428873
  22. 22. “IT'S SHOWTIME!” Start Stop Thisdemocanbefoundat
  23. 23. SPEECH RECOGNITION: RESULTS Results are obtained as an object (that implements the SpeechRecognitionEvent interface) passed as the first argument of the handler attached to the result event.
  24. 24. PROBLEM: SOMETIMES RECOGNITION SUCKS! Imagine a user of your website or web app says a command but the recognizer returns the wrong string. Your system is good and it asks the user to repeat it, but the recognition fails again. How can you get out of this loop?
  26. 26. SOLUTION: LEVENSHTEIN DISTANCE An approach that isn't ideal but that you can use today.
  27. 27. LEVENSHTEIN DISTANCE: EXAMPLE Commands available: "Send email", "Call" Names in the phonebook: "Aurelio De Rosa", "Annarita Tranfici", "John Doe" Recognized text: Updated text: Start Thisdemocanbefoundat
  28. 28. SPEECH SYNTHESIS Provides text-to-speech functionality in the browser. This is especially useful for blind people and those with visual impairments in general. The feature is exposed via a speechSynthesis object that possess static methods.
  29. 29. SPEECH SYNTHESIS: BROWSERS SUPPORT Explorer Chrome Safari Firefox Opera None 33+ 7+ None 27+ Dataupdatedto18thApril2015
  30. 30. SPEECH SYNTHESIS: PROPERTIES pending speaking paused* ** *UptoChrome42,pausingtheutterancedoesn'treflectinachangeofthepauseproperty(issue )#425553 **InOpera27pausingtheutterancereflectinanerroneous,reversedchangeofthepauseproperty(issue #DNA-37487)
  31. 31. SPEECH SYNTHESIS: METHODS speak()* cancel() pause() resume() getVoices() *UptoChrome42,speak()doesn'tsupportSSMLanddoesn'tstripunrecognizedtags(issue ).#428902
  33. 33. SPEECH SYNTHESIS: EVENTS voicechanged
  34. 34. SPEECH SYNTHESIS: UTTERANCE INTERFACE The SpeechSynthesisUtterance interface represents the utterance (i.e. the text) that will be spoken by the synthesizer.
  35. 35. SPEECH SYNTHESIS: UTTERANCE PROPERTIES lang pitch* rate* text** voice volume* *UptoChrome42,changingthepitch,thevolume,andtheratepropertiesdoesnothing(issue )#376280 **UptoChrome42,thetextpropertycan'tbesettoanSSML(SpeechSynthesisMarkupLanguage)document becauseitisn'tsupportedandChromedoesn'tstriptheunrecognizedtags(issue ).#428902
  36. 36. SPEECH SYNTHESIS: UTTERANCE EVENTS start end pause resume boundary* mark* error boundaryandmarkarenotsupportedbyanybrowserbecausetheyarefiredbytheinteractionwithSSML documents.
  37. 37. SHOW ME TEH CODEZ To set the text to emit, we can either pass it when instantiating an utterance object or set it later using the text property.
  38. 38. EXAMPLE 1 var utterance = new SpeechSynthesisUtterance('Hello!'); utterance.lang = 'en‐US'; utterance.rate = 1.2; utterance.addEventListener('end', function() {    console.log('Speech completed'); }); speechSynthesis.speak(utterance);
  39. 39. EXAMPLE 2 var utterance = new SpeechSynthesisUtterance(); utterance.text = 'Hello!'; utterance.lang = 'en‐US'; utterance.rate = 1.2; utterance.addEventListener('end', function() {    console.log('Speech completed'); }); speechSynthesis.speak(utterance);
  40. 40. SPEECH SYNTHESIS: DEMO IT MAN! I know my voice isn't very sexy, but I still want to say that this conference is wonderful and the audience of my talk is even better. You all rock! Thisdemocanbefoundat
  41. 41. HOW I DID IT
  42. 42. INTERACTIVE FORM: RECIPE Promises (to avoid the callback hell) Speech recognition Speech synthesis TheactualcodeisabitdifferentbutImadethechangesforthesakeofbrevityandthelimitedsizeofthescreen.
  43. 43. INTERACTIVE FORM: STEP 1 - HTML <form id="form">  <label for="name"    data‐question="What's your name?">Name:</label>  <input id="name" />  <label for="surname"    data‐question="What's your surname?">Surname:</label>  <input id="surname" />  <!‐‐ Other label/element pairs here ‐‐>  <input id="btn‐voice" type="submit" value="Start" /> </form>
  44. 44. INTERACTIVE FORM: STEP 2 - SUPPORT LIBRARY Create a Speech object containing two methods: speak and recognize that return a Promise. var Speech = {   speak: function(text) {     return new Promise(function(resolve, reject) {...}   },   recognize: function() {     return new Promise(function(resolve, reject) {...}   } }
  45. 45. INTERACTIVE FORM: STEP 3 - JS 1/2 function formData(i) {   return promise.then(function() {         return Speech.speak(            fieldLabels[i].dataset.question         );      }).then(function() {        return Speech.recognize().then(function(text) {          document.getElementById(            fieldLabels[i].getAttribute('for')          ).value = text;        })      }); }
  46. 46. INTERACTIVE FORM: STEP 3 - JS 2/2 var form = document.getElementById('form'); form.addEventListener('click', function(event) {    var fieldLabels = document.querySelectorAll('label');    function formData(i) { /* code here */ }    for(var i = 0; i < fieldLabels.length; i++) {       promise = formData(i);    }    promise.then(function() {       return Speech.speak('Thank you for filling...');    }).catch(function(error) { alert(error); }); });
  47. 47. DICTATION: RECIPE Speech recognition TheactualcodeisabitdifferentbutImadethechangesforthesakeofbrevityandthelimitedsizeofthescreen.
  48. 48. DICTATION: STEP 1 - HTML <div id="transcription" contenteditable="true"></div> <button id="btn‐start">Start</button> <button id="btn‐stop">Stop</button>
  49. 49. DICTATION: STEP 2 - JS 1/3 var recognizer = new SpeechRecognition(); recognizer.interimResults = true; recognizer.continuous = true; var transcr = document.getElementById('transcription'); var currTranscr = document.createElement('span'); = 'current‐transcription';
  50. 50. DICTATION: STEP 2 - JS 2/3 recognizer.addEventListener('result', function(event){   currTranscr.textContent = '';   var i = event.resultIndex;   while (i < event.results.length) {     var result = event.results[i++];     if (result.isFinal) {       transcr.removeChild(currTranscr);       transcr.textContent += result[0].transcript;       transcr.appendChild(currTranscr);     } else {       currTranscr.textContent += result[0].transcript;     }   } });
  51. 51. DICTATION: STEP 2 - JS 3/3 var btnStart = document.getElementById('btn‐start'); btnStart.addEventListener('click', function() {    transcr.textContent = '';    transcr.appendChild(currTranscr);    recognizer.start(); }); var btnStop = document.getElementById('btn‐stop'); btnStop.addEventListener('click', function() {    recognizer.stop(); });
  52. 52. ONE LAST DEMO... VideocourtesyofSzymonNowak( ):@szimek
  53. 53. THANK YOU!
  54. 54. QUESTIONS?
  55. 55. CONTACTS Website: Email: Twitter: @AurelioDeRosa