Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AI in RTC - RTC Korea 2018

509 views

Published on

Chad Hart examines the use of AI and Machine Learning (ML) in Real Time Communications (RTC) applications including speech analytics, voicebots, computer vision, and ML optimization of RTC components. Chad includes examples from his AI in RTC research report, webrtcHacks, and cogint.ai.

Published in: Technology
  • Be the first to comment

AI in RTC - RTC Korea 2018

  1. 1. cwh.consulting Artificial Intelligence in Real Time Communications (AI in RTC) RTC Korea 1 November 2018
  2. 2. cwh.consulting A blog for WebRTC developers webrtcHacks.com @webrtcHacks AI & RTC blog cogint.ai @cogintai WebRTC and ML for Developer Event November 16, 2018 in San Francisco krankygeek.com About Me Chad Hart Analyst & Product Consultant https://cwh.consulting @chadwallacehart chad@cwh.consulting
  3. 3. cwh.consulting AI in RTC Research Study • Authors • Chad Hart – cwh.consulting • Tsahi Levent-Levi - BlogGeek.me • Methodology • 40+ 1-on-1 vendor interviews • ~100 respondent web survey • Analysis of 126 companies & all major products • Output: 147-page report
  4. 4. cwh.consulting + = Image source: pixabay.com/en/a-i-ai-anatomy-2729782 What is AI in RTC? RTC
  5. 5. cwh.consulting AI in RTC use case categories speech analytics voicebots RTC optimization computer vision Image source: pixabay.com/en/a-i-ai-anatomy-2729782
  6. 6. cwh.consulting • Call center agent monitoring • Transcription • Translation • Agent coaching • Customer engagement Speech Analytics
  7. 7. cwh.consulting Promise: machine transcription at human levels Source: Google I/O 2017 keynote
  8. 8. cwh.consulting Reality: transcription quality is often not so great My name is a chat heart of you might be familiar with Dave from a brand or if you are, a web or to see people I've done about five years, I'm or so a of an independent analyst. So I'm mostly do park management strategy type. For a product, marketing. My name is Chad Hart. You might be familiar with me from a brand -- if you are WebRTC people; I've done webrtcHacks now for about five years or so. Outside of webrtcHacks, I have been an independent analyst. I mostly do product management and strategy type work and product marketing. Machine Transcription Actual Transcription https://www.nojitter.com/post/240173958/when-speech-analytics-makes-gibberish-useful
  9. 9. cwh.consulting My name is Chad Hart. You might be familiar with me from a brand -- if you are WebRTC people; I've done webrtcHacks now for about five years or so. Outside of webrtcHacks, I have been an independent analyst. I mostly do product management and strategy type work and product marketing. Reality: transcription quality is often not so great My name is a chat heart of you might be familiar with Dave from a brand or if you are, a web or to see people I've done about five years, I'm or so a of an independent analyst. So I'm mostly do park management strategy type. For a product, marketing. Machine Transcription Actual Transcription Non-standard spelling Industry Jargon Speech disfluencies US-English language assumption https://www.nojitter.com/post/240173958/when-speech-analytics-makes-gibberish-useful
  10. 10. cwh.consulting Higher-level speech analytics • Perfect transcription is not needed to provide useful analysis. • Higher-level speech analytics systems look for patterns in speech. • These patterns can be matched to business outcomes, such as did a caller end up purchasing or did they give a good customer satisfaction score. • There are often meaningful patterns beyond the words that were spoken – like how fast each party was speaking, or how often the agent talked compared to the customer. • There is also a lot of work going into looking at caller emotion and sentiment. Source: CallMiner
  11. 11. cwh.consulting • IVR replacement • Starting meetings • In-call assistance Voicebots – Smart Speakers & Assistants
  12. 12. cwh.consulting • Another area we examined was voice bots. • These are smart speakers like the google home which was recently made available in South Korea and AI assistants like Bixby or Siri. • Building a voicebot is complex. You not only need to transcribe the speech and run some natural language understanding on it like in speech analytics, but you need to also generate speech and deal with interactivity with the customer in real time. • There is very broad interest in using these voicebots • Every telephony device maker is interested in adding a voice user interface to their products – and this is a natural fit since people “talk” to these devices already. • Typical conference room equipment is already setup to capture good quality audio with minimal noise from a variety of locations throughout the room with microphone arrays • However, most companies are just starting to figure out how to use them in their products. Voicebots – Smart Speakers & Assistants
  13. 13. cwh.consulting Flattening the IVR: humans don’t speak in menus https://cogint.ai/dialogflow-phone-bot/ Menu DTMF Menu DTMF Response Response Menu DTMF Response Response Response Menu DTMF Response Response Response Menu DTMF Response Response Utterance Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Traditional IVR Menu Voicebot time 10 potential responses in an IVR menu hierarchy vs. a voicebot
  14. 14. cwh.consulting Flattening the IVR: humans don’t speak in menus • One major area where voicebots will have an impact is in IVRs. • Traditional IVRs were designed for DTMF input and are usually setup with multiple levels of menus. • Because people cannot remember more than a few menu options at a time, you cannot put too many options in each menu. • As a result, to fit many options, you need to have a complex menu with many layers. • Users hate this because they are difficult to navigate and takes too long. • Voicebots help to flatten the IVR into a just a few layers. • Rather than navigating a complex menu, user can just say what they want and use natural language to get the information they need. • This is good for call centers too because users are more likely to stay in the IVR instead of immediately dropping out to an operator. https://cogint.ai/dialogflow-phone-bot/
  15. 15. cwh.consulting New voicebots: consumer ⇨ businessNotable Consumer Voicebot Market Milestones krankygeek.com/research KRANKY GEEK RESEARCH Notable voicebot milestones
  16. 16. cwh.consulting New voicebot technology threatens IVRs Time Abilitytooffloadhumantasks today
  17. 17. cwh.consulting • Funny hats • Face detection • Gestures • Object detection • Emotion analysis Computer vision
  18. 18. cwh.consulting Object detection over WebRTC with TensorFlow Blog post: https://webrtchacks.com/webrtc-cv-tensorflow/ Demo video: https://youtu.be/vzTXW0hGINM • Using open source libraries and existing work, without having a PhD in computer vision it is relatively simple to setup your own server and process real time video. • Here is an example of a server I setup to do real time analysis of a WebRTC stream.
  19. 19. cwh.consulting Object detection over WebRTC with TensorFlow – example architecture https://webrtchacks.com/webrtc-cv-tensorflow/ TensorFlow Object Detection Flask Server Browser local.js index.html objDetect.js POST with image object details web assets GET web assets • This is just a very basic example that uses an HTTP post to send several images per second to a cloud-based server for processing. • As you saw in the video, there can be a little bit of lag. • Using a GPU-accelerated server or even something like Google’s TPU that were specifically designed to accelerate heavy machine learning graphs would have helped • But ultimately streaming a high-quality image can always have its limits. • Wouldn’t it be nice if you do the heavy processing locally with hardware acceleration, just like you can hardware accelerate codecs like H.264?
  20. 20. cwh.consulting ML processing moving to the edge, with faster, local processing • That’s exactly what you can do with some new chipsets from vendors like Intel. • This is an example of a kit from Google called the AIY Vision Kit that includes the Intel Movidius processor. • The Movidius is designed to run deep neural networks locally and is especially well-suited to low-power computer vision applications. • This kits runs on a tiny, single core Raspberry Pi 0 with only 512MB of RAM. • Google used to sell just the vision bonnet add-on part of the chip for $45. Now you can buy the complete kit with the Raspberry Pi for $90 in the US. • Note that Amazon also has a computer vision kit it calls Deep Lense. That runs on something more like an Intel NUC mini-PC and costs $250.
  21. 21. cwh.consulting ML processing moving to the edge, with faster, local processing https://webrtchacks.com/aiy-vision-kit-uv4l-web-server/
  22. 22. cwh.consulting Improvements with edge hardware (demonstration) • Let’s look at this in action • This all runs locally on the Pi. • So in this case, I am doing the computer vision process locally while sending the stream and annotation remotely Blog post: https://webrtchacks.com/aiy-vision-kit-uv4l- web-server Video: https://youtu.be/h0O18R1rI9U
  23. 23. cwh.consulting Fun use cases with native mobile libraries • With new native mobile libraries like Apple’s CoreML and Google’s ML Kit, it is relatively simple. • Some of the engineers at Houseparty wrote a blog post demonstrating how to do smile detection • Similar libraries are available that detect facial boundaries and let you put hats, sunglasses, beards, and other silly masks on people – I am sure you have seen some of these! • Similar techniques can be used in a business context to blur out backgrounds for remote workers who call into a video conference. https://webrtchacks.com/ml-kit-smile-detection/
  24. 24. cwh.consulting MLKit CPU consumption: high framerates are not practical (without special hardware) CPU Usage for different framerates processed by ML Kit CPUUsage% https://webrtchacks.com/ml-kit-smile-detection/
  25. 25. cwh.consulting Resource consumption MLKit is small compared to WebRTC https://webrtchacks.com/ml-kit-smile-detection/
  26. 26. cwh.consulting WebRTC CV is coming to the browser https://w3c.github.io/webrtc-nv-use-cases/#funnyhats* This is from a W3C document examining use cases for the next version of WebRTC
  27. 27. cwh.consulting RTC optimization • Noise suppression • Echo cancellation • Error correction • Route optimization
  28. 28. cwh.consulting Mozilla RNNoise – real time, low-power noise suppression with deep learning • One example is a research project from Mozilla that uses Deep Learning to provide better real-time noise suppression. • This is designed for lower power devices and does not require any specialized hardware. • We do not have time now, but you can go to that link and try some demos. • Unfortunately this was just a research project, but it gives you some idea of what could be done in this and other areas. https://people.xiph.org/~jm/demo/rnnoise/
  29. 29. cwh.consulting Special discount for RTC Korea Use code RTC-KOREA until November 7 for $1000.00 off krankygeek.com/research or email me purchase at
  30. 30. cwh.consulting Questions?
  31. 31. cwh.consulting A blog for WebRTC developers webrtcHacks.com @webrtcHacks AI & RTC blog cogint.ai @cogintai WebRTC and ML for Developer Event November 16, 2018 in San Francisco krankygeek.com About Me Chad Hart Analyst & Product Consultant https://cwh.consulting @chadwallacehart chad@cwh.consulting

×