Designing complex interactions for experiences that target XR headsets (MR/VR/AR) can be challenging due to the limited input schemes. While voice commands can be used to augment XR input peripherals, adhering to a rigid keyword-based system can be immersion-breaking and pose user adoption problems. Advances in Machine Learning (ML) now allow developers to easily leverage Natural Language Understanding through reusable techniques. The combination of XR+AI is a powerful integration that opens new possibilities for both gaming, entertainment and enterprise scenarios. This session is an exploration of how speech and language understanding can be used to augment Mixed Reality & VR experiences. We’ll explore the use of Speech recognition & Natural Language Understanding to build advanced voice commands, translate languages from within XR environments, and also look at the creation of intelligent conversation assistants to be used as interactive entities in Mixed Reality and VR apps & games. In a world where speech is the primary form of input, using Machine Learning to process language input and understand the user’s intent is of paramount importance.
9. • Computer Vision + Holographic/AR
• Language Services for MR
Cognitive Services
• Access to cloud data (SQL, Cosmos, etc.) from MR
• Calling Azure ML APIs from MR
Custom AI & Data
Services
• Smart assistants powered by Bots
• Learning agents powered by ML
Immersive Agents
• Offline AI access via Windows ML
• Access to Deep Learning frameworks (CNTK, TF, etc.)
Local AI Services
10. Emotion
Speaker
Recognition
Custom Speech
Recognition (CRIS)
Speech Synthesis &
Recognition
Computer Vision
Face
Video
microsoft.com/cognitive
Linguistic Analysis
Language
Understanding
Bing Spell Check
Entity Linking
Knowledge
Exploration
Academic
Knowledge
Bing
Image Search
Bing
Video Search
Bing
Web Search
WebLM
Text Analytics Recommendations
Bing
Autosuggest
Bing
News Search
Translator Speech
https://www.microsoft.com/cognitive-services/
Custom Vision Custom Voice
11.
12. Why Speech Matters in XR
Experiences
• Speech is the most convenient input method
• No keyboard, mouse, or touch screen (MR, VR)
• Limited input options with gestures and/or
motion controllers (MR, VR)
• Limited ability to interact with screen (phone-
based AR)
• Voice recognition vs Speech recognition
• Speech recognition vs Intent recognition
13. Not just for XR
Many games can benefit from voice
input, speech recognition & synthesis:
FPS squad orders
RPG dialogue
RTS commands
Space sim controls
Interactive Fiction & trivia choices
Hands-free gaming
Voiced computers, robots, etc.
20. { Your Code }
REST Endpoint
Direct Line Protocol
Conversational and
Business Logic
Canvas Aware Context
Sensitive
SDK
Bot Builder SDK
Platform Platform Services
HTTP
REST Endpoint
AI
Intelligent Tools
21. Goals
• Start Simple. Add Complexity. No dead-ends.
• Bot adapts to the user, based on context
• Composable and intelligent controls to manage complexity
Bot Controls
LUIS
Query over database via
Azure Search
Form
Filling
QnA
C#
Customer’s
Business Logic
& DataBot Connector
What?
• Tools for building REST Web Sites
• Services to enrich
• Mechanisms for receive events
• Data to debug and analyze
Why?
• Implements standard protocols
• Modeling conversations is hard. Tools help!
• UI across multiple canvases is hard. Cards rock!
• Language Understanding is hard
• Common and well understood patterns
38. Simple reusable solution
that easily demonstrates
the potential of Mixed
Reality combined with AI
services and a cloud
backend in Azure
1
HoloBot model can easily
be replaced to match any
company branded asset
using custom textures or
full 3D models
2
HoloBot can be integrated
as a virtual assistant for
any immersive/VR or
holographic Mixed Reality
experience, powered by
Bot Framework
3
Beyond LUIS, bots can
connect to more advanced
Machine Learning models
or data sources, allowing
voice-activated touch-free
access
4