Deep Learning and Intelligent
Applications
Dr Xuedong Huang
Distinguished Engineer & Head of Advanced Technology Group
Microsoft Technology and Research
xdh@microsoft.com
What drives speech technology progress?
Stage Oxygen
Our computing
infrastructure
including GPU is a
stage for performers
Performers
Big usage data is
oxygen to beautify
our performance
Deep learning is
changing everything
as our top
performer
2015 System
Human Error Rate 4%
Speech recognition could reach human parity in the next 3 years
5/25/2016
Dong Yu and Xuedong Huang: Microsoft Computational
Network Toolkit
6
ImageNet: Microsoft 2015 ResNet
28.2
25.8
16.4
11.7
7.3 6.7
3.5
ILSVRC 2010
NEC America
ILSVRC 2011
Xerox
ILSVRC 2012
AlexNet
ILSVRC 2013
Clarifi
ILSVRC 2014
VGG
ILSVRC 2014
GoogleNet
ILSVRC 2015
ResNet
ImageNet Classification top-5 error (%)
Microsoft had all 5 entries being the 1-st places this year: ImageNet classification,
ImageNet localization, ImageNet detection, COCO detection, and COCO
segmentation
Nvidia CEO's View
http://blogs.nvidia.com/blog/2016/01/12/accelerating-ai-artificial-intelligence-gpus/
Design Goal of CNTK
• A deep learning tool that balances
• Efficiency: Can train production systems as fast as possible
• Performance: Can achieve state-of-the-art performance on benchmark tasks
and production systems
• Flexibility: Can support various tasks such as speech, image, and text, and can
try out new ideas quickly
• Inspiration: Legos
• Each brick is very simple and performs a specific function
• Create arbitrary objects by combining many bricks
• CNTK enables the creation of existing and novel models by combining
simple functions in arbitrary ways.
5/25/2016 10
Functionality
• Supports
• CPU and GPU with a focus on GPU Cluster
• Windows and Linux
• automatic numerical differentiation
• Efficient static and recurrent network training through batching
• data parallelization within and across machines with 1-bit quantized SGD
• memory sharing during execution planning
• Modularized: separation of
• computational networks
• execution engine
• learning algorithms
• model description
• data readers
• Models can be described and modified with
• C++ code
• Network definition language (NDL) and model editing language (MEL)
• Brain Script (beta)
• Python and C# (planned)
5/25/2016 11
Architecture
12
CNICNBuilderCN
Description Use Build
ILearnerIDataReaderFeatures &
Labels Load Get data
IExecutionEngine
CPU/GPU
Task-specific reader SGD, AdaGrad, etc.
Evaluate
Compute Gradient
At the Heart: Computational Networks
• A generalization of machine learning models that can be described as a
series of computational steps.
• E.g., DNN, CNN, RNN, LSTM, DSSM, Log-linear model
• Representation:
• A list of computational nodes denoted as
n = {node name : operation name}
• The parent-children relationship describing the operands
{n : c1, · · · , cKn }
• Kn is the number of children of node n. For leaf nodes Kn = 0.
• Order of the children matters: e.g., XY is different from YX
• Given the inputs (operands) the value of the node can be computed.
• Can flexibly describe deep learning models.
• Adopted by many other popular tools as well
5/25/2016 13
CNTK Summary
• CNTK is a powerful tool that supports CPU/GPU and runs under
Windows/Linux
• CNTK is extensible with the low-coupling modular design: adding new
readers and new computation nodes is easy with a new reader design
• Network definition language, macros, and model editing language (as
well as Brain Script and Python binding in the future) makes network
design and modification easy
• Compared to other tools CNTK has a great balance between
efficiency, performance, and flexibility
5/25/2016 14
Theano only supports 1 GPU
We report 8 GPUs (2 machines) for CNTK only as it is the only
public toolkit that can scale beyond a single machine. Our system
can scale beyond 8 GPUs across multiple machines with superior
distributed system performance.
0
10000
20000
30000
40000
50000
60000
70000
80000
CNTK Theano TensorFlow Torch 7 Caffe
Speed Comparison (Frames/Second, The Higher the Better)
1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs)
5/25/2016 15
CNTK Computational Performance
A portfolio of APIs, SDKs and apps that enable developers to easily add intelligent
services, such as vision or speech capabilities, to their solutions
Project Oxford – Adding “smart” to your applications
Understand the data around
your application
PROJECT OXFORD
Analyze an Image
Understand content within an image
OCR
Detect and recognize words within an image
Generate Thumbnail
Scale and crop images, while retaining key content
Computer Vision APIs
Analyze Image
Type of Image:
Clip Art Type 0 Non-clipart
Line Drawing Type 0 Non-Line Drawing
Black & White Image False
Content of Image:
Categories [{ “name”: “people_swimming”, “score”: 0.099609375 }]
Adult Content False
Adult Score 0.18533889949321747
Faces [{ “age”: 27, “gender”: “Male”, “faceRectangle”:
{“left”: 472, “top”: 258, “width”: 199, “height”: 199}}]
Image Colors:
Dominant Color Background White
Dominant Color Foreground Grey
Dominant Colors White
Accent Color
OCR
LIFE IS LIKE
RIDING A BICYCLE
TO KEEP YOUR BALANCE
YOU MUST KEEP MOVING
JSON:
{
"language": "en",
"orientation": "Up",
"regions": [
{
"boundingBox": "41,77,918,440",
"lines": [
{
"boundingBox": "41,77,723,89",
"words": [
{
"boundingBox": "41,102,225,64",
"text": "LIFE"
},
{
"boundingBox": "356,89,94,62",
"text": "IS"
},
{
"boundingBox": "539,77,225,64",
"text": "LIKE"
}
. . .
Good At:
• Scanned Documents
• Photos with Text
• Fine Grained Location
Information
Need to Improve
• Vehicle License Plate
• Hand-written Text
• Characters with Large
Sizes
Smart Thumbnail
Smart Cropping OffSmart Cropping On
Face Detection
Detect faces and their attributes within an image
Face Verification
Check if two faces belong to the same person
Similar Face Searching
Find similar faces within a set of images
Face APIs
Face Grouping
Organize many faces into groups
Face Identification
Search which person a face belongs to
Face APIs
Detection
"faceRectangle": {"width": 193, "height": 193, "left": 326, "top": 204}
…
Feature Attributes
"attributes": { "age": 42, "gender": "male",
"headPose": { "roll": "8.2", "yaw": "-37.8", "pitch": "0.0" }}
Identification
Jasper Williams
Grouping
Recognize Emotions
Detect emotions based on facial expressions
Emotion APIs
Emotion APIs
Face Detection
"faceRectangle": {"width": 193, "height": 193, "left": 326, "top": 204}
…
Emotion Scores
“scores": { "anger": 5.182241e-8,
"contempt": 0.0000242813,
"disgust": 5.621025e-7,
"fear": 0.00115027453,
"happiness": 1.06114619e-8,
"neutral": 0.003540177,
"sadness": 9.30888746e-7,
"surprise": 0.9952837}
Video APIs
Stabilization
Smooth and stabilize shaky video
Face Detection and Tracking
Detect and track faces in videos
Motion Detection
Detect when motion occurs
Stabilization
The Stabilization API provides automatic video stabilization and smoothing for shaky videos.
This API uses many of the same technologies found in Microsoft Hyperlapse.
Best For:
Small camera motions, with or without rolling shutter effects (e.g. holding a static camera, walking
with a slow speed).
Face Detection and Tracking
High precision face location detection and tracking.
Can detect up to 64 human faces in a video (no smaller than 24x24 pixels)
Detected and tracked faces are returned with coordinates and a Face ID to track throughout the
video.
Time (sec) Face ID x, y Width, Height
0 0 0.59, 0.23 0.09, 0.16
0 1 0.38, 0.15 0.07, 0.12
1 0 0.54, 0.25 0.09, 0.15
1 1 0.23, 0.18 0.07, 0.12
Motion Detection
Indicates when motion occurs against a fixed background (e.g. surveillance video)
Trained to reduce false alarms, such as lighting and shadow changes.
Current limitations:
• No support for night-vision videos
• Semi-transparent and small objects are not detected well
Start Time End Time In Region
1.9 3.6 0
5.2 15.1 0
Speech APIs
Voice Recognition (Speech to Text)
Converts spoken audio to text
Voice Output (Text to Speech)
Synthesize audio from text
Speaker ID & Diarisation
Coming soon
Voice Recognition
Duration of Audio < 15 seconds < 2 minutes
Final Result n-best choice Best Choice, delivered at sentence pauses
Partial Results Yes Yes
Voice Recognition
Short Form Long Form
Synthesize audio from text via POST request
Maximum audio return of 15 seconds
17 languages supported
Voice Output
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="http://www.w3.org/2001/mstts"
xml:lang="en-US">
<voice name="Microsoft Server Speech Text to Speech
Voice (en-US, ZiraRUS)">
Synthesize audio from text, to speak to your users.
</voice></speak>
Speaker Verification
Check if two voices are the same
Speaker Identification
Identify who is speaking
Speaker Recognition
APIs
Speaker Recognition APIs
Enrollment
Create a unique voiceprint for a profile
Recognition
After enrolling one or more voices, identify who is speaking
from an audio clip
Verification
Confirm if a voice belongs to a previously enrolled profile
Is this Anna’s voice?
Anna
AnnaMike
Marry
Who’s voice is this?
CRIS
Customize both language and acoustic
models
Tailor speech recognition to your app &
environment
Create custom language models for the vocabulary of the
application
Adapt acoustic models to better match the expected
environment of the application’s users
Deploy to a custom endpoint and access from any device
Custom Recognition Intelligent Service
State-of-the-art cloud based spelling algorithms
Recognizes a wide variety of spelling errors
Spell Check APIs
Recognize name errors and homonyms in context
Difficult to spot errors that use the context of the words around
them
Updates over time
Support for new brands and coined expressions as they
emerge
Spell Check APIs
Check a single word or a whole sentence
“Our engineers developed this four you!”
Corrected Text: “four”  “for”
Identify errors and get suggestions
"spellingErrors": [
{
"offset": 5,
"token": "gona",
"type": "UnknownToken",
"suggestions": [
{ "token": "gonna" }
] }
LUIS
Understand what your users are saying
Use pre-built Bing & Cortana models or
create your own
Reduce labeling effort with interactive featuring
Use visualizations to gauge performance and improvements
Leverage Speech recognition with seamless integration
Deploy using just a few examples with active learning
Language Understanding Intelligent Service
{
“entities”: [
{
“entity”: “flight_delays”,
“type”: “Topic”
}
],
“intents”: [
{
“intent”: “FindNews”,
“score”: 0.99853384
},
{
“intent”: “None”,
“score”: 0.07289317
},
{
“intent”: “ReadNews”,
“score”: 0.0167122427
},
{
“intent”: “ShareNews”,
“score”: 1.0919299E-06
}
]
}
Language Understanding Models
oxfordSignUp
https://social.msdn.microsoft.com/forums/azure/en-
US/home?forum=mlapi
http://www.projectoxford.ai/doc​
https://github.com/Microsoft/ProjectOxford-ClientSDK
https://github.com/matvelloso/twinsornot (the story)
Intelligent Services Made Easy
What is next?
Q&A?
Email me any follow-up questions: xdh@microsoft.com

Xuedong Huang - Deep Learning and Intelligent Applications

  • 1.
    Deep Learning andIntelligent Applications Dr Xuedong Huang Distinguished Engineer & Head of Advanced Technology Group Microsoft Technology and Research xdh@microsoft.com
  • 4.
    What drives speechtechnology progress? Stage Oxygen Our computing infrastructure including GPU is a stage for performers Performers Big usage data is oxygen to beautify our performance Deep learning is changing everything as our top performer
  • 5.
    2015 System Human ErrorRate 4% Speech recognition could reach human parity in the next 3 years
  • 6.
    5/25/2016 Dong Yu andXuedong Huang: Microsoft Computational Network Toolkit 6
  • 7.
    ImageNet: Microsoft 2015ResNet 28.2 25.8 16.4 11.7 7.3 6.7 3.5 ILSVRC 2010 NEC America ILSVRC 2011 Xerox ILSVRC 2012 AlexNet ILSVRC 2013 Clarifi ILSVRC 2014 VGG ILSVRC 2014 GoogleNet ILSVRC 2015 ResNet ImageNet Classification top-5 error (%) Microsoft had all 5 entries being the 1-st places this year: ImageNet classification, ImageNet localization, ImageNet detection, COCO detection, and COCO segmentation
  • 9.
  • 10.
    Design Goal ofCNTK • A deep learning tool that balances • Efficiency: Can train production systems as fast as possible • Performance: Can achieve state-of-the-art performance on benchmark tasks and production systems • Flexibility: Can support various tasks such as speech, image, and text, and can try out new ideas quickly • Inspiration: Legos • Each brick is very simple and performs a specific function • Create arbitrary objects by combining many bricks • CNTK enables the creation of existing and novel models by combining simple functions in arbitrary ways. 5/25/2016 10
  • 11.
    Functionality • Supports • CPUand GPU with a focus on GPU Cluster • Windows and Linux • automatic numerical differentiation • Efficient static and recurrent network training through batching • data parallelization within and across machines with 1-bit quantized SGD • memory sharing during execution planning • Modularized: separation of • computational networks • execution engine • learning algorithms • model description • data readers • Models can be described and modified with • C++ code • Network definition language (NDL) and model editing language (MEL) • Brain Script (beta) • Python and C# (planned) 5/25/2016 11
  • 12.
    Architecture 12 CNICNBuilderCN Description Use Build ILearnerIDataReaderFeatures& Labels Load Get data IExecutionEngine CPU/GPU Task-specific reader SGD, AdaGrad, etc. Evaluate Compute Gradient
  • 13.
    At the Heart:Computational Networks • A generalization of machine learning models that can be described as a series of computational steps. • E.g., DNN, CNN, RNN, LSTM, DSSM, Log-linear model • Representation: • A list of computational nodes denoted as n = {node name : operation name} • The parent-children relationship describing the operands {n : c1, · · · , cKn } • Kn is the number of children of node n. For leaf nodes Kn = 0. • Order of the children matters: e.g., XY is different from YX • Given the inputs (operands) the value of the node can be computed. • Can flexibly describe deep learning models. • Adopted by many other popular tools as well 5/25/2016 13
  • 14.
    CNTK Summary • CNTKis a powerful tool that supports CPU/GPU and runs under Windows/Linux • CNTK is extensible with the low-coupling modular design: adding new readers and new computation nodes is easy with a new reader design • Network definition language, macros, and model editing language (as well as Brain Script and Python binding in the future) makes network design and modification easy • Compared to other tools CNTK has a great balance between efficiency, performance, and flexibility 5/25/2016 14
  • 15.
    Theano only supports1 GPU We report 8 GPUs (2 machines) for CNTK only as it is the only public toolkit that can scale beyond a single machine. Our system can scale beyond 8 GPUs across multiple machines with superior distributed system performance. 0 10000 20000 30000 40000 50000 60000 70000 80000 CNTK Theano TensorFlow Torch 7 Caffe Speed Comparison (Frames/Second, The Higher the Better) 1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs) 5/25/2016 15 CNTK Computational Performance
  • 17.
    A portfolio ofAPIs, SDKs and apps that enable developers to easily add intelligent services, such as vision or speech capabilities, to their solutions Project Oxford – Adding “smart” to your applications
  • 18.
    Understand the dataaround your application
  • 21.
  • 22.
    Analyze an Image Understandcontent within an image OCR Detect and recognize words within an image Generate Thumbnail Scale and crop images, while retaining key content Computer Vision APIs
  • 23.
    Analyze Image Type ofImage: Clip Art Type 0 Non-clipart Line Drawing Type 0 Non-Line Drawing Black & White Image False Content of Image: Categories [{ “name”: “people_swimming”, “score”: 0.099609375 }] Adult Content False Adult Score 0.18533889949321747 Faces [{ “age”: 27, “gender”: “Male”, “faceRectangle”: {“left”: 472, “top”: 258, “width”: 199, “height”: 199}}] Image Colors: Dominant Color Background White Dominant Color Foreground Grey Dominant Colors White Accent Color
  • 24.
    OCR LIFE IS LIKE RIDINGA BICYCLE TO KEEP YOUR BALANCE YOU MUST KEEP MOVING JSON: { "language": "en", "orientation": "Up", "regions": [ { "boundingBox": "41,77,918,440", "lines": [ { "boundingBox": "41,77,723,89", "words": [ { "boundingBox": "41,102,225,64", "text": "LIFE" }, { "boundingBox": "356,89,94,62", "text": "IS" }, { "boundingBox": "539,77,225,64", "text": "LIKE" } . . . Good At: • Scanned Documents • Photos with Text • Fine Grained Location Information Need to Improve • Vehicle License Plate • Hand-written Text • Characters with Large Sizes
  • 25.
    Smart Thumbnail Smart CroppingOffSmart Cropping On
  • 26.
    Face Detection Detect facesand their attributes within an image Face Verification Check if two faces belong to the same person Similar Face Searching Find similar faces within a set of images Face APIs Face Grouping Organize many faces into groups Face Identification Search which person a face belongs to
  • 27.
    Face APIs Detection "faceRectangle": {"width":193, "height": 193, "left": 326, "top": 204} … Feature Attributes "attributes": { "age": 42, "gender": "male", "headPose": { "roll": "8.2", "yaw": "-37.8", "pitch": "0.0" }} Identification Jasper Williams Grouping
  • 28.
    Recognize Emotions Detect emotionsbased on facial expressions Emotion APIs
  • 29.
    Emotion APIs Face Detection "faceRectangle":{"width": 193, "height": 193, "left": 326, "top": 204} … Emotion Scores “scores": { "anger": 5.182241e-8, "contempt": 0.0000242813, "disgust": 5.621025e-7, "fear": 0.00115027453, "happiness": 1.06114619e-8, "neutral": 0.003540177, "sadness": 9.30888746e-7, "surprise": 0.9952837}
  • 30.
    Video APIs Stabilization Smooth andstabilize shaky video Face Detection and Tracking Detect and track faces in videos Motion Detection Detect when motion occurs
  • 31.
    Stabilization The Stabilization APIprovides automatic video stabilization and smoothing for shaky videos. This API uses many of the same technologies found in Microsoft Hyperlapse. Best For: Small camera motions, with or without rolling shutter effects (e.g. holding a static camera, walking with a slow speed).
  • 32.
    Face Detection andTracking High precision face location detection and tracking. Can detect up to 64 human faces in a video (no smaller than 24x24 pixels) Detected and tracked faces are returned with coordinates and a Face ID to track throughout the video. Time (sec) Face ID x, y Width, Height 0 0 0.59, 0.23 0.09, 0.16 0 1 0.38, 0.15 0.07, 0.12 1 0 0.54, 0.25 0.09, 0.15 1 1 0.23, 0.18 0.07, 0.12
  • 33.
    Motion Detection Indicates whenmotion occurs against a fixed background (e.g. surveillance video) Trained to reduce false alarms, such as lighting and shadow changes. Current limitations: • No support for night-vision videos • Semi-transparent and small objects are not detected well Start Time End Time In Region 1.9 3.6 0 5.2 15.1 0
  • 34.
    Speech APIs Voice Recognition(Speech to Text) Converts spoken audio to text Voice Output (Text to Speech) Synthesize audio from text Speaker ID & Diarisation Coming soon
  • 35.
  • 36.
    Duration of Audio< 15 seconds < 2 minutes Final Result n-best choice Best Choice, delivered at sentence pauses Partial Results Yes Yes Voice Recognition Short Form Long Form
  • 37.
    Synthesize audio fromtext via POST request Maximum audio return of 15 seconds 17 languages supported Voice Output <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="Microsoft Server Speech Text to Speech Voice (en-US, ZiraRUS)"> Synthesize audio from text, to speak to your users. </voice></speak>
  • 38.
    Speaker Verification Check iftwo voices are the same Speaker Identification Identify who is speaking Speaker Recognition APIs
  • 39.
    Speaker Recognition APIs Enrollment Createa unique voiceprint for a profile Recognition After enrolling one or more voices, identify who is speaking from an audio clip Verification Confirm if a voice belongs to a previously enrolled profile Is this Anna’s voice? Anna AnnaMike Marry Who’s voice is this?
  • 40.
    CRIS Customize both languageand acoustic models Tailor speech recognition to your app & environment
  • 41.
    Create custom languagemodels for the vocabulary of the application Adapt acoustic models to better match the expected environment of the application’s users Deploy to a custom endpoint and access from any device Custom Recognition Intelligent Service
  • 42.
    State-of-the-art cloud basedspelling algorithms Recognizes a wide variety of spelling errors Spell Check APIs Recognize name errors and homonyms in context Difficult to spot errors that use the context of the words around them Updates over time Support for new brands and coined expressions as they emerge
  • 43.
    Spell Check APIs Checka single word or a whole sentence “Our engineers developed this four you!” Corrected Text: “four”  “for” Identify errors and get suggestions "spellingErrors": [ { "offset": 5, "token": "gona", "type": "UnknownToken", "suggestions": [ { "token": "gonna" } ] }
  • 44.
    LUIS Understand what yourusers are saying Use pre-built Bing & Cortana models or create your own
  • 45.
    Reduce labeling effortwith interactive featuring Use visualizations to gauge performance and improvements Leverage Speech recognition with seamless integration Deploy using just a few examples with active learning Language Understanding Intelligent Service
  • 46.
    { “entities”: [ { “entity”: “flight_delays”, “type”:“Topic” } ], “intents”: [ { “intent”: “FindNews”, “score”: 0.99853384 }, { “intent”: “None”, “score”: 0.07289317 }, { “intent”: “ReadNews”, “score”: 0.0167122427 }, { “intent”: “ShareNews”, “score”: 1.0919299E-06 } ] } Language Understanding Models
  • 49.
  • 50.
  • 51.
  • 54.
    Q&A? Email me anyfollow-up questions: xdh@microsoft.com

Editor's Notes

  • #3 Key messages: We believe our research in service of these three ambitions will lead to what we think of as an “invisible revolution”. Where increases in our capabilities are powered by technology that moves further out of sight. The innovation will come from the shift to the cloud…move from having device power right in front of you magnified and moved to the cloud. Invisible processing power, storage, intelligence, etc. The innovation will come from what’s happening inside and around the device versus the object you can see. Capabilities that come from machine learning, powerful algorithms, cloud, intelligence, and so on. The innovation will come from an ecosystem of computing that surrounds you. Pervasive computing which will become so natural, it disappears into the background. When technology becomes more powerful, but less intrusive, it can fit into more parts of our world and solve an even wider range of problems.
  • #5 Complexity factors Each device brings different memory, CPU and power constraints Networks bring latency New languages bring new data sources New scenarios bring new novel acoustics Strategies Re-architect runtime to reduce device footprint and latency Universal models, semi-supervised and unsupervised learning Personalization Modular AMs Crowdsource field collection programs
  • #22 Project Oxford is broken up into three areas of understanding. Each of these areas offer a range of APIs that developers can use within their applications. The first area is Vision – the ability to understand the content within photos and videos. Then, we have Speech – the ability to understand and generate spoken words given a segment of audio And finally, Language – the ability to understand language and the context in which it applies All of these APIs are available for public use today
  • #23 Xiaomin
  • #24 Xiaomin
  • #25 Xiaomin
  • #26 Xiaomin
  • #27 Xiaomin
  • #28 Xiaomin
  • #29 Xiaomin
  • #30 Xiaomin
  • #31 Xiaomin
  • #32 Xiaomin
  • #33 Xiaomin
  • #34 Xiaomin
  • #35 Xiaomin
  • #36 Xiaomin
  • #37 Xiaomin
  • #38 Xiaomin
  • #39 Xiaomin
  • #40 Xiaomin
  • #41 Xiaomin
  • #42  Pre-built models from bing and Cortana: LUIS provides access to select Cortana models, and lets you include these in your app.  These are deep models that know that (for example) “Ronald Reagan” is a US President, actor, author, and person; and can convert “5:45 PM tomorrow” into a machine-readable form. Interactive featuring to reduce labeling effort: other platforms let developers improve models by providing more labels; LUIS goes further by allowing developers to provide features.  This lets a developer tell LUIS that “car, truck, motorcycle, and SUV” are all types of vehicles.  These classes help LUIS generalize faster, so if it sees “I need insurance for my SUV”, it generalizes to all types of vehicles immediately.  This reduces labeling effort. Powerful visualizations to gauge performance and pinpoint improvements: A suite of visualizations enable developers to gauge the performance of their models, and pinpoint if and where improvements are needed Active learning for continual improvement: once a model is deployed and utterances begin rolling in, active learning prioritizes them so only the utterances which will yield the biggest improvement need to be labeled, again reducing labeling effort. Built on statistical models, not rules
  • #43 Xiaomin
  • #44 Xiaomin
  • #45 Xiaomin
  • #46  Pre-built models from bing and Cortana: LUIS provides access to select Cortana models, and lets you include these in your app.  These are deep models that know that (for example) “Ronald Reagan” is a US President, actor, author, and person; and can convert “5:45 PM tomorrow” into a machine-readable form. Interactive featuring to reduce labeling effort: other platforms let developers improve models by providing more labels; LUIS goes further by allowing developers to provide features.  This lets a developer tell LUIS that “car, truck, motorcycle, and SUV” are all types of vehicles.  These classes help LUIS generalize faster, so if it sees “I need insurance for my SUV”, it generalizes to all types of vehicles immediately.  This reduces labeling effort. Powerful visualizations to gauge performance and pinpoint improvements: A suite of visualizations enable developers to gauge the performance of their models, and pinpoint if and where improvements are needed Active learning for continual improvement: once a model is deployed and utterances begin rolling in, active learning prioritizes them so only the utterances which will yield the biggest improvement need to be labeled, again reducing labeling effort. Built on statistical models, not rules
  • #50 Zhipeng
  • #51 Zhipeng
  • #54 Key messages: We believe our research in service of these three ambitions will lead to what we think of as an “invisible revolution”. Where increases in our capabilities are powered by technology that moves further out of sight. The innovation will come from the shift to the cloud…move from having device power right in front of you magnified and moved to the cloud. Invisible processing power, storage, intelligence, etc. The innovation will come from what’s happening inside and around the device versus the object you can see. Capabilities that come from machine learning, powerful algorithms, cloud, intelligence, and so on. The innovation will come from an ecosystem of computing that surrounds you. Pervasive computing which will become so natural, it disappears into the background. When technology becomes more powerful, but less intrusive, it can fit into more parts of our world and solve an even wider range of problems.