SlideShare a Scribd company logo
1 of 54
Deep Learning and Intelligent
Applications
Dr Xuedong Huang
Distinguished Engineer & Head of Advanced Technology Group
Microsoft Technology and Research
xdh@microsoft.com
What drives speech technology progress?
Stage Oxygen
Our computing
infrastructure
including GPU is a
stage for performers
Performers
Big usage data is
oxygen to beautify
our performance
Deep learning is
changing everything
as our top
performer
2015 System
Human Error Rate 4%
Speech recognition could reach human parity in the next 3 years
5/25/2016
Dong Yu and Xuedong Huang: Microsoft Computational
Network Toolkit
6
ImageNet: Microsoft 2015 ResNet
28.2
25.8
16.4
11.7
7.3 6.7
3.5
ILSVRC 2010
NEC America
ILSVRC 2011
Xerox
ILSVRC 2012
AlexNet
ILSVRC 2013
Clarifi
ILSVRC 2014
VGG
ILSVRC 2014
GoogleNet
ILSVRC 2015
ResNet
ImageNet Classification top-5 error (%)
Microsoft had all 5 entries being the 1-st places this year: ImageNet classification,
ImageNet localization, ImageNet detection, COCO detection, and COCO
segmentation
Nvidia CEO's View
http://blogs.nvidia.com/blog/2016/01/12/accelerating-ai-artificial-intelligence-gpus/
Design Goal of CNTK
• A deep learning tool that balances
• Efficiency: Can train production systems as fast as possible
• Performance: Can achieve state-of-the-art performance on benchmark tasks
and production systems
• Flexibility: Can support various tasks such as speech, image, and text, and can
try out new ideas quickly
• Inspiration: Legos
• Each brick is very simple and performs a specific function
• Create arbitrary objects by combining many bricks
• CNTK enables the creation of existing and novel models by combining
simple functions in arbitrary ways.
5/25/2016 10
Functionality
• Supports
• CPU and GPU with a focus on GPU Cluster
• Windows and Linux
• automatic numerical differentiation
• Efficient static and recurrent network training through batching
• data parallelization within and across machines with 1-bit quantized SGD
• memory sharing during execution planning
• Modularized: separation of
• computational networks
• execution engine
• learning algorithms
• model description
• data readers
• Models can be described and modified with
• C++ code
• Network definition language (NDL) and model editing language (MEL)
• Brain Script (beta)
• Python and C# (planned)
5/25/2016 11
Architecture
12
CNICNBuilderCN
Description Use Build
ILearnerIDataReaderFeatures &
Labels Load Get data
IExecutionEngine
CPU/GPU
Task-specific reader SGD, AdaGrad, etc.
Evaluate
Compute Gradient
At the Heart: Computational Networks
• A generalization of machine learning models that can be described as a
series of computational steps.
• E.g., DNN, CNN, RNN, LSTM, DSSM, Log-linear model
• Representation:
• A list of computational nodes denoted as
n = {node name : operation name}
• The parent-children relationship describing the operands
{n : c1, · · · , cKn }
• Kn is the number of children of node n. For leaf nodes Kn = 0.
• Order of the children matters: e.g., XY is different from YX
• Given the inputs (operands) the value of the node can be computed.
• Can flexibly describe deep learning models.
• Adopted by many other popular tools as well
5/25/2016 13
CNTK Summary
• CNTK is a powerful tool that supports CPU/GPU and runs under
Windows/Linux
• CNTK is extensible with the low-coupling modular design: adding new
readers and new computation nodes is easy with a new reader design
• Network definition language, macros, and model editing language (as
well as Brain Script and Python binding in the future) makes network
design and modification easy
• Compared to other tools CNTK has a great balance between
efficiency, performance, and flexibility
5/25/2016 14
Theano only supports 1 GPU
We report 8 GPUs (2 machines) for CNTK only as it is the only
public toolkit that can scale beyond a single machine. Our system
can scale beyond 8 GPUs across multiple machines with superior
distributed system performance.
0
10000
20000
30000
40000
50000
60000
70000
80000
CNTK Theano TensorFlow Torch 7 Caffe
Speed Comparison (Frames/Second, The Higher the Better)
1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs)
5/25/2016 15
CNTK Computational Performance
A portfolio of APIs, SDKs and apps that enable developers to easily add intelligent
services, such as vision or speech capabilities, to their solutions
Project Oxford – Adding “smart” to your applications
Understand the data around
your application
PROJECT OXFORD
Analyze an Image
Understand content within an image
OCR
Detect and recognize words within an image
Generate Thumbnail
Scale and crop images, while retaining key content
Computer Vision APIs
Analyze Image
Type of Image:
Clip Art Type 0 Non-clipart
Line Drawing Type 0 Non-Line Drawing
Black & White Image False
Content of Image:
Categories [{ “name”: “people_swimming”, “score”: 0.099609375 }]
Adult Content False
Adult Score 0.18533889949321747
Faces [{ “age”: 27, “gender”: “Male”, “faceRectangle”:
{“left”: 472, “top”: 258, “width”: 199, “height”: 199}}]
Image Colors:
Dominant Color Background White
Dominant Color Foreground Grey
Dominant Colors White
Accent Color
OCR
LIFE IS LIKE
RIDING A BICYCLE
TO KEEP YOUR BALANCE
YOU MUST KEEP MOVING
JSON:
{
"language": "en",
"orientation": "Up",
"regions": [
{
"boundingBox": "41,77,918,440",
"lines": [
{
"boundingBox": "41,77,723,89",
"words": [
{
"boundingBox": "41,102,225,64",
"text": "LIFE"
},
{
"boundingBox": "356,89,94,62",
"text": "IS"
},
{
"boundingBox": "539,77,225,64",
"text": "LIKE"
}
. . .
Good At:
• Scanned Documents
• Photos with Text
• Fine Grained Location
Information
Need to Improve
• Vehicle License Plate
• Hand-written Text
• Characters with Large
Sizes
Smart Thumbnail
Smart Cropping OffSmart Cropping On
Face Detection
Detect faces and their attributes within an image
Face Verification
Check if two faces belong to the same person
Similar Face Searching
Find similar faces within a set of images
Face APIs
Face Grouping
Organize many faces into groups
Face Identification
Search which person a face belongs to
Face APIs
Detection
"faceRectangle": {"width": 193, "height": 193, "left": 326, "top": 204}
…
Feature Attributes
"attributes": { "age": 42, "gender": "male",
"headPose": { "roll": "8.2", "yaw": "-37.8", "pitch": "0.0" }}
Identification
Jasper Williams
Grouping
Recognize Emotions
Detect emotions based on facial expressions
Emotion APIs
Emotion APIs
Face Detection
"faceRectangle": {"width": 193, "height": 193, "left": 326, "top": 204}
…
Emotion Scores
“scores": { "anger": 5.182241e-8,
"contempt": 0.0000242813,
"disgust": 5.621025e-7,
"fear": 0.00115027453,
"happiness": 1.06114619e-8,
"neutral": 0.003540177,
"sadness": 9.30888746e-7,
"surprise": 0.9952837}
Video APIs
Stabilization
Smooth and stabilize shaky video
Face Detection and Tracking
Detect and track faces in videos
Motion Detection
Detect when motion occurs
Stabilization
The Stabilization API provides automatic video stabilization and smoothing for shaky videos.
This API uses many of the same technologies found in Microsoft Hyperlapse.
Best For:
Small camera motions, with or without rolling shutter effects (e.g. holding a static camera, walking
with a slow speed).
Face Detection and Tracking
High precision face location detection and tracking.
Can detect up to 64 human faces in a video (no smaller than 24x24 pixels)
Detected and tracked faces are returned with coordinates and a Face ID to track throughout the
video.
Time (sec) Face ID x, y Width, Height
0 0 0.59, 0.23 0.09, 0.16
0 1 0.38, 0.15 0.07, 0.12
1 0 0.54, 0.25 0.09, 0.15
1 1 0.23, 0.18 0.07, 0.12
Motion Detection
Indicates when motion occurs against a fixed background (e.g. surveillance video)
Trained to reduce false alarms, such as lighting and shadow changes.
Current limitations:
• No support for night-vision videos
• Semi-transparent and small objects are not detected well
Start Time End Time In Region
1.9 3.6 0
5.2 15.1 0
Speech APIs
Voice Recognition (Speech to Text)
Converts spoken audio to text
Voice Output (Text to Speech)
Synthesize audio from text
Speaker ID & Diarisation
Coming soon
Voice Recognition
Duration of Audio < 15 seconds < 2 minutes
Final Result n-best choice Best Choice, delivered at sentence pauses
Partial Results Yes Yes
Voice Recognition
Short Form Long Form
Synthesize audio from text via POST request
Maximum audio return of 15 seconds
17 languages supported
Voice Output
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="http://www.w3.org/2001/mstts"
xml:lang="en-US">
<voice name="Microsoft Server Speech Text to Speech
Voice (en-US, ZiraRUS)">
Synthesize audio from text, to speak to your users.
</voice></speak>
Speaker Verification
Check if two voices are the same
Speaker Identification
Identify who is speaking
Speaker Recognition
APIs
Speaker Recognition APIs
Enrollment
Create a unique voiceprint for a profile
Recognition
After enrolling one or more voices, identify who is speaking
from an audio clip
Verification
Confirm if a voice belongs to a previously enrolled profile
Is this Anna’s voice?
Anna
AnnaMike
Marry
Who’s voice is this?
CRIS
Customize both language and acoustic
models
Tailor speech recognition to your app &
environment
Create custom language models for the vocabulary of the
application
Adapt acoustic models to better match the expected
environment of the application’s users
Deploy to a custom endpoint and access from any device
Custom Recognition Intelligent Service
State-of-the-art cloud based spelling algorithms
Recognizes a wide variety of spelling errors
Spell Check APIs
Recognize name errors and homonyms in context
Difficult to spot errors that use the context of the words around
them
Updates over time
Support for new brands and coined expressions as they
emerge
Spell Check APIs
Check a single word or a whole sentence
“Our engineers developed this four you!”
Corrected Text: “four”  “for”
Identify errors and get suggestions
"spellingErrors": [
{
"offset": 5,
"token": "gona",
"type": "UnknownToken",
"suggestions": [
{ "token": "gonna" }
] }
LUIS
Understand what your users are saying
Use pre-built Bing & Cortana models or
create your own
Reduce labeling effort with interactive featuring
Use visualizations to gauge performance and improvements
Leverage Speech recognition with seamless integration
Deploy using just a few examples with active learning
Language Understanding Intelligent Service
{
“entities”: [
{
“entity”: “flight_delays”,
“type”: “Topic”
}
],
“intents”: [
{
“intent”: “FindNews”,
“score”: 0.99853384
},
{
“intent”: “None”,
“score”: 0.07289317
},
{
“intent”: “ReadNews”,
“score”: 0.0167122427
},
{
“intent”: “ShareNews”,
“score”: 1.0919299E-06
}
]
}
Language Understanding Models
oxfordSignUp
https://social.msdn.microsoft.com/forums/azure/en-
US/home?forum=mlapi
http://www.projectoxford.ai/doc​
https://github.com/Microsoft/ProjectOxford-ClientSDK
https://github.com/matvelloso/twinsornot (the story)
Intelligent Services Made Easy
What is next?
Q&A?
Email me any follow-up questions: xdh@microsoft.com

More Related Content

What's hot

Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)AI Frontiers
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...AI Frontiers
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
Mentoring Session with Innovesia: Advance Robotics
Mentoring Session with Innovesia: Advance RoboticsMentoring Session with Innovesia: Advance Robotics
Mentoring Session with Innovesia: Advance RoboticsDony Riyanto
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事台灣資料科學年會
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaAI Frontiers
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
 
Start a deep learning startup - tutorial
Start a deep learning startup - tutorialStart a deep learning startup - tutorial
Start a deep learning startup - tutorialMostapha Benhenda
 
KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python TutorialMahmutKAMALAK
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkDESMOND YUEN
 
ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsAnuj Gupta
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLPBill Liu
 

What's hot (20)

Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Mentoring Session with Innovesia: Advance Robotics
Mentoring Session with Innovesia: Advance RoboticsMentoring Session with Innovesia: Advance Robotics
Mentoring Session with Innovesia: Advance Robotics
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in Alexa
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
 
130704798265658191
130704798265658191130704798265658191
130704798265658191
 
Start a deep learning startup - tutorial
Start a deep learning startup - tutorialStart a deep learning startup - tutorial
Start a deep learning startup - tutorial
 
KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python Tutorial
 
Extracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic IpsicExtracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic Ipsic
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systems
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 

Similar to Xuedong Huang - Deep Learning and Intelligent Applications

Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecurityTao Xie
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform OverviewDavid Chou
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceAlex Danvy
 
Password management
Password managementPassword management
Password managementSai Kumar
 
Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondJames Huang
 
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Lviv Startup Club
 
FraserHacks Microsoft Cognitive Services and Hololens
FraserHacks Microsoft Cognitive Services and HololensFraserHacks Microsoft Cognitive Services and Hololens
FraserHacks Microsoft Cognitive Services and HololensBruno Capuano
 
AI and machine learning
AI and machine learningAI and machine learning
AI and machine learningITU
 
MLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf
 
Ssas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiSsas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiKoray Kocabas
 
Smart Web Apps with Azure and AI as a Service
Smart Web Apps with Azure and AI as a ServiceSmart Web Apps with Azure and AI as a Service
Smart Web Apps with Azure and AI as a ServiceIvo Andreev
 
Microsoft power platform
Microsoft power platformMicrosoft power platform
Microsoft power platformJenkins NS
 
Watson on Bluemix
Watson on BluemixWatson on Bluemix
Watson on BluemixIBM
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIndrajit Poddar
 
Ganesh k v my resume
Ganesh k v my resumeGanesh k v my resume
Ganesh k v my resumeGanesh KV
 

Similar to Xuedong Huang - Deep Learning and Intelligent Applications (20)

Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform Overview
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
 
Password management
Password managementPassword management
Password management
 
Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and Beyond
 
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
 
FraserHacks Microsoft Cognitive Services and Hololens
FraserHacks Microsoft Cognitive Services and HololensFraserHacks Microsoft Cognitive Services and Hololens
FraserHacks Microsoft Cognitive Services and Hololens
 
AI and machine learning
AI and machine learningAI and machine learning
AI and machine learning
 
MLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf NYC Ted Willke
MLconf NYC Ted Willke
 
Build 2019 Recap
Build 2019 RecapBuild 2019 Recap
Build 2019 Recap
 
Ai/ML services
Ai/ML servicesAi/ML services
Ai/ML services
 
Ssas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiSsas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesi
 
Smart Web Apps with Azure and AI as a Service
Smart Web Apps with Azure and AI as a ServiceSmart Web Apps with Azure and AI as a Service
Smart Web Apps with Azure and AI as a Service
 
Android Deep Dive
Android Deep DiveAndroid Deep Dive
Android Deep Dive
 
Microsoft power platform
Microsoft power platformMicrosoft power platform
Microsoft power platform
 
Watson on bluemix
Watson on bluemixWatson on bluemix
Watson on bluemix
 
Watson on Bluemix
Watson on BluemixWatson on Bluemix
Watson on Bluemix
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI Platform
 
Ganesh k v my resume
Ganesh k v my resumeGanesh k v my resume
Ganesh k v my resume
 

More from Machine Learning Prague

Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksMachine Learning Prague
 
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.czTomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.czMachine Learning Prague
 
Michael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at YandexMichael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at YandexMachine Learning Prague
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Machine Learning Prague
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Kateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment AnalysisKateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment AnalysisMachine Learning Prague
 
Jiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative WritingJiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative WritingMachine Learning Prague
 
Jan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal AssistantsJan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal AssistantsMachine Learning Prague
 
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and MethodologyMarek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and MethodologyMachine Learning Prague
 

More from Machine Learning Prague (13)

Vít Listík - Email.cz workshop
Vít Listík - Email.cz workshopVít Listík - Email.cz workshop
Vít Listík - Email.cz workshop
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural Networks
 
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.czTomáš Cícha - Machine Learning Solutions at Seznam.cz
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
 
Jan Pospíšil - Azure ML
Jan Pospíšil - Azure MLJan Pospíšil - Azure ML
Jan Pospíšil - Azure ML
 
Michael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at YandexMichael Levin - MatrixNet Applications at Yandex
Michael Levin - MatrixNet Applications at Yandex
 
Libor Mořkovský - Recognizing Malware
Libor Mořkovský - Recognizing MalwareLibor Mořkovský - Recognizing Malware
Libor Mořkovský - Recognizing Malware
 
Adam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the OddballsAdam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the Oddballs
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Kateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment AnalysisKateřina Veselovská - ML Approaches to Sentiment Analysis
Kateřina Veselovská - ML Approaches to Sentiment Analysis
 
Jiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative WritingJiří Materna - Artificial Intelligence in Creative Writing
Jiří Materna - Artificial Intelligence in Creative Writing
 
Jan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal AssistantsJan Šedivý - Intelligent Personal Assistants
Jan Šedivý - Intelligent Personal Assistants
 
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and MethodologyMarek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Xuedong Huang - Deep Learning and Intelligent Applications

  • 1. Deep Learning and Intelligent Applications Dr Xuedong Huang Distinguished Engineer & Head of Advanced Technology Group Microsoft Technology and Research xdh@microsoft.com
  • 2.
  • 3.
  • 4. What drives speech technology progress? Stage Oxygen Our computing infrastructure including GPU is a stage for performers Performers Big usage data is oxygen to beautify our performance Deep learning is changing everything as our top performer
  • 5. 2015 System Human Error Rate 4% Speech recognition could reach human parity in the next 3 years
  • 6. 5/25/2016 Dong Yu and Xuedong Huang: Microsoft Computational Network Toolkit 6
  • 7. ImageNet: Microsoft 2015 ResNet 28.2 25.8 16.4 11.7 7.3 6.7 3.5 ILSVRC 2010 NEC America ILSVRC 2011 Xerox ILSVRC 2012 AlexNet ILSVRC 2013 Clarifi ILSVRC 2014 VGG ILSVRC 2014 GoogleNet ILSVRC 2015 ResNet ImageNet Classification top-5 error (%) Microsoft had all 5 entries being the 1-st places this year: ImageNet classification, ImageNet localization, ImageNet detection, COCO detection, and COCO segmentation
  • 8.
  • 10. Design Goal of CNTK • A deep learning tool that balances • Efficiency: Can train production systems as fast as possible • Performance: Can achieve state-of-the-art performance on benchmark tasks and production systems • Flexibility: Can support various tasks such as speech, image, and text, and can try out new ideas quickly • Inspiration: Legos • Each brick is very simple and performs a specific function • Create arbitrary objects by combining many bricks • CNTK enables the creation of existing and novel models by combining simple functions in arbitrary ways. 5/25/2016 10
  • 11. Functionality • Supports • CPU and GPU with a focus on GPU Cluster • Windows and Linux • automatic numerical differentiation • Efficient static and recurrent network training through batching • data parallelization within and across machines with 1-bit quantized SGD • memory sharing during execution planning • Modularized: separation of • computational networks • execution engine • learning algorithms • model description • data readers • Models can be described and modified with • C++ code • Network definition language (NDL) and model editing language (MEL) • Brain Script (beta) • Python and C# (planned) 5/25/2016 11
  • 12. Architecture 12 CNICNBuilderCN Description Use Build ILearnerIDataReaderFeatures & Labels Load Get data IExecutionEngine CPU/GPU Task-specific reader SGD, AdaGrad, etc. Evaluate Compute Gradient
  • 13. At the Heart: Computational Networks • A generalization of machine learning models that can be described as a series of computational steps. • E.g., DNN, CNN, RNN, LSTM, DSSM, Log-linear model • Representation: • A list of computational nodes denoted as n = {node name : operation name} • The parent-children relationship describing the operands {n : c1, · · · , cKn } • Kn is the number of children of node n. For leaf nodes Kn = 0. • Order of the children matters: e.g., XY is different from YX • Given the inputs (operands) the value of the node can be computed. • Can flexibly describe deep learning models. • Adopted by many other popular tools as well 5/25/2016 13
  • 14. CNTK Summary • CNTK is a powerful tool that supports CPU/GPU and runs under Windows/Linux • CNTK is extensible with the low-coupling modular design: adding new readers and new computation nodes is easy with a new reader design • Network definition language, macros, and model editing language (as well as Brain Script and Python binding in the future) makes network design and modification easy • Compared to other tools CNTK has a great balance between efficiency, performance, and flexibility 5/25/2016 14
  • 15. Theano only supports 1 GPU We report 8 GPUs (2 machines) for CNTK only as it is the only public toolkit that can scale beyond a single machine. Our system can scale beyond 8 GPUs across multiple machines with superior distributed system performance. 0 10000 20000 30000 40000 50000 60000 70000 80000 CNTK Theano TensorFlow Torch 7 Caffe Speed Comparison (Frames/Second, The Higher the Better) 1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs) 5/25/2016 15 CNTK Computational Performance
  • 16.
  • 17. A portfolio of APIs, SDKs and apps that enable developers to easily add intelligent services, such as vision or speech capabilities, to their solutions Project Oxford – Adding “smart” to your applications
  • 18. Understand the data around your application
  • 19.
  • 20.
  • 22. Analyze an Image Understand content within an image OCR Detect and recognize words within an image Generate Thumbnail Scale and crop images, while retaining key content Computer Vision APIs
  • 23. Analyze Image Type of Image: Clip Art Type 0 Non-clipart Line Drawing Type 0 Non-Line Drawing Black & White Image False Content of Image: Categories [{ “name”: “people_swimming”, “score”: 0.099609375 }] Adult Content False Adult Score 0.18533889949321747 Faces [{ “age”: 27, “gender”: “Male”, “faceRectangle”: {“left”: 472, “top”: 258, “width”: 199, “height”: 199}}] Image Colors: Dominant Color Background White Dominant Color Foreground Grey Dominant Colors White Accent Color
  • 24. OCR LIFE IS LIKE RIDING A BICYCLE TO KEEP YOUR BALANCE YOU MUST KEEP MOVING JSON: { "language": "en", "orientation": "Up", "regions": [ { "boundingBox": "41,77,918,440", "lines": [ { "boundingBox": "41,77,723,89", "words": [ { "boundingBox": "41,102,225,64", "text": "LIFE" }, { "boundingBox": "356,89,94,62", "text": "IS" }, { "boundingBox": "539,77,225,64", "text": "LIKE" } . . . Good At: • Scanned Documents • Photos with Text • Fine Grained Location Information Need to Improve • Vehicle License Plate • Hand-written Text • Characters with Large Sizes
  • 25. Smart Thumbnail Smart Cropping OffSmart Cropping On
  • 26. Face Detection Detect faces and their attributes within an image Face Verification Check if two faces belong to the same person Similar Face Searching Find similar faces within a set of images Face APIs Face Grouping Organize many faces into groups Face Identification Search which person a face belongs to
  • 27. Face APIs Detection "faceRectangle": {"width": 193, "height": 193, "left": 326, "top": 204} … Feature Attributes "attributes": { "age": 42, "gender": "male", "headPose": { "roll": "8.2", "yaw": "-37.8", "pitch": "0.0" }} Identification Jasper Williams Grouping
  • 28. Recognize Emotions Detect emotions based on facial expressions Emotion APIs
  • 29. Emotion APIs Face Detection "faceRectangle": {"width": 193, "height": 193, "left": 326, "top": 204} … Emotion Scores “scores": { "anger": 5.182241e-8, "contempt": 0.0000242813, "disgust": 5.621025e-7, "fear": 0.00115027453, "happiness": 1.06114619e-8, "neutral": 0.003540177, "sadness": 9.30888746e-7, "surprise": 0.9952837}
  • 30. Video APIs Stabilization Smooth and stabilize shaky video Face Detection and Tracking Detect and track faces in videos Motion Detection Detect when motion occurs
  • 31. Stabilization The Stabilization API provides automatic video stabilization and smoothing for shaky videos. This API uses many of the same technologies found in Microsoft Hyperlapse. Best For: Small camera motions, with or without rolling shutter effects (e.g. holding a static camera, walking with a slow speed).
  • 32. Face Detection and Tracking High precision face location detection and tracking. Can detect up to 64 human faces in a video (no smaller than 24x24 pixels) Detected and tracked faces are returned with coordinates and a Face ID to track throughout the video. Time (sec) Face ID x, y Width, Height 0 0 0.59, 0.23 0.09, 0.16 0 1 0.38, 0.15 0.07, 0.12 1 0 0.54, 0.25 0.09, 0.15 1 1 0.23, 0.18 0.07, 0.12
  • 33. Motion Detection Indicates when motion occurs against a fixed background (e.g. surveillance video) Trained to reduce false alarms, such as lighting and shadow changes. Current limitations: • No support for night-vision videos • Semi-transparent and small objects are not detected well Start Time End Time In Region 1.9 3.6 0 5.2 15.1 0
  • 34. Speech APIs Voice Recognition (Speech to Text) Converts spoken audio to text Voice Output (Text to Speech) Synthesize audio from text Speaker ID & Diarisation Coming soon
  • 36. Duration of Audio < 15 seconds < 2 minutes Final Result n-best choice Best Choice, delivered at sentence pauses Partial Results Yes Yes Voice Recognition Short Form Long Form
  • 37. Synthesize audio from text via POST request Maximum audio return of 15 seconds 17 languages supported Voice Output <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="Microsoft Server Speech Text to Speech Voice (en-US, ZiraRUS)"> Synthesize audio from text, to speak to your users. </voice></speak>
  • 38. Speaker Verification Check if two voices are the same Speaker Identification Identify who is speaking Speaker Recognition APIs
  • 39. Speaker Recognition APIs Enrollment Create a unique voiceprint for a profile Recognition After enrolling one or more voices, identify who is speaking from an audio clip Verification Confirm if a voice belongs to a previously enrolled profile Is this Anna’s voice? Anna AnnaMike Marry Who’s voice is this?
  • 40. CRIS Customize both language and acoustic models Tailor speech recognition to your app & environment
  • 41. Create custom language models for the vocabulary of the application Adapt acoustic models to better match the expected environment of the application’s users Deploy to a custom endpoint and access from any device Custom Recognition Intelligent Service
  • 42. State-of-the-art cloud based spelling algorithms Recognizes a wide variety of spelling errors Spell Check APIs Recognize name errors and homonyms in context Difficult to spot errors that use the context of the words around them Updates over time Support for new brands and coined expressions as they emerge
  • 43. Spell Check APIs Check a single word or a whole sentence “Our engineers developed this four you!” Corrected Text: “four”  “for” Identify errors and get suggestions "spellingErrors": [ { "offset": 5, "token": "gona", "type": "UnknownToken", "suggestions": [ { "token": "gonna" } ] }
  • 44. LUIS Understand what your users are saying Use pre-built Bing & Cortana models or create your own
  • 45. Reduce labeling effort with interactive featuring Use visualizations to gauge performance and improvements Leverage Speech recognition with seamless integration Deploy using just a few examples with active learning Language Understanding Intelligent Service
  • 46. { “entities”: [ { “entity”: “flight_delays”, “type”: “Topic” } ], “intents”: [ { “intent”: “FindNews”, “score”: 0.99853384 }, { “intent”: “None”, “score”: 0.07289317 }, { “intent”: “ReadNews”, “score”: 0.0167122427 }, { “intent”: “ShareNews”, “score”: 1.0919299E-06 } ] } Language Understanding Models
  • 47.
  • 48.
  • 52.
  • 53.
  • 54. Q&A? Email me any follow-up questions: xdh@microsoft.com

Editor's Notes

  1. Key messages: We believe our research in service of these three ambitions will lead to what we think of as an “invisible revolution”. Where increases in our capabilities are powered by technology that moves further out of sight. The innovation will come from the shift to the cloud…move from having device power right in front of you magnified and moved to the cloud. Invisible processing power, storage, intelligence, etc. The innovation will come from what’s happening inside and around the device versus the object you can see. Capabilities that come from machine learning, powerful algorithms, cloud, intelligence, and so on. The innovation will come from an ecosystem of computing that surrounds you. Pervasive computing which will become so natural, it disappears into the background. When technology becomes more powerful, but less intrusive, it can fit into more parts of our world and solve an even wider range of problems.
  2. Complexity factors Each device brings different memory, CPU and power constraints Networks bring latency New languages bring new data sources New scenarios bring new novel acoustics Strategies Re-architect runtime to reduce device footprint and latency Universal models, semi-supervised and unsupervised learning Personalization Modular AMs Crowdsource field collection programs
  3. Project Oxford is broken up into three areas of understanding. Each of these areas offer a range of APIs that developers can use within their applications. The first area is Vision – the ability to understand the content within photos and videos. Then, we have Speech – the ability to understand and generate spoken words given a segment of audio And finally, Language – the ability to understand language and the context in which it applies All of these APIs are available for public use today
  4. Xiaomin
  5. Xiaomin
  6. Xiaomin
  7. Xiaomin
  8. Xiaomin
  9. Xiaomin
  10. Xiaomin
  11. Xiaomin
  12. Xiaomin
  13. Xiaomin
  14. Xiaomin
  15. Xiaomin
  16. Xiaomin
  17. Xiaomin
  18. Xiaomin
  19. Xiaomin
  20. Xiaomin
  21. Xiaomin
  22. Xiaomin
  23.  Pre-built models from bing and Cortana: LUIS provides access to select Cortana models, and lets you include these in your app.  These are deep models that know that (for example) “Ronald Reagan” is a US President, actor, author, and person; and can convert “5:45 PM tomorrow” into a machine-readable form. Interactive featuring to reduce labeling effort: other platforms let developers improve models by providing more labels; LUIS goes further by allowing developers to provide features.  This lets a developer tell LUIS that “car, truck, motorcycle, and SUV” are all types of vehicles.  These classes help LUIS generalize faster, so if it sees “I need insurance for my SUV”, it generalizes to all types of vehicles immediately.  This reduces labeling effort. Powerful visualizations to gauge performance and pinpoint improvements: A suite of visualizations enable developers to gauge the performance of their models, and pinpoint if and where improvements are needed Active learning for continual improvement: once a model is deployed and utterances begin rolling in, active learning prioritizes them so only the utterances which will yield the biggest improvement need to be labeled, again reducing labeling effort. Built on statistical models, not rules
  24. Xiaomin
  25. Xiaomin
  26. Xiaomin
  27.  Pre-built models from bing and Cortana: LUIS provides access to select Cortana models, and lets you include these in your app.  These are deep models that know that (for example) “Ronald Reagan” is a US President, actor, author, and person; and can convert “5:45 PM tomorrow” into a machine-readable form. Interactive featuring to reduce labeling effort: other platforms let developers improve models by providing more labels; LUIS goes further by allowing developers to provide features.  This lets a developer tell LUIS that “car, truck, motorcycle, and SUV” are all types of vehicles.  These classes help LUIS generalize faster, so if it sees “I need insurance for my SUV”, it generalizes to all types of vehicles immediately.  This reduces labeling effort. Powerful visualizations to gauge performance and pinpoint improvements: A suite of visualizations enable developers to gauge the performance of their models, and pinpoint if and where improvements are needed Active learning for continual improvement: once a model is deployed and utterances begin rolling in, active learning prioritizes them so only the utterances which will yield the biggest improvement need to be labeled, again reducing labeling effort. Built on statistical models, not rules
  28. Zhipeng
  29. Zhipeng
  30. Key messages: We believe our research in service of these three ambitions will lead to what we think of as an “invisible revolution”. Where increases in our capabilities are powered by technology that moves further out of sight. The innovation will come from the shift to the cloud…move from having device power right in front of you magnified and moved to the cloud. Invisible processing power, storage, intelligence, etc. The innovation will come from what’s happening inside and around the device versus the object you can see. Capabilities that come from machine learning, powerful algorithms, cloud, intelligence, and so on. The innovation will come from an ecosystem of computing that surrounds you. Pervasive computing which will become so natural, it disappears into the background. When technology becomes more powerful, but less intrusive, it can fit into more parts of our world and solve an even wider range of problems.