Microsoft Build
…to mobile, on-demand,
personal media
From mass media…
M O B I L E & O N - D E M A N D
S N A C K A B L E
P E R S O N A L I Z E D & I N T E R A C T I V E
A N Y T I M E , A N Y W H E R E
U S E R - G E N E R AT E D
• VOD Portals
• Live Events & Replay
• Corporate Video Galleries
• Training/EDU Solutions
• News / Sports / Music
• Marketing / Advertising
http://aka.ms/amse
What we’re
announcing at Build
What’s New in Media Services APIs v3
Simplified development
model
• CLI 2.0 support
• Single API for Media
Services using ARM
• HTTP(s) ingest
support on Jobs
• New Transform
template
New functionality
• New presets for Video
and Audio analysis
• New Live resources
that help simplify the
broadcast workflow
• Role Based Access
Control (RBAC)
New SDKs
.NET Core, Python, Java,
Node.js, Go
Better Integration
across Azure Services
Event Grid, Logic Apps,
Azure Functions
Azure integration & Workflows
IncomingStreamReceived
JobStateChange
EncoderConnected
EncoderDisconnected
ConnectionRejected
TrackDiscontinuityDetected
IncomingDataChunkDropped
IncomingStreamsOutOfSync
IngestHeartbeat
IncomingVideoStreamsOutOfSync
Video AIVideo A I
• Automatically transcribe your video or audio content
• Flag any adult or objectionable content before you
publish to web sites
• Enable true video search – for any spoken word, face,
object, topic, or even a product – across your entire
video archive
• Create captions for your video content – in any
language
• Understand which part of your video were most
popular/interesting to your viewers and make real
recommendations on similar content
• Create automated summaries of video content based
on specific people, topics, or scenes within the vide
2 ways to use Video AI……to create & return data insights
* Work in progress
Linguistic Transcript
Convert speech to text for 10
languages
Face detection
Find when does each face
appears in the video
Video Indexer | Bundling 20 AI Features Together
Contextual Search
Understand the context of
search results
Keywords Extraction
Find out the keywords discussed
in each segment
Noise canceling
Eliminate background noise for call
recordings using Skype filters
In-place Editing
Make manual fixes for errors
detected
Face grouping
Identify multiple appearances of
the same person
Call recording
enhancement
Optimized ingestion for calls
Sentiment Analysis
Compare levels of positive vs
negative spoken or written
moments over the timeline
Spoken Language ID
Detect spoken language to
support multi language content
Annotations
Tag objects such as cat, table,
car, ball etc when they appear
Identification
See name, job and biography of
celebrities and ordinary people
Content Moderation
Detect explicit visuals or text in
audio or overlay
Sub-Clipping
Source video is stored once for
multiple playlists of video
segments
Custom Vocabulary
Fit to Industry, market and
domain specific terms
OCR
Extract text that appears in
video as overlay, slides or
background
Shot Detection
Detect when a shot starts/ends
based on visual analysis
Recommendations
Find more videos with similar
people discussing similar topics
Programmatic APIs
Index / Search API and UI
widgets enable embedding in
other website/apps
</>
Speaker Diarization
Map and understand who spoke
when
Brand detection
Track brand mentions in speech
or on screen overheads
Translation
Translate source to 54 languages
– text or voice
Live Analytics*
Analyze content coming from a
live broadcast source
Emotion sensing*
Detect emotions expressed in
speech, vocal signals and facial
expressions
Logos*
Identify visual logos that appear
on screen
Video Indexer
Media Intelligence & Insights
http://video.ai
Lap around Video Indexer
Video Indexer | Typical Workflow
Portal
InsightsPlayer
APIs
Metadata
Search
Video
Streaming
Indexing
{JSON}
Platform Services
Widgets
Video Indexer under the cover
Editing &
Encoding
Video Indexer JSON insights
Cloud Content Management with Box
82K
Customers
69%
Fortune 500
Box Skills
Enhance files in Box
using machine learning
algorithms
Enhance files in Box with machine learning
Basic object recognition,
categorization and text
extraction for images
Speech-to-text transcription
and basic keyword/topic
identification for audio files
Speech-to-text transcription, basic
keyword/topic identification and
facial recognition for videos
Image Intelligence Audio Intelligence Video Intelligence
Demo
Video Analysis with Azure Functions & Cognitive Services
VideoIntelligence
Skill AzureMediaAnalyticsAPI
RetrieveVideo
Metadata
Users
Usersuploads
videofile
HTTPPOSTcontainingeventpayload
(FILE.UPLOADED)&AccessTokensenttoFunction
invokeURL
VideofileretrievedfromBoxusingshort-termAccess
Token
VideoMetadatawrittenasmetadatatovideofile
objectinBoxusingAccessToken
Image Analysis with Azure Functions & Cognitive Services
ImageIntelligence
Skill
VisionAPI
Retrievethe
labels
Users
Usersuploads
imagefile
HTTPPOSTcontainingeventpayload
(FILE.UPLOADED)&AccessTokensenttoFunction
invokeURL
ImagefileretrievedfromBoxusingshort-termAccess
Token
LabelswrittentoimagefileobjectasmetadatainBox
using AccessToken
Video workflows
Azure Media Services,
Azure Logic App
& Azure Functions
Azure Functions
•
•
•
•
http://aka.ms/amsfunctions
Content Delivery
Network (CDN)
Content Flagged
for follow-up"
Content workflow Quality workflow
n times
Architecture
Web App
“Show me all
programs with
this actor” “actor” Face at (28, 456) “Matt Damon” List of assets
Natural language
Call to Action
http://aka.ms/AzureMediaBuild
Evolve your app’s video experience with Azure: Processing and Video AI at scale
Evolve your app’s video experience with Azure: Processing and Video AI at scale

Evolve your app’s video experience with Azure: Processing and Video AI at scale

  • 1.
  • 3.
    …to mobile, on-demand, personalmedia From mass media… M O B I L E & O N - D E M A N D S N A C K A B L E P E R S O N A L I Z E D & I N T E R A C T I V E A N Y T I M E , A N Y W H E R E U S E R - G E N E R AT E D
  • 4.
    • VOD Portals •Live Events & Replay • Corporate Video Galleries • Training/EDU Solutions • News / Sports / Music • Marketing / Advertising
  • 5.
  • 6.
  • 7.
    What’s New inMedia Services APIs v3 Simplified development model • CLI 2.0 support • Single API for Media Services using ARM • HTTP(s) ingest support on Jobs • New Transform template New functionality • New presets for Video and Audio analysis • New Live resources that help simplify the broadcast workflow • Role Based Access Control (RBAC) New SDKs .NET Core, Python, Java, Node.js, Go Better Integration across Azure Services Event Grid, Logic Apps, Azure Functions
  • 9.
    Azure integration &Workflows IncomingStreamReceived JobStateChange EncoderConnected EncoderDisconnected ConnectionRejected TrackDiscontinuityDetected IncomingDataChunkDropped IncomingStreamsOutOfSync IngestHeartbeat IncomingVideoStreamsOutOfSync
  • 10.
  • 11.
    • Automatically transcribeyour video or audio content • Flag any adult or objectionable content before you publish to web sites • Enable true video search – for any spoken word, face, object, topic, or even a product – across your entire video archive • Create captions for your video content – in any language • Understand which part of your video were most popular/interesting to your viewers and make real recommendations on similar content • Create automated summaries of video content based on specific people, topics, or scenes within the vide
  • 12.
    2 ways touse Video AI……to create & return data insights
  • 13.
    * Work inprogress Linguistic Transcript Convert speech to text for 10 languages Face detection Find when does each face appears in the video Video Indexer | Bundling 20 AI Features Together Contextual Search Understand the context of search results Keywords Extraction Find out the keywords discussed in each segment Noise canceling Eliminate background noise for call recordings using Skype filters In-place Editing Make manual fixes for errors detected Face grouping Identify multiple appearances of the same person Call recording enhancement Optimized ingestion for calls Sentiment Analysis Compare levels of positive vs negative spoken or written moments over the timeline Spoken Language ID Detect spoken language to support multi language content Annotations Tag objects such as cat, table, car, ball etc when they appear Identification See name, job and biography of celebrities and ordinary people Content Moderation Detect explicit visuals or text in audio or overlay Sub-Clipping Source video is stored once for multiple playlists of video segments Custom Vocabulary Fit to Industry, market and domain specific terms OCR Extract text that appears in video as overlay, slides or background Shot Detection Detect when a shot starts/ends based on visual analysis Recommendations Find more videos with similar people discussing similar topics Programmatic APIs Index / Search API and UI widgets enable embedding in other website/apps </> Speaker Diarization Map and understand who spoke when Brand detection Track brand mentions in speech or on screen overheads Translation Translate source to 54 languages – text or voice Live Analytics* Analyze content coming from a live broadcast source Emotion sensing* Detect emotions expressed in speech, vocal signals and facial expressions Logos* Identify visual logos that appear on screen
  • 14.
    Video Indexer Media Intelligence& Insights http://video.ai
  • 15.
  • 18.
    Video Indexer |Typical Workflow
  • 19.
  • 20.
  • 22.
  • 23.
  • 24.
    Box Skills Enhance filesin Box using machine learning algorithms
  • 25.
    Enhance files inBox with machine learning Basic object recognition, categorization and text extraction for images Speech-to-text transcription and basic keyword/topic identification for audio files Speech-to-text transcription, basic keyword/topic identification and facial recognition for videos Image Intelligence Audio Intelligence Video Intelligence
  • 26.
  • 27.
    Video Analysis withAzure Functions & Cognitive Services VideoIntelligence Skill AzureMediaAnalyticsAPI RetrieveVideo Metadata Users Usersuploads videofile HTTPPOSTcontainingeventpayload (FILE.UPLOADED)&AccessTokensenttoFunction invokeURL VideofileretrievedfromBoxusingshort-termAccess Token VideoMetadatawrittenasmetadatatovideofile objectinBoxusingAccessToken
  • 28.
    Image Analysis withAzure Functions & Cognitive Services ImageIntelligence Skill VisionAPI Retrievethe labels Users Usersuploads imagefile HTTPPOSTcontainingeventpayload (FILE.UPLOADED)&AccessTokensenttoFunction invokeURL ImagefileretrievedfromBoxusingshort-termAccess Token LabelswrittentoimagefileobjectasmetadatainBox using AccessToken
  • 30.
  • 31.
    Azure Media Services, AzureLogic App & Azure Functions
  • 32.
  • 33.
  • 34.
  • 36.
    Content workflow Qualityworkflow n times Architecture
  • 37.
  • 38.
    “Show me all programswith this actor” “actor” Face at (28, 456) “Matt Damon” List of assets Natural language
  • 39.