Computer
Vision and
Media
Analytics
Tony Emerson
Managing Director
Worldwide Media and Cable
Creating New Opportunities
for Value-Add Connected TV
Services
1
Our OTT Vision
 Capitalize on existing content rights
 Unlock new niche markets
 Create a better viewing experience
 Create a closer connection to the audience
 Grow the business
The one of the biggest network in the world
• 30 regions is generally available.
• 6 region is coming soon.
https://azure.microsoft.com/ja-jp/regions/
Platform Services
Security &
Management
Infrastructure Services
Web Apps
Mobile
Apps
API
Management
API
Apps
Logic
Apps
Notification
Hubs
HDInsight Machine
Learning
Stream
Analytics
Data
Factory
Event
Hubs
Mobile
Engagement
Active
Directory
Multi-Factor
Authentication
Automation
Portal
Key Vault
Store /
Marketplace
Hybrid
Operations
Backup
StorSimple
Site
Recovery
Import/Export
SQL
Database
DocumentDB
Redis
Cache Search
Tables
SQL Data
Warehouse
Azure AD
Connect Health
AD Privileged
Identity
Management
Operational
Insights
Cloud
Services
Batch Remote App
Service
Fabric Visual Studio
Application
Insights
Azure SDK
Team Project
VM Image Gallery
& VM Depot
Content Delivery
Network (CDN)
Media
Insights
VoD/Live
Transcoding
Azure Media
Player
Multi DRM VoD/Live Channel
Streaming
Rio Olympics most successful in history
Rio 2016 by the Numbers
Microsoft Azure and Partners deliver globally
Plus a growing
ecosystem of value-add
third party partner
components
Live & On Demand
Streaming
with integrated CDN
Content
Protection
Encoding
&
Media Analytics
Cloud Upload
& Storage
Scalable components for building
custom media workflows in the cloud
Azure Media Services
Player
Clients
Wide Adoption
Premium video
on-demand
content,
broadcasts & live
event streaming,
online video
platforms for web
and mobile,
enterprise video
management….
And more!
What will you do to differentiate your service?
How can you increase the value of content?
Can you make it easier to search – for text, faces, logos and images,
specific actions?
How can you pull more data out of the content to enhance
discoverability, viewability?
How can you deal more efficiently with legal and regulatory
compliance?
Make Video and Audio Searchable
Creating a database of rich metadata pulled directly out of the video
and audio content itself
Powerful new media processors
 Speech-to-Text
 Facial and Emotion Detection & Facial Redaction
 Motion Detection, Stabilization, and Acceleration
 Object, Character, and Logo Recognition
 Automated Video Summarization
Azure Media Analytics – Enhancing Your Content
• Enables speech to text conversion
• Languages supported
• English & Spanish as GA
• German, French, Italian, Chinese, Portuguese, Arabic as Preview
• Use cases
• Deep Search & First-pass captions
• Capable of custom vocabulary adaptation
• User provides list of words related to video to improve speech recognition
Indexer
AZURE MEDIA INDEXER
TECHNICAL DETAILS
Azure Media
Indexer
Audio Decoding
Vocabulary Adaptation
Segmentation
Speech Recognition
Caption Alignment
Closed captions
(TTML/WebVTT/SAMI)
Audio or Video
MP4, WMV, MP3, M4A,
AAC, WAV, WMA
Audio Indexing Blob
(AIB) for use with SQL
Server and custom
Ifilter add-on (link)
Flexible metadata files
(keywords, word info)
• Detect faces that appear in your videos
• Track faces as they move around the frame
• Output Metadata with face locations and timestamps
• Age Detection
• Gender Detection
• Facial Recognition
Face
Detection
• Recognize the emotion of a person or crowd over time based on
the facial expressions in the video
• Designed for real emotions in-the-wild.
• Identifies emotions based on expressions that psychological research has identified as universal
• A solution for personalizing experiences, analyzing responses to
media and products, and crowd analytics
• Recognize: happiness, sadness, surprise, anger, contempt, fear,
digest, neutral
• Use cases – Audience Analytics, Personalization etc.
Emotion
Recognition
• Extract typeset words from video
content
• Select your own sampling rate to
balance performance and quality
• Specify where in the video to
look (e.g. bottom third for
captions)
• Output describes text with
location
Video OCR
Text: Who are we?
Location:
(200,100,250,50)
Time: 0:45:02
Text: Who are you and who
is the person sitting
next to you?
Location:
(100,250,350,90)
Time: 0:45:02
• Transforms first-person videos into smooth time-lapses
• Designed for forward-moving camera scenarios (action sports)
dash camera)
Hyperlapse
• Creates an automatic summary for videos to let people see a
preview or snapshot of their video
• Frames are selected based off of video quality, diversity, and
stability of the footage
Video
Summarization
• Detect video content policy violations
• Save time and money spent manually reviewing your content for
offensive, illicit and inappropriate material
• Currently supports adult content classification
Content
Moderation
Indexer Success Story
As a company dedicated to
building intelligent cloud
solutions across industries,
we’re excited to incorporate
Microsoft Azure Media
Analytics’ advanced machine
learning technology in
speech and vision onto our
platform.
Ryan Steelberg
President of Veritone Media and Co-Founder
of Veritone Inc.
"
"
Veritone’s Cognitive Media Platform (CMP) is an open cloud
ecosystem of cognitive tools to harness the power of media
GrayMeta™ - The video & metadata experts
behind MetaFarm
MetaFarm is a powerful platform that tackles big data and metadata
problems, saving business’s time and money and bringing insight to the
data that is already there.
• Connect to dispersed and siloed data for the right reasons, enabling
easier migration and adoption of Azure and other services
• Extract embedded metadata from any file type across all file systems,
databases and data feeds
• Create new metadata leveraging the exponential growth of cognitive,
machine learning & AI services powered by Azure and the Cortana Suite
Easy Upload to Azure
(Signiant)
Powerful Search &
Discovery
Review, Consume, Share and
Take Action
Bring Cognitive, AI & Machine Learning
to the data in a easy to digest way
Azure Consumption
• Storage
• Compute
• Cognitive Services
- Vision
- Speech
- Language
- Knowledge
- Search
SORT, ORGANIZE & ACCESS
“What data do I really have? How many duplicates?”
Find and organize all assets / data across all departments, which also helps prior to
data migration.
SEARCH & DISCOVERY ACROSS ENTERPRISE DATA
SILO’s
“I need the same asset another group has, why do I need to create or source it again?”
Leverage data / content across Linear, VOD and OTT within broadcaster’s multiple data locations
and increase efficiencies across the enterprise while driving cost down.
TIME & COST EFFICENCIES
“I need more data quicker and the FTE approach doesn’t have the ROI.”
Bring efficiencies to the workplace, reducing the need for manual tagging and labeling of content
and reducing the time to access the right data by 10x.
Video Stream Networks S.L. – Copyright © 2016 www.vsn-tv.com
MICROSOFT AZURE
• VSN is a leading End-to-End IT
developer company for the Broadcast
and the M&E Industries, with over 1000
clients in more than 100 countries.
• VSNEXPLORER provides corporations a
secure, always-on media asset
management solution that allows
companies and users to collaborate
with their media archive, optimizing
their processes and enhancing their
capabilities from any location
MICROSOFT AZURE MEDIA SERVICES
VSNEXPLORER and Azure Media Services
Integration with Speech to Text and Translation
VSNEXPLORER working with Azure Media Services
Celebrity
Search
• Input – Entertainment videos
• Output – Search index based on celebrities in videos
• User should be able to search for videos where
• Celebrity X and Y were sighted together
• Celebrity X said certain words or phrases
• …..
Media
Analytics
Cognitive
Vision API
Azure
Search
Face
recognition
• Input – Videos of any type (entertainment, surveillance etc.)
• Output – Search index based on list of known faces
• User should be able to search for videos where
• Person X and Y were sighted together
• Person X said certain words or phrases
• …..
Media
Analytics
Cognitive
Face API
Azure
Search
Audio
redaction
• Input – Videos of any type
• Output – Videos with keywords redacted
• Useful in following scenarios
• Identity protection (security videos)
• Applying censorship (broadcasting on public channels)
• …..
Media
Indexer
Transcript
Filter
AMS
Audio
OverlaysRedaction
timecodes
• Identifies objects and categories that are within a video frame
• Uses a trained model with over 2000 tags
• Output metadata with video tags by frames
Video
Tagging
• Identifies the actions taking place within a sequence of frames
• Starting with 61 categories (46 sports + 15 daily activities)
• Output metadata with action and time stamp
Action
Recognition
• Protect the Identities of individuals by blurring the video
• Automatically detect and redact faces
• Tag and blur identifiable information in dynamic settings such as
License Plates
Redaction
Computer Vision and Media Analytics
Imagine what you can
do with these services?
#azurejp
https://www.facebook.com/dahatake/
https://twitter.com/dahatake/
https://github.com/dahatake/
https://daiyuhatakeyama.wordpress.com/
Tsutaya-TV - 4K / HEVC trial
Speech-to-
text
話しているテキストを
抽出
現在、8言語対応
Face &
Emotion
detection
顔のカウントおよび
性別・年齢・感情の判
定
Hyperlapse
スタビライザーとタイ
ムラプス
Video
summarizatio
n
ハイライトシーンによ
る
サマリービデオの自動
作成
Motion
detection
動きのあった箇所の検
知
Object
Character
Recognition
(OCR)
ビデオ内の画像から、
テキストを抽出
450 6th St.
San
Francisco
Face
Redaction
特定の人の顔に
ぼかしを入れる
InterBEE 2016: クラウドをコアにした「デジタル・トランスフォーメーション」が  メディア業界に与えるインパクトとは何か?

InterBEE 2016: クラウドをコアにした「デジタル・トランスフォーメーション」が メディア業界に与えるインパクトとは何か?

  • 1.
    Computer Vision and Media Analytics Tony Emerson ManagingDirector Worldwide Media and Cable Creating New Opportunities for Value-Add Connected TV Services 1
  • 2.
    Our OTT Vision Capitalize on existing content rights  Unlock new niche markets  Create a better viewing experience  Create a closer connection to the audience  Grow the business
  • 3.
    The one ofthe biggest network in the world • 30 regions is generally available. • 6 region is coming soon. https://azure.microsoft.com/ja-jp/regions/
  • 4.
    Platform Services Security & Management InfrastructureServices Web Apps Mobile Apps API Management API Apps Logic Apps Notification Hubs HDInsight Machine Learning Stream Analytics Data Factory Event Hubs Mobile Engagement Active Directory Multi-Factor Authentication Automation Portal Key Vault Store / Marketplace Hybrid Operations Backup StorSimple Site Recovery Import/Export SQL Database DocumentDB Redis Cache Search Tables SQL Data Warehouse Azure AD Connect Health AD Privileged Identity Management Operational Insights Cloud Services Batch Remote App Service Fabric Visual Studio Application Insights Azure SDK Team Project VM Image Gallery & VM Depot Content Delivery Network (CDN) Media Insights VoD/Live Transcoding Azure Media Player Multi DRM VoD/Live Channel Streaming
  • 5.
    Rio Olympics mostsuccessful in history
  • 6.
    Rio 2016 bythe Numbers
  • 7.
    Microsoft Azure andPartners deliver globally
  • 8.
    Plus a growing ecosystemof value-add third party partner components Live & On Demand Streaming with integrated CDN Content Protection Encoding & Media Analytics Cloud Upload & Storage Scalable components for building custom media workflows in the cloud Azure Media Services Player Clients
  • 9.
    Wide Adoption Premium video on-demand content, broadcasts& live event streaming, online video platforms for web and mobile, enterprise video management…. And more!
  • 10.
    What will youdo to differentiate your service? How can you increase the value of content? Can you make it easier to search – for text, faces, logos and images, specific actions? How can you pull more data out of the content to enhance discoverability, viewability? How can you deal more efficiently with legal and regulatory compliance?
  • 11.
    Make Video andAudio Searchable Creating a database of rich metadata pulled directly out of the video and audio content itself Powerful new media processors  Speech-to-Text  Facial and Emotion Detection & Facial Redaction  Motion Detection, Stabilization, and Acceleration  Object, Character, and Logo Recognition  Automated Video Summarization Azure Media Analytics – Enhancing Your Content
  • 12.
    • Enables speechto text conversion • Languages supported • English & Spanish as GA • German, French, Italian, Chinese, Portuguese, Arabic as Preview • Use cases • Deep Search & First-pass captions • Capable of custom vocabulary adaptation • User provides list of words related to video to improve speech recognition Indexer
  • 13.
    AZURE MEDIA INDEXER TECHNICALDETAILS Azure Media Indexer Audio Decoding Vocabulary Adaptation Segmentation Speech Recognition Caption Alignment Closed captions (TTML/WebVTT/SAMI) Audio or Video MP4, WMV, MP3, M4A, AAC, WAV, WMA Audio Indexing Blob (AIB) for use with SQL Server and custom Ifilter add-on (link) Flexible metadata files (keywords, word info)
  • 14.
    • Detect facesthat appear in your videos • Track faces as they move around the frame • Output Metadata with face locations and timestamps • Age Detection • Gender Detection • Facial Recognition Face Detection
  • 15.
    • Recognize theemotion of a person or crowd over time based on the facial expressions in the video • Designed for real emotions in-the-wild. • Identifies emotions based on expressions that psychological research has identified as universal • A solution for personalizing experiences, analyzing responses to media and products, and crowd analytics • Recognize: happiness, sadness, surprise, anger, contempt, fear, digest, neutral • Use cases – Audience Analytics, Personalization etc. Emotion Recognition
  • 16.
    • Extract typesetwords from video content • Select your own sampling rate to balance performance and quality • Specify where in the video to look (e.g. bottom third for captions) • Output describes text with location Video OCR Text: Who are we? Location: (200,100,250,50) Time: 0:45:02 Text: Who are you and who is the person sitting next to you? Location: (100,250,350,90) Time: 0:45:02
  • 17.
    • Transforms first-personvideos into smooth time-lapses • Designed for forward-moving camera scenarios (action sports) dash camera) Hyperlapse
  • 18.
    • Creates anautomatic summary for videos to let people see a preview or snapshot of their video • Frames are selected based off of video quality, diversity, and stability of the footage Video Summarization
  • 19.
    • Detect videocontent policy violations • Save time and money spent manually reviewing your content for offensive, illicit and inappropriate material • Currently supports adult content classification Content Moderation
  • 20.
    Indexer Success Story Asa company dedicated to building intelligent cloud solutions across industries, we’re excited to incorporate Microsoft Azure Media Analytics’ advanced machine learning technology in speech and vision onto our platform. Ryan Steelberg President of Veritone Media and Co-Founder of Veritone Inc. " " Veritone’s Cognitive Media Platform (CMP) is an open cloud ecosystem of cognitive tools to harness the power of media
  • 21.
    GrayMeta™ - Thevideo & metadata experts behind MetaFarm MetaFarm is a powerful platform that tackles big data and metadata problems, saving business’s time and money and bringing insight to the data that is already there. • Connect to dispersed and siloed data for the right reasons, enabling easier migration and adoption of Azure and other services • Extract embedded metadata from any file type across all file systems, databases and data feeds • Create new metadata leveraging the exponential growth of cognitive, machine learning & AI services powered by Azure and the Cortana Suite Easy Upload to Azure (Signiant) Powerful Search & Discovery Review, Consume, Share and Take Action Bring Cognitive, AI & Machine Learning to the data in a easy to digest way
  • 22.
    Azure Consumption • Storage •Compute • Cognitive Services - Vision - Speech - Language - Knowledge - Search
  • 23.
    SORT, ORGANIZE &ACCESS “What data do I really have? How many duplicates?” Find and organize all assets / data across all departments, which also helps prior to data migration. SEARCH & DISCOVERY ACROSS ENTERPRISE DATA SILO’s “I need the same asset another group has, why do I need to create or source it again?” Leverage data / content across Linear, VOD and OTT within broadcaster’s multiple data locations and increase efficiencies across the enterprise while driving cost down. TIME & COST EFFICENCIES “I need more data quicker and the FTE approach doesn’t have the ROI.” Bring efficiencies to the workplace, reducing the need for manual tagging and labeling of content and reducing the time to access the right data by 10x.
  • 24.
    Video Stream NetworksS.L. – Copyright © 2016 www.vsn-tv.com MICROSOFT AZURE • VSN is a leading End-to-End IT developer company for the Broadcast and the M&E Industries, with over 1000 clients in more than 100 countries. • VSNEXPLORER provides corporations a secure, always-on media asset management solution that allows companies and users to collaborate with their media archive, optimizing their processes and enhancing their capabilities from any location MICROSOFT AZURE MEDIA SERVICES VSNEXPLORER and Azure Media Services Integration with Speech to Text and Translation VSNEXPLORER working with Azure Media Services
  • 25.
    Celebrity Search • Input –Entertainment videos • Output – Search index based on celebrities in videos • User should be able to search for videos where • Celebrity X and Y were sighted together • Celebrity X said certain words or phrases • ….. Media Analytics Cognitive Vision API Azure Search
  • 26.
    Face recognition • Input –Videos of any type (entertainment, surveillance etc.) • Output – Search index based on list of known faces • User should be able to search for videos where • Person X and Y were sighted together • Person X said certain words or phrases • ….. Media Analytics Cognitive Face API Azure Search
  • 27.
    Audio redaction • Input –Videos of any type • Output – Videos with keywords redacted • Useful in following scenarios • Identity protection (security videos) • Applying censorship (broadcasting on public channels) • ….. Media Indexer Transcript Filter AMS Audio OverlaysRedaction timecodes
  • 28.
    • Identifies objectsand categories that are within a video frame • Uses a trained model with over 2000 tags • Output metadata with video tags by frames Video Tagging
  • 29.
    • Identifies theactions taking place within a sequence of frames • Starting with 61 categories (46 sports + 15 daily activities) • Output metadata with action and time stamp Action Recognition
  • 30.
    • Protect theIdentities of individuals by blurring the video • Automatically detect and redact faces • Tag and blur identifiable information in dynamic settings such as License Plates Redaction
  • 31.
    Computer Vision andMedia Analytics Imagine what you can do with these services?
  • 32.
  • 34.
    Tsutaya-TV - 4K/ HEVC trial
  • 37.