SlideShare a Scribd company logo
Automatic multi-modal
metadata annotation
based on trained
cognitive solutions
Jakob Rosinski
Lead Architect Video & Broadcast
IBM GBS Europe
Lead Architect Video & Broadcast, IBM GBS Europe
Member IBM Global Center of Competence Telco, Media & Entertainment
Member IBM Technical Expert Council Central (TEC CR)
Product Owner IBM AREMA
Jakob Rosinski is the Lead Architect for Video & Broadcast for IBM Global Business
Services Europe and also member of IBMs Global Center of Competence for Telecom,
Media & Entertainment. In this role he is also the product owner of IBM AREMA, a
workflow and essence management solution which is widely used at different
broadcasters for essence archives and workflow automation.
Over the last decade Jakob was responsible for various projects in the media industry
at HBO, France24, ORF, SRF, RTL Mediengruppe or Deutsche Bundesliga/Sportcast. He
is a subject matter expert for multi-site &multi-tier essence management and
workflow automation for ingest, archive, production & distribution.
Further he is well recognized in topics like cognitive content enrichment and broadcast
integration.
Dipl.-Inf. (M.Sc.) Jakob Rosinski
2
1. Introduction
2. Components
3. Training & Optimization
4. Analysis & Aggregation
5. Overall process & Integration
Agenda
3
Introduction
„Rich metadata is the key to content discovery and monetization. It powers
advanced video search and recommandation engines...“
FKTG Magazin 03/2017, S.84
5
Scene Detection / Segmentation
Deep Video-Analysis
 People-, Object and Context-Detection
 Classification of actors based on 24
emotions
 Classification of scenes based on 22.000
categories
Deep Audio-Analysis
 Background
 Actor sentiment and tone
Analysis of scene composition
 Classification of light and color
Analysis of succesful
trailers
https://www.youtube.com/watch?v=gJEzuYynaiw
6
7
Automatic content enrichment of 40+ years of soccer
content
 Annotation by usage of a portfolio of cognitive
solutions (IBM, FRH, Google, MS)
 Audio: Speech-to-text / Transcript
 Audio: Speaker-Detection
 Audio: Atmosphere (cheers, whistles, ..)
 Video: Angle/Camera & Context Detection
 Video: Face- & Object Detection
 Domain trained services including Traningsportal
 Sharpening of results by knowledge of domain and
creation of timelines, identifiying of concepts
Link with Game- and Playerdata
 Optimize content analysis and search based on game
and player statistics
 Guided search.
Persona-based User Experience
 Personalized Discovery, Suggestions, Design & Projects
Content enrichment for
Bundesliga archive
8
Components
Magical Metadata
10
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
Magical Metadata
11
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
IBM Watson Visual Recognition
Visual Recognition understands the contents of images - visual concepts
tag the image, find human faces, approximate age and gender, and find
similar images in a collection. You can also train the service by creating
your own custom concepts. Use Visual Recognition to detect a dress
type in retail, identify spoiled fruit in inventory, and more.
 Image Recognition
 Text Recognition
 Face- & Persondetection
 Pattern search / Collection
 Trainable
12
13
IBM Watson Visual Recognition
IBM Watson
Visual
Recognition –
A Multi-layered
trainable
architecture for
image analysis
• Need to learn effective semantic classifiers using a wide diversity of audio-visual features and models
• Need to design a rich space of semantic concepts that captures multiple facets of audio-visual content
FeaturesColor
Background
Frequencies SpectrumEdges
Camera
Motion
Energy Zero-crossings
Models
P P P P
P P P
P
PP
Positive
Examples
Negative
Examples
N N N N
N N N
N
NN
Labeled Data
Unlabeled Data
Addaboost
K-means
Regression
Bayes Net
Nearest
Neighbor
Neural Net
Deep Belief Nets
GMMClustering
Markov
ModelDecision TreeExpectation
Maximization
Factor Graph
Shot
Boundaries
Semantics
Multimedia Data
Scenes
Locations
Settings Objects
Activities
Actions
Objects
Actions
Behaviors
People
Objects
Living
CarsAnimals
People
Vehicles
Activities
Scenes
People
Places Faces
Objects
Events
Activities
GMMSVMs
ShapeTexture
Ensemble
Classifiers
Motion
Moving
Objects
Active
Learning
Regions
Scene
Dynamics
Tracks
14
Microsoft Cognitive Services
 Image Recognition
This feature returns information about visual content found in an image.
Use tagging, descriptions and domain-specific models to identify
content and label it with confidence. Apply the adult/racy settings to
enable automated restriction of adult content. Identify image types and
color schemes in pictures.
 Text Recognition
Optical Character Recognition (OCR) detects text in an image and
extracts the recognized words into a machine-readable character
stream. Analyze images to detect embedded text, generate character
streams and enable searching. Allow users to take photos of text
instead of copying to save time and effort.
 Face- & Persondetection
The Celebrity Model is an example of Domain Specific Models. Our
new celebrity recognition model recognizes 200K celebrities from
business, politics, sports and entertainment around the World. Domain-
specific models is a continuously evolving feature within Computer
Vision API.
 Emotiondetection
15
Google Vision
Google Cloud Vision API enables developers to understand
the content of an image by encapsulating powerful machine
learning models in an easy to use REST API. It quickly
classifies images into thousands of categories (e.g., "sailboat",
"lion", "Eiffel Tower"), detects individual objects and faces
within images, and finds and reads printed words contained
within images. You can build metadata on your image catalog,
moderate offensive content, or enable new marketing scenarios
through image sentiment analysis. Analyze images uploaded
in the request or integrate with your image storage on Google
Cloud Storage.
 Imagerecognition
 Textrecognition
 Facedetection
 Emotiondetection
 Textanalyzes (nicht deutsch)
16
OpenCV
OpenCV is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and
Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and
with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing.
Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform.
Adopted all around the world, OpenCV has more than 47 thousand people of user community and estimated number of
downloads exceeding 14 million. Usage ranges from interactive art, to mines inspection, stitching maps on the web or through
advanced robotics.
 Imagerecognition
 Face- &Persondetection
 Trainierbar
17
Clarifai Image and Video Recognition API
Predict / Classify
 Predict analyzes your images and tells you what's inside of them.
 The API will return a list of concepts with corresponding
probabilities of how likely it is these concepts are contained within
the image
Search
 The Search API allows you to send images (url or bytes) to the
service and have them indexed by 'general' model concepts and
their visual representations.
 Once indexed, you can search for images by concept or using
reverse image search.
Train
 Clarifai provides many different models that 'see' the world
differently. A model contains a group of concepts. A model will
only see the concepts it contains.
18
Imagga Auto-Tagging
Imagga is an Image Recognition
Platform-as-a-Service providing
Image Tagging APIs for
developers & businesses to
build scalable, image intensive
cloud apps.
19
Magical Metadata
20
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
Fraunhofer IAIS Audiomining
 Segmentation
 Speaker- and Languagedetection
 Emotiondetection
 Trainable
 Keywordextraction
Alternatives
 IBM Watson Speech2Text (see later)
 Microsoft Cognitive Services – Bing Speech
 Google Speech
21
22
{"segments": [
…
{
"segmentNumber": 1,
"startTime": 4480,
"duration": 3190,
"endTime": 7670,
"speaker": 1,
"gender": "female",
"transcript": "Hier ist das erste deutsche Fernsehen mit der Tagesschau."
},
...
{
"segmentNumber": 20,
"startTime": 238980,
"duration": 23620,
"endTime": 262600,
"speaker": 2,
"gender": "male",
"transcript": "Großbritannien raus aus der Europäischen Union für viele unvorstellbar
das weiß auch der britische Premierminister Cameron und er nutzt es um die EU Partner
unter Druck zu setzen entweder das Staatenbündnis ist zu Reformen bereit oder bei der
geplanten Volksabstimmung über die EU Mitgliedschaft droht ein Nein heute hatte EU
Ratspräsident Tosca ein Kompromisspapier vorgelegt dass die Briten besänftigen soll."
},
Fraunhofer IAIS Audiomining
IBM Watson Speech to Text
23
https://www-
03.ibm.com/press
/us/en/pressrelea
se/51790.wss
24
Magical Metadata
25
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text/ Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
IBM Watson Natural Language Unterstanding (NLU)
Extraction of
• Sentiment
• Emotion
• Keywords
• Entities
• Categories
• Concepts
• Semantic Roles
26
Magical Metadata
27
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
Visual Atoms
FIND is a high-speed, high-accuracy, image visual search solution.
Our state-of-the-art visual search engine enables the matching of images
depicting the same objects or scenes based on visual similarities, without the
need for manual annotations or metadata.
If you are a provider of image editing or management solutions, the
FIND engine will equip your product with the necessary tools for the creation
of image databases which are searchable using images as queries. Your
end users will be able to create and maintain their own image databases and
efficiently organise, manage and search their image assets.
For providers of image hosting solutions, the FIND engine will allow the
creation of image databases which users can search using visual queries.
For developers of mobile apps, such as for e-commerce, tourism
or entertainment, the FIND engine will give your app cloud-based and/or
terminal based visual search functionality for retrieval of relevant images
and associated information.
With a streamlined API, the FIND engine is designed so that it can be
easily integrated in any third-party application or workflow.
Alternatives: IBM Watson VR Collections, Clarifai Search
28
Training & Optimization
...
Why is training necessary?
30
Visual Recognition - Training
31
Domain- specific model
32
Domain- specific model - Trainer
33
...
Optimization
of keyframe
extraction –
not good
extraction /
use
adaptive
extraction
34
...
Analysis & Aggregation
Cognitive modell for
German Soccer League
Archive
36
Metadaten
(Technisch, Statistik, Ticker,
etc.)
Essenzen
(Audio, Video, Keyframes,
etc.)
Analyse verschiedener Ordnung
(Audiomining, Bilderkennung, Gesichtserkennung,
Mustererkennung, etc.)
Timelines verschiedener Ordnung
(Atmosphäre, Kontext, Perspektive, Personen, etc.)
Cognitive model for German Soccer League Archive
– multi-modal analyzes
37
38
Cognitive model for German Soccer League Archive
– example for timeline of first order
Just uses
results from
analysis
39
Cognitive model for German Soccer League Archive
– example for timeline of second order
Uses results
from analyzes
as well as other
timelines
40
Cognitive model for German Soccer League Archive
– example for timeline of third order
Uses results
from analyzes
as well as other
timelines
41
Camera Timeline
Speed Timeline
Cognitive Aggregator for
Timelines
42
Normal: 60 %
Spidercam: 80%
SlowMo: 55 %
CloseUp: 83%
Normal: 67 %
Goalline: 77%
Normal: 83 %
Spidercam: 76%
Normal: 87 %
Spidercam: 77%
Reduce and sharpen from 20 analysis
events to 4
Combine
Timelines
Combine and
Sharpen SlowMo
Combine
Timelines
Combine Timelines and Frames
due to near similarity
+20 %
Overall process & Integration
IBM AREMA & Watson at Hackdays/SRF
„Die Zukunft der Mediennutzung“
44
Involving now:
• Watson VR - ClassifyImage
• Watson VR - DetectFaces
• Watson VR - RecognizeText
• Watson Speech2Text
• Alchemy API
Used to find
meaningful
content from
SRGs Archives
45
IBM AREMA & Watson at Hackdays/SRF
„Die Zukunft der Mediennutzung“
46
IBM AREMA & Watson at Hackdays/SRF
„Die Zukunft der Mediennutzung“
Cognitive
Process with
Trainer,
Analysis
Workflow and
Aggregator
47
Cognitive
Analysis
Workflow
Cognitive
Trainer
Cognitive
Aggregator
Image
Classifier
Inbox
Taxonomy
Database
Image
Classifier
Repository
Media
Ingestion
Metadata
Repository
(MAM)
1
2
3
4
5
6
1. Configure Taxonomy (add
Classifiers, Categories, etc.)
2. Show and organize classifier
images
3. Move good classifiers to
repository to optimize training
4. Use classifier repository to train
services and perform custom
analysis
5. Move actual frame to inbox
when confidence ok
6. Use taxonomy for rule creation
Future?
Upcoming:
Watson For Media,
announced in April 2017
at
First use cases available
at IBC in September 2017
49

More Related Content

Similar to Automatic multi-modal metadata annotation based on trained cognitive solutions - Rosinski, Jakob

How can you get started with machine learning
How can you get started with machine learning How can you get started with machine learning
How can you get started with machine learning
Omar Badawi
 
Using Cognitive Services
Using Cognitive ServicesUsing Cognitive Services
Using Cognitive Services
Eng Teong Cheah
 
Machine learning, WTF!?
Machine learning, WTF!? Machine learning, WTF!?
Machine learning, WTF!?
Alê Borba
 
Intelligent Apps - Amplifying Human Ingenuity
Intelligent Apps - Amplifying Human IngenuityIntelligent Apps - Amplifying Human Ingenuity
Intelligent Apps - Amplifying Human Ingenuity
David J Rosenthal
 
Using Azure, AI and IoT to find out if the person next to you is a Cylon
Using Azure, AI and IoT to find out if the person next to you is a CylonUsing Azure, AI and IoT to find out if the person next to you is a Cylon
Using Azure, AI and IoT to find out if the person next to you is a Cylon
Todd Whitehead
 
Microsoft Azure beyond IaaS
Microsoft Azure  beyond IaaSMicrosoft Azure  beyond IaaS
Microsoft Azure beyond IaaS
Bipeen Sinha
 
Microsoft AI Overview: Cognitive Services
Microsoft AI Overview: Cognitive ServicesMicrosoft AI Overview: Cognitive Services
Microsoft AI Overview: Cognitive Services
AI Leadership Institute
 
Design Day Workshop
Design Day WorkshopDesign Day Workshop
Design Day Workshop
Prottay Karim
 
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Codemotion
 
Azure beyond IaaS
Azure  beyond IaaSAzure  beyond IaaS
Azure beyond IaaS
Bipeen Sinha
 
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis JugoO365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
NCCOMMS
 
Microsoft Cognitive Services at a Glance
Microsoft Cognitive Services at a GlanceMicrosoft Cognitive Services at a Glance
Microsoft Cognitive Services at a Glance
Marvin Heng
 
Prior AI consulting use cases
Prior AI consulting use casesPrior AI consulting use cases
Prior AI consulting use cases
Harendra Singh
 
AI NOTES.docx
AI NOTES.docxAI NOTES.docx
AI NOTES.docx
gfgcmagadi
 
20160813 102-59-kim youngwook
20160813 102-59-kim youngwook20160813 102-59-kim youngwook
20160813 102-59-kim youngwook
itproman35
 
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш....NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
NETFest
 
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptxunleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
Usama Wahab Khan Cloud, Data and AI
 
Rita Arrigo, Microsoft
Rita Arrigo, Microsoft Rita Arrigo, Microsoft
Rita Arrigo, Microsoft
Hilary Ip
 
Inteligencia artificial para todos
Inteligencia artificial para todosInteligencia artificial para todos
Inteligencia artificial para todos
Juan Nieto García
 

Similar to Automatic multi-modal metadata annotation based on trained cognitive solutions - Rosinski, Jakob (20)

How can you get started with machine learning
How can you get started with machine learning How can you get started with machine learning
How can you get started with machine learning
 
Using Cognitive Services
Using Cognitive ServicesUsing Cognitive Services
Using Cognitive Services
 
Machine learning, WTF!?
Machine learning, WTF!? Machine learning, WTF!?
Machine learning, WTF!?
 
Intelligent Apps - Amplifying Human Ingenuity
Intelligent Apps - Amplifying Human IngenuityIntelligent Apps - Amplifying Human Ingenuity
Intelligent Apps - Amplifying Human Ingenuity
 
Using Azure, AI and IoT to find out if the person next to you is a Cylon
Using Azure, AI and IoT to find out if the person next to you is a CylonUsing Azure, AI and IoT to find out if the person next to you is a Cylon
Using Azure, AI and IoT to find out if the person next to you is a Cylon
 
Microsoft Azure beyond IaaS
Microsoft Azure  beyond IaaSMicrosoft Azure  beyond IaaS
Microsoft Azure beyond IaaS
 
Microsoft AI Overview: Cognitive Services
Microsoft AI Overview: Cognitive ServicesMicrosoft AI Overview: Cognitive Services
Microsoft AI Overview: Cognitive Services
 
Design Day Workshop
Design Day WorkshopDesign Day Workshop
Design Day Workshop
 
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
 
Guru_poster
Guru_posterGuru_poster
Guru_poster
 
Azure beyond IaaS
Azure  beyond IaaSAzure  beyond IaaS
Azure beyond IaaS
 
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis JugoO365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
 
Microsoft Cognitive Services at a Glance
Microsoft Cognitive Services at a GlanceMicrosoft Cognitive Services at a Glance
Microsoft Cognitive Services at a Glance
 
Prior AI consulting use cases
Prior AI consulting use casesPrior AI consulting use cases
Prior AI consulting use cases
 
AI NOTES.docx
AI NOTES.docxAI NOTES.docx
AI NOTES.docx
 
20160813 102-59-kim youngwook
20160813 102-59-kim youngwook20160813 102-59-kim youngwook
20160813 102-59-kim youngwook
 
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш....NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
 
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptxunleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
 
Rita Arrigo, Microsoft
Rita Arrigo, Microsoft Rita Arrigo, Microsoft
Rita Arrigo, Microsoft
 
Inteligencia artificial para todos
Inteligencia artificial para todosInteligencia artificial para todos
Inteligencia artificial para todos
 

More from FIAT/IFTA

2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey
FIAT/IFTA
 
20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List
FIAT/IFTA
 
WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020
FIAT/IFTA
 
OOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVOOMEN MEZARIS ReTV
OOMEN MEZARIS ReTV
FIAT/IFTA
 
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
FIAT/IFTA
 
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉCULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
FIAT/IFTA
 
HULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiativesHULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiatives
FIAT/IFTA
 
WILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC ScotlandWILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC Scotland
FIAT/IFTA
 
GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!
FIAT/IFTA
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal deposit
FIAT/IFTA
 
BIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formatsBIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formats
FIAT/IFTA
 
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
FIAT/IFTA
 
BERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memoriesBERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memories
FIAT/IFTA
 
AOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archiveAOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archive
FIAT/IFTA
 
HULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open upHULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open up
FIAT/IFTA
 
PERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archivesPERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archives
FIAT/IFTA
 
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIAICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
FIAT/IFTA
 
VINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methodsVINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methods
FIAT/IFTA
 
LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?
FIAT/IFTA
 
AZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archiveAZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archive
FIAT/IFTA
 

More from FIAT/IFTA (20)

2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey
 
20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List
 
WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020
 
OOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVOOMEN MEZARIS ReTV
OOMEN MEZARIS ReTV
 
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
 
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉCULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
 
HULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiativesHULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiatives
 
WILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC ScotlandWILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC Scotland
 
GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal deposit
 
BIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formatsBIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formats
 
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
 
BERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memoriesBERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memories
 
AOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archiveAOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archive
 
HULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open upHULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open up
 
PERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archivesPERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archives
 
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIAICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
 
VINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methodsVINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methods
 
LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?
 
AZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archiveAZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archive
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

Automatic multi-modal metadata annotation based on trained cognitive solutions - Rosinski, Jakob

  • 1. Automatic multi-modal metadata annotation based on trained cognitive solutions Jakob Rosinski Lead Architect Video & Broadcast IBM GBS Europe
  • 2. Lead Architect Video & Broadcast, IBM GBS Europe Member IBM Global Center of Competence Telco, Media & Entertainment Member IBM Technical Expert Council Central (TEC CR) Product Owner IBM AREMA Jakob Rosinski is the Lead Architect for Video & Broadcast for IBM Global Business Services Europe and also member of IBMs Global Center of Competence for Telecom, Media & Entertainment. In this role he is also the product owner of IBM AREMA, a workflow and essence management solution which is widely used at different broadcasters for essence archives and workflow automation. Over the last decade Jakob was responsible for various projects in the media industry at HBO, France24, ORF, SRF, RTL Mediengruppe or Deutsche Bundesliga/Sportcast. He is a subject matter expert for multi-site &multi-tier essence management and workflow automation for ingest, archive, production & distribution. Further he is well recognized in topics like cognitive content enrichment and broadcast integration. Dipl.-Inf. (M.Sc.) Jakob Rosinski 2
  • 3. 1. Introduction 2. Components 3. Training & Optimization 4. Analysis & Aggregation 5. Overall process & Integration Agenda 3
  • 4. Introduction „Rich metadata is the key to content discovery and monetization. It powers advanced video search and recommandation engines...“ FKTG Magazin 03/2017, S.84
  • 5. 5
  • 6. Scene Detection / Segmentation Deep Video-Analysis  People-, Object and Context-Detection  Classification of actors based on 24 emotions  Classification of scenes based on 22.000 categories Deep Audio-Analysis  Background  Actor sentiment and tone Analysis of scene composition  Classification of light and color Analysis of succesful trailers https://www.youtube.com/watch?v=gJEzuYynaiw 6
  • 7. 7
  • 8. Automatic content enrichment of 40+ years of soccer content  Annotation by usage of a portfolio of cognitive solutions (IBM, FRH, Google, MS)  Audio: Speech-to-text / Transcript  Audio: Speaker-Detection  Audio: Atmosphere (cheers, whistles, ..)  Video: Angle/Camera & Context Detection  Video: Face- & Object Detection  Domain trained services including Traningsportal  Sharpening of results by knowledge of domain and creation of timelines, identifiying of concepts Link with Game- and Playerdata  Optimize content analysis and search based on game and player statistics  Guided search. Persona-based User Experience  Personalized Discovery, Suggestions, Design & Projects Content enrichment for Bundesliga archive 8
  • 10. Magical Metadata 10 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text / Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 11. Magical Metadata 11 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text / Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 12. IBM Watson Visual Recognition Visual Recognition understands the contents of images - visual concepts tag the image, find human faces, approximate age and gender, and find similar images in a collection. You can also train the service by creating your own custom concepts. Use Visual Recognition to detect a dress type in retail, identify spoiled fruit in inventory, and more.  Image Recognition  Text Recognition  Face- & Persondetection  Pattern search / Collection  Trainable 12
  • 13. 13 IBM Watson Visual Recognition
  • 14. IBM Watson Visual Recognition – A Multi-layered trainable architecture for image analysis • Need to learn effective semantic classifiers using a wide diversity of audio-visual features and models • Need to design a rich space of semantic concepts that captures multiple facets of audio-visual content FeaturesColor Background Frequencies SpectrumEdges Camera Motion Energy Zero-crossings Models P P P P P P P P PP Positive Examples Negative Examples N N N N N N N N NN Labeled Data Unlabeled Data Addaboost K-means Regression Bayes Net Nearest Neighbor Neural Net Deep Belief Nets GMMClustering Markov ModelDecision TreeExpectation Maximization Factor Graph Shot Boundaries Semantics Multimedia Data Scenes Locations Settings Objects Activities Actions Objects Actions Behaviors People Objects Living CarsAnimals People Vehicles Activities Scenes People Places Faces Objects Events Activities GMMSVMs ShapeTexture Ensemble Classifiers Motion Moving Objects Active Learning Regions Scene Dynamics Tracks 14
  • 15. Microsoft Cognitive Services  Image Recognition This feature returns information about visual content found in an image. Use tagging, descriptions and domain-specific models to identify content and label it with confidence. Apply the adult/racy settings to enable automated restriction of adult content. Identify image types and color schemes in pictures.  Text Recognition Optical Character Recognition (OCR) detects text in an image and extracts the recognized words into a machine-readable character stream. Analyze images to detect embedded text, generate character streams and enable searching. Allow users to take photos of text instead of copying to save time and effort.  Face- & Persondetection The Celebrity Model is an example of Domain Specific Models. Our new celebrity recognition model recognizes 200K celebrities from business, politics, sports and entertainment around the World. Domain- specific models is a continuously evolving feature within Computer Vision API.  Emotiondetection 15
  • 16. Google Vision Google Cloud Vision API enables developers to understand the content of an image by encapsulating powerful machine learning models in an easy to use REST API. It quickly classifies images into thousands of categories (e.g., "sailboat", "lion", "Eiffel Tower"), detects individual objects and faces within images, and finds and reads printed words contained within images. You can build metadata on your image catalog, moderate offensive content, or enable new marketing scenarios through image sentiment analysis. Analyze images uploaded in the request or integrate with your image storage on Google Cloud Storage.  Imagerecognition  Textrecognition  Facedetection  Emotiondetection  Textanalyzes (nicht deutsch) 16
  • 17. OpenCV OpenCV is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing. Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform. Adopted all around the world, OpenCV has more than 47 thousand people of user community and estimated number of downloads exceeding 14 million. Usage ranges from interactive art, to mines inspection, stitching maps on the web or through advanced robotics.  Imagerecognition  Face- &Persondetection  Trainierbar 17
  • 18. Clarifai Image and Video Recognition API Predict / Classify  Predict analyzes your images and tells you what's inside of them.  The API will return a list of concepts with corresponding probabilities of how likely it is these concepts are contained within the image Search  The Search API allows you to send images (url or bytes) to the service and have them indexed by 'general' model concepts and their visual representations.  Once indexed, you can search for images by concept or using reverse image search. Train  Clarifai provides many different models that 'see' the world differently. A model contains a group of concepts. A model will only see the concepts it contains. 18
  • 19. Imagga Auto-Tagging Imagga is an Image Recognition Platform-as-a-Service providing Image Tagging APIs for developers & businesses to build scalable, image intensive cloud apps. 19
  • 20. Magical Metadata 20 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text / Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 21. Fraunhofer IAIS Audiomining  Segmentation  Speaker- and Languagedetection  Emotiondetection  Trainable  Keywordextraction Alternatives  IBM Watson Speech2Text (see later)  Microsoft Cognitive Services – Bing Speech  Google Speech 21
  • 22. 22 {"segments": [ … { "segmentNumber": 1, "startTime": 4480, "duration": 3190, "endTime": 7670, "speaker": 1, "gender": "female", "transcript": "Hier ist das erste deutsche Fernsehen mit der Tagesschau." }, ... { "segmentNumber": 20, "startTime": 238980, "duration": 23620, "endTime": 262600, "speaker": 2, "gender": "male", "transcript": "Großbritannien raus aus der Europäischen Union für viele unvorstellbar das weiß auch der britische Premierminister Cameron und er nutzt es um die EU Partner unter Druck zu setzen entweder das Staatenbündnis ist zu Reformen bereit oder bei der geplanten Volksabstimmung über die EU Mitgliedschaft droht ein Nein heute hatte EU Ratspräsident Tosca ein Kompromisspapier vorgelegt dass die Briten besänftigen soll." }, Fraunhofer IAIS Audiomining
  • 23. IBM Watson Speech to Text 23
  • 25. Magical Metadata 25 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text/ Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 26. IBM Watson Natural Language Unterstanding (NLU) Extraction of • Sentiment • Emotion • Keywords • Entities • Categories • Concepts • Semantic Roles 26
  • 27. Magical Metadata 27 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text / Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 28. Visual Atoms FIND is a high-speed, high-accuracy, image visual search solution. Our state-of-the-art visual search engine enables the matching of images depicting the same objects or scenes based on visual similarities, without the need for manual annotations or metadata. If you are a provider of image editing or management solutions, the FIND engine will equip your product with the necessary tools for the creation of image databases which are searchable using images as queries. Your end users will be able to create and maintain their own image databases and efficiently organise, manage and search their image assets. For providers of image hosting solutions, the FIND engine will allow the creation of image databases which users can search using visual queries. For developers of mobile apps, such as for e-commerce, tourism or entertainment, the FIND engine will give your app cloud-based and/or terminal based visual search functionality for retrieval of relevant images and associated information. With a streamlined API, the FIND engine is designed so that it can be easily integrated in any third-party application or workflow. Alternatives: IBM Watson VR Collections, Clarifai Search 28
  • 30. ... Why is training necessary? 30
  • 31. Visual Recognition - Training 31
  • 33. Domain- specific model - Trainer 33
  • 34. ... Optimization of keyframe extraction – not good extraction / use adaptive extraction 34 ...
  • 36. Cognitive modell for German Soccer League Archive 36 Metadaten (Technisch, Statistik, Ticker, etc.) Essenzen (Audio, Video, Keyframes, etc.) Analyse verschiedener Ordnung (Audiomining, Bilderkennung, Gesichtserkennung, Mustererkennung, etc.) Timelines verschiedener Ordnung (Atmosphäre, Kontext, Perspektive, Personen, etc.)
  • 37. Cognitive model for German Soccer League Archive – multi-modal analyzes 37
  • 38. 38 Cognitive model for German Soccer League Archive – example for timeline of first order Just uses results from analysis
  • 39. 39 Cognitive model for German Soccer League Archive – example for timeline of second order Uses results from analyzes as well as other timelines
  • 40. 40 Cognitive model for German Soccer League Archive – example for timeline of third order Uses results from analyzes as well as other timelines
  • 41. 41
  • 42. Camera Timeline Speed Timeline Cognitive Aggregator for Timelines 42 Normal: 60 % Spidercam: 80% SlowMo: 55 % CloseUp: 83% Normal: 67 % Goalline: 77% Normal: 83 % Spidercam: 76% Normal: 87 % Spidercam: 77% Reduce and sharpen from 20 analysis events to 4 Combine Timelines Combine and Sharpen SlowMo Combine Timelines Combine Timelines and Frames due to near similarity +20 %
  • 43. Overall process & Integration
  • 44. IBM AREMA & Watson at Hackdays/SRF „Die Zukunft der Mediennutzung“ 44
  • 45. Involving now: • Watson VR - ClassifyImage • Watson VR - DetectFaces • Watson VR - RecognizeText • Watson Speech2Text • Alchemy API Used to find meaningful content from SRGs Archives 45 IBM AREMA & Watson at Hackdays/SRF „Die Zukunft der Mediennutzung“
  • 46. 46 IBM AREMA & Watson at Hackdays/SRF „Die Zukunft der Mediennutzung“
  • 47. Cognitive Process with Trainer, Analysis Workflow and Aggregator 47 Cognitive Analysis Workflow Cognitive Trainer Cognitive Aggregator Image Classifier Inbox Taxonomy Database Image Classifier Repository Media Ingestion Metadata Repository (MAM) 1 2 3 4 5 6 1. Configure Taxonomy (add Classifiers, Categories, etc.) 2. Show and organize classifier images 3. Move good classifiers to repository to optimize training 4. Use classifier repository to train services and perform custom analysis 5. Move actual frame to inbox when confidence ok 6. Use taxonomy for rule creation
  • 48. Future? Upcoming: Watson For Media, announced in April 2017 at First use cases available at IBC in September 2017
  • 49. 49