Jakob Rosinski presented on automatic multi-modal metadata annotation based on trained cognitive solutions. He discussed using various cognitive services like IBM Watson, Microsoft Cognitive Services, Google Vision, and OpenCV to analyze video and audio content. This includes scene detection, people and object detection, emotion detection, speech to text, and more. The extracted metadata can then be used to enrich content and power advanced search and discovery tools.
Extend the Reach of R to the Enterprise (for useR! 2013)Lou Bajuk
An overview of how and why we developed TIBCO Enterprise Runtime for R (TERR), and how it helps organizations leverage the power of the R language more widely.
Extend the Reach of R to the Enterprise (for useR! 2013)Lou Bajuk
An overview of how and why we developed TIBCO Enterprise Runtime for R (TERR), and how it helps organizations leverage the power of the R language more widely.
Using Cognitive Services describes the use of the Cognitive Services APIs for text and image processing, and in recommendation applications, and also describes the use of neural networks with Azure Machine Learning.
Vision: Image-processing algorithms to smartly identify, caption and moderate your pictures
Computer vision: Distill actionable information from images
Content Moderator: Automatically moderate potentially offensive images, text and videos
Customer Vision Service: Train a web service to recognize specific content in images
Face: Identify human faces and emotions in images
Video indexer: Easily extract insights from your videos to enrich your applications
Speech: Convert spoken audio into text, use voice for verification, or add speaker recognition to your app
Bing Speech: Convert speech to text and text to speech
Speaker Recognition: Use speech to identify and authenticate individual speakers
Custom Speech Service: Overcome speech recognition barriers like speaking style, background noise, and vocabulary
Translator Speech: Easily conduct real-time speech translation on your app
Language: Enable your apps to process natural language with pre-built scripts, evaluate sentiment and learn how to recognize what users want
Bing Spell Check: Add spell checking functionality to your app
Language Understanding (LUIS): Add language understanding intelligence to your apps with minimal effort
Linguistic Analysis: Easily parse complex text with language analysis
Text Analytics: Easily evaluate sentiment, language, and key phrases to understand what users want
Translator Text: Easily conduct machine translation for 60+ languages
Knowledge: Map complex information and data in order to solve tasks such as intelligent recommendations and semantic search
Knowledge Exploration Service: Enable interactive search experiences over structured data via natural language inputs
Entity Linking Service: Power your app's data links with named entity recognition and disambiguation
Academic Knowledge: Tap into the wealth of academic content in the Microsoft Academic Graph using the Academic Knowledge API
QnA Maker: Distill information into an easy-to-navigate FAQ for bot services
Customer Decision Service: Create custom experiences with adaptive, contextual decision-making
Search: Add Bing Search APIs to your apps and harness the ability to comb billions of webpages, images, videos, and news with a single API call
Bing Autosuggest: Give your app intelligent autosuggest options for searches
Bing News Search: Search for news and get comprehensive results
Bing Web Search: Get enhanced search details from billions of web documents
Bing Entity Search: Enrich your experiences by identifying and augmenting entity information from the web
Bing Image Search: Search for images and get comprehensive results
Bing Video Search: Search for videos and get comprehensive results
Bing Custom Search: Create tailored site search or vertical search experiences for topics you care about
Using Azure, AI and IoT to find out if the person next to you is a CylonTodd Whitehead
n this demo heavy session we will see how developers can combine Azure’s custom cognitive services and IoT Edge technologies to productionise AI models to the edge on something as small as a Raspberry Pi. In the past, machine learning at the edge required powerful and expensive machines known as “heavy edge” but are limited by continuous power supplies and direct connectivity to all sensors, making deployments constrained and expensive. By leveraging the computing power of Azure and easy to use services we will see how this is now in the reach of any developer.
The session will cover:
· Training Custom Cognitive AI in Azure
· Deployment options for your shiny new AI
· Using IoT Edge to deploy AI
· Rubbing a little DevOps on it
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...Codemotion
This session provides a look at where and how Deltatre, a global leader for sport business services, is early-adopting AI techs in their world-class, multi-device OTT (over-the-top) platform services, powering some of the major sport federations, such as NFL Game Pass, ATP Tennis TV and FINAtv. Based on Azure AI services, 3rd party providers and custom models built with open-source libraries and frameworks, the session shows how AI is helping to achieve better system monitoring, improved customer support, more fan engagement and per-user customization in on-demand and live streaming.
Digital transformation with AI and process automation.
Prior consulting use cases in the domain of talent acquisition, e-commerce, e-Publishing and HR analytics.
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...NETFest
Штучний інтелект, беззаперечно, є трендом цього року. Когнітивні сервіси, цифрові асистенти, глобальні ініціативи трансформації бізнесу та соціальної сфери, машинне навчання та чатботи - все це дуже активно розвивається. Компанія Microsoft надає розробникам великий вибір різноманітних інструментів та технології (в тому числі у зв'язці з продуктами інших компаній), які дозволяють будувати "розумне" програмне забезпечення, а також трансформувати бізнес процеси. В доповіді на реальних прикладах ви дізнаєтесь, яким чином зробити ваше програмне забезпечення більш розумним, які кращі практики використання тих чи інших інструментів та до яких глобальних ініціатив ви можете приєднатись, будучи спеціалістом зі штучного інтелекту.
masterclass de introducción a Inteligencia Artificial utilizando las APIs de Google elaborada a partir de la de Mario Ezquerro en GDG La Rioja.
Impartida dentro de las actividades de la Agenda Digital de La Rioja por AERTIC
An overview of the results of the 2021 FIAT/IFTA Timeline Survey, as presented by Adrienne Warburton during the 2021 FIAT/IFTA World Conference (online).
The FIAT/IFTA Most Wanted List may be a new initiative of FIAT/IFTA. The aim is to create a central hub of Most Wanted Lists, provided by broadcast and audiovisual archives worldwide.
On these lists we would put those programmes, media fragments, excerpts or even complete series that archives are desperately looking for. Via a contact button, other archives could put themselves in contact with the archive that has published its list, in order to to signal a possible trouvaille.
All further explanations and a link to a survey to measure the interest are in this presentation.
More Related Content
Similar to Automatic multi-modal metadata annotation based on trained cognitive solutions - Rosinski, Jakob
Using Cognitive Services describes the use of the Cognitive Services APIs for text and image processing, and in recommendation applications, and also describes the use of neural networks with Azure Machine Learning.
Vision: Image-processing algorithms to smartly identify, caption and moderate your pictures
Computer vision: Distill actionable information from images
Content Moderator: Automatically moderate potentially offensive images, text and videos
Customer Vision Service: Train a web service to recognize specific content in images
Face: Identify human faces and emotions in images
Video indexer: Easily extract insights from your videos to enrich your applications
Speech: Convert spoken audio into text, use voice for verification, or add speaker recognition to your app
Bing Speech: Convert speech to text and text to speech
Speaker Recognition: Use speech to identify and authenticate individual speakers
Custom Speech Service: Overcome speech recognition barriers like speaking style, background noise, and vocabulary
Translator Speech: Easily conduct real-time speech translation on your app
Language: Enable your apps to process natural language with pre-built scripts, evaluate sentiment and learn how to recognize what users want
Bing Spell Check: Add spell checking functionality to your app
Language Understanding (LUIS): Add language understanding intelligence to your apps with minimal effort
Linguistic Analysis: Easily parse complex text with language analysis
Text Analytics: Easily evaluate sentiment, language, and key phrases to understand what users want
Translator Text: Easily conduct machine translation for 60+ languages
Knowledge: Map complex information and data in order to solve tasks such as intelligent recommendations and semantic search
Knowledge Exploration Service: Enable interactive search experiences over structured data via natural language inputs
Entity Linking Service: Power your app's data links with named entity recognition and disambiguation
Academic Knowledge: Tap into the wealth of academic content in the Microsoft Academic Graph using the Academic Knowledge API
QnA Maker: Distill information into an easy-to-navigate FAQ for bot services
Customer Decision Service: Create custom experiences with adaptive, contextual decision-making
Search: Add Bing Search APIs to your apps and harness the ability to comb billions of webpages, images, videos, and news with a single API call
Bing Autosuggest: Give your app intelligent autosuggest options for searches
Bing News Search: Search for news and get comprehensive results
Bing Web Search: Get enhanced search details from billions of web documents
Bing Entity Search: Enrich your experiences by identifying and augmenting entity information from the web
Bing Image Search: Search for images and get comprehensive results
Bing Video Search: Search for videos and get comprehensive results
Bing Custom Search: Create tailored site search or vertical search experiences for topics you care about
Using Azure, AI and IoT to find out if the person next to you is a CylonTodd Whitehead
n this demo heavy session we will see how developers can combine Azure’s custom cognitive services and IoT Edge technologies to productionise AI models to the edge on something as small as a Raspberry Pi. In the past, machine learning at the edge required powerful and expensive machines known as “heavy edge” but are limited by continuous power supplies and direct connectivity to all sensors, making deployments constrained and expensive. By leveraging the computing power of Azure and easy to use services we will see how this is now in the reach of any developer.
The session will cover:
· Training Custom Cognitive AI in Azure
· Deployment options for your shiny new AI
· Using IoT Edge to deploy AI
· Rubbing a little DevOps on it
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...Codemotion
This session provides a look at where and how Deltatre, a global leader for sport business services, is early-adopting AI techs in their world-class, multi-device OTT (over-the-top) platform services, powering some of the major sport federations, such as NFL Game Pass, ATP Tennis TV and FINAtv. Based on Azure AI services, 3rd party providers and custom models built with open-source libraries and frameworks, the session shows how AI is helping to achieve better system monitoring, improved customer support, more fan engagement and per-user customization in on-demand and live streaming.
Digital transformation with AI and process automation.
Prior consulting use cases in the domain of talent acquisition, e-commerce, e-Publishing and HR analytics.
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...NETFest
Штучний інтелект, беззаперечно, є трендом цього року. Когнітивні сервіси, цифрові асистенти, глобальні ініціативи трансформації бізнесу та соціальної сфери, машинне навчання та чатботи - все це дуже активно розвивається. Компанія Microsoft надає розробникам великий вибір різноманітних інструментів та технології (в тому числі у зв'язці з продуктами інших компаній), які дозволяють будувати "розумне" програмне забезпечення, а також трансформувати бізнес процеси. В доповіді на реальних прикладах ви дізнаєтесь, яким чином зробити ваше програмне забезпечення більш розумним, які кращі практики використання тих чи інших інструментів та до яких глобальних ініціатив ви можете приєднатись, будучи спеціалістом зі штучного інтелекту.
masterclass de introducción a Inteligencia Artificial utilizando las APIs de Google elaborada a partir de la de Mario Ezquerro en GDG La Rioja.
Impartida dentro de las actividades de la Agenda Digital de La Rioja por AERTIC
Similar to Automatic multi-modal metadata annotation based on trained cognitive solutions - Rosinski, Jakob (20)
An overview of the results of the 2021 FIAT/IFTA Timeline Survey, as presented by Adrienne Warburton during the 2021 FIAT/IFTA World Conference (online).
The FIAT/IFTA Most Wanted List may be a new initiative of FIAT/IFTA. The aim is to create a central hub of Most Wanted Lists, provided by broadcast and audiovisual archives worldwide.
On these lists we would put those programmes, media fragments, excerpts or even complete series that archives are desperately looking for. Via a contact button, other archives could put themselves in contact with the archive that has published its list, in order to to signal a possible trouvaille.
All further explanations and a link to a survey to measure the interest are in this presentation.
As presented by Johan Oomen (Sound an Vision) and Vasilis Mezaris (Information Technologies Institute Thessaloniki) at the 2019 FIAT/IFTA World Conference in Dubrovnik, Croatia.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
2. Lead Architect Video & Broadcast, IBM GBS Europe
Member IBM Global Center of Competence Telco, Media & Entertainment
Member IBM Technical Expert Council Central (TEC CR)
Product Owner IBM AREMA
Jakob Rosinski is the Lead Architect for Video & Broadcast for IBM Global Business
Services Europe and also member of IBMs Global Center of Competence for Telecom,
Media & Entertainment. In this role he is also the product owner of IBM AREMA, a
workflow and essence management solution which is widely used at different
broadcasters for essence archives and workflow automation.
Over the last decade Jakob was responsible for various projects in the media industry
at HBO, France24, ORF, SRF, RTL Mediengruppe or Deutsche Bundesliga/Sportcast. He
is a subject matter expert for multi-site &multi-tier essence management and
workflow automation for ingest, archive, production & distribution.
Further he is well recognized in topics like cognitive content enrichment and broadcast
integration.
Dipl.-Inf. (M.Sc.) Jakob Rosinski
2
4. Introduction
„Rich metadata is the key to content discovery and monetization. It powers
advanced video search and recommandation engines...“
FKTG Magazin 03/2017, S.84
6. Scene Detection / Segmentation
Deep Video-Analysis
People-, Object and Context-Detection
Classification of actors based on 24
emotions
Classification of scenes based on 22.000
categories
Deep Audio-Analysis
Background
Actor sentiment and tone
Analysis of scene composition
Classification of light and color
Analysis of succesful
trailers
https://www.youtube.com/watch?v=gJEzuYynaiw
6
8. Automatic content enrichment of 40+ years of soccer
content
Annotation by usage of a portfolio of cognitive
solutions (IBM, FRH, Google, MS)
Audio: Speech-to-text / Transcript
Audio: Speaker-Detection
Audio: Atmosphere (cheers, whistles, ..)
Video: Angle/Camera & Context Detection
Video: Face- & Object Detection
Domain trained services including Traningsportal
Sharpening of results by knowledge of domain and
creation of timelines, identifiying of concepts
Link with Game- and Playerdata
Optimize content analysis and search based on game
and player statistics
Guided search.
Persona-based User Experience
Personalized Discovery, Suggestions, Design & Projects
Content enrichment for
Bundesliga archive
8
10. Magical Metadata
10
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
11. Magical Metadata
11
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
12. IBM Watson Visual Recognition
Visual Recognition understands the contents of images - visual concepts
tag the image, find human faces, approximate age and gender, and find
similar images in a collection. You can also train the service by creating
your own custom concepts. Use Visual Recognition to detect a dress
type in retail, identify spoiled fruit in inventory, and more.
Image Recognition
Text Recognition
Face- & Persondetection
Pattern search / Collection
Trainable
12
14. IBM Watson
Visual
Recognition –
A Multi-layered
trainable
architecture for
image analysis
• Need to learn effective semantic classifiers using a wide diversity of audio-visual features and models
• Need to design a rich space of semantic concepts that captures multiple facets of audio-visual content
FeaturesColor
Background
Frequencies SpectrumEdges
Camera
Motion
Energy Zero-crossings
Models
P P P P
P P P
P
PP
Positive
Examples
Negative
Examples
N N N N
N N N
N
NN
Labeled Data
Unlabeled Data
Addaboost
K-means
Regression
Bayes Net
Nearest
Neighbor
Neural Net
Deep Belief Nets
GMMClustering
Markov
ModelDecision TreeExpectation
Maximization
Factor Graph
Shot
Boundaries
Semantics
Multimedia Data
Scenes
Locations
Settings Objects
Activities
Actions
Objects
Actions
Behaviors
People
Objects
Living
CarsAnimals
People
Vehicles
Activities
Scenes
People
Places Faces
Objects
Events
Activities
GMMSVMs
ShapeTexture
Ensemble
Classifiers
Motion
Moving
Objects
Active
Learning
Regions
Scene
Dynamics
Tracks
14
15. Microsoft Cognitive Services
Image Recognition
This feature returns information about visual content found in an image.
Use tagging, descriptions and domain-specific models to identify
content and label it with confidence. Apply the adult/racy settings to
enable automated restriction of adult content. Identify image types and
color schemes in pictures.
Text Recognition
Optical Character Recognition (OCR) detects text in an image and
extracts the recognized words into a machine-readable character
stream. Analyze images to detect embedded text, generate character
streams and enable searching. Allow users to take photos of text
instead of copying to save time and effort.
Face- & Persondetection
The Celebrity Model is an example of Domain Specific Models. Our
new celebrity recognition model recognizes 200K celebrities from
business, politics, sports and entertainment around the World. Domain-
specific models is a continuously evolving feature within Computer
Vision API.
Emotiondetection
15
16. Google Vision
Google Cloud Vision API enables developers to understand
the content of an image by encapsulating powerful machine
learning models in an easy to use REST API. It quickly
classifies images into thousands of categories (e.g., "sailboat",
"lion", "Eiffel Tower"), detects individual objects and faces
within images, and finds and reads printed words contained
within images. You can build metadata on your image catalog,
moderate offensive content, or enable new marketing scenarios
through image sentiment analysis. Analyze images uploaded
in the request or integrate with your image storage on Google
Cloud Storage.
Imagerecognition
Textrecognition
Facedetection
Emotiondetection
Textanalyzes (nicht deutsch)
16
17. OpenCV
OpenCV is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and
Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and
with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing.
Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform.
Adopted all around the world, OpenCV has more than 47 thousand people of user community and estimated number of
downloads exceeding 14 million. Usage ranges from interactive art, to mines inspection, stitching maps on the web or through
advanced robotics.
Imagerecognition
Face- &Persondetection
Trainierbar
17
18. Clarifai Image and Video Recognition API
Predict / Classify
Predict analyzes your images and tells you what's inside of them.
The API will return a list of concepts with corresponding
probabilities of how likely it is these concepts are contained within
the image
Search
The Search API allows you to send images (url or bytes) to the
service and have them indexed by 'general' model concepts and
their visual representations.
Once indexed, you can search for images by concept or using
reverse image search.
Train
Clarifai provides many different models that 'see' the world
differently. A model contains a group of concepts. A model will
only see the concepts it contains.
18
19. Imagga Auto-Tagging
Imagga is an Image Recognition
Platform-as-a-Service providing
Image Tagging APIs for
developers & businesses to
build scalable, image intensive
cloud apps.
19
20. Magical Metadata
20
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
21. Fraunhofer IAIS Audiomining
Segmentation
Speaker- and Languagedetection
Emotiondetection
Trainable
Keywordextraction
Alternatives
IBM Watson Speech2Text (see later)
Microsoft Cognitive Services – Bing Speech
Google Speech
21
22. 22
{"segments": [
…
{
"segmentNumber": 1,
"startTime": 4480,
"duration": 3190,
"endTime": 7670,
"speaker": 1,
"gender": "female",
"transcript": "Hier ist das erste deutsche Fernsehen mit der Tagesschau."
},
...
{
"segmentNumber": 20,
"startTime": 238980,
"duration": 23620,
"endTime": 262600,
"speaker": 2,
"gender": "male",
"transcript": "Großbritannien raus aus der Europäischen Union für viele unvorstellbar
das weiß auch der britische Premierminister Cameron und er nutzt es um die EU Partner
unter Druck zu setzen entweder das Staatenbündnis ist zu Reformen bereit oder bei der
geplanten Volksabstimmung über die EU Mitgliedschaft droht ein Nein heute hatte EU
Ratspräsident Tosca ein Kompromisspapier vorgelegt dass die Briten besänftigen soll."
},
Fraunhofer IAIS Audiomining
25. Magical Metadata
25
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text/ Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
26. IBM Watson Natural Language Unterstanding (NLU)
Extraction of
• Sentiment
• Emotion
• Keywords
• Entities
• Categories
• Concepts
• Semantic Roles
26
27. Magical Metadata
27
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
28. Visual Atoms
FIND is a high-speed, high-accuracy, image visual search solution.
Our state-of-the-art visual search engine enables the matching of images
depicting the same objects or scenes based on visual similarities, without the
need for manual annotations or metadata.
If you are a provider of image editing or management solutions, the
FIND engine will equip your product with the necessary tools for the creation
of image databases which are searchable using images as queries. Your
end users will be able to create and maintain their own image databases and
efficiently organise, manage and search their image assets.
For providers of image hosting solutions, the FIND engine will allow the
creation of image databases which users can search using visual queries.
For developers of mobile apps, such as for e-commerce, tourism
or entertainment, the FIND engine will give your app cloud-based and/or
terminal based visual search functionality for retrieval of relevant images
and associated information.
With a streamlined API, the FIND engine is designed so that it can be
easily integrated in any third-party application or workflow.
Alternatives: IBM Watson VR Collections, Clarifai Search
28