Image Annotation
Presented by: Yomna Mahmoud Ibrahim Hassan
Submitted to : Dr. Abeer El Korany
Summary from previous presentation
 Existence of Heterogeneous data
 Types of annotation
 Annotation/ Labeling of various data types
 Unifying data formats
 Usage of domain-based Ontologies to enhance
annotation
Image Annotation
What is Image Annotation?
Why Image Annotation? (Motivation)
 Summarization
 Applications such as video search and retrieval
 Minimizing features to search with (textual) data
 Minimizing storage
 Video reconstruction
User Involvement
Generalized steps for Image Annotation
 Image Capturing and Pre-processing
 Feature Extraction
 Scene semantic concept from Objects/ Feature
Matching
Image Capturing and Pre-processing
 Image capturing distribution and management
 Noise Filter : Blurriness, transport noise, etc..
 Color saturation
Image Capturing and Pre-processing: Issues
and Solutions
 Data Inaccuracy/ lack of information : Fusion
 Time consumption : Focus on higher level concept,
Ontology-directed annotation
Feature Extraction
 Types of Features Needed: Low /High level features
 Low-level information such as SIFT descriptors,
colors, textures, edges, resolution and image
size.
 High-level concepts (objects, events) must be
linked to the presence /absence of other
concepts and statistical models for combining
these concept models into a high-level model
must be chosen.
Example on Feature extraction for Traffic
Systems
Scene semantic concept from Objects
Semantic concept Parade is defined as:
-collection of people
-music
-context in which this clip is interpreted as parade
 Model a semantic-concept as a class conditional
probability density function over a feature space.
 In the given set of semantic-concepts and a feature
observation, choose the label as that class conditional
density which yields the maximum likelihood of the
observed feature.
 As true class conditional densities are not available,
assumptions are made and choices made generally are:
 GMMs for independent observation vectors
 HMMs for time series data.
Annotation of Multiple Images
 Data can be:
(1) Complementary: when the information provided by the
input sources represents different parts of the scene and could
thus be used to obtain more complete global information. For
example, in the case of visual sensor networks, the information
on the same target provided by two cameras with different
fields of view is considered complementary.
(2) Redundant: when two or more input sources provide
information about the same target and could thus be fused to
increment the confidence. For example, the data coming from
overlapped areas in visual sensor networks are considered
redundant.
(3) Cooperative: when the provided information is combined
into new information that is typically more complex than the
original information. For example, multi-modal (audio and
video) data fusion is considered cooperative.
Data Fusion of Multiple Image data
 Fusion at Image level (In pre-processing phase)
 Fusion at Feature level (After feature extraction)
 Fusion at Textual level (Post-annotation phase)
Current Research
WNtags: A Web-Based Tool For Image Labeling And
Retrieval With Lexical Ontologies [1]
 Existing tools for labeling are domain specific
 They suggest usage of generalized ontologies as a
resource (Example: WordNet)
 Node distance metrics , using synonyms with
weights (similarity, semantic distance)
Architecture and protocol of a semantic system
designed for video tagging with sensor data in mobile
devices [2]
19
 In this paper, we present a system that tags the
video frames of the video recorded from mobile
phones with the data collected by the embedded
sensors.
Real-time annotation of images in a human
assistive environment [3]
Semi-automatic image annotation [4]
 When one or more new (un-annotated) images are added into the
database, an unconfirmed automatic annotation process can take
place.
 The system automatically uses each new image as a query and
performs a content-based image retrieval process.
 For the top N (which can be determined by the user) similar images
to a query, the keywords in the annotations would be analyzed.
 A list of keywords sorted by their frequency in these N images is
stored in an unconfirmed keyword list for the input (query) image.
The new image is thus annotated (though virtually and without
confirmation) using the unconfirmed keywords.
 An interface option could be provided to let the user manually
confirm these keywords. The user may only need to confirm one or
two keywords if he or she is reluctant to confirm all relevant
keywords. The unconfirmed annotation will be refined (e.g., changing
unconfirmed keywords to confirmed) through daily use of the image
database in the future.
Semantic Annotation of Complex Human
Scenes for Multimedia Surveillance [5]
 Focus on the automatic, real-time extraction of
semantic content from video recordings obtained in
outdoor surveillance environments.
 Aim to evaluate complex behaviors involving both
humans and vehicles, detecting the significant
events developed, and providing semantic-oriented
outputs which are more easily handleable for
indexing, search, and retrieval purposes, such as
annotated video streams, small sets of content-
expressing images, or summaries of occurrences.
 Time based evaluation
Automatic Semantic Content Extraction in Videos
Using a Fuzzy Ontology and Rule-Based Model [6]
- Raw video data and low-level features alone are not sufficient
to fulfill the user’s need; that is, a deeper understanding of the
information at the semantic level is required in many video-
based applications.
- A semantic content extraction system that allows the user to
query and retrieve objects, events, and concepts that are
extracted automatically.
- An ontology-based fuzzy video semantic content model that
uses spatial/temporal relations in event and concept
definitions. (after being manually extracted with user
interaction)
 Module
 Input Video
 Feature Extraction
 Domain Ontology
 Relation Extraction
 Spatial Relation
Summary
 Image / Video annotation benefits in our current
environment
 Annotation procedure
 Issues in Image annotation applications:
 Human Involvement
 Required computational effort
Potential Future Work (challenges in this
area)
 Automatic identification of possible semantic
relations to the object identified, identifying
neighborhood objects (correlation with environment)
 Power based annotation (resource availability
based) : optimization, status, manipulate resources
References
1. Horvat, Marko, Anton Grbin, and Gordan Gledec. "WNtags: a web-
based tool for image labeling and retrieval with lexical
ontologies." arXiv preprint arXiv:1302.2223 (2013).
2. Macias, Elsa, et al. "Architecture and protocol of a semantic
system designed for video tagging with sensor data in mobile
devices." Sensors 12.2 (2012): 2062-2087.
3. Connell, Jonathan, et al. "Real-time annotation of images in a
human assistive environment." U.S. Patent No. 8,265,342. 11 Sep.
2012.
4. Moehrmann, Julia, and Gunther Heidemann. "Semi-automatic
Image Annotation." Computer Analysis of Images and Patterns.
Springer Berlin Heidelberg, 2013.
5. Fernández, Carles, et al. "Semantic annotation of complex human
scenes for multimedia surveillance." AI* IA 2007: Artificial
Intelligence and Human-Oriented Computing. Springer Berlin
Heidelberg, 2007. 698-709.
6. Yildirim, Yakup, Adnan Yazici, and Turgay Yilmaz. "Automatic
semantic content extraction in videos using a fuzzy ontology and
rule-based model."Knowledge and Data Engineering, IEEE
Transactions on 25.1 (2013): 47-61.

Image Annotation

  • 1.
    Image Annotation Presented by:Yomna Mahmoud Ibrahim Hassan Submitted to : Dr. Abeer El Korany
  • 2.
    Summary from previouspresentation  Existence of Heterogeneous data  Types of annotation  Annotation/ Labeling of various data types  Unifying data formats  Usage of domain-based Ontologies to enhance annotation
  • 3.
  • 4.
    What is ImageAnnotation?
  • 5.
    Why Image Annotation?(Motivation)  Summarization  Applications such as video search and retrieval  Minimizing features to search with (textual) data  Minimizing storage  Video reconstruction
  • 6.
  • 7.
    Generalized steps forImage Annotation  Image Capturing and Pre-processing  Feature Extraction  Scene semantic concept from Objects/ Feature Matching
  • 8.
    Image Capturing andPre-processing  Image capturing distribution and management  Noise Filter : Blurriness, transport noise, etc..  Color saturation
  • 9.
    Image Capturing andPre-processing: Issues and Solutions  Data Inaccuracy/ lack of information : Fusion  Time consumption : Focus on higher level concept, Ontology-directed annotation
  • 10.
    Feature Extraction  Typesof Features Needed: Low /High level features  Low-level information such as SIFT descriptors, colors, textures, edges, resolution and image size.  High-level concepts (objects, events) must be linked to the presence /absence of other concepts and statistical models for combining these concept models into a high-level model must be chosen.
  • 11.
    Example on Featureextraction for Traffic Systems
  • 12.
    Scene semantic conceptfrom Objects Semantic concept Parade is defined as: -collection of people -music -context in which this clip is interpreted as parade
  • 13.
     Model asemantic-concept as a class conditional probability density function over a feature space.  In the given set of semantic-concepts and a feature observation, choose the label as that class conditional density which yields the maximum likelihood of the observed feature.  As true class conditional densities are not available, assumptions are made and choices made generally are:  GMMs for independent observation vectors  HMMs for time series data.
  • 14.
    Annotation of MultipleImages  Data can be: (1) Complementary: when the information provided by the input sources represents different parts of the scene and could thus be used to obtain more complete global information. For example, in the case of visual sensor networks, the information on the same target provided by two cameras with different fields of view is considered complementary. (2) Redundant: when two or more input sources provide information about the same target and could thus be fused to increment the confidence. For example, the data coming from overlapped areas in visual sensor networks are considered redundant. (3) Cooperative: when the provided information is combined into new information that is typically more complex than the original information. For example, multi-modal (audio and video) data fusion is considered cooperative.
  • 16.
    Data Fusion ofMultiple Image data  Fusion at Image level (In pre-processing phase)  Fusion at Feature level (After feature extraction)  Fusion at Textual level (Post-annotation phase)
  • 17.
  • 18.
    WNtags: A Web-BasedTool For Image Labeling And Retrieval With Lexical Ontologies [1]  Existing tools for labeling are domain specific  They suggest usage of generalized ontologies as a resource (Example: WordNet)  Node distance metrics , using synonyms with weights (similarity, semantic distance)
  • 19.
    Architecture and protocolof a semantic system designed for video tagging with sensor data in mobile devices [2] 19  In this paper, we present a system that tags the video frames of the video recorded from mobile phones with the data collected by the embedded sensors.
  • 21.
    Real-time annotation ofimages in a human assistive environment [3]
  • 22.
    Semi-automatic image annotation[4]  When one or more new (un-annotated) images are added into the database, an unconfirmed automatic annotation process can take place.  The system automatically uses each new image as a query and performs a content-based image retrieval process.  For the top N (which can be determined by the user) similar images to a query, the keywords in the annotations would be analyzed.  A list of keywords sorted by their frequency in these N images is stored in an unconfirmed keyword list for the input (query) image. The new image is thus annotated (though virtually and without confirmation) using the unconfirmed keywords.  An interface option could be provided to let the user manually confirm these keywords. The user may only need to confirm one or two keywords if he or she is reluctant to confirm all relevant keywords. The unconfirmed annotation will be refined (e.g., changing unconfirmed keywords to confirmed) through daily use of the image database in the future.
  • 23.
    Semantic Annotation ofComplex Human Scenes for Multimedia Surveillance [5]  Focus on the automatic, real-time extraction of semantic content from video recordings obtained in outdoor surveillance environments.  Aim to evaluate complex behaviors involving both humans and vehicles, detecting the significant events developed, and providing semantic-oriented outputs which are more easily handleable for indexing, search, and retrieval purposes, such as annotated video streams, small sets of content- expressing images, or summaries of occurrences.  Time based evaluation
  • 24.
    Automatic Semantic ContentExtraction in Videos Using a Fuzzy Ontology and Rule-Based Model [6] - Raw video data and low-level features alone are not sufficient to fulfill the user’s need; that is, a deeper understanding of the information at the semantic level is required in many video- based applications. - A semantic content extraction system that allows the user to query and retrieve objects, events, and concepts that are extracted automatically. - An ontology-based fuzzy video semantic content model that uses spatial/temporal relations in event and concept definitions. (after being manually extracted with user interaction)  Module  Input Video  Feature Extraction  Domain Ontology  Relation Extraction  Spatial Relation
  • 25.
    Summary  Image /Video annotation benefits in our current environment  Annotation procedure  Issues in Image annotation applications:  Human Involvement  Required computational effort
  • 26.
    Potential Future Work(challenges in this area)  Automatic identification of possible semantic relations to the object identified, identifying neighborhood objects (correlation with environment)  Power based annotation (resource availability based) : optimization, status, manipulate resources
  • 27.
    References 1. Horvat, Marko,Anton Grbin, and Gordan Gledec. "WNtags: a web- based tool for image labeling and retrieval with lexical ontologies." arXiv preprint arXiv:1302.2223 (2013). 2. Macias, Elsa, et al. "Architecture and protocol of a semantic system designed for video tagging with sensor data in mobile devices." Sensors 12.2 (2012): 2062-2087. 3. Connell, Jonathan, et al. "Real-time annotation of images in a human assistive environment." U.S. Patent No. 8,265,342. 11 Sep. 2012. 4. Moehrmann, Julia, and Gunther Heidemann. "Semi-automatic Image Annotation." Computer Analysis of Images and Patterns. Springer Berlin Heidelberg, 2013. 5. Fernández, Carles, et al. "Semantic annotation of complex human scenes for multimedia surveillance." AI* IA 2007: Artificial Intelligence and Human-Oriented Computing. Springer Berlin Heidelberg, 2007. 698-709. 6. Yildirim, Yakup, Adnan Yazici, and Turgay Yilmaz. "Automatic semantic content extraction in videos using a fuzzy ontology and rule-based model."Knowledge and Data Engineering, IEEE Transactions on 25.1 (2013): 47-61.