Image Annotation

Image Annotation
Presented by: Yomna Mahmoud Ibrahim Hassan
Submitted to : Dr. Abeer El Korany

Summary from previous presentation
 Existence of Heterogeneous data
 Types of annotation
 Annotation/ Labeling of various data types
 Unifying data formats
 Usage of domain-based Ontologies to enhance
annotation

Why Image Annotation? (Motivation)
 Summarization
 Applications such as video search and retrieval
 Minimizing features to search with (textual) data
 Minimizing storage
 Video reconstruction

Generalized steps for Image Annotation
 Image Capturing and Pre-processing
 Feature Extraction
 Scene semantic concept from Objects/ Feature
Matching

Image Capturing and Pre-processing
 Image capturing distribution and management
 Noise Filter : Blurriness, transport noise, etc..
 Color saturation

Image Capturing and Pre-processing: Issues
and Solutions
 Data Inaccuracy/ lack of information : Fusion
 Time consumption : Focus on higher level concept,
Ontology-directed annotation

Feature Extraction
 Types of Features Needed: Low /High level features
 Low-level information such as SIFT descriptors,
colors, textures, edges, resolution and image
size.
 High-level concepts (objects, events) must be
linked to the presence /absence of other
concepts and statistical models for combining
these concept models into a high-level model
must be chosen.

Example on Feature extraction for Traffic
Systems

Scene semantic concept from Objects
Semantic concept Parade is defined as:
-collection of people
-music
-context in which this clip is interpreted as parade

 Model a semantic-concept as a class conditional
probability density function over a feature space.
 In the given set of semantic-concepts and a feature
observation, choose the label as that class conditional
density which yields the maximum likelihood of the
observed feature.
 As true class conditional densities are not available,
assumptions are made and choices made generally are:
 GMMs for independent observation vectors
 HMMs for time series data.

Annotation of Multiple Images
 Data can be:
(1) Complementary: when the information provided by the
input sources represents different parts of the scene and could
thus be used to obtain more complete global information. For
example, in the case of visual sensor networks, the information
on the same target provided by two cameras with different
fields of view is considered complementary.
(2) Redundant: when two or more input sources provide
information about the same target and could thus be fused to
increment the confidence. For example, the data coming from
overlapped areas in visual sensor networks are considered
redundant.
(3) Cooperative: when the provided information is combined
into new information that is typically more complex than the
original information. For example, multi-modal (audio and
video) data fusion is considered cooperative.

Data Fusion of Multiple Image data
 Fusion at Image level (In pre-processing phase)
 Fusion at Feature level (After feature extraction)
 Fusion at Textual level (Post-annotation phase)

WNtags: A Web-Based Tool For Image Labeling And
Retrieval With Lexical Ontologies [1]
 Existing tools for labeling are domain specific
 They suggest usage of generalized ontologies as a
resource (Example: WordNet)
 Node distance metrics , using synonyms with
weights (similarity, semantic distance)

Architecture and protocol of a semantic system
designed for video tagging with sensor data in mobile
devices [2]
19
 In this paper, we present a system that tags the
video frames of the video recorded from mobile
phones with the data collected by the embedded
sensors.

Real-time annotation of images in a human
assistive environment [3]

Semi-automatic image annotation [4]
 When one or more new (un-annotated) images are added into the
database, an unconfirmed automatic annotation process can take
place.
 The system automatically uses each new image as a query and
performs a content-based image retrieval process.
 For the top N (which can be determined by the user) similar images
to a query, the keywords in the annotations would be analyzed.
 A list of keywords sorted by their frequency in these N images is
stored in an unconfirmed keyword list for the input (query) image.
The new image is thus annotated (though virtually and without
confirmation) using the unconfirmed keywords.
 An interface option could be provided to let the user manually
confirm these keywords. The user may only need to confirm one or
two keywords if he or she is reluctant to confirm all relevant
keywords. The unconfirmed annotation will be refined (e.g., changing
unconfirmed keywords to confirmed) through daily use of the image
database in the future.

Semantic Annotation of Complex Human
Scenes for Multimedia Surveillance [5]
 Focus on the automatic, real-time extraction of
semantic content from video recordings obtained in
outdoor surveillance environments.
 Aim to evaluate complex behaviors involving both
humans and vehicles, detecting the signiﬁcant
events developed, and providing semantic-oriented
outputs which are more easily handleable for
indexing, search, and retrieval purposes, such as
annotated video streams, small sets of content-
expressing images, or summaries of occurrences.
 Time based evaluation

Automatic Semantic Content Extraction in Videos
Using a Fuzzy Ontology and Rule-Based Model [6]
- Raw video data and low-level features alone are not sufficient
to fulfill the user’s need; that is, a deeper understanding of the
information at the semantic level is required in many video-
based applications.
- A semantic content extraction system that allows the user to
query and retrieve objects, events, and concepts that are
extracted automatically.
- An ontology-based fuzzy video semantic content model that
uses spatial/temporal relations in event and concept
definitions. (after being manually extracted with user
interaction)
 Module
 Input Video
 Feature Extraction
 Domain Ontology
 Relation Extraction
 Spatial Relation

Summary
 Image / Video annotation benefits in our current
environment
 Annotation procedure
 Issues in Image annotation applications:
 Human Involvement
 Required computational effort

Potential Future Work (challenges in this
area)
 Automatic identification of possible semantic
relations to the object identified, identifying
neighborhood objects (correlation with environment)
 Power based annotation (resource availability
based) : optimization, status, manipulate resources

References
1. Horvat, Marko, Anton Grbin, and Gordan Gledec. "WNtags: a web-
based tool for image labeling and retrieval with lexical
ontologies." arXiv preprint arXiv:1302.2223 (2013).
2. Macias, Elsa, et al. "Architecture and protocol of a semantic
system designed for video tagging with sensor data in mobile
devices." Sensors 12.2 (2012): 2062-2087.
3. Connell, Jonathan, et al. "Real-time annotation of images in a
human assistive environment." U.S. Patent No. 8,265,342. 11 Sep.
2012.
4. Moehrmann, Julia, and Gunther Heidemann. "Semi-automatic
Image Annotation." Computer Analysis of Images and Patterns.
Springer Berlin Heidelberg, 2013.
5. Fernández, Carles, et al. "Semantic annotation of complex human
scenes for multimedia surveillance." AI* IA 2007: Artificial
Intelligence and Human-Oriented Computing. Springer Berlin
Heidelberg, 2007. 698-709.
6. Yildirim, Yakup, Adnan Yazici, and Turgay Yilmaz. "Automatic
semantic content extraction in videos using a fuzzy ontology and
rule-based model."Knowledge and Data Engineering, IEEE
Transactions on 25.1 (2013): 47-61.

Image Annotation

More Related Content

What's hot

Similar to Image Annotation

More from Yomna Mahmoud Ibrahim Hassan

Recently uploaded

Image Annotation