Annotation seminar

Semantic Web TechnologiesAnnotationPresented By : AlbaraAbdalkhalig MansourSudan University-Web TechnologyE-mail : Brra51@hotmail.com Tel : 00249121200239

Definition : Annotations are comments, notes, explanations, or other types of external remarks that can be attached to a Web document or a selected part of the document. As they are external, it is possible to annotate any Web document independently, without needing to edit that document. From the technical point of view, annotations are usually seen as metadata, as they give additional information about an existing piece of data. 2

What is annotation?People make notes to themselves in order to preserve ideas that arise during a variety of activities.The purpose of these notes is often to summarize, criticize, or emphasize specific phrases or events.Semantic annotations are to tag ontology class instance data and map it into ontology classes.3

Why use annotation?To have the world knowledge at one's finger tips seems possible.The Internet is the platform for information.Unfortunately most of the information is provided in an unstructured and non-standardized form.4

Annotation methodsManuallySemi-automaticallyAutomatically5

(1) ManuallyManual annotation is the transformation of existing syntactic resources into interlinked knowledge structures that represent relevant underlying information.Manual annotation is an expensive process, and often does not consider that multiple perspectives of a data source, requiring multiple ontologies, can be beneficial to support the needs of different users.6

(2) Semi-automatic AnnotationSemi-automatic annotation systems rely on human intervention at some point in the annotation process.The platforms vary in their architecture, information extraction tools and methods, initial ontology, amount of manual work required to perform annotation, performance and other features, such as storage management.7

(3) Automatic AnnotationThe fully automatic creation of semantic annotations is an unsolved problem.Automatic semantic annotation for the natural language sentences in these pages is a daunting task and we are often forced to do it manually or semi-automatically using handwritten rules8

Semantic Annotation ConcernsScale, VolumeExisting & new documents on the WebManual annotationExpensive – economic, timeSubject to personal motivationSchema ComplexityStoragesupport for multiple ontologieswithin or external to source document?Knowledge base refinementAccess - How are annotations accessed?API, custom UI, plug-ins9

TECHNICAL SOLUTION2.1 Annotation of textSemi-automatic text annotation

KIM2.2 Multimedia annotationLevels of multimedia annotation

Tools for multimedia annotation11

Annotation of text Many systems apply rules or wrappers that were manually created that try to recognize patterns for the annotations. Some systems learn how to annotate with the help of the user.Supervised systems learn how to annotate from a training set that was manually created beforehand. Semi-automatic approaches often apply information extraction technology, which analyzes natural language for pulling out information the user is interested in.12

A Walk-Through Example: GATEGATEis a tool for : scientists performing experiments that involve processing human language; companies developing applications with language processing components; teachers and students of courses about language and language computation. GATE comprises an architecture, framework (or SDK) and development environment, and has been in development since 1995 in the Sheffield NLP group. The system has been used for many language processing projects; in particular for Information Extraction in many languages. GATE is funded by the EPSRC and the EU.13

KIM platformKIM = Knowledge and Information Managementdeveloped by semantic technology lab “Ontotext“based on GATE14

KIM platformKIM performs IE based on an ontology and a massive knowledge base.15

KIM KBKIM KB consists of above 80,000 entities (50,000 locations, 8,400 organization instances, etc.)Each location has geographic coordinates and several aliases (usually including English, French, Spanish, and sometimes the local transcription of the location name) as well as co-positioning relations (e.g. subRegionOf.)The organizations have locatedInrelations to the corresponding Countryinstances. The additionally imported information about the companies consists of short description, URL, reference to an industry sector, reported sales, net income,and number of employees.16

KIM platform The KIM platform provides a novel infrastructure and services for:automatic semantic annotation, indexing, retrieval of unstructured and semi-structured content.17

KIM platformThe most direct applications of KIM are: Generation of meta-data for the Semantic Web, which allows hyper-linking and advanced visualization and navigation.Knowledge Management, enhancing the efficiency of the existing indexing, retrieval, classification and filtering applications. 18

KIM platformThe automatic semantic annotation is seen as a named-entity recognition (NER) and annotation process.The traditional flat NE type sets consist of several general types (such as Organization, Person, Date, Location, Percent, Money). In KIM the NE type is specified by reference to an ontology.The semantic descriptions of entities and relations between them are kept in a knowledge base (KB) encoded in the KIM ontology and residing in the same semantic repository. Thus KIM provides for each entity reference in the text (i) a link (URI) to the most specific class in the ontology and (ii) a link to the specific instance in the KB. Each extracted NE is linked to its specific type information (thus Arabian Sea would be identified as Sea, instead of the traditional – Location).19

Multimedia AnnotationDifferent levels of annotationsMetadataOften technical metadataContent levelSemantic annotationsKeywords, domain ontologies, free-textMultimedia levellow-level annotationsVisual descriptors, such as dominant color21

Metadatarefers to information about technical detailscreation detailscreator, creationDate, …camera detailssettingsresolutionformatEXIFaccess rightsadministrated by the OSowner, access rights, …22

Content LevelDescribes what is depicted and directly perceivable by a humanusually provided manuallykeywords/tagsclassification of contentseldom generated automaticallyscene classificationobject detectiondifferent types of annotationsglobal vs. localdifferent semantic levels23

Global vs. Local AnnotationsGlobal annotations most widely usedflickr: tagging is only globalorganization within categoriesfree-text annotationsprovide information about the content as a wholeno detailed informationLocal annotations are less supportede.g. flickr, PhotoStuff allow to provide annotations of regionsespecially important for semantic image understandingallow to extract relationsprovide a more complete view of the sceneprovide information about different regionsand about the depicted relations and arrangements of objects24

Semantic LevelsFree-Text annotations cover large aspects, but less appropriate for sharing, organization and retrievalFree-Text Annotations probably most natural for the human, but provide least formal semanticsTagging provides light-weight semanticsOnly useful if a fixed vocabulary is usedAllows some simple inference of related concepts by tag analysis (clustering)No formal semantics, but provides benefits due to fixed vocabularyRequires more effort from the userOntologiesProvide syntax and semantic to define complex domain vocabulariesAllow for the inference of additional knowledgeLeverage interoperabilityPowerful way of semantic annotation, but hardly comprehensible by “normal users”25

ToolsWeb-based Toolsflickrriya26

flickrWeb2.0 applicationtagging photos globallyadd comments to image regions marked by bounding boxlarge user community and tagging allows for easy sharing of imagespartly fixed vocabularies evolvede.g. Geo-Tagging27

Annotation seminar

More Related Content

Viewers also liked

Similar to Annotation seminar

Recently uploaded

Annotation seminar

Editor's Notes