Semantic Web TechnologiesAnnotationPresented By : AlbaraAbdalkhalig MansourSudan University-Web TechnologyE-mail : Brra51@hotmail.com Tel : 00249121200239
Definition :	Annotations are comments, notes, explanations, or other types of external remarks that can be attached to a Web document or a selected part of the document. As they are external, it is possible to annotate any Web document independently, without needing to edit that document. From the technical point of view, annotations are usually seen as metadata, as they give additional information about an existing piece of data. 2
What is annotation?People make notes to themselves in order to preserve ideas that arise during a variety of activities.The purpose of these notes is often to summarize, criticize, or emphasize specific phrases or events.Semantic annotations are to tag ontology class instance data and map it into ontology classes.3
Why use annotation?To have the world knowledge at one's finger tips seems possible.The Internet is the platform for information.Unfortunately most of the information is provided in an unstructured and non-standardized form.4
Annotation methodsManuallySemi-automaticallyAutomatically5
(1)  ManuallyManual annotation is the transformation of existing syntactic resources into interlinked knowledge structures that represent relevant underlying information.Manual annotation is an expensive process, and often does not consider  that multiple perspectives of a data source, requiring multiple ontologies, can be beneficial to support the needs of different users.6
(2) Semi-automatic AnnotationSemi-automatic annotation systems rely on human intervention at some point in the annotation process.The platforms vary in their architecture, information extraction tools and methods, initial ontology, amount of manual work required to perform annotation, performance and other features, such as storage management.7
(3) Automatic AnnotationThe fully automatic creation of semantic annotations is an unsolved problem.Automatic semantic annotation for the natural language sentences in these pages is a daunting task and we are often forced to do it manually or semi-automatically using handwritten rules8
Semantic Annotation ConcernsScale, VolumeExisting & new documents on the WebManual annotationExpensive – economic, timeSubject to personal motivationSchema ComplexityStoragesupport for multiple ontologieswithin or external to source document?Knowledge base refinementAccess - How are annotations accessed?API, custom UI, plug-ins9
TECHNICAL SOLUTION
TECHNICAL SOLUTION2.1	Annotation of textSemi-automatic text annotation
GATE
KIM2.2	Multimedia annotationLevels of multimedia annotation
Tools for multimedia annotation11
Annotation of text Many systems apply rules or wrappers that were manually created that try to recognize patterns for the annotations. Some systems learn how to annotate with the help of the user.Supervised systems learn how to annotate from a training set that was manually created beforehand. Semi-automatic approaches often apply information extraction technology, which analyzes natural language for pulling out information the user is interested in.12
A Walk-Through Example: GATEGATEis a tool for : scientists performing experiments that involve processing human language; companies developing applications with language processing components; teachers and students of courses about language and language computation. GATE comprises an architecture, framework (or SDK) and development environment, and has been in development since 1995 in the Sheffield NLP group. The system has been used for many language processing projects; in particular for Information Extraction in many languages. GATE is funded by the EPSRC and the EU.13
KIM platformKIM = Knowledge and Information Managementdeveloped by semantic technology lab “Ontotext“based on GATE14
KIM platformKIM performs IE based on an ontology and a massive knowledge base.15
KIM KBKIM KB consists of above 80,000 entities (50,000 locations, 8,400 organization instances, etc.)Each location has geographic coordinates and several aliases (usually including English, French, Spanish, and sometimes the local transcription of the location name) as well as co-positioning relations (e.g. subRegionOf.)The organizations have locatedInrelations to the corresponding Countryinstances. The additionally imported information about the companies consists of short description, URL, reference to an industry sector, reported sales, net income,and number of employees.16
KIM platform  The KIM platform provides a novel infrastructure and services for:automatic semantic annotation, indexing, retrieval of unstructured and semi-structured content.17
KIM platformThe most direct applications of KIM are: Generation of meta-data for the Semantic Web, which allows hyper-linking and advanced visualization and navigation.Knowledge Management, enhancing the efficiency of the existing indexing, retrieval, classification and filtering applications. 18
KIM platformThe automatic semantic annotation is seen as a named-entity recognition (NER) and annotation process.The traditional flat NE type sets consist of several general types (such as Organization, Person, Date, Location, Percent, Money). In KIM the NE type is specified by reference to an ontology.The semantic descriptions of entities and relations between them are kept in a knowledge base (KB) encoded in the KIM ontology and residing in the same semantic repository. Thus KIM provides for each entity reference in the text (i) a link (URI) to the most specific class in the ontology and (ii) a link to the specific instance in the KB. Each extracted NE is linked to its specific type information (thus Arabian Sea would be identified as Sea, instead of the traditional – Location).19
MULTIMEDIA ANNOTATION20
Multimedia AnnotationDifferent levels of annotationsMetadataOften technical metadataContent levelSemantic annotationsKeywords, domain ontologies, free-textMultimedia levellow-level annotationsVisual descriptors, such as dominant color21
Metadatarefers to information about technical detailscreation detailscreator, creationDate, …camera detailssettingsresolutionformatEXIFaccess rightsadministrated by the OSowner, access rights, …22
Content LevelDescribes what is depicted and directly perceivable by a humanusually provided manuallykeywords/tagsclassification of contentseldom generated automaticallyscene classificationobject detectiondifferent types of annotationsglobal vs. localdifferent semantic levels23
Global vs. Local AnnotationsGlobal annotations most widely usedflickr: tagging is only globalorganization within categoriesfree-text annotationsprovide information about the content as a wholeno detailed informationLocal annotations are less supportede.g. flickr, PhotoStuff allow to provide annotations of regionsespecially important for semantic image understandingallow to extract relationsprovide a more complete view of the sceneprovide information about different regionsand about the depicted relations and arrangements of objects24
Semantic LevelsFree-Text annotations cover large aspects, but less appropriate for sharing, organization and retrievalFree-Text Annotations probably most natural for the human, but provide least formal semanticsTagging provides light-weight semanticsOnly useful if a fixed vocabulary is usedAllows some simple inference of related concepts by tag analysis (clustering)No formal semantics, but provides benefits due to fixed vocabularyRequires more effort from the userOntologiesProvide syntax and semantic to define complex domain vocabulariesAllow for the inference of additional knowledgeLeverage interoperabilityPowerful way of semantic annotation, but hardly comprehensible by “normal users”25
ToolsWeb-based Toolsflickrriya26
flickrWeb2.0 applicationtagging photos globallyadd comments to image regions marked by  bounding boxlarge user community and tagging allows for easy sharing of imagespartly fixed vocabularies evolvede.g. Geo-Tagging27

Annotation seminar

  • 1.
    Semantic Web TechnologiesAnnotationPresentedBy : AlbaraAbdalkhalig MansourSudan University-Web TechnologyE-mail : Brra51@hotmail.com Tel : 00249121200239
  • 2.
    Definition : Annotations arecomments, notes, explanations, or other types of external remarks that can be attached to a Web document or a selected part of the document. As they are external, it is possible to annotate any Web document independently, without needing to edit that document. From the technical point of view, annotations are usually seen as metadata, as they give additional information about an existing piece of data. 2
  • 3.
    What is annotation?Peoplemake notes to themselves in order to preserve ideas that arise during a variety of activities.The purpose of these notes is often to summarize, criticize, or emphasize specific phrases or events.Semantic annotations are to tag ontology class instance data and map it into ontology classes.3
  • 4.
    Why use annotation?Tohave the world knowledge at one's finger tips seems possible.The Internet is the platform for information.Unfortunately most of the information is provided in an unstructured and non-standardized form.4
  • 5.
  • 6.
    (1) ManuallyManualannotation is the transformation of existing syntactic resources into interlinked knowledge structures that represent relevant underlying information.Manual annotation is an expensive process, and often does not consider that multiple perspectives of a data source, requiring multiple ontologies, can be beneficial to support the needs of different users.6
  • 7.
    (2) Semi-automatic AnnotationSemi-automaticannotation systems rely on human intervention at some point in the annotation process.The platforms vary in their architecture, information extraction tools and methods, initial ontology, amount of manual work required to perform annotation, performance and other features, such as storage management.7
  • 8.
    (3) Automatic AnnotationThefully automatic creation of semantic annotations is an unsolved problem.Automatic semantic annotation for the natural language sentences in these pages is a daunting task and we are often forced to do it manually or semi-automatically using handwritten rules8
  • 9.
    Semantic Annotation ConcernsScale,VolumeExisting & new documents on the WebManual annotationExpensive – economic, timeSubject to personal motivationSchema ComplexityStoragesupport for multiple ontologieswithin or external to source document?Knowledge base refinementAccess - How are annotations accessed?API, custom UI, plug-ins9
  • 10.
  • 11.
    TECHNICAL SOLUTION2.1 Annotation oftextSemi-automatic text annotation
  • 12.
  • 13.
  • 14.
  • 15.
    Annotation of textMany systems apply rules or wrappers that were manually created that try to recognize patterns for the annotations. Some systems learn how to annotate with the help of the user.Supervised systems learn how to annotate from a training set that was manually created beforehand. Semi-automatic approaches often apply information extraction technology, which analyzes natural language for pulling out information the user is interested in.12
  • 16.
    A Walk-Through Example:GATEGATEis a tool for : scientists performing experiments that involve processing human language; companies developing applications with language processing components; teachers and students of courses about language and language computation. GATE comprises an architecture, framework (or SDK) and development environment, and has been in development since 1995 in the Sheffield NLP group. The system has been used for many language processing projects; in particular for Information Extraction in many languages. GATE is funded by the EPSRC and the EU.13
  • 17.
    KIM platformKIM =Knowledge and Information Managementdeveloped by semantic technology lab “Ontotext“based on GATE14
  • 18.
    KIM platformKIM performsIE based on an ontology and a massive knowledge base.15
  • 19.
    KIM KBKIM KBconsists of above 80,000 entities (50,000 locations, 8,400 organization instances, etc.)Each location has geographic coordinates and several aliases (usually including English, French, Spanish, and sometimes the local transcription of the location name) as well as co-positioning relations (e.g. subRegionOf.)The organizations have locatedInrelations to the corresponding Countryinstances. The additionally imported information about the companies consists of short description, URL, reference to an industry sector, reported sales, net income,and number of employees.16
  • 20.
    KIM platform The KIM platform provides a novel infrastructure and services for:automatic semantic annotation, indexing, retrieval of unstructured and semi-structured content.17
  • 21.
    KIM platformThe mostdirect applications of KIM are: Generation of meta-data for the Semantic Web, which allows hyper-linking and advanced visualization and navigation.Knowledge Management, enhancing the efficiency of the existing indexing, retrieval, classification and filtering applications. 18
  • 22.
    KIM platformThe automaticsemantic annotation is seen as a named-entity recognition (NER) and annotation process.The traditional flat NE type sets consist of several general types (such as Organization, Person, Date, Location, Percent, Money). In KIM the NE type is specified by reference to an ontology.The semantic descriptions of entities and relations between them are kept in a knowledge base (KB) encoded in the KIM ontology and residing in the same semantic repository. Thus KIM provides for each entity reference in the text (i) a link (URI) to the most specific class in the ontology and (ii) a link to the specific instance in the KB. Each extracted NE is linked to its specific type information (thus Arabian Sea would be identified as Sea, instead of the traditional – Location).19
  • 23.
  • 24.
    Multimedia AnnotationDifferent levelsof annotationsMetadataOften technical metadataContent levelSemantic annotationsKeywords, domain ontologies, free-textMultimedia levellow-level annotationsVisual descriptors, such as dominant color21
  • 25.
    Metadatarefers to informationabout technical detailscreation detailscreator, creationDate, …camera detailssettingsresolutionformatEXIFaccess rightsadministrated by the OSowner, access rights, …22
  • 26.
    Content LevelDescribes whatis depicted and directly perceivable by a humanusually provided manuallykeywords/tagsclassification of contentseldom generated automaticallyscene classificationobject detectiondifferent types of annotationsglobal vs. localdifferent semantic levels23
  • 27.
    Global vs. LocalAnnotationsGlobal annotations most widely usedflickr: tagging is only globalorganization within categoriesfree-text annotationsprovide information about the content as a wholeno detailed informationLocal annotations are less supportede.g. flickr, PhotoStuff allow to provide annotations of regionsespecially important for semantic image understandingallow to extract relationsprovide a more complete view of the sceneprovide information about different regionsand about the depicted relations and arrangements of objects24
  • 28.
    Semantic LevelsFree-Text annotationscover large aspects, but less appropriate for sharing, organization and retrievalFree-Text Annotations probably most natural for the human, but provide least formal semanticsTagging provides light-weight semanticsOnly useful if a fixed vocabulary is usedAllows some simple inference of related concepts by tag analysis (clustering)No formal semantics, but provides benefits due to fixed vocabularyRequires more effort from the userOntologiesProvide syntax and semantic to define complex domain vocabulariesAllow for the inference of additional knowledgeLeverage interoperabilityPowerful way of semantic annotation, but hardly comprehensible by “normal users”25
  • 29.
  • 30.
    flickrWeb2.0 applicationtagging photosgloballyadd comments to image regions marked by bounding boxlarge user community and tagging allows for easy sharing of imagespartly fixed vocabularies evolvede.g. Geo-Tagging27
  • 31.
    riyaSimilar to flickrin functionalityAdds automatic annotation featuresFace RecognitionMark faces in photosassociate nametrain systemautomatic recognition of the person in the future28
  • 32.
  • 33.
    ReferencesFurther Reading:B. Popov,A. Kiryakov, A.Kirilov, D. Manov, D.Ognyanoff, M. Goranov: „KIM – Semantic Annotation Platform“, 2003.GATE: http://gate.ac.uk/overview.htmlM-OntoMat-Annotizer: http://www.acemedia.org/aceMedia/results/software/m-ontomat-annotizer.htmlKIM platform: http://www.ontotext.com/kim/ALIPR: http://www.alipr.comWikipedia links:http://en.wikipedia.org/wiki/Automatic_image_annotation http://en.wikipedia.org/wiki/Games_with_a_purposehttp://en.wikipedia.org/wiki/General_Architecture_for_Text_Engineering30

Editor's Notes

  • #15 KIMprovides a Knowledge and Information Management (KIM) infrastructure and services for automatic semantic annotation, indexing, and retrieval of unstructured and semi-structured content. Within the process of annotation, KIM also performs ontology population. As a base line, KIM analyzes texts and recognizes references to entities (like persons, organizations, locations, dates). Then it tries to match the reference with a known entity, having a unique URI and description in the knowledge base. Alternatively, a new URI and entity description are automatically generated. Finally, the reference in the document gets annotated with the URI of the entity. This process, as well, as the result of it, are the KIM’s offer for semantic annotation. This sort of meta-data is later on used for semantic indexing, retrieval, visualization, and automatic hyper-linking of documents. KIM is a platform which offers a server, web user interface, and Internet Explorer plug-in. KIM is equipped with an upper-level ontology (KIMO) of about 250 classes and 100 properties. Further, a knowledge base (KIM KB), pre-populated with up to 200 000 entity descriptions, is bundled with KIM. In terms of underlying technology, KIM is using GATE, Sesame, and Lucene.