Knowledge Extraction Semantic Web

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Knowledge Extraction Semantic Web - Presentation Transcript

    1. Language Technology I 2005/06 Paul Buitelaar German Research Center for Artificial Intelligence (DFKI) Knowledge Extraction/Semantic Web
    2. Overview
      • Semantic Web
        • Introduction
        • Semantic Web Representation and Query Languages
        • Semantic Web Tools
      • Ontologies and Knowledge Markup
        • Ontologies and other Knowledge Organization Systems
        • Knowledge Markup for Ontology Population
        • Ontology Life-Cycle
      • Knowledge Extraction
        • Ontology Population
        • Ontology Learning
      • Semantic Web
    3. Web Docs, Data Web
    4. Web Docs, Data Knowledge Markup Web > Semantic Web
    5. Web Docs, Data Knowledge Markup Ontologies Web > Semantic Web
    6. Knowledge Markup Ontologies Web > Semantic Web
    7. Knowledge Markup Ontologies Semantic Web Services Accessing the Semantic Web - Machines
    8. Intelligent Man-Machine Interface Knowledge Markup Ontologies Semantic Web Services Accessing the Semantic Web - Humans
    9. Semantic Web Layer cake
      • Introduced by Tim Berners-Lee in 2001
      • Built upon existing WWW standards
    10. Resource Description Framework (RDF)
      • RDF is an extensible language for expressing graph-structures
      • Serializes to XML
      node1 DFKI GmbH Kaiserslautern <?xml version=‘1.0’ ?> < rdf:RDF xmlns:rdf=“… rdf-syntax-ns#” xmlns:rdfs=“… rdf-schema#” xmlns=“http://example.org”> < rdf:Description rdf:nodeID =“node1”> <name> DFKI GmbH </name> <location> Kaiserslautern </location> <w ww rdf:resource=“ http://www.dfki.de ” /> </ rdf:Description > </ rdf:RDF > name location www http://www.dfki.de
    11. RDF Schema (RDFS)
      • Adds a vocabulary for representing classes and properties to RDF
      Person Teacher Student rdf:Literal name Course teaches enrolledIn is-a is-a
    12. Web Ontology Language (OWL)
      • OWL - Based on Description Logics
      • Adds further modelling vocabulary on top of RDFS
      XML Schema Namespaces Interpretation Context RDF Schema OWL Formalization: Classes (Inheritance), Properties Formalization: Classes, Class Definitions, Properties, Property Types (e.g. Transitivity) Data Types XML RDF Syntax Semantics
    13. Semantic Web Query Languages - SPARQL
      • SPARQL - query language developed by W3C
      • Syntactically based on SQL:
      • Results available as XML Documents
      PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?foafName WHERE { ?x foaf:name ?foafName . OPTIONAL { ?x foaf:mbox ?mbox } . }
    14. Semantic Web Tools
      • Programming APIs
        • Jena - Java
        • Redland – Python, …
        • RAP - PhP
      • Editors
        • Prot égé
        • OntoStudio
        • Triple20 - Prolog
      • Storage
        • Sesame
        • OntoBroker
      • Ontologies and Knowledge Markup
    15. Ontologies in Philosophy
      • Ontology is a branch of philosophy that deals with the nature and the organization of reality
      • Science of Being (Aristotle, Metaphysics)
        • What characterizes being?
        • Eventually, what is being?
    16. Ontologies in Computer Science
      • Ontology refers to an engineering artifact
        • a specific vocabulary used to describe a certain reality
        • a set of explicit assumptions regarding the intended meaning of the vocabulary
      • An Ontology is
        • an explicit specification of a conceptualization [Gruber 93]
        • a shared understanding of a domain of interest [Uschold/Gruninger 96]
    17. Why Develop an Ontology?
      • Make domain assumptions explicit
        • Easier to change domain assumptions
        • Easier to understand and update legacy data
      • Separate domain knowledge from operational knowledge
        • Re-use domain and operational knowledge separately
      • A community reference for applications
      • Shared understanding of what information means
    18. Types of Ontologies [Guarino, 98] Describe very general concepts like space, time, event, which are independent of a particular problem or domain. It seems reasonable to have unified top-level ontologies for large communities of users. Describe the vocabulary related to a generic domain by specializing the concepts introduced in the top-level ontology. Describe the vocabulary related to a generic task or activity by specializing the top-level ontologies. These are the most specific ontologies. Concepts in application ontologies often correspond to roles played by domain entities while performing a certain activity .
    19. Ontologies and Their Relatives Catalog / ID Terms/ Glossary Thesauri Informal Is-a Formal Is-a Formal Instance Frames Value Restric- tions General logical constraints Axioms Disjoint Inverse Relations, ...
    20. Knowledge Organization Systems
      • Semantic Lexicons – e.g. WordNet
        • … group together words according to lexical semantic relations like synonymy , hyponymy , meronymy , antonymy , etc.
      • Thesauri
        • … group together domain terms according to a set of taxonomic relations, including broader term, narrower term, sibling , etc.
      • Semantic Networks and Ontologies
        • … group together classes of objects according to a set of relations that originate in the nature of the domain of application.
        • Ontologies are defined by a formal semantics, but semantic networks may be informally defined. Therefore all ontologies are semantic networks, but not all semantic networks are ontologies.
    21. Thesauri - Examples MeSH Heading Databases, Genetic Entry Term Genetic Databases Entry Term Genetic Sequence Databases Entry Term OMIM Entry Term Online Mendelian Inheritance in Man Entry Term Genetic Data Banks Entry Term Genetic Data Bases Entry Term Genetic Databanks Entry Term Genetic Information Databases See Also Genetic Screening MT 3606 natural and applied sciences UF gene pool genetic resource genetic stock genotype heredity BT1 biology BT2 life sciences NT1 DNA NT1 eugenics RT genetic engineering (6411) EuroVoc covers terminology in all of the official EU languages for all fields that concern the EU institutions, e.g., politics, trade, law, science, energy, agriculture, 27 such fields in total. MeSH (Medical Subject Headings) is organized by terms (currently over 250,000) that correspond to a specific medical subject. For each such term a list of syntactic, morphological or semantic variants is given.
    22. Semantic Networks - Examples Pharmacologic Substance affects Pathologic Function Pharmacologic Substance causes Pathologic Function Pharmacologic Substance complicates Pathologic Function Pharmacologic Substance diagnoses Pathologic Function Pharmacologic Substance prevents Pathologic Function Pharmacologic Substance treats Pathologic Function Accession: GO:0009292 Ontology: biological process Synonyms: broad: genetic exchange Definition: In the absence of a sexual life cycle, the processes involved in the introduction of genetic information to create a genetically different individual. Term Lineage all : all (164142) GO:0008150 : biological process (115947) GO:0007275 : development (11892) GO:0009292 : genetic transfer (69) GO (Gene Ontology) allows for “consistent descriptions of gene products in different databases, including several of the world’s major repositories for plant, animal and microbial genomes…“ Organizing principles are molecular function, biological process and cellular component. UMLS (Unified Medical Language System) integrates linguistic, terminological and semantic information. The Semantic Network consists of 134 semantic types and 54 relations between types.
    23. Example Ontology Consider an Example Ontology for the Newspaper Domain
    24. Knowledge Markup
      • Ontologies are used to semantically organize and retrieve data (structured, textual, multimedia) through knowledge markup
        • Consider the following example:
      • Knowledge Markup from Text is based on Named-Entity Recognition, Semantic Tagging (Term to Class Mapping) and Relation Extraction
      <news:story xmnls:jobs=“http://www.jobs.org/owl-jobs#” xmlns:com=“http://www.companies.org/owl-companies#” xmlns:it=“http://www.it.net/owl-it#”> “ We were surprised by several of the results, particularly the order of finish,” said <jobs:SystemsAnalyst> Dan Olds </jobs:SystemsAnalyst>. <com:Company> IBM </com:Company> finished first with very strong results, and <com:Company> HP </com:Company> scored a solid number two; we expected to see <com:Company> Sun Microsystems </com:Company> challenging for first place or at least a strong second place. As the largest <it:operatingsystem> UNIX </it:operatingsystem> vendor in terms of number of installed systems, a third place finish should put their management on notice that their installed base may be vulnerable.
    25. Knowledge Markup - Images Semantic Annotation of Medical Images (miAKT Project - UK)
    26. Knowledge Markup - Images Semantic Annotation of Video (SmartMedia – DFKI KM)
    27. Ontology Life-Cycle Create/Select Development and/or Selection Populate Knowledge Base Generation Validate Consistency Checks Evolve Extension, Modification Maintain Usability Tests Deploy Knowledge Retrieval
      • Knowledge Extraction
      • Ontology Population & Ontology Learning
    28. Ontology Life-Cycle – Ontology Population Create/Select Development and/or Selection Populate Knowledge Base Generation Validate Consistency Checks Evolve Extension, Modification Maintain Usability Tests Deploy Knowledge Retrieval
    29. Ontology Population with SOBA
      • SOBA: SmartWeb Ontology-based Annotation
      • Application Context
        • SmartWeb (http://www.smartweb-projekt.de/) – German Project around World-Cup 2006
        • Integrates
          • Multimodal Dialog Processing
          • IR-based Question Answering
          • Ontology-Based Information Extraction
          • Semantic Web Services
      • Ontology-Based Information Extraction …
        • Combines:
          • Semantic Wrapping of Semi-Structured Data
          • Semantic and Linguistic Annotation of Free Text
          • Inference Rules for Instantiation and Integration of Annotated Entities and Events
      • … and Display
        • Ontology-driven Hyperlink Generation for Display of Extracted Information
    30. Linguistic Annotation Named Entity Recognition & Semantic Tagging Image Extraction PDF Analysis Inference Rules for Instantiation & Integration Knowledge Base Documents Ontologies Wrapping of SemiStructured Data SOBA – Processing and Data Flow
    31. SWIntO: SmartWeb Integrated Ontology SmartDOLCE:Entity SmartSUMO:Attribute SmartSUMO:SocialRole SmartSUMO:Proposition SportEvent:FootballPlayer SportEvent:Goalkeeper SportEvent:FootballOrganizationPerson SportEvent:FootballClubPresident … … … … … … … …
      • SWIntO (by AIFB, DFKI KM/IUI, EML) covers
        • Foundational (DOLCE) and General (SUMO) Knowledge
        • Domain- and Task-Specific Knowledge
          • Football / Sport Events
          • Navigation, Discourse, Multimedia
          • other
    32. SMartWeb Integrated Ontology (by AIFB, DFKI KM/IUI, EML)
    33.  
    34. SmartWeb Corpus
      • (Growing) Web Corpus through Monitor on
        • http://fifaworldcup.yahoo.com/
        • http://www.uefa.com/competitions/worldcup
      • Semi-Structured Data
        • Tabular: Match Reports, Teams, etc.
      • Free Text
        • Match Reports
        • Image Captions
    35. Semi-Structured Data - HTML
    36. Semi-Structured Data - XML
    37. Semi-Structured Data – F-Logic
    38. MatchEvent [Score, Team1, Team2] FootballPlayer Information Extraction from Free Text
    39. FoulEvent [FootballPlayer] FootballPlayer Information Extraction from Image Captions
    40. Linguistic and Semantic Annotation Mark Crossley saved twice with his legs from Huckerby. Named Entity Recognition & Semantic Tagging [ Mark Crossley GOALKEEPER] [ saved GOALKEEPER_ACTION] twice with his legs from [ Huckerby PLAYER] . Linguistic Annotation [ Mark Crossley GOALKEEPER : SUBJ] [ saved PRED : GOALKEEPER_ACTION] twice [ with his legs PP_OBJ] [ from [ Huckerby PLAYER] PP_ADJUNCT] . [ GOALKEEPER_ACTION = 'save‘, GOALKEEPER = ' Mark Crossley ‘, PLAYER = ' Huckerby ‘, MANNER = ‘legs' ]
    41. Annotation/Extraction Example
      • Example Sentence from Match Report
        • Allerdings ist Petrow fuer die Partie gegen Schweden gesperrt und kann erst gegen Ungarn eingesetzt werden.
        • “ However Petrow has been banned for the match against Sweden and can again be deployed against Hungary.”
      • Annotated/Extracted Information (with SProUT IE Tool - DFKI-LT )
      • player_action & [GAME_EVENT &quot;Ban&quot;,
        • AGENT player & [SURNAME &quot;PETROW&quot;],
        • IN_MATCH game & [TEAM2 &quot;SWE&quot;, TOURNAMENT &quot;Match&quot;]]
        • team & [NAME &quot;HUN&quot;]
    42. Knowledge Base Generation
      • <type orig=&quot;player&quot; target=&quot;dolce#natual-person-denomination>
      • <link type=&quot;dolce#natural-person&quot; method=&quot;dolce#HAS-DENOMINATION&quot; id=&quot;&quot;/>
      • <map>
      • <simple-mapping>
      • <input>
      • <arg orig=&quot;GIVEN_NAME&quot; target=&quot;VAR1&quot;/>
      • </input>
      • <output method=&quot;dolce#FIRSTNAME&quot; value=&quot;VAR1&quot;/>
      • </simple-mapping>
      • <simple-mapping>
      • <input>
      • <arg orig=&quot;SURNAME&quot; target=&quot;VAR1&quot;/>
      • </input>
      • <output method=&quot;dolce#LASTNAME&quot; value=&quot;VAR1&quot;/>
      • </simple-mapping>
      • </map>
      • </type>
      Transformation of SProUt Output to F-Logic via Declarative Mappings, e.g.:
    43. SProUt to F-Logic
      • FS type=&quot;player_action&quot;>
      • [N [N <F name=&quot;GAME_EVENT&quot;>
      • <FS type=&quot;world champion&quot;/>
      • <F name=&quot;ACTION_TIME&quot;>
      • <FS type=&quot;1990&quot;/>
      • <F name=&quot;ACTION_LOCATION&quot;>
      • <FS type=&quot;Italy&quot;/>
      • <F name=&quot;AGENT&quot;>
      • <FS type=&quot;player&quot;>
      • <F name=&quot;SURNAME&quot;>
      • <FS type=&quot;Buchwald&quot;/>
      • <F name=&quot;GIVEN_NAME&quot;>
      • <FS type=&quot;Guido&quot;/>
      soba#player124:sportevent#FootballPlayer [sportevent#impersonatedBy -> soba#Guido_BUCHWALD]. soba#Guido_BUCHWALD:dolce#&quot;natural-person&quot; [dolce#&quot;HAS-DENOMINATION&quot; -> soba#Guido_BUCHWALD_Denomination]. soba#Guido_BUCHWALD_Denomination&quot;:dolce#&quot;natural-person-denomination&quot; [dolce#LASTNAME -> &quot;Buchwald&quot;; dolce#FIRSTNAME -> &quot;Guido&quot;]. SProUt F-Logic
    44. A Complex Example semistruct#&quot;Bolivien_vs_Brasilien_09_Oct_05_16_00_Luis_CRISTALDO&quot;: sportevent#FieldMatchFootballPlayer [ externalRepresentation@(de) ->> &quot;Luis CRISTALDO (7)&quot;; sportevent#number -> 7; sportevent#impersonatedBy -> semistruct#&quot;Luis_CRISTALDO&quot; ]. semistruct#&quot;Bolivien_vs_Brasilien_09_OCt_05_16_00&quot; [ sportevent#matchEvents -> soba#ID25 ]. soba#ID25:sportevent#Foul [ sportevent#commitedBy -> semistruct#&quot;Bolivien_vs_Brasilien_09_Oct_05_Luis_CRISTALDO ]. mediainst#ID67:media#Picture [ media#URL -> &quot;http://fifaworldcup.yahoo.com/06/de/photos/index.html?aid=124155&d=1&quot;; media#shows -> ID25 ].
    45. Display of Extracted Information
    46. Ontology Life-Cycle – Ontology Learning Create/Select Development and/or Selection Populate Knowledge Base Generation Validate Consistency Checks Evolve Extension, Modification Maintain Usability Tests Deploy Knowledge Retrieval
    47. Ontology Learning Layer Cake Terms Concepts Taxonomy Relations Rules & Axioms disease, doctor, hospital {disease, illness, Krankheit} DISEASE:=<Int, Ext, Lex> is_a(DOCTOR, PERSON) cure(dom:DOCTOR, range:DISEASE) Introduced in: Philipp Cimiano, PhD Thesis University of Karlsruhe, forthcoming (Multilingual) Synonyms
    48. Some Current Work on Ontology Learning from Text
      • Term Extraction
          • Statistical Analysis
          • Patterns
          • (Shallow) Linguistic Parsing
          • Term Disambiguation & Compositional Interpretation
          • Combinations
      • Taxonomy Extraction
          • Statistical Analysis & Clustering (e.g. FCA)
          • Patterns
          • (Shallow) Linguistic Parsing
          • WordNet
          • Combinations
      • Relation Extraction
          • Anonymous Relations (e.g. with Association Rules)
          • Named Relations (Linguistic Parsing)
          • (Linguistic) Compound Analysis
          • Web Mining, Social Network Analysis
          • Combinations
      • Definition Extraction
          • (Linguistic) Compound Analysis (incl. WordNet)
      Overview of Current Work: Paul Buitelaar, Philipp Cimiano, Bernardo Magnini Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in Artificial Intelligence and Applications Series, Vol. 123, IOS Press, July 2005.
    49. RelExt - Relation Extraction for Ontology Learning Terms Concepts Taxonomy Relations Rules & Axioms disease, doctor, hospital {disease, illness, Krankheit} DISEASE:=<Int, Ext, Lex> is_a(DOCTOR, PERSON) cure(dom:DOCTOR, range:DISEASE) (Multilingual) Synonyms
    50. RelExt - Motivation
      • Extend Ontology with Relations
          • Currently ~ 60 Relations in the Sport Events Ontology
            • Mostly Properties, e.g. hasName, atMinute , …
          • Representation of (Verbal) Relations Enables Better Modeling of Events for Information Extraction Purposes
      • Example
          • “ Ballack shoots the ball in the net.”
          • Relation: Shoot ( Domain: FootballPlayer Range: BallObject)
    51. RelExt – System Architecture Triple Generation Triples Head : Pred : Head Evaluation Relation Extraction and Evaluation Named-Entity Rec. & Semantic Tagging Shallow Parsing Corpus Annotated Corpus Relevance Measure Frequencies In BNC, NZZ Relevance Scores Heads, Preds Co-occurrence Measure Co-occurrence Scores Heads <> Preds Linguistic Annotation Statistical Processing
    52. Linguistic Annotation
      • Named-Entity Recognition
      • “ Michael Ballack” : FootballPlayer
      • Semantic Tagging
          • “ Ball” (ball), “Leder” (leather) : BallObject
      • Shallow Parsing
        • Part-of-Speech Tagging
          • Fussballspieler (soccer player): Noun
        • Morphological Analysis
          • Fussballspieler: Fussball – Spieler
        • Dependency Structure Analysis
            • “ The team won the second match.”
            • SUBJECT PREDICATE DIRECT_OBJECT
    53. Relevance Ranking Top-10 Head-Nouns before and after mapping to Ontology Classes Top-10 Predicates
    54. Co-Occurrence Analysis ... ... flanken SUBJ: FOOTBALLPLAYER “Klasnic” flanken DOBJ: FOOTBALLPLAYER “Klose” flanken_in PP_ADJ “Zuschauer” (audience) ... beschimpfen (to insult) SUBJ: FOOTBALLPLAYER “Klasnic” ... ... ...
    55. Integration into Ontology Development
    56. OntoLT – Protégé Plug-In for Ontology Extraction from Text Terms Concepts Taxonomy Relations Rules & Axioms disease, doctor, hospital {disease, illness, Krankheit} DISEASE:=<Int, Ext, Lex> is_a(DOCTOR, PERSON) cure(dom:DOCTOR, range:DISEASE) (Multilingual) Synonyms
    57. OntoLT – Basic Idea
      • Middleware Solution in Ontology Development
        • Supports the Ontology Engineer through Semi-Automatic Extraction of Ontology Fragments from Domain-Relevant Document Collections
        • Download http://olp.dfki.de/OntoLT/OntoLT.htm
      • Based on
        • Automatic Linguistic Annotation
        • Manual Definition of Mapping Rules
        • Statistical Preprocessing (Option)
        • Interactive Validation of Candidates
        • Generation in Protégé of Ontology Fragments
    58. OntoLT – System Architecture
    59. Corpus Example – KMI News
    60. Mapping Rules
    61. Statistical Relevance
    62. Extract Candidates
    63. Generate Ontology Fragments
    64. Exercises
      • Knowledge Extraction
        • Ontology Modeling (from Text)
        • Ontology Population
        • Ontology Learning (Extension)
        • Ontology Mapping

    + Nirmala lastNirmala last, 2 years ago

    custom

    264 views, 0 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 264
      • 264 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 22
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories