Your SlideShare is downloading. ×
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating and Exploiting Complex Semantic Relationships
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating and Exploiting Complex Semantic Relationships

826
views

Published on

Amit P. Sheth, “Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating and Exploiting Complex Semantic Relationships,” Keynote at the 29th Conference on Current Trends in …

Amit P. Sheth, “Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating and Exploiting Complex Semantic Relationships,” Keynote at the 29th Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2002), Milovy, Czech Republic, November 22–29, 2002.

Published in: Education

1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
826
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
1
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • EarthQuake: USGS Neuclear: Oklohoma Observatory
  • Resource modeling: Locally completeness (you can get all Delta flights from delta.com), data characteristics (all flights from delta.com are Delta flights), binding patterns
  • Query and analyze – a powerful query interface beyond the traditional keyword queries and queries on structured databases as in SQL
  • Ontology has repationships between testsite and lattitude/longitude, and correlation agent
  • Transcript

    • 1. Relationships at the Heart of Semantic Web Amit Sheth Large Scale Distributed Information Systems (LSDIS) Lab University Of Georgia; http://lsdis.cs.uga.edu CTO, Semagix, Inc. http://www.semagix.com November 2002 © Amit Sheth Keynote SOFSEM 2002 , Milovy , Czech Republic, Nov 25 2002
    • 2.
      • The Semantic Web -- a vision with several views:
      • · “The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what data means to process it.” [B99]
      • · “The semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” [BHL01]
      • · “The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications. [W3C01]
      Semantics: The Next Step in the Web’s Evolution
    • 3. Semantics for the Web
      • On the Semantic Web every resource (people, enterprises, information services, application services, and devices) are augmented with machine processable descriptions to support the finding, reasoning about (e.g., which service is best), and using (e.g., executing or manipulating) the resource. The idea is that self-descriptions of data and other techniques would allow context-understanding programs to selectively find what users want, or for programs to work on behalf of humans and organizations to make them more efficient and productive.
    • 4. Move from Syntax to Semantics in Information System ( a personal perspective) Semantic Web, some DL-II projects, Semagix SCORE, Applied Semantics VideoAnywhere InfoQuilt OBSERVER Generation III (information brokering) 1997... Semantics (Ontology, Context, Relationships, KB) InfoSleuth, KMed, DL-I projects Infoscopes, HERMES, SIMS, Garlic,TSIMMIS,Harvest, RUFUS,... Generation II (mediators) 1990s VisualHarness InfoHarness Metadata (Domain model) Mermaid DDTS Multibase, MRDSM, ADDS, IISS, Omnibase, ... Generation I (federated DB/ multidatabases) 1980s Data (Schema, “semantic data modeling)
    • 5. Semantics and Relationships
      • Semantics is derived from relationships. Consider the linguistics perspective.
      • “ Semantics is the study of meaning. …We may distinguish a number of legitimate ways to approach semantics:
      • the relationship between linguistic expressions (e.g. synonymy, antonymy, hyperonymy, etc.): sense;
      • the relationship to linguistic expressions to the "real world": reference. “
      • Ontologies in KR help capture the above.
      • Quoted part from http://www.ncl.ac.uk/sml/staff/. © 2000 Jonathan West.
    • 6. Why is this a hard problem?
      • Are objects/entities equivalent/equal(same)?
      • How (well) are they related?
      • partial and fuzzy match: related, relevant
      • related in a “context”
      • degrees: semantic similarity, semantic proximity, semantic distance, ….
        • [differentiation, disjointedness]
      • Even is-a link involves different notions: identify, unity, essense (Guarino and Wetley 2002)
      • Semantic ambiguity, also based on incomplete, inconsistent, approximate information/knowledge
      • Many problems have stumbled across these issues e.g., schema integration (in database management area)
    • 7. Semantics and Relationships
      • Increasing depth and sophistication in dealing with semantics by dealing with (identifying/searching to analyzing) documents, entities, and relationships.
      Documents Entities Relationships Future Current Past
    • 8. Issues - Relationships
      • Identifying Relationship (extraction)
      • Expressing (specifying, representing) relationships
      • Discovering and Exploring Relationships
      • Hypothesizing and Validating Relationships
      • Utilizing/exploiting Relationships for Semantic Applications (in document search, querying metadata, analysis…)
    • 9. Expressing Relationships
      • Expressiveness of specification language
        • In relational model
        • In semantic data model, e.g., E-R variants
        • KR languages
        • In logic, e.g., description logics
    • 10. Relationship Modeling in Various Representation Models … Catalog/ID General Logical constraints Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restriction Disjointness, Inverse, part of… After Deborah L. McGuinness (Stanford) and Tim Finin (UMBC) Simple Taxonomies Expressive Ontologies Wordnet CYC RDF DAML OO DB Schema RDFS IEEE SUO OWL UMLS
    • 11. Sampling Issues in Relationships- outline of this talk
      • Simple Relationships – already known
        • Representation
        • Identification/Querying: “Which entities are related to entity X via relationship R?” where R is typically specified as possibly a join condition or path expression
      • Complex relationships
        • Rho: discovery from large document set with associated metadata and ontologies: “How is X related to Y?”
        • ISCAPEs: validation/ human-directed knowledge discovery
    • 12. Metadata and Ontology: Primary Semantic Web enablers Data (Heterogeneous Types/Media) Content Independent Metadata (creation-date, location, type-of-sensor...) Content Dependent Metadata (size, max colors, rows, columns...) Direct Content Based Metadata (inverted lists, document vectors, LSI) Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata concept descriptions from ontologies Ontologies Classifications Domain Models User More Semantics for Relevance to tackle Information Overload!!
    • 13. SCORE technology Ontology Metadata adapter Metadata adapter Enterprise Content Applications Knowledge Agent Monitor KS KS KS KS KA KA KA Knowledge Sources Knowledge Agents KA Toolkit Metabase Semi- Structured Content Sources Content Sources CA CA CA Content Agent Monitor Content Agents CA Toolkit Databases XML/Feeds Websites Email Reports Documents Structured Unstructured Databases XML/Feeds Websites Email Reports Semantic Enhancement Server Entity Extraction, Enhanced Metadata, Domain Experts Automatic Classification Classification Committee Semantic Query Server Ontology and Metabase Main Memory Index
    • 14. Information Extraction for Metadata Creation METADATA EXTRACTORS Key challenge: Create/extract as much (semantics) metadata automatically as possible WWW, Enterprise Repositories Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images . . . . . . . . .
    • 15. Video with Editorialized Text on the Web Automatic Classification & Metadata Extraction (Web page) Auto Categorization Semantic Metadata
    • 16. Semantic Annotation Limited tagging (mostly syntactic) COMTEX Tagging Content ‘ Enhancement’ Rich Semantic Metatagging Value-added Voquette Semantic Tagging
      • Value-added
      • relevant metatags
      • added by Voquette
      • to existing
      • COMTEX tags:
      • Private companies
      • Type of company
      • Industry affiliation
      • Sector
      • Exchange
      • Company Execs
      • Competitors
    • 17. Automatic Semantic Annotation of Text: Entity and Relationship Extraction
    • 18. Extraction Agent Enhanced Metadata Asset Ontology-directed Metadata Extraction (Semi-structured data) Web Page
    • 19. Entity and Semantic Metadata Extraction Semantic Metadata Syntax Metadata
    • 20. Semantic Metadata Enhancement
    • 21. Semantic Application Example – Research Dashboard Focused relevant content organized by topic ( semantic categorization ) Automatic Content Aggregation from multiple content providers and feeds Related relevant content not explicitly asked for (semantic associations) Competitive research inferred automatically Automatic 3 rd party content integration
    • 22. Related Stock News Semantic Web – Intelligent Content Industry News Technology Products COMPANY EPA Regulations Competition COMPANIES in Same or Related INDUSTRY COMPANIES in INDUSTRY with Competing PRODUCTS Impacting INDUSTRY or Filed By COMPANY Important to INDUSTRY or COMPANY SEC Intelligent Content = What You Asked for + What you need to know!
    • 23. led by Same entity Human-assisted inference Knowledge-based and Manual Associations Syntax Metadata Semantic Metadata
    • 24. Blended Semantic Browsing and Querying (Intelligence Analyst Workbench)
    • 25. Physical link to Relationship
      • <TITLE> A Scenic Sunset at Lake Tahoe </TITLE>
      • <p>
      • Lake Tahoe is a popular tourist spot and <A HREF = “http://www1.server.edu/lake_tahoe.txt”> some interesting facts </A> are available here. The scenic beauty of Lake Tahoe can be viewed in this photograph: <center> <IMG SRC=“http://www2.server.edu/lake_tahoe.img”> </center>
      Correlation achieved by using physical links Done manually by user publishing the HTML document
    • 26. MREF Metadata Reference Link -- complementing HREF
      • Creating “logical web” through
      • Media Independent Metadata based Correlation
    • 27. Metadata Reference Link (<A MREF …>)
      • <A HREF=“URL”> Document Description </A>
      • physical link between document (components)
      • <A MREF KEYWORDS= <list-of-keywords>; THRESH=<real>> Document Description </A>
      • <A MREF ATTRIBUTES (<list-of-attribute-value-pairs>)> Document Description </A>
    • 28. Abstraction Layers METADATA DATA METADATA DATA MREF in RDF ONTOLOGY NAMESPACE ONTOLOGY NAMESPACE
    • 29. Model for Logical Correlation using Ontological Terms and Metadata Framework for Representing MREFs Serialization (one implementation choice)
    • 30. Correlation based on Content-based Metadata height, width and size Some interesting information on dams is available here. “ information on dams” is defined by an MREF defining keywords and metadata (which may be used for a query). water.gif (Data) Metadata Storage water.gif …… mpeg …… ppm Major component(RGB) Blue Content based Metadata Content Dependent Metadata
    • 31. An Example RDF Model for MREF
      • <?namespace href=&quot;http://www.foo.com/IQ&quot; as=&quot;IQ&quot;?>
      • <?namespace href=&quot;http://www.w3.org/schemas/rdf-schema&quot; as=&quot;RDF&quot;?>
      • <RDF:serialization>
      • <RDF:bag id=&quot;MREF:12345>
          • <IQ:keyword>
          • <RDF:resource id=&quot;constraint_001&quot;>
          • <IQ:threshold>0.5</IQ:threshold>
      • <RDF:PropValue>dam</RDF:PropValue>
          • </RDF:resource>
          • </IQ:keyword>
          • <IQ:attribute>
          • <RDF:resource id=&quot;constraint_002&quot;>
      • <IQ:name>majorRGB</IQ:color>
      • <IQ:type>string</IQ:type>
      • <RDF:PropValue>blue</RDF:PropValue>
          • </RDF:resource>
          • </IQ:attribute>
      • </RDF:bag>
      • </RDF:serialization>
    • 32. Domain Specific Correlation
      • Potential locations for a future shopping mall identified by all regions having a population greater than 500 and area greater than 50 sq meters having an urban land cover and moderate relief <A MREF ATTRIBUTES( population < 500; area < 50 & region-type = ‘block’ & land-cover = ‘urban’ & relief = ‘moderate’)> can be viewed here </A>
      • => media-independent relationships between domain specific metadata : population, area, land cover, relief
      • => correlation between image and structured data at a higher domain specific level as opposed to physical “link-chasing” in the WWW
    • 33. TIGER/Line DB Repositories and the Media Types Population: Area: Boundaries: Land cover: Relief: Census DB Map DB Regions (SQL) Boundaries Image Features (IP routines)
    • 34.  
    • 35. Relationship Discovery
      • Problem
        • Huge volumes of data. Need to find relationships between two entities in the Semantic Web.
        • Application areas such as National Security, Intelligence Services, Bioinformatics.
        • Relationship can be of different kinds.
    • 36. Example passengerOf AlQaida Terrorist Organization leaderOf friendOf Mohammad Atta Osama, bin laden Ramzi Binalshibh name name memberOf name
    • 37. Semantic Association
      • Complex relationships which capture connectivity and similarity of entities in a knowledge base
        • Complex
          • Involve multiple relations
        • Connectivity
          • Includes both directed paths and undirected paths Similarity
          • Specific notion of an isomorphism, based on the similarity of roles of entities.
    • 38. Representing and analyzing metadata
      • By using a graph data model, Semantic Associations can be viewed in terms of graph traversals
      • We can distinguish between different types of Semantic Associations based on structural properties
      • For example, a path, intersecting paths, isomorphic paths.
      • We use the RDF Graph Data Model, to model Semantic Associations.
    • 39. Example Graph &r3 &r5 “ Reina Sofia Museun” &r7 “ oil on canvas” &r2 2000-02-01 “ oil on canvas” &r8 “ Rodin Museum” “ image/jpeg” 2000-6-09 Ext. Resource String Date Integer String title file_size last_modified mime-type Artist Sculptor Artifact Sculpture Museum String fname lname creates exhibited sculpts String Painting Painter paints technique material typeOf(instance) subClassOf(isA) subPropertyOf mime-type exhibited technique exhibited title last_modified last_modified title technique exhibited &r6 &r1 fname lname paints paints creates &r4 “ Rembrandt” fname
      •  - pathConnected (x, y): is true if there is a path
        • < x , p1, a, p2, b, p3, …. y > in the knowledge base
      X Y a p1 p2 String String “ Rodin” “ August” fname lname “ Pablo” “ Picasso”
    • 40. &r3 &r5 “ Reina Sofia Museun” &r7 “ oil on canvas” &r2 2000-02-01 “ oil on canvas” &r8 “ Rodin Museum” “ image/jpeg” 2000-6-09 Ext. Resource String Date Integer String title file_size last_modified mime-type Artist Sculptor Artifact Sculpture Museum String fname lname creates exhibited sculpts String Painting Painter paints technique material typeOf(instance) subClassOf(isA) subPropertyOf mime-type exhibited technique exhibited title last_modified last_modified title technique exhibited &r6 &r1 fname lname paints paints creates &r4 “ Rembrandt” fname X k a
      •  - joinConnected (x, y): is true if there two paths P1, P2 such that:
        • P1 = < x , pa, a, pb, b, pc, c, pd… k , pl l, pm, m> and
        • P2 = < y , pu, b, pv,… k , pw, l, py, n>
        • Or
        • P1 = < a, pa, b, pb,… k , pk, l, pl, x > and
        • P2 = < y, pu, b, pv, m, pw, l,… k , p5, l, p6, n >
      m y b n String String “ Rodin” “ August” fname lname “ Pablo” “ Picasso”
    • 41. Painting &r3 &r5 “ Reina Sofia Museun” &r7 “ oil on canvas” &r2 2000-02-01 “ oil on canvas” &r8 “ Rodin Museum” “ image/jpeg” 2000-6-09 Ext. Resource String Date Integer String title file_size last_modified mime-type Artist Sculptor Artifact Sculpture Museum String fname lname creates exhibited sculpts String Painter paints technique material typeOf(instance) subClassOf(isA) subPropertyOf mime-type exhibited technique exhibited title last_modified last_modified title technique exhibited &r6 &r1 fname lname paints paints creates &r4 “ Rembrandt” fname X Y pa pa a u pc p1 c 1
      •  - isoConnected (x, y) is true if there two paths P1, P2 such that:
        • P1 = <x, pa, a, pb, b, pc, c> and
        • P2 = <y, pu, b, pv, m, pw, l>
        • and
        • x  y, a  b, c  l …….
        • pa  pu, pb  pv, pc  pw ….
      String String “ Rodin” “ August” fname lname “ Pablo” “ Picasso”
    • 42.  Operators
      • The  Operator computes Semantic Associations between two entities.
      • Three kinds of Operators are defined.
        •  Path : This operator returns all paths between two entities in the data model.
        •  Connect : This operator returns intersecting paths, on which the two entities lie.
        •  Iso :  - isomorphic paths implies a similarity of nodes and edges along the paths, and returns such similar paths between entities.
    • 43. Formalism
      •  - pathConnected (x, y): is true if there is a path
        • < x , p 1 , a, p 2 , b, p 3 , …. y > in the knowledge base
      •  - joinConnected (x, y): is true if there two paths P 1 , P 2 such that:
        • P 1 = < x , p a , a, p b , b, p c , c, p d … k , p l l, p m , m> and
        • P 2 = < y , p u , b, p v ,… k , p w , l, p y , n>
        • Or
        • P 1 = < a, p a , b, p b ,… k , p k , l, p l , x > and
        • P 2 = < y, p u , b, p v , m, p w , l,… k , p 5 , l, p 6 , y >
    • 44. Complex Relationship Validation
      • Arise in several contexts, especially involving multiple ontologies (hence mappings)
        • information interoperability where related resources subscribe to different but related ontologies
        • information requestor and resource modelers choose to use different ontologies
        • information requests to support analysis, knowledge discovery, decision making, learning that requires linking multiple domains with different ontologies
      • Developing all encompassing, unified ontology is not shown to be practical. Preexisting classifications/metadata standards/taxonomies are hard to ignore.
    • 45. Complex Relationships - Cause-Effects & Knowledge discovery AFFECTS VOLCANO LOCATION ASH RAIN PYROCLASTIC FLOW ENVIRON. LOCATION PEOPLE WEATHER PLANT BUILDING DESTROYS COOLS TEMP DESTROYS KILLS
    • 46. Knowledge Discovery - Example Earthquake Sources Nuclear Test Sources Nuclear Test May Cause Earthquakes Is it really true? Complex Relationship: How do you model this?
    • 47. Inter-Ontological Relationships
        • A nuclear test could have caused an earthquake
        • if the earthquake occurred some time after the
        • nuclear test was conducted and in a nearby region .
        • NuclearTest Causes Earthquake
        • <= dateDifference ( NuclearTest.eventDate,
        • Earthquake.eventDate ) < 30
        • AND distance ( NuclearTest.latitude,
        • NuclearTest.longitude,
        • Earthquake,latitude,
        • Earthquake.longitude ) < 10000
    • 48. Knowledge Discovery - Example When was the first recorded nuclear test conducted? Find the total number of earthquakes with a magnitude 5.8 or higher on the Richter scale per year starting from 1900 1950 Increase in number of earthquakes since 1945
    • 49. Knowledge Discovery – exploring relationship… For each group of earthquakes with magnitudes in the ranges 5.8-6, 6-7, 7-8, 8-9, and >9 on the Richter scale per year starting from 1900, find number of earthquakes Number of earthquakes with magnitude > 7 almost constant. So nuclear tests probably only cause earthquakes with magnitude < 7
    • 50. Knowledge Discovery - Example… Find nuclear tests and earthquakes that may have occurred as a result of the test KB
    • 51. InfoQuilt System Core capabilities
      • Ability to handle heterogeneous, static or dynamic content – wrappers & extractors, with resource modeling (completeness, data characteristics, binding patterns)
      • Information Extraction: Semi-Automatically or Automatically create domain-specific or contextually relevant metadata
      • Domain modeling with complex (user defined, inter-ontology) relationships, domain rules and FD
      • User defined Functions (esp. for fuzzy/approximate matching) and Simulation
      • Post processing result analysis (e.g., chart creator)
    • 52. IScape (Information Scape)
      • A computing paradigm that allows users to query and analyze the data available from a diverse autonomous sources, gain better understanding of the domains and their interactions as well as discover and study relationships .
    • 53. IScape …a simple example
      • user’s request
        • for semantically related information (regardless of all forms of heterogeneity)
        • specified in terms of components of knowledge base (domain model, relationships, functions, simulations)
      “ Find all earthquakes with epicenter less than 5000 mile from the location at latitude 60.790 North and longitude 97.570 East and find all tsunamis that they might have caused ” Next - KD using ISacpes
    • 54. Ontologies Disaster eventDate description site => latitude, longitude site latitude longitude Natural Disaster Man-made Disaster damage numberOfDeaths damagePhoto Volcano Earthquake NuclearTest magnitude bodyWaveMagnitude conductedBy explosiveYield bodyWaveMagnitude < 10 bodyWaveMagnitude > 0 magnitude < 10 magnitude > 0 Terms/Concepts (Attributes) Functional Dependencies (FDs) Domain Rules Hierarchies
    • 55. Knowledge Builder
    • 56. IScape Builder
    • 57. IScape Execution IScape Plan Plan Knowledge IScape Query Query Query Data retrieved Final Results Final Results
    • 58. IScape 1 NuclearTestsDB ( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3 ); NuclearTestSites ( testSite, latitude, longitude ); SignificantEarthquakesDB ( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” ); NuclearTest ( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude ); Earthquake ( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 ); “ Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been caused due to these tests.” NuclearTest Causes Earthquake <= dateDifference ( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance ( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000 Ontology Ontology Resource Resource Resource Relationship IScape USGS site http://sun00781.dn.net/nuke/hew/Library/Catalog
    • 59. IScape Processing Monitor
    • 60. Future
      • Future work in Semantic Web will increasingly focus on all dimensions of relationships, especially complex relationships.
      • New Semantic Applications (business/govt. intelligence) are being enabled.
    • 61. Further Information
      • Related Paper: Sheth, Arpinar, Kashyap: Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships
      • http://lsdis.cs.uga.edu/lib/2002.html
      • InfoQuilt and Semantic Association Projects at the LSDIS Lab: http://lsdis.cs.uga.edu
      • Green, Bean and Myaeng: The Semantics of Relationships: An Interdisciplinary Perspective, Kluwer Academic Publishers 2002.

    ×