• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Semantic Interoperability & Information Brokering in Global Information Systems

on

  • 1,415 views

Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999. ...

Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.

Key coverage:
Use of ontologies for semantic interoperability (http://knoesis.org/library/resource.php?id=00277); InfoHarness (http://knoesis.org/library/resource.php?id=00275) and VisualHarness (http://knoesis.org/library/resource.php?id=00267) demonstrate faceted search; MREF - putting metadata on HREF is way ahead of its time (see: http://knoesis.org/library/resource.php?id=00294); multi-ontology query processing in OBSERVER system (http://knoesis.org/library/resource.php?id=00273)

Statistics

Views

Total Views
1,415
Views on SlideShare
1,395
Embed Views
20

Actions

Likes
0
Downloads
23
Comments
0

3 Embeds 20

http://goshika.blogspot.com 16
http://www.slideshare.net 3
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Semantic Interoperability & Information Brokering in Global Information Systems Semantic Interoperability & Information Brokering in Global Information Systems Presentation Transcript

    • Bethesda, Maryland, April 6, 1999 Amit Sheth Large Scale Distributed Information Systems Lab University of Georgia http://lsdis.cs.uga.edu Semantic Interoperability and Information Brokering in Global Information Systems
    • Three perspectives to GlobIS Information Integration Perspective distribution autonomy heterogeneity Information Brokering Perspective data meta-data semantic (terminological, contextual) “ Vision” Perspective data connectivity computing information knowledge
    • Evolving targets and approaches in integrating data and information (a personal perspective) Infocosm Mermaid DDTS Multibase, MRDSM, ADDS, IISS, Omnibase, ... Generation I 1980s DL-II projects ADEPT, InfoQuilt Generation III 1997... InfoSleuth, KMed, DL-I projects Infoscopes, HERMES, SIMS, Garlic,TSIMMIS,Harvest, RUFUS,... Generation II 1990s VisualHarness InfoHarness a society for ubiquitous exchange of (tradeable) information in all digital forms of representation; information anywhere, anytime, any forms
      • Data recognized as corporate resource — leverage it!
      • Data predominantly in structured databases, different data models, transitioning from network and hierarchical to relational DBMSs
      • Heterogeneity (system, modeling and schematic) as well as need to support autonomy posed main challenges; major issues were data access and connectivity
      • Information integration through Federated architecture
      • Support for corporate IS applications as the primary objective, update often required, data integrity important
      Generation I
    • (heterogeneity in FDBMSs) Generation I C o m m u n i c a t i o n
      • Hardware/System
      • instruction set
      • data representation/coding
      • configuration
      • Operating System
      • file system
      • naming, file types, operation
      • transaction support
      • IPC
      • Database System
      • Semantic Heterogeneity
      • Differences in DBMS
        • data models (abstractions, constraints, query languages)
        • System level support (concurrency control, commit, recovery)
      1970s 1980s
    • Generation I (Federated Database Systems: Schema Architecture)
      • Model Heterogeneity: Common/Canonical Data Model Schema Translation
      • Information sharing while preserving autonomy
      • Dimensions for interoperability and integration: distribution , autonomy and heterogeneity
      Component DBS Local Schema Component Schema Export Schema Export Schema Export Schema Federated Schema External Schema External Schema . . . Component DBS Local Schema Component Schema . . . . . . . . . . . . schema translation schema integration
    • (characterization of schematic conflicts in multidatabase systems) Sheth & Kashyap, Kim & Seo Generation I Schematic Conflicts Generalization Conflicts Aggregation Conflicts Abstraction Level Incompatibility Data Value Attribute Conflict Entity Attribute Conflict Data Value Entity Conflict Schematic Discrepancies Naming Conflicts Database Identifier Conflicts Schema Isomorphism Conflicts Missing Data Items Conflicts Entity Definition Incompatibility Naming Conflicts Data Representation Conflicts Data Scaling Conflicts Data Precision Conflicts Default Value Conflicts Attribute Integrity Constraint Conflicts Domain Definition Incompatibility Known Inconsistency Temporal Inconsistency Acceptable Inconsistency Data Value Incompatibility B U T these techniques for dealing with schematic heterogeneity do not directly map to dealing with much larger variety of heterogeneous media
    • (observations and lessons learnt)
      • “ tightly coupled” vs “loosely coupled” debate: we were not able to develop “global schema” based systems
      • “ good common data model” debate: we were not able to pick the best data model
      • can we have a metadata standard for a domain?
          • only for a limited purpose
          • must learn to live with multiple data types, multiple metadata models/standards, and multiple ontologies
      Generation I
      • Significant improvements in computing and connectivity (standardization of protocol, public network, Internet/Web); remote data access as given;
      • Increasing diversity in data formats, with focus on variety of textual data and semi-structured documents
      • Many more data sources, heterogeneous information sources, but not necessarily better understanding of data
      • Use of data beyond traditional business applications: mining + warehousing, marketing, e-commerce
      • Web search engines for keyword based querying against HTML pages; attribute-based querying available in a few search systems
      • Use of metadata for information access; early work on ontology support distribution applied to metadata in some cases
      • Mediator architecture for information management
      Generation II
    • (limited types of metadata, extractors, mappers, wrappers) Generation II METADATA EXTRACTORS Find Marketing Manager positions in a company that is within 15 miles of San Francisco and whose stock price has been growing at a rate of at least 25% per year over the last three years Junglee, SIGMOD Record, Dec. 1997 Global/Enterprise Web Repositories Digital Maps Nexis UPI AP Documents Digital Audios Data Stores Digital Videos Digital Images . . . . . . . . .
      • METADATA STANDARDS
      • General Purpose:
      • Dublin Core, MCF
      • Domain/industry specific:
      • Geographic (FGDC, UDK, …),
      • Library (MARC,…)
      (a metadata classification: the informartion pyramid) Generation II Data (Heterogeneous Types/Media) Content Independent Metadata (creation-date, location, type-of-sensor...) Content Dependent Metadata (size, max colors, rows, columns...) Direct Content Based Metadata (inverted lists, document vectors, WAIS, Glimpse, LSI) Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata concept descriptions from ontologies Ontologies Classifications Domain Models User Move in this direction to tackle information overload!!
    • VisualHarness – an example
    • Query processing and information requests What’s next (after comprehensive use of metadata)? NOW
      • traditional queries based on keywords
      • attribute based queries
      • content-based queries
      NEXT
      • ‘ high level’ information requests involving ontology-based, iconic, mixed-media, and media-independent information rrequests
      • user selected ontology, use of profiles
    • GIS Data Representation – Example multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas State FGDC Metadata Model Theme keywords : digital line graph, hydrography, transportation... Title : Dakota Aquifer Online linkage : http://gisdasc.kgs.ukans.edu/dasc/ Direct Spatial Reference Method: Vector Horizontal Coordinate System Definition: Universal Transverse Mercator … … … ... UDK Metadata Model Search terms : digital line graph, hydrography, transportation... Topic : Dakota Aquifer Adress Id: http://gisdasc.kgs.ukans.edu/dasc/ Measuring Techniques: Vector Co-ordinate System: Universal Transverse Mercator … … … ...
      • Increasing information overload and broader variety of information content (video content, audio clips etc) with increasing amount of visual information, scientific/engineering data
      • Continued standardization related to Web for representational and metadata issues (MCF, RDF, XML)
      • Changes in Web architecture; distributed computing (CORBA, Java)
      • Users demand simplicity, but complexities continue to rise
      • Web is no longer just another information source, but decision support through “data mining and information discovery, information fusion, information dissemination, knowledge creation and management”, “information management complemented by cooperation between the information system and humans”
      • Information Brokering Architecture proposed for information management
      Generation III
    • Information Brokering: An Enabler for the Infocosm INFORMATION/DATA OVERLOAD INFORMATION PROVIDERS Newswires Universities Corporations Research Labs Information System Data Repository Information System INFORMATION CONSUMERS Corporations Universities People Government Programs User Query User Query User Query arbitration between information consumers and providers for resolving information impedance INFORMATION BROKERING Information System Data Repository Information System Information Request Information Request Information Request dynamic reinterpretation of information requests for determination of relevant information services and products — dynamic creation and composition of information products
    • Information Brokering: Three Dimensions Objective: Reduce the problem of knowing structure and semantics of data in the huge number of information sources on a global scale to: understanding and navigating a significantly smaller number of domain ontologies S E M A N T I C S S T R U C T U R E S Y N T A X S Y S T E M C O N S U M E R S B R O K E R S P R O V I D E R S D A T A M E T A D A T A V O C A B U L A R Y T H R E E D I M E N S I O N S
    • What else can Information Brokering do? W W W a confusing heterogeneity of media, formats (Tower of Babel) information correlation using physical (HREF) links at the extensional data level location dependent browsing of information using physical (HREF) links user has to keep track of information content !! W W W + Information Brokering Domain Specific Ontologies as “semantic conceptual views” Information correlation using concept mappings at the intensional concept level Browsing of information using terminological relationships across ontologies Higher level of abstraction, closer to user view of information !!
    • Concepts, tools and techniques to support semantics context media-independent information correlations semantic proximity inter-ontological relations ontologies (esp. domain-specific) profiles domain-specific metadata
      • Context, context, context
      • Media-independent information correlations
      • Multiple ontologies
        • Semantic Proximity (relationships between concepts within and across ontologies) using domain, context, modeling/abstraction/representation, state
        • Characterizing Loss of Information incurred due to differences in vocabulary
      Tools to support semantics BIG challenge: identifying relationship or similarity between objects of different media, developed and managed by different persons and systems
    • Information Brokering over Heterogeneous Digital Data: A Metadata-based Approach
      • Systems Heterogeneity: information system heterogeneity (DBMSs, concurrency control); platform Heterogeneity (operating systems, hardware)
      • Syntactic Heterogeneity: different formats and storage for digital media ; machine readable aspects of data representation
      • Structural Heterogeneity: heterogeneity in data model constructs; schematic/ representational heterogeneity
      • Semantic Heterogeneity: terminological/vocabulary heterogeneity; contextual heterogeneity
      • Information Resource Discovery
        • which/where are the relevant information sources ?
      • Modeling of information Content
        • increasing number of modeling possibilities
      • Querying of Information Content
        • Information Focusing
        • Information Correlation
        • combinatorial combinations of combining/subsetting information
      We shall focus on these! I N F O R M A T I O N O V E R L O A D = HETEROGENEITY + GLOBALIZATION
    • Heterogeneity... … is a Babel Tower!! SEMANTIC INTEROPERABILITY metadata ontologies contexts SEMANTIC HETEROGENEITY
    • The InfoQuilt Project
      • THE INFOQUILT VISION
      • Semantic interoperability between systems, sharing knowledge using multiple ontologies
      • Logical correlation of information
      • Media independent information processing
      • REALIZATION OF THE VISION
      • fully distributed, adaptable, agent-based system
      • information/knowledgement supported by collaborative processes
      http://lsdis.cs.uga.edu/proj/iq/iq.html
    • InfoQuilt Project: using the M etadata REF erence link http://lsdis.cs.uga.edu/proj/iq/iq.html MREF Complements HREF, creating a “logical web” through media independent ontology & metadata based correlation It is a description of the information asset we want to retrieve MREF domain ontologies IQ_Asset ontology + extension ontologies attributes relations constraints keywords content attributes (color, scene cuts, …) Semantic Correlation using MREF MREF Concept Model for logical correlation using ontological terms and metadata Framework for representing MREF’s Serialization (one implementation choice) X M L M R E F R D F
    • Domain Specific Correlation – example Potential locations for a future shopping mall identified by all regions having a population greater than 5000, and area greater than 50 sq. ft. having an urban land cover and moderate relief <A MREF ATTRIBUTES(population > 5000; area > 50; region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’) can be viewed here </A> => media-independent relationships between domain specific metadata : population, area, land cover, relief => correlation between image and structured data at a higher domain specific level as opposed to physical “link- chasing” in the WWW domain specific metadata: terms chosen from domain specific ontologies Population: Area : Land cover: Relief: Boundaries: Census DB TIGER/Line DB US Geological Survey Regions (SQL) :  Boundaries  Image Features (image processing routines)
    • Domain Specific Correlation – example
    • A DL II approach for Information Brokering CONSTRUCTING ADDITIONAL META-INFORMATION RESOURCES Physical/Simulation World DISCOVERING COLLECTIONS OF HETEROGENEOUS INFORMATION AND META-INFORMATION RESOURCES Images Data Stores Documents Digital Media Domain Specific Ontologies Domain Independent Ontologies Iscape N CONSTRUCTING APPROPRIATE INFORMATION LANDSCAPES Iscape 1
    • ADEPT Information Landscape Concept Prototype (a scenario for Digital Earth: learning in the context of the “El Niño” phenomenon)
      • Sample Iscapes Requests:
        • How does El Niño affect sea animals? Look for broadcast videos of less than 2 minutes.
        • How are some regions affected by El Niño? Look at East/West Pacific regions.
        • What disasters have been related to El Niño?
        • What storm occurrences are attributed to El Niño?
        • Show reports related to El Niño that contain Clinton.
      TRY ISCAPE CONCEPT DEMO
      • request information using
        • keywords
        • domain-specific attributes
        • domain-independent attributes
    • Putting MREFs to work User Agent Profile Manager user information MREF request retrieve profile User display results change profile design MREF domain ontologies MREF Builder IQ_Asset ontology + extension ontologies construct new MREF Broker Agent send MREF send results retrieve MREF retrieve MREF MREF repository MREF repository User profiles
      • “ For instance, if you were to use Yahoo! or Infoseek to search the web for pizza, your results would probably be hundreds of matches for the word pizza. Many of these could be pizza parlors around the world. Yet if you run the same search within NeighborNet, you will allows you to order pizza to be delivered instead of shipped.”
      • From a Press Resease of FutureOne, Inc. March 24, 1999 http://home.futureone.com/about/pr/021699.asp
      Context: the lynchpin of semantics Cricket
    • Constructing c-contexts from ontological terms
      • Advantages:
      • Use of ontologies for an intensional domain specific description of data
      • Representation of extra information
        • Relationships between objects not represented in the database schema
        • Using terminological relationships in the ontology
      C-CONTEXT: “ All documents stored in the database have been published by some agency ” => C def (DOC) = <(hasOrganization, AgencyConcept)> C-Context = <(C 1 , V 1 ) (C 2 , V 2 ) ... (C k , V k ) > a collection of contextual coordinates C i s (roles) and values V i s (concepts/concept descriptions) AGENCY (RegNo, Name, Affiliation) DOC (Id, Title, Agency) ONTOLOGICAL TERMS Agency Concept DATABASE OBJECTS Document Concept hasOrganization
    • Using c-contexts to reason about information in database - Reasoning with c-contexts: glb(C def (DOC), C Q ) - Ontological Inferences: - DocumentConcept - (hasOrganization, { “USGS” } ) Challenge 1: use of multiple ontologies Challenge 2: estimating the loss of information EXAMPLE C def (DOC) <(hasOrganization, AgencyConcept)> C Q <(hasOrganization, { “USGS”} )> glb(C def (DOC), C Q ) <(self, DocumentConcept),(hasOrganization, { “USGS” } )>
    • Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system OBSERVER architecture Eduardo Mena (III’98) Data Repositories Mappings Ontologies COMPONENT NODE Data Repositories Mappings Ontologies COMPONENT NODE Data Repositories Mappings Ontology Server Query Processor User Query Ontologies USER NODE Interontologies Terminological Relationships IRM IRM NODE Ontology Server Ontology Server Query Processor Query Processor
    • Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system “ Get title and number of pages of books written by Carl Sagan” Query construction - Example Eduardo Mena (III’98) User ontology: WN [name pages] for (AND book (FILLS creator “Carl Sagan”)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for (AND book (FILLS doc-author-name “Carl Sagan”)) Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
    • Re-use of Knowledge: Bibliography Data Ontology Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system “ Get title and number of pages of books written by Carl Sagan” Query construction - Example Eduardo Mena (III’98) Biblio-Thing Document Book Edited-Book Technical-Report Periodical-Publication Journal Magazine Newspaper Miscellaneous-Publication Technical-Manual Computer-Program Multimedia-Document Artwork Cartographic-Map Thesis Doctoral-Thesis Master-Thesis Proceedings Conference Agent Person Author Organization Publisher University Stanford-I User ontology: WN [name pages] for (AND book (FILLS creator “Carl Sagan”)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for (AND book (FILLS doc-author-name “Carl Sagan”)) Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
    • Re-use of Knowledge: A subset of WordNet 1.5 Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system “ Get title and number of pages of books written by Carl Sagan” Query construction - Example Eduardo Mena (III’98) User ontology: WN [name pages] for (AND book (FILLS creator “Carl Sagan”)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for (AND book (FILLS doc-author-name “Carl Sagan”)) Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/ Print-Media Press Publication Journalism Newspaper Magazine Book Periodical Trade-Book Brochure TextBook Reference-Book SongBook PrayerBook Pictorial Series Journals CookBook Instruction-Book WordBook HandBook Directory Annual Encyclopedia Manual Bible GuideBook Instructions Reference-Manual
    • WN ontology and user query Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system “ Get title and number of pages of books written by Carl Sagan” Query construction - Example Eduardo Mena (III’98) User ontology: WN [name pages] for (AND book (FILLS creator “Carl Sagan”)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for (AND book (FILLS doc-author-name “Carl Sagan”)) Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
    • Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Estimating the loss of information Eduardo Mena (III’98)
      • To choose the plan with the least loss
      • To present a level of confidence in the answer
      • Based on intensional information (terminological difference)
      • Based on extensional information (precision and recall)
      Plans in the example User Query: (AND book (FILLS doc-author-name “Carl Sagan”)) Plan 1: (AND document (FILLS doc-author-name “Carl Sagan”)) Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”)) Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”)) Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-report) (FILLS doc-author-name “Carl Sagan”))
    • Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Loss of information based on intensional information Eduardo Mena (III’98) User Query: (AND book (FILLS doc-author-name “Carl Sagan”)) Plan 1: (AND document (FILLS doc-author-name “Carl Sagan”)) book:=(AND publication ( AT-LEAST 1 ISBN )) publication:=(AND document ( AT-LEAST 1 place-of-publication )) Loss: “Instead of books written by Carl Sagan, OBSERVER is providing all the documents written by Carl Sagan (even if they do not have an ISBN and place of publication)”
    • Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Example: loss for the plans Eduardo Mena (III’98) Plan 1: (AND document (FILLS doc-author-name “Carl Sagan”)) [case 2] 91.57% < (1-Loss) < 91.75% Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”)) 94.03% < (1-Loss) < 100% [case 3] Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”)) [case 3] 98.56% < (1-Loss) < 100% Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-report) (FILLS doc-author-name “Carl Sagan”)) [case 1] 0% < (1-Loss) < 7.22%
    • Summary Text Structured Databases Data Syntax, System Federated DB Semi-structured Metadata Structural, Schematic Mediator, Federated IS Visual, Scientific/Eng. Knowledge Semantic Knowledge Mgmt., Information Brokering, Cooperative IS
    • Agenda for research
      • Interoperation not at systems level, but at informational and possibly knowledge level
        • traditional database and information retrieval solutions do not suffice
        • need to understand context; measures of similarities
      • Need to increase impetus on semantic level issues involving terminological and contextual differences, possible perceptual or cognitive differences in future
        • information systems and humans need to cooperate, possible involving a coordination and collaborative processes
    • Related Reading
      • Books:
        • Information Brokering for Digital Media, Kashyap and Sheth, Kluwer, 1999 (to appear)
        • Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998
        • Cooperative Information Systems, Papazoglou and Schlageter Eds., Academic Press, 1998
        • Management of Heterogeneous and Autonomous Database Systems, Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998.
      • Special Issues and Proceedings:
        • Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998
        • Semantic Interoperability in Global Information Systems, Ouksel and Sheth, SIGMOD Record, March 1999.
      http://lsdis.cs.uga.edu [See publications on Metadata, Semantics,Context, InfoHarness/InfoQuilt] [email_address] Acknowledgements: Tarcisio Lima Vipul Kashyap