Ieee metadata-conf-1999-keynote-amit sheth


Published on

Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote given at IEEE Meta-Data, Bathesda, MD, April 6 1999.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ieee metadata-conf-1999-keynote-amit sheth

  1. 1. Bethesda, Maryland, April 6, 1999 Amit Sheth Large Scale Distributed Information Systems Lab University of Georgia
  2. 2. Three perspectives to GlobIS autonomy Information Integration Perspective distributionheterogeneity (terminological, semantic contextual) Information Brokering Perspective meta-data data knowledge information ―Vision‖ Perspective connectivity computing data
  3. 3. Evolving targets and approaches in integratingdata and information (a personal perspective) a society for ubiquitous exchange of (tradeable) information in all digital forms of representation; information anywhere, anytime, any formsGeneration III ADEPT, DL-II projects 1997... InfoQuiltGeneration II InfoSleuth, KMed, DL-I projects VisualHarness Infoscopes, HERMES, SIMS, 1990s InfoHarness Garlic,TSIMMIS,Harvest, RUFUS,...Generation I Mermaid Multibase, MRDSM, ADDS, 1980s DDTS IISS, Omnibase, ...
  4. 4. Generation I•Data recognized as corporate resource — leverage it!• Data predominantly in structured databases, different data models, transitioning from network and hierarchical to relational DBMSs• Heterogeneity (system, modeling and schematic) as well as need to support autonomy posed main challenges; major issues were data access and connectivity• Information integration through Federated architecture• Support for corporate IS applications as the primary objective, update often required, data integrity important
  5. 5. Generation I(heterogeneity in FDBMSs) Database System •Semantic Heterogeneity •Differences in DBMS • data models (abstractions, constraints, query languages) 1980s • System level support (concurrency control, commit, recovery) C Operating System o • file system m • naming, file types, operation m • transaction support u • IPC n 1970s Hardware/System i c • instruction set a • data representation/coding t • configuration i o n
  6. 6. Generation I(Federated Database Systems: Schema Architecture) External External • Dimensions for Schema Schema interoperability and integration: Federated ... distribution, autonomy Schema schema and heterogeneity integrationExport Export Export ... SchemaSchema Schema •Model Heterogeneity: Component ... Component Common/Canonical Schema Schema Data Model schema translation Schema Translation Local ... Local Schema Schema • Information sharing while preserving Component ... Component autonomy DBS DBS
  7. 7. Generation I(characterization of schematic conflicts in multidatabase systems) Schematic ConflictsDomain Definition Data Value Abstraction Level Schematic Entity Definition Incompatibility Incompatibility Incompatibility Discrepancies Incompatibility Naming Conflicts Known Generalization Data Value Naming Inconsistency Conflicts Attribute ConflictsData Representation Conflict Database Conflicts Temporal Aggregation Inconsistency Conflicts Entity Attribute Identifier Data Scaling Conflicts Conflict Conflicts Acceptable Inconsistency Data Value Schema Data Precision Isomorphism Entity Conflict Conflicts Conflicts Default Value Missing Data Conflicts BUT Items Conflicts these techniques for dealing with schematic Attribute Integrity Sheth & Kashyap, Kim & SeoConstraint Conflicts heterogeneity do not directly map to dealing with much larger variety of heterogeneous media
  8. 8. Generation II• Significant improvements in computing and connectivity (standardization of protocol, public network, Internet/Web); remote data access as given;• Increasing diversity in data formats, with focus on variety of textual data and semi-structured documents• Many more data sources, heterogeneous information sources, but not necessarily better understanding of data• Use of data beyond traditional business applications: mining + warehousing, marketing, e-commerce• Web search engines for keyword based querying against HTML pages; attribute-based querying available in a few search systems• Use of metadata for information access; early work on ontology support distribution applied to metadata in some cases• Mediator architecture for information management
  9. 9. Generation II(limited types of metadata, extractors, mappers, wrappers) Nexis Digital Videos UPI AP ... ... Documents Data Stores Global/Enterprise Digital Maps Web Repositories ... Digital Images Digital AudiosFind Marketing Manager positions in acompany that is within 15 miles of SanFrancisco and whose stock price hasbeen growing at a rate of at least 25% EXTRACTORSper year over the last three years Junglee, SIGMOD Record, Dec. 1997 METADATA
  10. 10. Generation II(a metadata classification: the informartion pyramid) METADATA STANDARDS User General Purpose: Ontologies Dublin Core, MCF Classifications Move in this Domain Models Domain/industry specific: direction to Geographic (FGDC, UDK, …), Domain Specific Metadata tackle Library (MARC,…) area, population (Census),information land-cover, relief (GIS),metadataoverload!! concept descriptions from ontologies Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Direct Content Based Metadata (inverted lists, document vectors, WAIS, Glimpse, LSI) Content Dependent Metadata(size, max colors, rows, columns...) Content Independent Metadata(creation-date, location, type-of-sensor...) Data(Heterogeneous Types/Media)
  11. 11. VisualHarness – an example
  12. 12. What‘s next (after comprehensive use of metadata)? Query processing and information requests NOW  traditional queries based on keywords  attribute based queries  content-based queries NEXT  ‗high level‘ information requests involving ontology-based, iconic, mixed-media, and media-independent information rrequests  user selected ontology, use of profiles
  13. 13. GIS Data Representation – Example multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas State FGDC Metadata Model UDK Metadata Model Theme keywords: digital line graph, Search terms: digital line graph, hydrography, transportation... hydrography, transportation... Title: Dakota Aquifer Topic: Dakota Aquifer Online linkage: Adress Id: Spatial Reference Method: Vector Measuring Techniques: VectorHorizontal Coordinate System Definition: Co-ordinate System: Universal Transverse Mercator Universal Transverse Mercator … … … ... … … … ...
  14. 14. Generation III• Increasing information overload and broader variety of information content (video content, audio clips etc) with increasing amount of visual information, scientific/engineering data• Continued standardization related to Web for representational and metadata issues (MCF, RDF, XML)• Changes in Web architecture; distributed computing (CORBA, Java)• Users demand simplicity, but complexities continue to rise• Web is no longer just another information source, but decision support through―data mining and information discovery, information fusion, information dissemination, knowledge creation and management‖, ―information management complemented by cooperation between the information system and humans‖•Information Brokering Architecture proposed for information management
  15. 15. Information Brokering: An Enabler for the Infocosm INFORMATION CONSUMERS arbitration between information People consumers and providers for resolving Corporations Programs information impedance Universities Government Information Information Information User User User Request Request Request Query Query Query INFORMATION/DATA INFORMATION BROKERING OVERLOADInformation Data Information Information Data Information System Repository System System Repository System Newswires Corporations dynamic reinterpretation of information requests for determination of relevant Universities Research Labs information services and products INFORMATION PROVIDERS — dynamic creation and composition of information products
  16. 16. Information Brokering: Three Dimensions THREE DIMENSIONS C O N S U M E R S B R O K E R S VOCABULARY M E T A D A T A P R O V I D E R S S E M A N T I C S D A T A S T R U C T U R E S Y N T A X S Y S T E M Objective: Reduce the problem of knowing structure and semantics of data in the huge number of information sources on a global scale to: understanding and navigating a significantly smaller number of domain ontologies
  17. 17. What else can Information Brokering do? W W W + Information Brokering WWW Domain Specific Ontologies as a confusing heterogeneity of media, “semantic (Tower of Babel) formats conceptual views” information correlation usingusing concept Information correlation physical (HREF) mappings at the extensional data level level links at the intensional concept Browsing of information using information location dependent browsing of terminological using physical (HREF) links relationships across ontologies user has to keep track of information content !! Higher level of abstraction, closer to user view of information !!
  18. 18. Concepts, tools and techniques to support semantics context semantic proximity inter-ontological relations media-independent information correlations ontologies(esp. domain-specific) profiles domain-specific metadata
  19. 19. Tools to support semantics • Context, context, context • Media-independent information correlations • Multiple ontologies – Semantic Proximity (relationships between concepts within and across ontologies) using domain, context, modeling/abstraction/representation, state – Characterizing Loss of Information incurred due to differences in vocabulary BIG challenge:identifying relationship or similarity between objects of different media, developed and managed by different persons and systems
  20. 20. Heterogeneity... … is a Babel Tower!! SEMANTIC HETEROGENEITY metadata ontologies contexts SEMANTIC INTEROPERABILITY
  21. 21. The InfoQuilt Project THE INFOQUILT VISION Semantic interoperability between systems, sharing knowledge using multiple ontologies Logical correlation of information Media independent information processing REALIZATION OF THE VISION fully distributed, adaptable, agent-based system information/knowledgement supported by collaborative processes
  22. 22. InfoQuilt Project: using the Metadata REFerence link MREF Complements HREF, creating a ―logical web‖ through media independent ontology & metadata based correlation It is a description of the information asset we want to retrieve Semantic Correlation using MREF MREF Concept constraints relations attributes Model for logical correlation usingdomain ontologies ontological terms MREF IQ_Asset ontology + and metadata extension ontologies Framework for RDF representing MREF‘s MREF Serialization (one implementation XML keywords content attributes choice) (color, scene cuts, …)
  23. 23. Domain Specific Correlation – example Potential locations for a future shopping mall identified by allregionshaving apopulationgreater than 5000, andareagreater than 50 sq. ft. having an urbanland cover and moderaterelief<A MREF ATTRIBUTES(population > 5000; area > 50;region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’) can be viewed here</A> domain specific metadata: terms chosen from domain specific ontologies Population: Area: =>media-independent relationshipsbetween domain Boundaries: specific metadata:population, Regions Land cover: area, land cover, relief (SQL): Image Features Relief: (image processing routines) =>correlation between image Boundaries and structured data at a higher domain specific level asopposed to physical ―link- chasing‖ in the WWW Census DB TIGER/Line DB US Geological Survey
  24. 24. Domain Specific Correlation – example
  26. 26. ADEPT Information Landscape Concept Prototype(a scenario for Digital Earth: learning in the context of the “El Niño” phenomenon) Sample Iscapes Requests: –How does El Niño affect sea animals? Look for broadcast videos of less than 2 minutes. – How are some regions affected by El Niño? Look at request information using East/West Pacific regions. keywords – What disasters have been related to El Niño? domain-specific attributes – What storm occurrencesattributes domain-independent are attributed to El Niño? – Show reports related to El Niño that contain Clinton. TRY ISCAPE CONCEPT DEMO
  27. 27. Putting MREFs to work IQ_Asset ontology + extension ontologies domain ontologies MREF Builder MREF User construct new MREF repository MREF repository User Agent User Profile Broker Agent profiles Manager
  28. 28. Context: the lynchpin of semantics Cricket ―For instance, if you were to use Yahoo! or Infoseek to search the web for pizza, your results would probably be hundreds of matches for the word pizza. Many of these could be pizza parlors around the world. Yet if you run the same search within NeighborNet, you will allows you to order pizza to be delivered instead of shipped.‖ From a Press Resease of FutureOne, Inc. March 24, 1999
  29. 29. Constructing c-contexts from ontological terms C-CONTEXT: ―All documents stored in the database have been published by some agency‖ DATABASE OBJECTS => Cdef(DOC) = <(hasOrganization, AgencyConcept)>AGENCY(RegNo, Name, Affiliation) C-Context = <(C1 , V1) (C2 , V2) ... (Ck , Vk) > DOC(Id, Title, Agency) a collection of contextual coordinatesCi s(roles) and valuesVi s(concepts/concept descriptions) Agency Concept Advantages: Document Concept Use of ontologies for an intensional domain specific description of data Representation of extra information Relationships between objects not ONTOLOGICAL TERMS represented in the database schema Using terminological relationships in the ontology
  30. 30. Using c-contexts to reason about EXAMPLEinformation in database Cdef(DOC) CQ <(hasOrganization, AgencyConcept)> <(hasOrganization,{―USGS‖})> glb(Cdef(DOC), CQ) <(self, DocumentConcept),(hasOrganization, { ―USGS‖ })> - Reasoning with c-contexts: glb(Cdef(DOC), CQ) - Ontological Inferences: - DocumentConcept - (hasOrganization, { ―USGS‖ }) Challenge 1: use of multiple ontologies Challenge 2: estimating the loss of information
  31. 31. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemOBSERVER architecture Data Repositories IRM Ontology Server Mappings Ontologies Interontologies Terminological Query User Relationships Processor Query IRM NODE USER NODE COMPONENT NODE COMPONENT NODE Ontology Ontology Server Server Mappings Mappings Query Ontologies Query Ontologies Processor Processor Data Repositories Data Repositories Eduardo Mena (III’98)
  32. 32. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemQuery construction - Example “Get title and number of pages of books written by Carl Sagan” User ontology: WN [name pages] for (AND book (FILLS creator “Carl Sagan”)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for (AND book (FILLS doc-author-name “Carl Sagan”)) Ontologies sites: Eduardo Mena (III’98)
  33. 33. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Query construction - Example Re-use of Knowledge: Biblio-Thing Bibliography Data Ontology Stanford-I “Get title and number of pages of books written by Carl Sagan” Document Conference Agent User ontology: WN Person Organization [name pages] for AuthorBook Technical-Report (AND book (FILLS creator “Carl Sagan”)) Publisher University Miscellaneous-Publication Proceedings Target ontology: Stanford-IEdited-Book Thesis Integrated ontology WN-Stanford-I Periodical-Publication Technical-Manual Cartographic-Map [title number-of-pages] for Doctoral-Thesis Computer-Program Multimedia-DocumentJournal Newspaper (AND book (FILLS doc-author-name “Carl Sagan”)) Master-Thesis Artwork Magazine Ontologies sites: Eduardo Mena (III’98)
  34. 34. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Re-use of Knowledge: Query construction - Example Print-Media A subset of WordNet 1.5 “Get title and number of pages of books written by Carl Journalism Press Publication Sagan” UserNewspaper ontology: WN Magazine Periodical Book [name pages] for Journals Pictorial SeriesTrade-Book Brochure (AND book (FILLS creator “Carl Sagan”)) TextBook SongBook Reference-Book PrayerBook Target ontology: Stanford-I CookBook Encyclopedia Integrated ontology WN-Stanford-I WordBook Instruction-Book HandBook Directory Annual [title number-of-pages] for GuideBook (AND book (FILLS doc-author-name “Carl Sagan”)) Manual Bible Ontologies sites: Instructions Reference-Manual Eduardo Mena (III’98)
  35. 35. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system WN ontology and user queryQuery construction - Example“Get title and number of pages of books written by Carl Sagan” User ontology: WN [name pages] for (AND book (FILLS creator “Carl Sagan”)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for (AND book (FILLS doc-author-name “Carl Sagan”))Ontologies sites: Eduardo Mena (III’98)
  36. 36. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemEstimating the loss of information To choose the plan with the least loss To present a level of confidence in the answer Based on intensional information (terminological difference) Based on extensional information (precision and recall) Plans in the example User Query: (AND book (FILLS doc-author-name “Carl Sagan”)) Plan 1: (ANDdocument(FILLS doc-author-name “Carl Sagan”)) Plan 2: (ANDperiodical-publication (FILLS doc-author-name “Carl Sagan”)) Plan 3: (ANDjournal(FILLS doc-author-name “Carl Sagan”)) Plan 4: (ANDUNION(book, proceedings, thesis, misc-publication, technical-report) (FILLS doc-author-name “Carl Sagan”)) Eduardo Mena (III’98)
  37. 37. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemLoss of information based on intensional information User Query: (AND book (FILLS doc-author-name “Carl Sagan”)) Plan 1: (ANDdocument (FILLS doc-author-name “Carl Sagan”)) book:=(AND publication (AT-LEAST 1 ISBN)) publication:=(AND document (AT-LEAST 1 place-of-publication)) Loss:“Instead of books written by Carl Sagan, OBSERVER is providing all the documents written by Carl Sagan (even if they do not have an ISBN and place of publication)” Eduardo Mena (III’98)
  38. 38. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemExample: loss for the plans Plan 1:(AND document (FILLS doc-author-name “Carl Sagan”)) [case 2] 91.57% < (1-Loss) < 91.75% Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”)) 94.03% < (1-Loss) < 100%[case 3] Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”)) [case 3] 98.56% < (1-Loss) < 100% Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical- report) (FILLS doc-author-name “Carl Sagan”)) [case 1] 0% < (1-Loss) < 7.22% Eduardo Mena (III’98)
  39. 39. Summary Knowledge Mgmt., Visual, Information Knowledge Semantic Scientific/Eng. Brokering, Cooperative IS Structural, Mediator, Semi-structured Metadata Schematic Federated IS Text Syntax, Data Federated DBStructured Databases System
  40. 40. Agenda for research Interoperation not at systems level, but at informational and possibly knowledge level – traditional database and information retrieval solutions do not suffice – need to understand context; measures of similarities Need to increase impetus on semantic level issues involving terminological and contextual differences, possible perceptual or cognitive differences in future – information systems and humans need to cooperate, possible involving a coordination and collaborative processes
  41. 41. Related Reading Books: Information Brokering for Digital Media, Kashyap and Sheth, Kluwer, 1999 (to appear) Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998 Cooperative Information Systems, Papazoglou and Schlageter Eds., Academic Press, 1998 Management of Heterogeneous and Autonomous Database Systems, Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998. Special Issues and Proceedings: Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998 Semantic Interoperability in Global Information Systems, Ouksel and Sheth, SIGMOD Record, March 1999. Acknowledgements: [See publications on Metadata, Semantics,Context, Tarcisio Lima InfoHarness/InfoQuilt] Vipul Kashyap