SlideShare a Scribd company logo
1 of 41
Bethesda, Maryland, April 6, 1999




                                    Amit Sheth
                                    Large Scale Distributed Information Systems Lab
                                    University of Georgia
                                    http://lsdis.cs.uga.edu
Three perspectives to GlobIS

        autonomy




                                   Information Integration Perspective
                    distribution

heterogeneity                                                      (terminological,
                                                        semantic
                                                                   contextual)


   Information Brokering Perspective                   meta-data


                                                          data
                   knowledge


                   information        ―Vision‖ Perspective
  connectivity computing data
Evolving targets and approaches in integrating
data and information (a personal perspective)
   a society for ubiquitous exchange of (tradeable)
  information in all digital forms of representation;
      information anywhere, anytime, any forms



Generation III             ADEPT,
                                                              DL-II projects
   1997...                InfoQuilt



Generation II                                    InfoSleuth, KMed, DL-I projects
                      VisualHarness
                                                  Infoscopes, HERMES, SIMS,
   1990s               InfoHarness              Garlic,TSIMMIS,Harvest, RUFUS,...



Generation I             Mermaid                        Multibase, MRDSM, ADDS,
   1980s                  DDTS                               IISS, Omnibase, ...
Generation I

•Data recognized as corporate resource — leverage it!
• Data predominantly in structured databases, different data models,
  transitioning from network and hierarchical to relational DBMSs

• Heterogeneity (system, modeling and schematic) as well as need to
  support autonomy posed main challenges;
  major issues were data access and connectivity

• Information integration through Federated architecture
• Support for corporate IS applications as the primary objective,
  update often required, data integrity important
Generation I
(heterogeneity in FDBMSs)



                      Database System
                      •Semantic Heterogeneity
                      •Differences in DBMS
                           • data models
                           (abstractions, constraints, query languages)
         1980s             • System level support
                           (concurrency control, commit, recovery)

                                                                 C
                       Operating System
                                                                 o
                       • file system                             m
                       • naming, file types, operation           m
                       • transaction support                     u
                       • IPC                                     n
         1970s          Hardware/System
                                                                 i
                                                                 c
                        • instruction set                        a
                        • data representation/coding             t
                        • configuration                          i
                                                                 o
                                                                 n
Generation I
(Federated Database Systems: Schema Architecture)


           External                   External
                                                      •   Dimensions for
           Schema                     Schema              interoperability and
                                                          integration:
                   Federated
                                            ...       distribution, autonomy
                    Schema
                          schema
                                                          and heterogeneity
                        integration
Export       Export                         Export
                           ...              Schema
Schema       Schema
                                                      •Model Heterogeneity:
      Component         ...               Component       Common/Canonical
       Schema                              Schema         Data Model
                                          schema
                                       translation
                                                          Schema Translation
          Local         ...                   Local
         Schema                              Schema   •   Information sharing
                                                          while preserving
       Component        ...               Component
                                                          autonomy
         DBS                                DBS
Generation I
(characterization of schematic conflicts in multidatabase systems)

                                          Schematic
                                           Conflicts


Domain Definition        Data Value      Abstraction Level     Schematic        Entity Definition
 Incompatibility       Incompatibility    Incompatibility    Discrepancies      Incompatibility



 Naming Conflicts          Known           Generalization      Data Value           Naming
                        Inconsistency        Conflicts          Attribute           Conflicts
Data Representation
                                                                Conflict           Database
     Conflicts            Temporal          Aggregation
                        Inconsistency        Conflicts       Entity Attribute      Identifier
   Data Scaling                                                                    Conflicts
                                                                Conflict
    Conflicts             Acceptable
                        Inconsistency                         Data Value             Schema
  Data Precision                                                                  Isomorphism
                                                             Entity Conflict
    Conflicts                                                                       Conflicts
   Default Value                                                                   Missing Data
     Conflicts                               BUT
                                                                                 Items Conflicts
                        these techniques for dealing with schematic
 Attribute Integrity                 Sheth & Kashyap, Kim & Seo
Constraint Conflicts    heterogeneity do not directly map to dealing
                         with much larger variety of heterogeneous
                                             media
Generation II

•   Significant improvements in computing and connectivity (standardization
    of protocol, public network, Internet/Web); remote data access as given;
•   Increasing diversity in data formats, with focus on variety of textual data
    and semi-structured documents
•   Many more data sources, heterogeneous information sources,
    but not necessarily better understanding of data
•   Use of data beyond traditional business applications:
    mining + warehousing, marketing, e-commerce
•   Web search engines for keyword based querying against HTML pages;
    attribute-based querying available in a few search systems
•   Use of metadata for information access; early work on ontology support
    distribution applied to metadata in some cases
•   Mediator architecture for information management
Generation II
(limited types of metadata, extractors, mappers, wrappers)




                                                  Nexis        Digital Videos
                                                  UPI
                                                  AP
                                                             ...                  ...
                                                 Documents                              Data Stores
                            Global/Enterprise                      Digital Maps
                            Web Repositories
                                                                        ...
                                                    Digital Images            Digital Audios

Find Marketing Manager positions in a
company that is within 15 miles of San
Francisco and whose stock price has
been growing at a rate of at least 25%                    EXTRACTORS
per year over the last three years
             Junglee, SIGMOD Record, Dec. 1997            METADATA
Generation II
(a metadata classification: the informartion pyramid)


                                                               METADATA STANDARDS
                                          User
                                                                          General Purpose:
                                     Ontologies
                                                                          Dublin Core, MCF
                                     Classifications
              Move in this           Domain Models                Domain/industry specific:
        direction to                                           Geographic (FGDC, UDK, …),
                       Domain Specific Metadata
          tackle                                                        Library (MARC,…)
                       area, population (Census),
information              land-cover, relief (GIS),metadata
overload!!             concept descriptions from ontologies

              Domain Independent (structural) Metadata
              (C++ class-subclass relationships, HTML/SGML
                Document Type Definitions, C program structure...)
      Direct Content Based Metadata
      (inverted lists, document vectors, WAIS, Glimpse, LSI)

     Content Dependent Metadata(size, max colors, rows, columns...)
  Content Independent Metadata(creation-date, location, type-of-sensor...)

                             Data(Heterogeneous Types/Media)
VisualHarness – an example
What‘s next (after comprehensive use of metadata)?

     Query processing and information requests

    NOW
         traditional queries based on keywords
         attribute based queries
         content-based queries


    NEXT
         ‗high level‘ information requests involving
          ontology-based, iconic, mixed-media, and
          media-independent information rrequests
         user selected ontology, use of profiles
GIS Data Representation – Example

            multiple heterogeneous metadata models with different
             tag names for the same data in the same GIS domain

                                           Kansas State




    FGDC Metadata Model                                     UDK Metadata Model
  Theme keywords: digital line graph,                      Search terms: digital line graph,
    hydrography, transportation...                          hydrography, transportation...

          Title: Dakota Aquifer                                 Topic: Dakota Aquifer

              Online linkage:                                          Adress Id:
   http://gisdasc.kgs.ukans.edu/dasc/                     http://gisdasc.kgs.ukans.edu/dasc/

Direct Spatial Reference Method: Vector                     Measuring Techniques: Vector

Horizontal Coordinate System Definition:                       Co-ordinate System:
     Universal Transverse Mercator                         Universal Transverse Mercator
             … … … ...                                              … … … ...
Generation III

•   Increasing information overload and broader variety of information
    content (video content, audio clips etc) with increasing amount of visual
    information, scientific/engineering data

•   Continued standardization related to Web for representational and metadata
    issues (MCF, RDF, XML)

• Changes in Web architecture; distributed computing (CORBA, Java)
• Users demand simplicity, but complexities continue to rise
• Web is no longer just another information source, but decision support through
―data mining and information discovery, information fusion, information
    dissemination, knowledge creation and management‖, ―information management
    complemented by cooperation between the information system and humans‖

•Information Brokering Architecture proposed for information management
Information Brokering: An Enabler for the Infocosm

  INFORMATION CONSUMERS                      arbitration between information
                   People                 consumers and providers for resolving
  Corporations
                         Programs                information impedance
    Universities    Government

                                          Information   Information   Information
  User         User           User          Request       Request       Request
  Query        Query          Query


      INFORMATION/DATA
                                          INFORMATION BROKERING
          OVERLOAD

Information     Data        Information   Information     Data        Information
  System      Repository      System        System      Repository      System



   Newswires       Corporations           dynamic reinterpretation of information
                                           requests for determination of relevant
     Universities Research Labs
                                             information services and products
   INFORMATION PROVIDERS                                     —
                                           dynamic creation and composition of
                                                   information products
Information Brokering: Three Dimensions


                    THREE DIMENSIONS


                               C O N S U M E R S

                              B R O K E R S




                                                                               VOCABULARY
                                                             M E T A D A T A
                          P R O V I D E R S


                       S E M A N T I C S




                                                   D A T A
                       S T R U C T U R E

                          S Y N T A X

                          S Y S T E M




                                 Objective:
  Reduce the problem of knowing structure and semantics of data in the huge
    number of information sources on a global scale to: understanding and
       navigating a significantly smaller number of domain ontologies
What else can Information Brokering do?


         W W W + Information Brokering
                   WWW
         Domain Specific Ontologies as
        a confusing heterogeneity of media,
          “semantic (Tower of Babel)
             formats conceptual views”

   information correlation usingusing concept
     Information correlation physical (HREF)
   mappings at the extensional data level level
         links at the intensional concept

 Browsing of information using information
    location dependent browsing of terminological
            using physical (HREF) links
        relationships across ontologies
   user has to keep track of information content !!
       Higher level of abstraction, closer
         to user view of information !!
Concepts, tools and techniques to support semantics



    context                      semantic
                                 proximity     inter-ontological
                                                   relations

                            media-independent
                         information correlations

 ontologies
(esp. domain-specific)                              profiles


                          domain-specific metadata
Tools to support semantics


  •   Context, context, context

  •   Media-independent information correlations

  •   Multiple ontologies
       –   Semantic Proximity (relationships between concepts within
           and across ontologies) using domain, context,
           modeling/abstraction/representation, state
       –   Characterizing Loss of Information incurred due to
           differences in vocabulary



               BIG challenge:identifying relationship or
            similarity between objects of different media,
      developed and managed by different persons and systems
Heterogeneity...                   … is a Babel Tower!!


                   SEMANTIC HETEROGENEITY


                        metadata

                    ontologies

                   contexts


      SEMANTIC INTEROPERABILITY
The InfoQuilt Project

   THE INFOQUILT VISION
    Semantic interoperability between systems, sharing knowledge
    using multiple ontologies
    Logical correlation of information
    Media independent information processing



   REALIZATION OF THE VISION
    fully distributed, adaptable, agent-based system
    information/knowledgement supported by collaborative
    processes




                   http://lsdis.cs.uga.edu/proj/iq/iq.html
InfoQuilt Project: using the Metadata REFerence link

                                         MREF
         Complements HREF, creating a ―logical web‖ through media
           independent ontology & metadata based correlation
        It is a description of the information asset we want to retrieve

 Semantic Correlation using MREF                            MREF Concept
                 constraints
             relations
                   attributes                    Model for logical
                                                 correlation using
domain ontologies                                ontological terms    MREF
                      IQ_Asset ontology +          and metadata
                      extension ontologies
                                                   Framework for      RDF
                                                representing MREF‘s
             MREF
                                                    Serialization
                                                (one implementation   XML
 keywords              content attributes              choice)
                     (color, scene cuts, …)


                           http://lsdis.cs.uga.edu/proj/iq/iq.html
Domain Specific Correlation – example
     Potential locations for a future shopping mall identified by allregionshaving
   apopulationgreater than 5000, andareagreater than 50 sq. ft. having an urban
land cover and moderaterelief<A MREF ATTRIBUTES(population > 5000; area > 50;
region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’) can be viewed here</A>


   domain specific metadata: terms chosen from domain specific ontologies

                 Population:
                 Area:
                                                         =>media-independent
                                                           relationshipsbetween domain
                 Boundaries:
                                                           specific metadata:population,
   Regions       Land cover:                               area, land cover, relief
   (SQL):                            Image Features
                 Relief:             (image processing
                                          routines)      =>correlation between image
                    Boundaries                             and structured data at a
                                                           higher domain specific level
                                                           asopposed to physical ―link-
                                                           chasing‖ in the WWW
 Census DB     TIGER/Line DB     US Geological Survey
Domain Specific Correlation – example
A DL II approach for Information Brokering

               Iscape 1                               Iscape N

           CONSTRUCTING APPROPRIATE INFORMATION LANDSCAPES


                        CONSTRUCTING ADDITIONAL
                       META-INFORMATION RESOURCES




                       DISCOVERING COLLECTIONS OF
                     HETEROGENEOUS INFORMATION AND
                       META-INFORMATION RESOURCES

   Domain
   Specific                                                            Domain
  Ontologies                                                         Independent
                   Images    Data Stores   Documents Digital Media
                                                                      Ontologies




   Physical/Simulation
                World
ADEPT Information Landscape Concept Prototype
(a scenario for Digital Earth:
           learning in the context of the “El Niño” phenomenon)


      Sample Iscapes Requests:

           –How does El Niño affect sea animals? Look for
             broadcast videos of less than 2 minutes.

           – How are some regions affected by El Niño? Look at
             request information using
              East/West Pacific regions.
               keywords
           – What disasters have been related to El Niño?
               domain-specific attributes
           – What storm occurrencesattributes
               domain-independent are attributed to El Niño?

           – Show reports related to El Niño that contain Clinton.


                  TRY ISCAPE CONCEPT DEMO
Putting MREFs to work



                                            IQ_Asset ontology +
                                            extension ontologies
   domain ontologies
                         MREF Builder
                                                         MREF
   User                construct new MREF              repository




                                                         MREF
                                                       repository
                                  User
                                  Agent


  User                  Profile      Broker Agent
 profiles              Manager
Context: the lynchpin of semantics

 Cricket


  ―For instance, if you were to use Yahoo! or Infoseek to
    search the web for pizza, your results would probably
    be hundreds of matches for the word pizza. Many of
    these could be pizza parlors around the world. Yet if
    you run the same search within NeighborNet, you will
    allows you to order pizza to be delivered instead of
    shipped.‖

        From a Press Resease of FutureOne, Inc. March 24, 1999
              http://home.futureone.com/about/pr/021699.asp
Constructing c-contexts from ontological terms
                                   C-CONTEXT:

                                   ―All documents stored in the database
                                   have been published by some agency‖
          DATABASE
          OBJECTS                  => Cdef(DOC) = <(hasOrganization, AgencyConcept)>

AGENCY(RegNo, Name, Affiliation)   C-Context = <(C1 , V1) (C2 , V2) ... (Ck , Vk) >
     DOC(Id, Title, Agency)        a collection of
                                   contextual coordinatesCi s(roles) and
                                   valuesVi s(concepts/concept descriptions)
  Agency
  Concept                             Advantages:
                    Document
                     Concept            Use of ontologies for an intensional
                                        domain specific description of data
                                        Representation of extra information
                                               Relationships between objects not
    ONTOLOGICAL TERMS                          represented in the database schema
                                               Using terminological relationships in
                                               the ontology
Using c-contexts to reason about
                                                         EXAMPLE
information in database

             Cdef(DOC)                                    CQ
 <(hasOrganization, AgencyConcept)>          <(hasOrganization,{―USGS‖})>




                             glb(Cdef(DOC), CQ)
           <(self, DocumentConcept),(hasOrganization, { ―USGS‖ })>


            - Reasoning with c-contexts: glb(Cdef(DOC), CQ)
            - Ontological Inferences:
              - DocumentConcept
              - (hasOrganization, { ―USGS‖ })


             Challenge 1: use of multiple ontologies

             Challenge 2: estimating the loss of information
Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
OBSERVER architecture
                                                                    Data Repositories


                  IRM

                                                       Ontology
                                                        Server                          Mappings

                                                                         Ontologies
                    Interontologies
                    Terminological                        Query                    User
                    Relationships                       Processor                  Query

       IRM NODE                                                                            USER NODE


                                      COMPONENT NODE                                                 COMPONENT NODE

            Ontology                                                        Ontology
             Server                                                          Server
                                      Mappings                                                       Mappings



        Query           Ontologies                                     Query            Ontologies
      Processor                                                      Processor




                   Data Repositories                                             Data Repositories

                                                                                                   Eduardo Mena (III’98)
Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Query construction - Example

 “Get title and number of pages of books written by Carl Sagan”

 User ontology: WN
             [name pages] for
                         (AND book (FILLS creator “Carl Sagan”))
 Target ontology: Stanford-I
 Integrated ontology WN-Stanford-I
             [title number-of-pages] for
                        (AND book (FILLS doc-author-name “Carl Sagan”))
 Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
                   http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/

                                                                                  Eduardo Mena (III’98)
Estimating information loss for multi-ontology based
    query processing in the OBSERVER/InfoQuilt system
       Query construction - Example                                     Re-use of Knowledge:
                              Biblio-Thing                           Bibliography Data Ontology
          Stanford-I
       “Get title and number of pages of books written by Carl Sagan”
                      Document                       Conference                         Agent
        User ontology: WN
                                                                          Person                  Organization
                   [name pages] for                                                   Author
Book                                            Technical-Report
                               (AND book (FILLS creator “Carl Sagan”))
                                                                     Publisher                       University
                                              Miscellaneous-Publication
                  Proceedings
        Target ontology: Stanford-I
Edited-Book
                                     Thesis
        Integrated ontology WN-Stanford-I
        Periodical-Publication    Technical-Manual
                                                                                          Cartographic-Map
                   [title number-of-pages] for
                            Doctoral-Thesis  Computer-Program
                                                                                        Multimedia-Document
Journal            Newspaper
                           (AND       book (FILLS doc-author-name “Carl Sagan”))
                                         Master-Thesis       Artwork

        Magazine
       Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
                         http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/

                                                                                        Eduardo Mena (III’98)
Estimating information loss for multi-ontology based
    query processing in the OBSERVER/InfoQuilt system
                                                                       Re-use of Knowledge:
    Query construction - Example
                                                   Print-Media        A subset of WordNet 1.5

     “Get title and number of pages of books written by Carl Journalism
      Press                    Publication
                                                             Sagan”

      User
Newspaper     ontology: WN
                Magazine                                                           Periodical
                                                Book
                 [name pages] for                                                                 Journals
                                                                                 Pictorial
                                                                                             Series
Trade-Book      Brochure     (AND book (FILLS creator “Carl Sagan”))
                               TextBook
                                                 SongBook
                                        Reference-Book                      PrayerBook
      Target ontology: Stanford-I
   CookBook                                                               Encyclopedia
      Integrated ontology WN-Stanford-I
                     WordBook
       Instruction-Book                   HandBook        Directory       Annual
                 [title number-of-pages] for
                                                              GuideBook
                            (AND book (FILLS doc-author-name “Carl Sagan”))
                                  Manual   Bible

     Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
                        Instructions             Reference-Manual
                       http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/

                                                                                      Eduardo Mena (III’98)
Estimating information loss for multi-ontology based
 query processing in the OBSERVER/InfoQuilt system
                                WN ontology and user query
Query construction - Example

“Get title and number of pages of books written by Carl Sagan”

 User ontology: WN
            [name pages] for
                        (AND book (FILLS creator “Carl Sagan”))
 Target ontology: Stanford-I
 Integrated ontology WN-Stanford-I
            [title number-of-pages] for
                       (AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
                  http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/

                                                                                    Eduardo Mena (III’98)
Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Estimating the loss of information

   To choose the plan with the least loss
     To present a level of confidence in the answer
     Based on intensional information (terminological difference)
     Based on extensional information (precision and recall)

 Plans in the example
                             User Query: (AND book
                                       (FILLS doc-author-name “Carl Sagan”))

 Plan 1: (ANDdocument(FILLS doc-author-name “Carl Sagan”))
 Plan 2: (ANDperiodical-publication (FILLS doc-author-name “Carl Sagan”))
 Plan 3: (ANDjournal(FILLS doc-author-name “Carl Sagan”))
 Plan 4: (ANDUNION(book, proceedings, thesis, misc-publication, technical-report)
            (FILLS doc-author-name “Carl Sagan”))

                                                                 Eduardo Mena (III’98)
Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Loss of information based on intensional information

   User Query: (AND book (FILLS doc-author-name “Carl Sagan”))


 Plan 1:
 (ANDdocument (FILLS doc-author-name “Carl Sagan”))
     book:=(AND publication (AT-LEAST 1 ISBN))
     publication:=(AND document (AT-LEAST 1 place-of-publication))


   Loss:“Instead of books written by Carl Sagan, OBSERVER is
   providing all the documents written by Carl Sagan (even if they
   do not have an ISBN and place of publication)”



                                                           Eduardo Mena (III’98)
Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Example: loss for the plans

  Plan 1:(AND document (FILLS doc-author-name “Carl Sagan”))      [case 2]

                    91.57% < (1-Loss) < 91.75%

  Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”))

  94.03% < (1-Loss) < 100%[case 3]

  Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”))      [case 3]

                    98.56% < (1-Loss) < 100%

  Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-
  report) (FILLS doc-author-name “Carl Sagan”))                   [case 1]

  0% < (1-Loss) < 7.22%



                                                                 Eduardo Mena (III’98)
Summary


                                                 Knowledge Mgmt.,
       Visual,                                      Information
                       Knowledge   Semantic
   Scientific/Eng.                                   Brokering,
                                                  Cooperative IS


                                   Structural,       Mediator,
  Semi-structured      Metadata
                                   Schematic        Federated IS




        Text                        Syntax,
                         Data                      Federated DB
Structured Databases                System
Agenda for research

  Interoperation not at systems level, but at informational and
  possibly knowledge level
   – traditional database and information retrieval solutions
     do not suffice
   – need to understand context; measures of similarities
  Need to increase impetus on semantic level issues involving
  terminological and contextual differences, possible perceptual
  or cognitive differences in future
   – information systems and humans need to cooperate,
     possible involving a coordination and collaborative
     processes
Related Reading
   Books:
       Information Brokering for Digital Media, Kashyap and Sheth, Kluwer,
       1999 (to appear)
       Multimedia Data Management: Using Metadata to Integrate and Apply
       Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998
       Cooperative Information Systems, Papazoglou and Schlageter Eds.,
       Academic Press, 1998
       Management of Heterogeneous and Autonomous Database Systems,
       Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998.

 Special Issues and Proceedings:
       Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998
       Semantic Interoperability in Global Information Systems, Ouksel and
       Sheth, SIGMOD Record, March 1999.

 http://lsdis.cs.uga.edu                                 Acknowledgements:
 [See publications on Metadata, Semantics,Context,                 Tarcisio Lima
 InfoHarness/InfoQuilt]                                           Vipul Kashyap
 amit@cs.uga.edu

More Related Content

Viewers also liked (10)

Property Alignment on Linked Open Data
Property Alignment on Linked Open DataProperty Alignment on Linked Open Data
Property Alignment on Linked Open Data
 
Trust networks tutorial-iicai-12-15-2011
Trust networks tutorial-iicai-12-15-2011Trust networks tutorial-iicai-12-15-2011
Trust networks tutorial-iicai-12-15-2011
 
Big data healthcare
Big data healthcareBig data healthcare
Big data healthcare
 
Real Time Semantic Analysis of Streaming Sensor Data
Real Time Semantic Analysis of Streaming Sensor DataReal Time Semantic Analysis of Streaming Sensor Data
Real Time Semantic Analysis of Streaming Sensor Data
 
2011 national geographic_photos
2011 national geographic_photos2011 national geographic_photos
2011 national geographic_photos
 
Cursing in English on Twitter at CSCW 2014
Cursing in English on Twitter at CSCW 2014Cursing in English on Twitter at CSCW 2014
Cursing in English on Twitter at CSCW 2014
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
Semantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life SciencesSemantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life Sciences
 
NCSU invited talk: Leveraging Social Media for Tourism Marketplace Coordination
NCSU invited talk: Leveraging Social Media for Tourism Marketplace CoordinationNCSU invited talk: Leveraging Social Media for Tourism Marketplace Coordination
NCSU invited talk: Leveraging Social Media for Tourism Marketplace Coordination
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 

Similar to Ieee metadata-conf-1999-keynote-amit sheth

Requirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsRequirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsStoitsis Giannis
 
Semantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsSemantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2David Linthicum
 
From file-based production to real-time co-production
From file-based production to real-time co-productionFrom file-based production to real-time co-production
From file-based production to real-time co-productionMaarten Verwaest
 
Thomas.mc vittie
Thomas.mc vittieThomas.mc vittie
Thomas.mc vittieNASAPMC
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyGarethKnight
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011SEO CAMP
 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Benjamin Heitmann
 
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Amit Sheth
 
Creating an RAD Authoratative Data Environment
Creating an RAD Authoratative Data EnvironmentCreating an RAD Authoratative Data Environment
Creating an RAD Authoratative Data Environmentanicewick
 
Syllabus for screening test 10+2 lecturer in computer sciences..
Syllabus for screening test 10+2 lecturer in computer sciences..Syllabus for screening test 10+2 lecturer in computer sciences..
Syllabus for screening test 10+2 lecturer in computer sciences..Ashish Sharma
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...Stichting ePortfolio Support
 
Software Design_Se lect16 btech
Software Design_Se lect16 btechSoftware Design_Se lect16 btech
Software Design_Se lect16 btechIIITA
 
Semantic Interoperability and Information Brokering in Global Information Sys...
Semantic Interoperability and Information Brokering in Global Information Sys...Semantic Interoperability and Information Brokering in Global Information Sys...
Semantic Interoperability and Information Brokering in Global Information Sys...Amit Sheth
 

Similar to Ieee metadata-conf-1999-keynote-amit sheth (20)

STI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital WorldsSTI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital Worlds
 
Requirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsRequirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender Systems
 
Semantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsSemantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information Systems
 
Sensor Data Management
Sensor Data ManagementSensor Data Management
Sensor Data Management
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2
 
From file-based production to real-time co-production
From file-based production to real-time co-productionFrom file-based production to real-time co-production
From file-based production to real-time co-production
 
Thomas.mc vittie
Thomas.mc vittieThomas.mc vittie
Thomas.mc vittie
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategy
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...
 
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
 
Creating an RAD Authoratative Data Environment
Creating an RAD Authoratative Data EnvironmentCreating an RAD Authoratative Data Environment
Creating an RAD Authoratative Data Environment
 
Syllabus for screening test 10+2 lecturer in computer sciences..
Syllabus for screening test 10+2 lecturer in computer sciences..Syllabus for screening test 10+2 lecturer in computer sciences..
Syllabus for screening test 10+2 lecturer in computer sciences..
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
 
Defense Powepoint
Defense PowepointDefense Powepoint
Defense Powepoint
 
Software Design_Se lect16 btech
Software Design_Se lect16 btechSoftware Design_Se lect16 btech
Software Design_Se lect16 btech
 
Project P
Project PProject P
Project P
 
Semantic Interoperability and Information Brokering in Global Information Sys...
Semantic Interoperability and Information Brokering in Global Information Sys...Semantic Interoperability and Information Brokering in Global Information Sys...
Semantic Interoperability and Information Brokering in Global Information Sys...
 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
 
Oracle: Dw Design
Oracle: Dw DesignOracle: Dw Design
Oracle: Dw Design
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Ieee metadata-conf-1999-keynote-amit sheth

  • 1. Bethesda, Maryland, April 6, 1999 Amit Sheth Large Scale Distributed Information Systems Lab University of Georgia http://lsdis.cs.uga.edu
  • 2. Three perspectives to GlobIS autonomy Information Integration Perspective distribution heterogeneity (terminological, semantic contextual) Information Brokering Perspective meta-data data knowledge information ―Vision‖ Perspective connectivity computing data
  • 3. Evolving targets and approaches in integrating data and information (a personal perspective) a society for ubiquitous exchange of (tradeable) information in all digital forms of representation; information anywhere, anytime, any forms Generation III ADEPT, DL-II projects 1997... InfoQuilt Generation II InfoSleuth, KMed, DL-I projects VisualHarness Infoscopes, HERMES, SIMS, 1990s InfoHarness Garlic,TSIMMIS,Harvest, RUFUS,... Generation I Mermaid Multibase, MRDSM, ADDS, 1980s DDTS IISS, Omnibase, ...
  • 4. Generation I •Data recognized as corporate resource — leverage it! • Data predominantly in structured databases, different data models, transitioning from network and hierarchical to relational DBMSs • Heterogeneity (system, modeling and schematic) as well as need to support autonomy posed main challenges; major issues were data access and connectivity • Information integration through Federated architecture • Support for corporate IS applications as the primary objective, update often required, data integrity important
  • 5. Generation I (heterogeneity in FDBMSs) Database System •Semantic Heterogeneity •Differences in DBMS • data models (abstractions, constraints, query languages) 1980s • System level support (concurrency control, commit, recovery) C Operating System o • file system m • naming, file types, operation m • transaction support u • IPC n 1970s Hardware/System i c • instruction set a • data representation/coding t • configuration i o n
  • 6. Generation I (Federated Database Systems: Schema Architecture) External External • Dimensions for Schema Schema interoperability and integration: Federated ... distribution, autonomy Schema schema and heterogeneity integration Export Export Export ... Schema Schema Schema •Model Heterogeneity: Component ... Component Common/Canonical Schema Schema Data Model schema translation Schema Translation Local ... Local Schema Schema • Information sharing while preserving Component ... Component autonomy DBS DBS
  • 7. Generation I (characterization of schematic conflicts in multidatabase systems) Schematic Conflicts Domain Definition Data Value Abstraction Level Schematic Entity Definition Incompatibility Incompatibility Incompatibility Discrepancies Incompatibility Naming Conflicts Known Generalization Data Value Naming Inconsistency Conflicts Attribute Conflicts Data Representation Conflict Database Conflicts Temporal Aggregation Inconsistency Conflicts Entity Attribute Identifier Data Scaling Conflicts Conflict Conflicts Acceptable Inconsistency Data Value Schema Data Precision Isomorphism Entity Conflict Conflicts Conflicts Default Value Missing Data Conflicts BUT Items Conflicts these techniques for dealing with schematic Attribute Integrity Sheth & Kashyap, Kim & Seo Constraint Conflicts heterogeneity do not directly map to dealing with much larger variety of heterogeneous media
  • 8. Generation II • Significant improvements in computing and connectivity (standardization of protocol, public network, Internet/Web); remote data access as given; • Increasing diversity in data formats, with focus on variety of textual data and semi-structured documents • Many more data sources, heterogeneous information sources, but not necessarily better understanding of data • Use of data beyond traditional business applications: mining + warehousing, marketing, e-commerce • Web search engines for keyword based querying against HTML pages; attribute-based querying available in a few search systems • Use of metadata for information access; early work on ontology support distribution applied to metadata in some cases • Mediator architecture for information management
  • 9. Generation II (limited types of metadata, extractors, mappers, wrappers) Nexis Digital Videos UPI AP ... ... Documents Data Stores Global/Enterprise Digital Maps Web Repositories ... Digital Images Digital Audios Find Marketing Manager positions in a company that is within 15 miles of San Francisco and whose stock price has been growing at a rate of at least 25% EXTRACTORS per year over the last three years Junglee, SIGMOD Record, Dec. 1997 METADATA
  • 10. Generation II (a metadata classification: the informartion pyramid) METADATA STANDARDS User General Purpose: Ontologies Dublin Core, MCF Classifications Move in this Domain Models Domain/industry specific: direction to Geographic (FGDC, UDK, …), Domain Specific Metadata tackle Library (MARC,…) area, population (Census), information land-cover, relief (GIS),metadata overload!! concept descriptions from ontologies Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Direct Content Based Metadata (inverted lists, document vectors, WAIS, Glimpse, LSI) Content Dependent Metadata(size, max colors, rows, columns...) Content Independent Metadata(creation-date, location, type-of-sensor...) Data(Heterogeneous Types/Media)
  • 12. What‘s next (after comprehensive use of metadata)? Query processing and information requests NOW  traditional queries based on keywords  attribute based queries  content-based queries NEXT  ‗high level‘ information requests involving ontology-based, iconic, mixed-media, and media-independent information rrequests  user selected ontology, use of profiles
  • 13. GIS Data Representation – Example multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas State FGDC Metadata Model UDK Metadata Model Theme keywords: digital line graph, Search terms: digital line graph, hydrography, transportation... hydrography, transportation... Title: Dakota Aquifer Topic: Dakota Aquifer Online linkage: Adress Id: http://gisdasc.kgs.ukans.edu/dasc/ http://gisdasc.kgs.ukans.edu/dasc/ Direct Spatial Reference Method: Vector Measuring Techniques: Vector Horizontal Coordinate System Definition: Co-ordinate System: Universal Transverse Mercator Universal Transverse Mercator … … … ... … … … ...
  • 14. Generation III • Increasing information overload and broader variety of information content (video content, audio clips etc) with increasing amount of visual information, scientific/engineering data • Continued standardization related to Web for representational and metadata issues (MCF, RDF, XML) • Changes in Web architecture; distributed computing (CORBA, Java) • Users demand simplicity, but complexities continue to rise • Web is no longer just another information source, but decision support through ―data mining and information discovery, information fusion, information dissemination, knowledge creation and management‖, ―information management complemented by cooperation between the information system and humans‖ •Information Brokering Architecture proposed for information management
  • 15. Information Brokering: An Enabler for the Infocosm INFORMATION CONSUMERS arbitration between information People consumers and providers for resolving Corporations Programs information impedance Universities Government Information Information Information User User User Request Request Request Query Query Query INFORMATION/DATA INFORMATION BROKERING OVERLOAD Information Data Information Information Data Information System Repository System System Repository System Newswires Corporations dynamic reinterpretation of information requests for determination of relevant Universities Research Labs information services and products INFORMATION PROVIDERS — dynamic creation and composition of information products
  • 16. Information Brokering: Three Dimensions THREE DIMENSIONS C O N S U M E R S B R O K E R S VOCABULARY M E T A D A T A P R O V I D E R S S E M A N T I C S D A T A S T R U C T U R E S Y N T A X S Y S T E M Objective: Reduce the problem of knowing structure and semantics of data in the huge number of information sources on a global scale to: understanding and navigating a significantly smaller number of domain ontologies
  • 17. What else can Information Brokering do? W W W + Information Brokering WWW Domain Specific Ontologies as a confusing heterogeneity of media, “semantic (Tower of Babel) formats conceptual views” information correlation usingusing concept Information correlation physical (HREF) mappings at the extensional data level level links at the intensional concept Browsing of information using information location dependent browsing of terminological using physical (HREF) links relationships across ontologies user has to keep track of information content !! Higher level of abstraction, closer to user view of information !!
  • 18. Concepts, tools and techniques to support semantics context semantic proximity inter-ontological relations media-independent information correlations ontologies (esp. domain-specific) profiles domain-specific metadata
  • 19. Tools to support semantics • Context, context, context • Media-independent information correlations • Multiple ontologies – Semantic Proximity (relationships between concepts within and across ontologies) using domain, context, modeling/abstraction/representation, state – Characterizing Loss of Information incurred due to differences in vocabulary BIG challenge:identifying relationship or similarity between objects of different media, developed and managed by different persons and systems
  • 20. Heterogeneity... … is a Babel Tower!! SEMANTIC HETEROGENEITY metadata ontologies contexts SEMANTIC INTEROPERABILITY
  • 21. The InfoQuilt Project THE INFOQUILT VISION Semantic interoperability between systems, sharing knowledge using multiple ontologies Logical correlation of information Media independent information processing REALIZATION OF THE VISION fully distributed, adaptable, agent-based system information/knowledgement supported by collaborative processes http://lsdis.cs.uga.edu/proj/iq/iq.html
  • 22. InfoQuilt Project: using the Metadata REFerence link MREF Complements HREF, creating a ―logical web‖ through media independent ontology & metadata based correlation It is a description of the information asset we want to retrieve Semantic Correlation using MREF MREF Concept constraints relations attributes Model for logical correlation using domain ontologies ontological terms MREF IQ_Asset ontology + and metadata extension ontologies Framework for RDF representing MREF‘s MREF Serialization (one implementation XML keywords content attributes choice) (color, scene cuts, …) http://lsdis.cs.uga.edu/proj/iq/iq.html
  • 23. Domain Specific Correlation – example Potential locations for a future shopping mall identified by allregionshaving apopulationgreater than 5000, andareagreater than 50 sq. ft. having an urban land cover and moderaterelief<A MREF ATTRIBUTES(population > 5000; area > 50; region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’) can be viewed here</A> domain specific metadata: terms chosen from domain specific ontologies Population: Area: =>media-independent relationshipsbetween domain Boundaries: specific metadata:population, Regions Land cover: area, land cover, relief (SQL): Image Features Relief: (image processing routines) =>correlation between image Boundaries and structured data at a higher domain specific level asopposed to physical ―link- chasing‖ in the WWW Census DB TIGER/Line DB US Geological Survey
  • 25. A DL II approach for Information Brokering Iscape 1 Iscape N CONSTRUCTING APPROPRIATE INFORMATION LANDSCAPES CONSTRUCTING ADDITIONAL META-INFORMATION RESOURCES DISCOVERING COLLECTIONS OF HETEROGENEOUS INFORMATION AND META-INFORMATION RESOURCES Domain Specific Domain Ontologies Independent Images Data Stores Documents Digital Media Ontologies Physical/Simulation World
  • 26. ADEPT Information Landscape Concept Prototype (a scenario for Digital Earth: learning in the context of the “El Niño” phenomenon) Sample Iscapes Requests: –How does El Niño affect sea animals? Look for broadcast videos of less than 2 minutes. – How are some regions affected by El Niño? Look at request information using East/West Pacific regions. keywords – What disasters have been related to El Niño? domain-specific attributes – What storm occurrencesattributes domain-independent are attributed to El Niño? – Show reports related to El Niño that contain Clinton. TRY ISCAPE CONCEPT DEMO
  • 27. Putting MREFs to work IQ_Asset ontology + extension ontologies domain ontologies MREF Builder MREF User construct new MREF repository MREF repository User Agent User Profile Broker Agent profiles Manager
  • 28. Context: the lynchpin of semantics Cricket ―For instance, if you were to use Yahoo! or Infoseek to search the web for pizza, your results would probably be hundreds of matches for the word pizza. Many of these could be pizza parlors around the world. Yet if you run the same search within NeighborNet, you will allows you to order pizza to be delivered instead of shipped.‖ From a Press Resease of FutureOne, Inc. March 24, 1999 http://home.futureone.com/about/pr/021699.asp
  • 29. Constructing c-contexts from ontological terms C-CONTEXT: ―All documents stored in the database have been published by some agency‖ DATABASE OBJECTS => Cdef(DOC) = <(hasOrganization, AgencyConcept)> AGENCY(RegNo, Name, Affiliation) C-Context = <(C1 , V1) (C2 , V2) ... (Ck , Vk) > DOC(Id, Title, Agency) a collection of contextual coordinatesCi s(roles) and valuesVi s(concepts/concept descriptions) Agency Concept Advantages: Document Concept Use of ontologies for an intensional domain specific description of data Representation of extra information Relationships between objects not ONTOLOGICAL TERMS represented in the database schema Using terminological relationships in the ontology
  • 30. Using c-contexts to reason about EXAMPLE information in database Cdef(DOC) CQ <(hasOrganization, AgencyConcept)> <(hasOrganization,{―USGS‖})> glb(Cdef(DOC), CQ) <(self, DocumentConcept),(hasOrganization, { ―USGS‖ })> - Reasoning with c-contexts: glb(Cdef(DOC), CQ) - Ontological Inferences: - DocumentConcept - (hasOrganization, { ―USGS‖ }) Challenge 1: use of multiple ontologies Challenge 2: estimating the loss of information
  • 31. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system OBSERVER architecture Data Repositories IRM Ontology Server Mappings Ontologies Interontologies Terminological Query User Relationships Processor Query IRM NODE USER NODE COMPONENT NODE COMPONENT NODE Ontology Ontology Server Server Mappings Mappings Query Ontologies Query Ontologies Processor Processor Data Repositories Data Repositories Eduardo Mena (III’98)
  • 32. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Query construction - Example “Get title and number of pages of books written by Carl Sagan” User ontology: WN [name pages] for (AND book (FILLS creator “Carl Sagan”)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for (AND book (FILLS doc-author-name “Carl Sagan”)) Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/ Eduardo Mena (III’98)
  • 33. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Query construction - Example Re-use of Knowledge: Biblio-Thing Bibliography Data Ontology Stanford-I “Get title and number of pages of books written by Carl Sagan” Document Conference Agent User ontology: WN Person Organization [name pages] for Author Book Technical-Report (AND book (FILLS creator “Carl Sagan”)) Publisher University Miscellaneous-Publication Proceedings Target ontology: Stanford-I Edited-Book Thesis Integrated ontology WN-Stanford-I Periodical-Publication Technical-Manual Cartographic-Map [title number-of-pages] for Doctoral-Thesis Computer-Program Multimedia-Document Journal Newspaper (AND book (FILLS doc-author-name “Carl Sagan”)) Master-Thesis Artwork Magazine Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/ Eduardo Mena (III’98)
  • 34. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Re-use of Knowledge: Query construction - Example Print-Media A subset of WordNet 1.5 “Get title and number of pages of books written by Carl Journalism Press Publication Sagan” User Newspaper ontology: WN Magazine Periodical Book [name pages] for Journals Pictorial Series Trade-Book Brochure (AND book (FILLS creator “Carl Sagan”)) TextBook SongBook Reference-Book PrayerBook Target ontology: Stanford-I CookBook Encyclopedia Integrated ontology WN-Stanford-I WordBook Instruction-Book HandBook Directory Annual [title number-of-pages] for GuideBook (AND book (FILLS doc-author-name “Carl Sagan”)) Manual Bible Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html Instructions Reference-Manual http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/ Eduardo Mena (III’98)
  • 35. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system WN ontology and user query Query construction - Example “Get title and number of pages of books written by Carl Sagan” User ontology: WN [name pages] for (AND book (FILLS creator “Carl Sagan”)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for (AND book (FILLS doc-author-name “Carl Sagan”)) Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/ Eduardo Mena (III’98)
  • 36. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Estimating the loss of information To choose the plan with the least loss To present a level of confidence in the answer Based on intensional information (terminological difference) Based on extensional information (precision and recall) Plans in the example User Query: (AND book (FILLS doc-author-name “Carl Sagan”)) Plan 1: (ANDdocument(FILLS doc-author-name “Carl Sagan”)) Plan 2: (ANDperiodical-publication (FILLS doc-author-name “Carl Sagan”)) Plan 3: (ANDjournal(FILLS doc-author-name “Carl Sagan”)) Plan 4: (ANDUNION(book, proceedings, thesis, misc-publication, technical-report) (FILLS doc-author-name “Carl Sagan”)) Eduardo Mena (III’98)
  • 37. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Loss of information based on intensional information User Query: (AND book (FILLS doc-author-name “Carl Sagan”)) Plan 1: (ANDdocument (FILLS doc-author-name “Carl Sagan”)) book:=(AND publication (AT-LEAST 1 ISBN)) publication:=(AND document (AT-LEAST 1 place-of-publication)) Loss:“Instead of books written by Carl Sagan, OBSERVER is providing all the documents written by Carl Sagan (even if they do not have an ISBN and place of publication)” Eduardo Mena (III’98)
  • 38. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt system Example: loss for the plans Plan 1:(AND document (FILLS doc-author-name “Carl Sagan”)) [case 2] 91.57% < (1-Loss) < 91.75% Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”)) 94.03% < (1-Loss) < 100%[case 3] Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”)) [case 3] 98.56% < (1-Loss) < 100% Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical- report) (FILLS doc-author-name “Carl Sagan”)) [case 1] 0% < (1-Loss) < 7.22% Eduardo Mena (III’98)
  • 39. Summary Knowledge Mgmt., Visual, Information Knowledge Semantic Scientific/Eng. Brokering, Cooperative IS Structural, Mediator, Semi-structured Metadata Schematic Federated IS Text Syntax, Data Federated DB Structured Databases System
  • 40. Agenda for research Interoperation not at systems level, but at informational and possibly knowledge level – traditional database and information retrieval solutions do not suffice – need to understand context; measures of similarities Need to increase impetus on semantic level issues involving terminological and contextual differences, possible perceptual or cognitive differences in future – information systems and humans need to cooperate, possible involving a coordination and collaborative processes
  • 41. Related Reading Books: Information Brokering for Digital Media, Kashyap and Sheth, Kluwer, 1999 (to appear) Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998 Cooperative Information Systems, Papazoglou and Schlageter Eds., Academic Press, 1998 Management of Heterogeneous and Autonomous Database Systems, Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998. Special Issues and Proceedings: Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998 Semantic Interoperability in Global Information Systems, Ouksel and Sheth, SIGMOD Record, March 1999. http://lsdis.cs.uga.edu Acknowledgements: [See publications on Metadata, Semantics,Context, Tarcisio Lima InfoHarness/InfoQuilt] Vipul Kashyap amit@cs.uga.edu