Ieee metadata-conf-1999-keynote-amit sheth

Bethesda, Maryland, April 6, 1999

Amit Sheth
Large Scale Distributed Information Systems Lab
University of Georgia
http://lsdis.cs.uga.edu

Three perspectives to GlobIS

autonomy

Information Integration Perspective
distribution

heterogeneity (terminological,
semantic
contextual)

Information Brokering Perspective meta-data

data
knowledge

information ―Vision‖ Perspective
connectivity computing data

Evolving targets and approaches in integrating
data and information (a personal perspective)
a society for ubiquitous exchange of (tradeable)
information in all digital forms of representation;
information anywhere, anytime, any forms

Generation III ADEPT,
DL-II projects
1997... InfoQuilt

Generation II InfoSleuth, KMed, DL-I projects
VisualHarness
Infoscopes, HERMES, SIMS,
1990s InfoHarness Garlic,TSIMMIS,Harvest, RUFUS,...

Generation I Mermaid Multibase, MRDSM, ADDS,
1980s DDTS IISS, Omnibase, ...

Generation I

•Data recognized as corporate resource — leverage it!
• Data predominantly in structured databases, different data models,
transitioning from network and hierarchical to relational DBMSs

• Heterogeneity (system, modeling and schematic) as well as need to
support autonomy posed main challenges;
major issues were data access and connectivity

• Information integration through Federated architecture
• Support for corporate IS applications as the primary objective,
update often required, data integrity important

Generation I
(heterogeneity in FDBMSs)

Database System
•Semantic Heterogeneity
•Differences in DBMS
• data models
(abstractions, constraints, query languages)
1980s • System level support
(concurrency control, commit, recovery)

C
Operating System
o
• file system m
• naming, file types, operation m
• transaction support u
• IPC n
1970s Hardware/System
i
c
• instruction set a
• data representation/coding t
• configuration i
o
n

Generation I
(Federated Database Systems: Schema Architecture)

External External
• Dimensions for
Schema Schema interoperability and
integration:
Federated
... distribution, autonomy
Schema
schema
and heterogeneity
integration
Export Export Export
... Schema
Schema Schema
•Model Heterogeneity:
Component ... Component Common/Canonical
Schema Schema Data Model
schema
translation
Schema Translation
Local ... Local
Schema Schema • Information sharing
while preserving
Component ... Component
autonomy
DBS DBS

Generation I
(characterization of schematic conflicts in multidatabase systems)

Schematic
Conflicts

Domain Definition Data Value Abstraction Level Schematic Entity Definition
Incompatibility Incompatibility Incompatibility Discrepancies Incompatibility

Naming Conflicts Known Generalization Data Value Naming
Inconsistency Conflicts Attribute Conflicts
Data Representation
Conflict Database
Conflicts Temporal Aggregation
Inconsistency Conflicts Entity Attribute Identifier
Data Scaling Conflicts
Conflict
Conflicts Acceptable
Inconsistency Data Value Schema
Data Precision Isomorphism
Entity Conflict
Conflicts Conflicts
Default Value Missing Data
Conflicts BUT
Items Conflicts
these techniques for dealing with schematic
Attribute Integrity Sheth & Kashyap, Kim & Seo
Constraint Conflicts heterogeneity do not directly map to dealing
with much larger variety of heterogeneous
media

Generation II

• Significant improvements in computing and connectivity (standardization
of protocol, public network, Internet/Web); remote data access as given;
• Increasing diversity in data formats, with focus on variety of textual data
and semi-structured documents
• Many more data sources, heterogeneous information sources,
but not necessarily better understanding of data
• Use of data beyond traditional business applications:
mining + warehousing, marketing, e-commerce
• Web search engines for keyword based querying against HTML pages;
attribute-based querying available in a few search systems
• Use of metadata for information access; early work on ontology support
distribution applied to metadata in some cases
• Mediator architecture for information management

Generation II
(limited types of metadata, extractors, mappers, wrappers)

Nexis Digital Videos
UPI
AP
... ...
Documents Data Stores
Global/Enterprise Digital Maps
Web Repositories
...
Digital Images Digital Audios

Find Marketing Manager positions in a
company that is within 15 miles of San
Francisco and whose stock price has
been growing at a rate of at least 25% EXTRACTORS
per year over the last three years
Junglee, SIGMOD Record, Dec. 1997 METADATA

Generation II
(a metadata classification: the informartion pyramid)

METADATA STANDARDS
User
General Purpose:
Ontologies
Dublin Core, MCF
Classifications
Move in this Domain Models Domain/industry specific:
direction to Geographic (FGDC, UDK, …),
Domain Specific Metadata
tackle Library (MARC,…)
area, population (Census),
information land-cover, relief (GIS),metadata
overload!! concept descriptions from ontologies

Domain Independent (structural) Metadata
(C++ class-subclass relationships, HTML/SGML
Document Type Definitions, C program structure...)
Direct Content Based Metadata
(inverted lists, document vectors, WAIS, Glimpse, LSI)

Content Dependent Metadata(size, max colors, rows, columns...)
Content Independent Metadata(creation-date, location, type-of-sensor...)

Data(Heterogeneous Types/Media)

What‘s next (after comprehensive use of metadata)?

Query processing and information requests

NOW
 traditional queries based on keywords
 attribute based queries
 content-based queries

NEXT
 ‗high level‘ information requests involving
ontology-based, iconic, mixed-media, and
media-independent information rrequests
 user selected ontology, use of profiles

GIS Data Representation – Example

multiple heterogeneous metadata models with different
tag names for the same data in the same GIS domain

Kansas State

FGDC Metadata Model UDK Metadata Model
Theme keywords: digital line graph, Search terms: digital line graph,
hydrography, transportation... hydrography, transportation...

Title: Dakota Aquifer Topic: Dakota Aquifer

Online linkage: Adress Id:
http://gisdasc.kgs.ukans.edu/dasc/ http://gisdasc.kgs.ukans.edu/dasc/

Direct Spatial Reference Method: Vector Measuring Techniques: Vector

Horizontal Coordinate System Definition: Co-ordinate System:
Universal Transverse Mercator Universal Transverse Mercator
… … … ... … … … ...

Generation III

• Increasing information overload and broader variety of information
content (video content, audio clips etc) with increasing amount of visual
information, scientific/engineering data

• Continued standardization related to Web for representational and metadata
issues (MCF, RDF, XML)

• Changes in Web architecture; distributed computing (CORBA, Java)
• Users demand simplicity, but complexities continue to rise
• Web is no longer just another information source, but decision support through
―data mining and information discovery, information fusion, information
dissemination, knowledge creation and management‖, ―information management
complemented by cooperation between the information system and humans‖

•Information Brokering Architecture proposed for information management

Information Brokering: An Enabler for the Infocosm

INFORMATION CONSUMERS arbitration between information
People consumers and providers for resolving
Corporations
Programs information impedance
Universities Government

Information Information Information
User User User Request Request Request
Query Query Query

INFORMATION/DATA
INFORMATION BROKERING
OVERLOAD

Information Data Information Information Data Information
System Repository System System Repository System

Newswires Corporations dynamic reinterpretation of information
requests for determination of relevant
Universities Research Labs
information services and products
INFORMATION PROVIDERS —
dynamic creation and composition of
information products

Information Brokering: Three Dimensions

THREE DIMENSIONS

C O N S U M E R S

B R O K E R S

VOCABULARY
M E T A D A T A
P R O V I D E R S

S E M A N T I C S

D A T A
S T R U C T U R E

S Y N T A X

S Y S T E M

Objective:
Reduce the problem of knowing structure and semantics of data in the huge
number of information sources on a global scale to: understanding and
navigating a significantly smaller number of domain ontologies

What else can Information Brokering do?

W W W + Information Brokering
WWW
Domain Specific Ontologies as
a confusing heterogeneity of media,
“semantic (Tower of Babel)
formats conceptual views”

information correlation usingusing concept
Information correlation physical (HREF)
mappings at the extensional data level level
links at the intensional concept

Browsing of information using information
location dependent browsing of terminological
using physical (HREF) links
relationships across ontologies
user has to keep track of information content !!
Higher level of abstraction, closer
to user view of information !!

Concepts, tools and techniques to support semantics

context semantic
proximity inter-ontological
relations

media-independent
information correlations

ontologies
(esp. domain-specific) profiles

domain-specific metadata

Tools to support semantics

• Context, context, context

• Media-independent information correlations

• Multiple ontologies
– Semantic Proximity (relationships between concepts within
and across ontologies) using domain, context,
modeling/abstraction/representation, state
– Characterizing Loss of Information incurred due to
differences in vocabulary

BIG challenge:identifying relationship or
similarity between objects of different media,
developed and managed by different persons and systems

Heterogeneity... … is a Babel Tower!!

SEMANTIC HETEROGENEITY

metadata

ontologies

contexts

SEMANTIC INTEROPERABILITY

The InfoQuilt Project

THE INFOQUILT VISION
Semantic interoperability between systems, sharing knowledge
using multiple ontologies
Logical correlation of information
Media independent information processing

REALIZATION OF THE VISION
fully distributed, adaptable, agent-based system
information/knowledgement supported by collaborative
processes

http://lsdis.cs.uga.edu/proj/iq/iq.html

InfoQuilt Project: using the Metadata REFerence link

MREF
Complements HREF, creating a ―logical web‖ through media
independent ontology & metadata based correlation
It is a description of the information asset we want to retrieve

Semantic Correlation using MREF MREF Concept
constraints
relations
attributes Model for logical
correlation using
domain ontologies ontological terms MREF
IQ_Asset ontology + and metadata
extension ontologies
Framework for RDF
representing MREF‘s
MREF
Serialization
(one implementation XML
keywords content attributes choice)
(color, scene cuts, …)

http://lsdis.cs.uga.edu/proj/iq/iq.html

Domain Specific Correlation – example
Potential locations for a future shopping mall identified by allregionshaving
apopulationgreater than 5000, andareagreater than 50 sq. ft. having an urban
land cover and moderaterelief<A MREF ATTRIBUTES(population > 5000; area > 50;
region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’) can be viewed here</A>

domain specific metadata: terms chosen from domain specific ontologies

Population:
Area:
=>media-independent
relationshipsbetween domain
Boundaries:
specific metadata:population,
Regions Land cover: area, land cover, relief
(SQL): Image Features
Relief: (image processing
routines) =>correlation between image
Boundaries and structured data at a
higher domain specific level
asopposed to physical ―link-
chasing‖ in the WWW
Census DB TIGER/Line DB US Geological Survey

Domain Specific Correlation – example

A DL II approach for Information Brokering

Iscape 1 Iscape N

CONSTRUCTING APPROPRIATE INFORMATION LANDSCAPES

CONSTRUCTING ADDITIONAL
META-INFORMATION RESOURCES

DISCOVERING COLLECTIONS OF
HETEROGENEOUS INFORMATION AND
META-INFORMATION RESOURCES

Domain
Specific Domain
Ontologies Independent
Images Data Stores Documents Digital Media
Ontologies

Physical/Simulation
World

ADEPT Information Landscape Concept Prototype
(a scenario for Digital Earth:
learning in the context of the “El Niño” phenomenon)

Sample Iscapes Requests:

–How does El Niño affect sea animals? Look for
broadcast videos of less than 2 minutes.

– How are some regions affected by El Niño? Look at
request information using
East/West Pacific regions.
keywords
– What disasters have been related to El Niño?
domain-specific attributes
– What storm occurrencesattributes
domain-independent are attributed to El Niño?

– Show reports related to El Niño that contain Clinton.

TRY ISCAPE CONCEPT DEMO

Putting MREFs to work

IQ_Asset ontology +
extension ontologies
domain ontologies
MREF Builder
MREF
User construct new MREF repository

MREF
repository
User
Agent

User Profile Broker Agent
profiles Manager

Context: the lynchpin of semantics

Cricket

―For instance, if you were to use Yahoo! or Infoseek to
search the web for pizza, your results would probably
be hundreds of matches for the word pizza. Many of
these could be pizza parlors around the world. Yet if
you run the same search within NeighborNet, you will
allows you to order pizza to be delivered instead of
shipped.‖

From a Press Resease of FutureOne, Inc. March 24, 1999
http://home.futureone.com/about/pr/021699.asp

Constructing c-contexts from ontological terms
C-CONTEXT:

―All documents stored in the database
have been published by some agency‖
DATABASE
OBJECTS => Cdef(DOC) = <(hasOrganization, AgencyConcept)>

AGENCY(RegNo, Name, Affiliation) C-Context = <(C1 , V1) (C2 , V2) ... (Ck , Vk) >
DOC(Id, Title, Agency) a collection of
contextual coordinatesCi s(roles) and
valuesVi s(concepts/concept descriptions)
Agency
Concept Advantages:
Document
Concept Use of ontologies for an intensional
domain specific description of data
Representation of extra information
Relationships between objects not
ONTOLOGICAL TERMS represented in the database schema
Using terminological relationships in
the ontology

Using c-contexts to reason about
EXAMPLE
information in database

Cdef(DOC) CQ
<(hasOrganization, AgencyConcept)> <(hasOrganization,{―USGS‖})>

glb(Cdef(DOC), CQ)
<(self, DocumentConcept),(hasOrganization, { ―USGS‖ })>

- Reasoning with c-contexts: glb(Cdef(DOC), CQ)
- Ontological Inferences:
- DocumentConcept
- (hasOrganization, { ―USGS‖ })

Challenge 1: use of multiple ontologies

Challenge 2: estimating the loss of information

Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
OBSERVER architecture
Data Repositories

IRM

Ontology
Server Mappings

Ontologies
Interontologies
Terminological Query User
Relationships Processor Query

IRM NODE USER NODE

COMPONENT NODE COMPONENT NODE

Ontology Ontology
Server Server
Mappings Mappings

Query Ontologies Query Ontologies
Processor Processor

Data Repositories Data Repositories

Eduardo Mena (III’98)

Query construction - Example

“Get title and number of pages of books written by Carl Sagan”

User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/


Query construction - Example Re-use of Knowledge:
Biblio-Thing Bibliography Data Ontology
Stanford-I
Document Conference Agent
User ontology: WN
Person Organization
[name pages] for Author
Book Technical-Report
Publisher University
Miscellaneous-Publication
Proceedings
Edited-Book
Thesis
Periodical-Publication Technical-Manual
Cartographic-Map
Doctoral-Thesis Computer-Program
Multimedia-Document
Journal Newspaper
Master-Thesis Artwork

Magazine


Re-use of Knowledge:
Print-Media A subset of WordNet 1.5

“Get title and number of pages of books written by Carl Journalism
Press Publication
Sagan”

User
Newspaper ontology: WN
Magazine Periodical
Book
[name pages] for Journals
Pictorial
Series
Trade-Book Brochure (AND book (FILLS creator “Carl Sagan”))
TextBook
SongBook
Reference-Book PrayerBook
CookBook Encyclopedia
WordBook
Instruction-Book HandBook Directory Annual
GuideBook
Manual Bible

Instructions Reference-Manual


WN ontology and user query


User ontology: WN
[name pages] for


Estimating the loss of information

To choose the plan with the least loss
To present a level of confidence in the answer
Based on intensional information (terminological difference)
Based on extensional information (precision and recall)

Plans in the example
User Query: (AND book
(FILLS doc-author-name “Carl Sagan”))

Plan 1: (ANDdocument(FILLS doc-author-name “Carl Sagan”))
Plan 2: (ANDperiodical-publication (FILLS doc-author-name “Carl Sagan”))
Plan 3: (ANDjournal(FILLS doc-author-name “Carl Sagan”))
Plan 4: (ANDUNION(book, proceedings, thesis, misc-publication, technical-report)
(FILLS doc-author-name “Carl Sagan”))


Loss of information based on intensional information

User Query: (AND book (FILLS doc-author-name “Carl Sagan”))

Plan 1:
(ANDdocument (FILLS doc-author-name “Carl Sagan”))
book:=(AND publication (AT-LEAST 1 ISBN))
publication:=(AND document (AT-LEAST 1 place-of-publication))

Loss:“Instead of books written by Carl Sagan, OBSERVER is
providing all the documents written by Carl Sagan (even if they
do not have an ISBN and place of publication)”


Example: loss for the plans

Plan 1:(AND document (FILLS doc-author-name “Carl Sagan”)) [case 2]

91.57% < (1-Loss) < 91.75%

Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”))

94.03% < (1-Loss) < 100%[case 3]

Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”)) [case 3]

98.56% < (1-Loss) < 100%

Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-
report) (FILLS doc-author-name “Carl Sagan”)) [case 1]

0% < (1-Loss) < 7.22%


Summary

Knowledge Mgmt.,
Visual, Information
Knowledge Semantic
Scientific/Eng. Brokering,
Cooperative IS

Structural, Mediator,
Semi-structured Metadata
Schematic Federated IS

Text Syntax,
Data Federated DB
Structured Databases System

Agenda for research

Interoperation not at systems level, but at informational and
possibly knowledge level
– traditional database and information retrieval solutions
do not suffice
– need to understand context; measures of similarities
Need to increase impetus on semantic level issues involving
terminological and contextual differences, possible perceptual
or cognitive differences in future
– information systems and humans need to cooperate,
possible involving a coordination and collaborative
processes

Related Reading
Books:
Information Brokering for Digital Media, Kashyap and Sheth, Kluwer,
1999 (to appear)
Multimedia Data Management: Using Metadata to Integrate and Apply
Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998
Cooperative Information Systems, Papazoglou and Schlageter Eds.,
Academic Press, 1998
Management of Heterogeneous and Autonomous Database Systems,
Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998.

Special Issues and Proceedings:
Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998
Semantic Interoperability in Global Information Systems, Ouksel and
Sheth, SIGMOD Record, March 1999.

http://lsdis.cs.uga.edu Acknowledgements:
[See publications on Metadata, Semantics,Context, Tarcisio Lima
InfoHarness/InfoQuilt] Vipul Kashyap
amit@cs.uga.edu

Ieee metadata-conf-1999-keynote-amit sheth

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Ieee metadata-conf-1999-keynote-amit sheth

Similar to Ieee metadata-conf-1999-keynote-amit sheth (20)

Recently uploaded

Recently uploaded (20)

Ieee metadata-conf-1999-keynote-amit sheth