This is oldish set on an engineering-based approach to sharing diverse and heterogeneous data. It complements a paper about to be published in a Springer collection by Tansel et al. as well as recent Health care record systems discussions.
1. I3 Master
Integration of Information
from
Heterogeneous Sources
October 2001
Gio Wiederhold
Stanford University
Gio Wiederhold I3 1
2. Change is constant
Changes are imposed by
• Technology advance
• Local government
• Federal rules
• Competition
• Emerging standards
Systems must be designed
and operated to recognize
and adapt to change
Gio Wiederhold I3 2
3. Information Leverage
Tactical Strategic
• Customers • Planning
• Inventory • Capabilities
• Suppliers • Opportunities
a variety of external and
internal sources imprecise sources
Gio Wiederhold I3 3
4. Information overload
Data starvation
• More databases
– public & corporate
• Faster communication
– digital
– packeting: TCP-IP, ATM
• World-wide connectivity
– internet
– world-wide web
• Disintermediation
– ubiquitous publishing
Gio Wiederhold I3 4
5. Focus on Information Systems
Computing
Systems
Processing Information Real-time
as Systems control of
Analyses (on-line and processes,
Payroll, . . . . distributed, factories, . . .
... )
Gio Wiederhold I3 5
6. Data and Knowledge
Knowledge Loop Data Loop Information is
Storage created at the
Education confluence of
data -- the state
Selection Recording
&
Integration knowledge --
the ability to
Experience
Abstraction select and
State changes
project the
Decision-making state into
Action the future
Gio Wiederhold I3 6
8. Transform Data to Information
Application decision-makers at workstations
Layer
Mediation
Layer value-added services
Foundation
Layer
data and simulation resources
Gio Wiederhold I3 8
9. Dealing With Heterogeneity
• Hardware platform . . . . . Hidden by operating system
• Operating system . . . . . . Choices are reducing: NT, UNIX,
...
Fewer choices
• Programming language . . .
Irrelevant in remote access
• Database system model . .
Relational and E-R common
• Database system . . . . . . .
Standards, convergence
• Coverage . . . . . . . . . . . . .
– Attributes Source dependent
– Scope documented, additive
undocumented, intersecting
• Data representation . . . . .
• Data semantics . . . . . . . . . Conversion problems, nulls
Requires knowledge Gio Wiederhold I3 9
10. Definition*
A mediator is a software module that exploits
encoded knowledge about certain sets or
subsets of data to create information for a
higher layer of applications.
It should be small and simple, so that it can
be maintained by one expert or, at most, a
small and coherent group of experts.
* Wiederhold: IEEE Computer March 1992
Gio Wiederhold I3 10
11. Flow in mediation
• DELIVERY
t s
• SUMMARIZATION
t s
• INTEGRATION
t s
• ABSTRACTION
t s
• ACCESS
Gio Wiederhold I3 11
13. Example in Health Care
Health Care Planner
Will the Clinic loose Money?
Patient Investment
Care domain domain
Age Profile Service Operations Bond Sales
Patient Volume Growth Loan Interest State Support
Gio Wiederhold. 1995
Gio Wiederhold I3 13
14. Functional Layer
Human-computer
User interface Interaction
Application-
Service specific code
interface
Domain-
MEDIATION specific
Resource access code
interface Source-
specific
code
Real-world
interface
Gio Wiederhold I3 14
15. Function of Mediation
Apply Domain-specific Specialist
Knowledge to add value
• to locate data sources
• to describe data for use
• to convert for consistency
• to abstract for insight / models
• to extrapolate to new situations
• to integrate from diverse sources
• to re-abstract for presentation
INFORMATION
Gio Wiederhold I3 15
16. Architectures &
Communication
Presen- Printed terminl Mini- Work User
tation reports comptr station Workst.
Appli- Infor-
Infor- Appli- Appli- Infor-
cation mation
mation cation cation mation
Aggre- Compu- Compu- Compu- CORBA
Aggre-
gation tatio tation tation gation
Access, I-O SQL for Select Object SQL, ...
Select code A&S FTP Struct. for A&S
Data Local Data File Server Distr.
Source Storage Base Storage Storage Sources
Function ‘mainframe’ smart file server client server mediated
terminal Gio Wiederhold. 1995
Gio Wiederhold I3 16
17. Current Methods
• Access: WWW with MOSAIC
– browsing, collection services: Harvest, ALIWEB, Fish
• SQL with Views
– one verb, one database, one datatype
– predefined subsets
• Grouping: Objects with Corba
– predefined aggregation with methods
• View-Objects
– created via extension of relational algebra
• Summarization
– Tables from text documents; Exception search
Gio Wiederhold I3 17
18. Central Solutions do not Scale
What works
with 7 modules fails when we
and one person have 100 and need
in charge a committee
Any changes in resources affects the central module
Gio Wiederhold I3 18
19. Evolution of mediation
applications
A2 A3 A4 A5
A1
A6
integrators
a. I1 I2
mediators
network b. c. M1 M2
d. e.
D1 wrappers
W2 W3
D6
W1 D4 D5
D2 D3
datasources
Gio Wiederhold I3 19
20. Domain-specific Mediation
• User application
– Workstations
• Mediator
– Expert-owned
nodes
• Data sources
– Remote primary
and byproduct
services
Gio Wiederhold I3 20
22. Allocation Flexibility
User Interfaces
Application C Application B
Application I
M2
Provider
Provider of of medi-
Mediator M
M
ator N
Copy- if high
HPC
intensity of N
interaction with M1
1. Application (M2)
2. Resources (N1,2) N
3. Processing (M1) 1
N
Mediators are DB 2 DB
only code DBS R
P Q
Databases Gio Wiederhold I3 22
23. Features of Mediation
• Domain-specific partitioning for C
Creation and Maintenance B
A
D
• Network-basing for easy
Reconfiguration
E
• Caching to deal with
Asynchronocity
A1
• Replication for A1’
Performance
Gio Wiederhold I3 23
24. Allocation Flexibility
User Interfaces
Application C Application B
Application I
M2
Provider
Provider of of medi-
Mediator M
M
ator N
Copy- if high
HPC
intensity of N
interaction with M1
1. Application (M2)
2. Resources (N1,2) N
3. Processing (M1) 1
N
Mediators are DB 2 DB
only code DBS R
P Q
Databases Gio Wiederhold I3 24
25. Central Solutions do not Scale
What works
with 7 modules fails when we
and one person have 100 and need
in charge a committee
Changes in resources affect the intermediary modules
Gio Wiederhold I3 25
26. Integration at two levels
Application
• Informal, pragmatic
• User-control
Mediation
• Formal service
• Domain-Expert control
Gio Wiederhold. 1995
Gio Wiederhold I3 26
27. Status of Mediation Technology
Today Future
• Handcrafted • Generated from models
• Expert consults with • Domain Expert
programmer maintains models
• Programmer codes the • Specification
knowledge needed determines functions
• Resource changes • Resource changes
require advise, program trigger regeneration
update
Gio Wiederhold I3 27
28. Facilitators
Another
Module Type in
Facilitators Procure Linkages
Information
• search for suitable resources
Systems
• resolve terminological mappings
• build system configurations
• issue subqueries, as needed
• combine results from subqueries
perform these tasks dynamically
without human intervention
depend greatly on ontologies
• can call on mediators for value added services
Gio Wiederhold I3 28
37. KQML APIs
Several suppliers Multiple platforms
FAT
Fat and THIN versions Mainly to Internet (TCP/IP)
thin
Not (yet) shrinkwrapped, require interaction
– Un.of Maryland, Baltimore County, with UNISYS
– Stanford Design Projects ABSE [Gensererth et al.]
– Crystalliz (Cambridge MA), transmits PDES, SQL on PC
– BBN for planning, rapid assembly of joint task forces
– ISX (Westlake Village, CA) Demonstration tools
– Toronto Univ. Enterprise Integration Laboratory
– EITech Servicemail (uses email to go across firewalls)
Gio Wiederhold I3 37
38. KIF -- Knowledge Interchange
Transmits among
Expert Systems
• LOOM
• Ontolingua
• others
ANSI X3T2 evaluation
Compatible with Conceptual Graphs
Used by KQML to describe choices
Gio Wiederhold I3 38
39. Two Design Phases
1. Resource Integration
2. Customer Focusing
Co mmon
M odel
Gio Wiederhold I3 39
40. Mediator Design Principle
Transform Data into
Information
Match
Customer Model
Hierarchical
to
Resource Model
General network
(and maintain models)
Gio Wiederhold I3 40
41. Fat versus thin mediators
• too thin: insufficient added value
• Too fat: hard to
compose
Just right
service • Too narrow: few costumers
scope
• too broad:
hard to maintain, needs
a committee
domain scope
Gio Wiederhold I3 41
42. Heterogeneity among Domains
If interoperation involves distinct
domains mismatch ensues
• Autonomy conflicts with consistency,
– Local Needs have Priority,
– Outside uses are a Byproduct
Heterogeneity must be addressed
• Platform and Operating Systems 4 4
• Representation and Access Conventions 4
• Naming and Ontology :
Gio Wiederhold I3 42
43. Unsolved problem in Interoperation
Common assumption in assembling and integrating
distributed information resources
• The language used by the resources is the same
• Sublanguages used by the resources are subsets of a
globally consistent language
This assumption is provably false.
Working towards the goal of global consistency is
1. naïve -- the goal cannot be achieved
2. inefficient -- languages are efficient in local contexts
Gio Wiederhold I3 43
44. Ontology: components .
We represent the contents and structure of a
languages by its ontology:
• a set of well-defined terms,
which delimit the domain of
discourse
• relationships among those terms,
chosen from a limited set
a formalizable subset of expert
knowledge
Gio Wiederhold I3 44
45. SKC’s grounded definition .
• Ontology:
a set of terms and their relationships
• Term:
a reference to real-world and abstract objects
• Relationship:
a named and typed set of links between objects
• Reference:
a label that names objects
• Real-world object:
an entity instance with a physical manifestation
• Abstract object:
a concept which refers to other objects
Gio Wiederhold I3 45
46. Where are Ontologies found?
Ontologies allow communication among partners in
enterprises (rarely in machine-readable form)
Relationships determine meaning - parent, school, company
Variable and Class names in Software
Databases use ontologies during design in
their E-R diagrams (implicitly) and to represent
the leaf nodes in their schemas.
Knowledge-bases use term ontologies (often
explicitely), add class definition (to hold instances),
constraints, and operations among the terms.
Gio Wiederhold I3 46
47. Establishing Ontologies
Top-down:
–Commonly acceptable UPPER layers
Domain-specific
–Analysis and Sharing tools
–Model and Object-type based
Bottom-up
–Wordlist creation from task-specific
collections
–Database models, schemas, and contents
Gio Wiederhold I3 47
48. Large Ontologies: good or bad?
Have all the Knowledge together
+ simple for customers of KBs
– hard for owners of KBs, must synchronize with many
others
– in the limit -- everybody must be globally consistent
Large KB will cover multiple / all domains
created by a committee -- slow
maintained by a committee -- costly
Differences in level of abstraction -- efficiency
homeowner: nail
carpenter: sinker, brad, boxnail, . . .
Gio Wiederhold I3 48
49. Domain ontology assumption .
• a domain will contain known objects
• the object configuration is consistent
• within a domain all terms are consistent
&
• relationships among objects are
consistent Ontology
Domain No committee is needed
to forge compromises *
within a domain
• context is implicit in use
Compromises hide valuable details
• explicit context is needed
for external use Gio Wiederhold I3 49
50. SKC Objective
Provide for Maintainable Ontologies
• devolve maintenance onto many
domain-specific experts / authorities
SKC
• provide an algebra to compute
composed ontologies that are
limited to their articulation terms
• enable interpretation within the
source contexts
Gio Wiederhold I3 50
51. Conservative assumption !
When dealing with multiple ontologies one can never be
sure that identically or similarly spelled words mean the
same thing,
I.e, refer to exactly the same set of real-world objects
under all current and future conditions
• Common, optimistic assumption: Meaning is identical
– Gets worse when terms are stemmed
• SKC, conservative or pessimistic assumption: Meaning
never matches, unless there is a match rule
– number of matching rules is reduced by focusing on the
articulation
Gio Wiederhold I3 51
52. An Ontology Algebra
A knowledge-based algebra for ontologies
Intersection create a subset ontology
keep sharable entries
Union create a joint ontology
merge entries
Difference create a distinct ontology
remove shared entries
The Articulation Ontology (AO) consists of
matching rules that link domain ontologies
Gio Wiederhold I3 52
53. Sample Operation: INTERSECTION
Result contains Terms useful
shared terms for purchasing
Source Domain 1: Source Domain 2:
Owned and maintained Owned and maintained
by Store by Factory
Gio Wiederhold I3 53
54. INTERSECTION support
Articulation ontology Terms useful
for purchasing
Matching
rules that use
terms from the
2 source domains
Store Factory
Ontology Ontology
Gio Wiederhold I3 54
56. Other Basic Operations
UNION: merging DIFFERENCE: material
entire ontologies fully under local control
Arti-
culation
ontology
typically prior
intersections
Gio Wiederhold I3 56
57. Features of an algebra
Operations can be composed
Operations can be rearranged
Alternate arrangements can be evaluated
Optimization is enabled
The record of past operations can be
kept and reused
Gio Wiederhold I3 57
58. Knowledge Composition
Composed knowledge for
Articulation
Legend: applications using A,B,C,E
knowledge
U : union for
U
U (A B) U
U
: intersection (B C) U
U Articulation knowledge
U
(C E) for (C E)
Knowledge
Articulation resource
knowledge E
U
for (A B)
U Knowledge U
(B C) resource (C D)
C
Knowledge Knowledge Knowledge
resource resource resource
A B D
Gio Wiederhold I3 58
59. Sample Processing in HPKB
• What is the most recent year – Problems resolved by SKC
an OPEC member nation was * Factbook has out of date
on the UN security council? OPEC & UN SC lists
– Related to DARPA HPKB • Indonesia not listed
Challenge Problem • Gabon (left OPEC
– SKC resolves 3 Sources 1994)
» CIA Factbook ‘96 * different country names
(nation) • Gambia => The
» OPEC (members, dates) Gambia
» UN (SC members, years) * historical country names
– SKC obtains the • Yugoslavia
Correct Answer » UN lists future security
» 1996 (Indonesia) council members
– Other groups obtained • Gabon 1999
more, » intent of original question
but factually wrong • Temporal variants
answers
Gio Wiederhold I3 59
60. Tools to create articulations
Graph matcher
for
Articulation-
creating
Expert
Vehicle Transport
ontology ontology
Suggestions
for articulations
Gio Wiederhold I3 60
61. continue from initial point
Also suggest similar terms
for further articulation:
• by spelling similarity,
• by graph position
• by term match repository
Expert response:
1. Okay
2. False
3. Irrelevant
to this articulation
All results are recorded
Okay’s are converted into articulation rules
Gio Wiederhold I3 61
62. Candidate Match Repository
Term linkages automatically extracted from 1912 Webster’s dictionary *
* free, other sources
have been processed.
.
Based on processing
headwords definitions Notice presence
using algebra primitives of 2 domains:
chemistry, transport
Gio Wiederhold I3 62
65. Primitive Operations
Model and Instance
Unary Constructors
• create object
• Summarize -- structure up
• create set
• Glossarize - list terms Connectors
• Filter - reduce instances • match object
• Extract - circumscription • match set
Binary Editors
• insert value
• Match - data corrobaration • edit value
• Difference - distance • move value
measure • delete value
• Intersect - schem Converters
discovery • object - value
• Blend - schema extension • object indirection
• reference indirection
Gio Wiederhold I3 65
66. Future: exploiting the result
Avoid n2 problem of interpreter
Result has links mapping as stated by Swartout
to source as an issue in HPKB year 1
Processing & query evaluation
is best performed within Source
Domains & by their engines
Gio Wiederhold I3 66
67. SKC Synopsis
• Research: Reliable query answers from heterogeneous, imperfect
data sources
• Sources:
– General: CIA World Factbook ‘96, UN www, OPEC www
Webster’s Dictionary, Thesaurus, Oxford English Dictionary
– Topical: OPEC, BattleSpace Sensors, Logistics Servers
• Client: DARPA High Performance Knowledge Base
(HPKB) project
• Theory: Rule-based algebra
– Translation & Composition primitives
Gio Wiederhold I3 67
68. Innovation in SKC
• No need to harmonize full ontologies
• Focus on what is critical for interoperation
• Rules specific for articulation
• Potentially many sets of articulation rules
• Maintenance is distributed
–to n sources
–to m articulation agents
is m < n2 , depending on architecture
density a research question
Gio Wiederhold I3 68
69. Mega-programming Process
mega-
program- Mega-program
mer Text
customer
Feedback CHAIMS
Module / platform compiler
descriptions
Wrapper / API
Modules to be / API
Wrapper Mega- Result
composed program GUI
Modules to be API
Wrapper /
composed
Module to be
composed
Gio Wiederhold I3 69
70. Decomposing CALL statements
CHAIMS progress
decomposes in
Copying scale of
CALL
functions computing
Code sharing
Parameterized computation
Objects with overloaded method names
Remote procedure calls to distributed modules
Constrained (black box) access to encapsulated data
Set Up Estimate Invoke Inspect Extract
Gio Wiederhold I3 70
71. Maintenance is good for you
13
12
? 11
100% 10 years
depreciation = 1 / lifetime
90 9
80 8
maintenance cost
70 7
60
relative annual
6
50 5
40 4
lifetime
30 3
20 2
10 1
0
automobile software hardware
Gio Wiederhold I3 71
72. Growing Systems: n modules
Federated: to deal with many servers and clients
resource reuse
changes are difficult
affect many clients
Gio Wiederhold I3 72
73. Systems with Mediators
Gio Wiederhold. 1995
Applications . . . .
Mediators . . . . . .
Data Resources . . .
Gio Wiederhold I3 73
74. Growth through Reuse
Gio Wiederhold. 1995
New Application
Prior & Revised
Mediators
Extended Data
Resources
Gio Wiederhold I3 74
75. Linear O(n) Cost of Growth
now O(n2)
• Data changes only affect some 7 2
mediators; only in their domain
• Mediators can
1. supply old information to n-1
prior applications
2. provide better information to the
new application
3. be partially or completely reused
• New applications, using the new
data, can be developed and
inserted dynamically
Gio Wiederhold I3 75
76. A mediator Is not just
static software
Application
Interface Changes of
user needs
Software & People Domain
Owner/ Creator changes
Models, programs,
Maintainer
rules, caches, . . .
Lessor - Seller
Advertisor
Resource Interfaces Resource
changes
Gio Wiederhold I3 76
77. Assigning maintenance responsibility
a. Source data quality –
supplier database, files, or web pages
b. Interface to the source – Sources
wrapper, supplier or vendor for supplier
c. Source selection –
expert specialist in mediator
d. Source quality assessment –
customer input to mediator Services
e. Semantic interoperation –
specialist group providing input to the mediator
f. Consistency and metadata information –
mediator service operation or warehouse
g. Informal, pragmatic integration –
client services with customer input Customers
h. User presentation formats –
client services with customer input
Gio Wiederhold I3 77
78. Sample projects
• Tsimmis at Stanford
• E-Commerce in Digital Libraries
• INEEL: information integration for environmental
restoration
• MIFT: feedback for training
• Civil Engineering and Architecture
• F-22
• SimQL
• Security
Gio Wiederhold I3 78
79. Projects at Stanford DB group
Data Mining.
Mediator & Wrapper
Generation.
Warehousing. MIDAS
Security Mediators. WHIPS
Megaprogramming. TSIMMIS
Simulation Access.
TIHI
Changes, Consistency,
and Configurations.
C3 CHAIMS SimQL
Gio Wiederhold I3 79
80. The TSIMMIS Project
Ramana Yerneni, Yannis Papakonstantinou, ...
• Objective: Support mediation technology
–integrated access to distributed,
autonomous, heterogeneous data sources,
using object fusion
–wrapper toolkit to rapidly create wrappers,
based on source specification,
a uniform interface to heterogeneous sources
–mediator toolkit to rapidly construct
mediators, based on a mediator specification,
to integrate data from a set of wrappers
Gio Wiederhold I3 80
81. Investors Need to Fuse Information
from Multiple Sources .
Network
Ticker Tape Personal
database
WWW
• group together information about
the same real-world entity
• remove redundancies
• resolve conflicts
Gio Wiederhold I3 81
82. An Integration Architecture
Client
Application
portfolios for each company
Mediator
stock market prices business reports
Wrapper Wrapper
Ticker
Tape Dialog
Gio Wiederhold I3 82
83. Additional Challenge: Sources Without a
Well-Structured Schema
• semistructured Examples
– irregular • World Wide Web
– deeply nested • SGML documents
• incomplete • genome, chemical
schema knowledge structures
– autonomous
• bibliographic
– dynamic
information
• files
Gio Wiederhold I3 83
85. E-money
Services must be paid for
• Incentive for creation and improvement
• price proportional to value added, often small
• profit f (cost, market, price, overhead )
• price low per item, so overhead must be low
Simple payment (no credit accounts, checks)
Enabled through secure signatures
yes
Gio Wiederhold I3 85
86. E-Commerce in the Digital
Library
Steven Ketchpel & DL Economics Group
Payment Delivery
CyberCash Cryptolope
DigiCash
Major DigiBox
First Virtual Integration HTTP
SET Problem E-mail
Shopping Models: Pay-per-view, Subscription,
Session, Shareware, Auctions, Site License,
Gift Certificate, Layaway, Pre-paid vouchers, … .
Gio Wiederhold I3 86
87. Shopping model: merchant-independent
logic controlling flow of business model
Example shopping models: State
Order, Pay, (Deliver 52 times) Information
(1 month; Order, Deliver) Pay
Event Handlers
Event Handlers
Bill
Event Handlers
2 1
Order Merchant
Customer
Complete
3
4 Payment
Start Transfer $
Complete
Event Handlers
Abstract API Proxy event handlers
allows application to translate from
Payment/Delivery/
interact with many Other Services native applications
different services to shopping model
in a consistent way defined protocols
Gio Wiederhold I3 87
88. TSIMMIS Status
• Mediator Specification Interpreter running on
Ultrix, AIX, OSF.
• 9000 lines of C/C++ code
• 4000 C++ lines of Server/Client Support Libraries
• Integration of three disparate bibliographic
sources
– legacy system
– flat BibTeX files
– relational DB
– wwWeb files
Gio Wiederhold I3 88
89. Mediator Specification Interpreter
Architecture
Result Query
Query Rewriter Mediator
Specification
logical datamerge
program
Cost-Based Optimizer
plan
Datamerge Engine
Queries to
Results
Wrappers
Gio Wiederhold I3 89
90. Environmental Restoration at
INEL Undoing 50 years of messes
….
MSL [Stanford]
MQL [ISX] OQL [ODMG]
OEM
OEM
OEM QEM QEM
QEM OEM
QEM OEM
other QEM
mediators
mediator CORBA
wrapper QEM
QEM OEM
OEM wrapper
QEM
wrapper
wrapper
Many projects ERIS
IEDMS
many sources ISX - Stanford Univ.
LOCKHEED MARTIN Idaho National
Engineering Laboratory
Gio Wiederhold I3 90
91. CHAIMS - software composition
Domain expert
IO module Client workstation IO module
C
Computation
Services b e
MEGA modules
a
T d
Sites S c U T
R
Data
Resources
Gio Wiederhold I3 91
92. Mediation to Implement Feedback in
Training
David Maluf, Priya Panchapagesan, Ted Linden
Another task of mediators, prior to integration
MIFT Abstraction
Abstraction to match levels of granularity
Gio Wiederhold I3 92
93. Mediation Feedback:
Playback or Graph
User
Interface Commanders Training
Trainees UI in
Developers Analysts Java
Observers
Application
Layer
Standards
Objectives in KQML
Mediation
Tasks Stanford
Layers Mediators with
rules in CLIPS
I.D.A
Wrappers
Wrapped in C/C++
Simulation
Resources Janus SimNet
Gio Wiederhold I3 93
94. MIFT .
Result .
Analyses:
• Force ratio
• Losses
• Area gain
Exercise
Simulator
Type
Gio Wiederhold I3 94
96. F-22 IWSDB Phase 6
User Interfaces Integration Services Wrappers Databases
PD
Appli- Change Sy- DS
cation Notification base
Provi- PRIDE Index
Query Re-
sioner formulation WAIS
server
Match Domain Suppliers
Engi- IWSDB maker Model
neer
client S
Domain
Q
GUI Matching
L
Gio Wiederhold I3 96
97. Current state of DM Support
past now future
time
organized support disjointed support
Data integration
x17 @qbfera
ffga 67 .78 jjkl,a
nsnd nn 23.5a
Intuition +
• Spreadsheets
• Planning of allocations
Databases
• Other simulations
distributed, heterogeneous various point assessments
Gio Wiederhold I3 97
98. Information Systems should also
Project into the Future
past now future
time
Databases,
Simulations,
accessed via SQL or
accessed via SimQL and
CORBA compliant
compliant wrappers
wrappers
Msg
systems,
sensors
Gio Wiederhold I3 98
99. SimQL: Simulation Access Service
Information Systems should also deal
with the Future
past SQL now SimQL future
time
Decision-making requires dealing with the future, as well the past
• Databases deal well with the past
• Sensors can provide current status
• Spreadsheets, simulations deal with the likely futures
Information systems should be able to combine all three
Gio Wiederhold I3 99
101. Enabling Interoperation
Databases Simulations should
• serve clients via SQL by • serve clients via SimQL by
Sharing a Model (The Schema)
Sharing a Model (research q.)
A query language over the model A query language over the model
the SQL interface enables a SimQL interface will enable
• independence of • independence of
application development application development
DBMS technology development simulation technology develop’t
reuse of infrastructure reuse of infrastructure
Today Objective
• most new systems use a • build information systems
DBMS for data storage combining DBMS, Simulations
even with less performance, even with less performance,
inability to handle all problems, inability to handle all problems,
but enough of them well enough. but enough of them . . .
Gio Wiederhold I3 101
102. Internet requirements
• Ubiquitous acess to simulations
of a wide variety of types
• Rapid response to parameter changes
– often High-Performance computation is
needed
– distributed simulations with synchronization
• Rapid Service Composition
– High bandwidth among simulations
– Acces to multiple services in parallel
Gio Wiederhold I3 102
103. Even the present needs SimQL
point-in-time for
last recorded observations situational
assessment
simple simulations
to extrapolate data
past now future
time
Is the delivery truck in X?
Not all data are current:: • Is the right stuff on the truck?
• Will the crew be at X?
• Will the forces be ready to accept delivery?
Gio Wiederhold I3 103
104. Use of Simulation Results
Simulation results can be composed for
Alternative Courses-of-actions
Composition should be seamless, elegant,
with computation and recomputation of
likelihoods
Results change as now moves forwards and
eliminates earlier alternatives.
Gio Wiederhold I3 104
105. Types of simulation services
1. Continously executing: weather prediction
– SimQL result reports best match samples
2. Execution specific to query: what-if assessment
– may require HPC power for adequate response
3. Past simulations collect results in a base: materials
– performs inter- or extra-polations to match query parameters
4. Combinations, i.e., 2. + 3.: top layer simulation using stored
partial lower level results: weapon performance in new setting
5. Human-in-the-loop (mediated by an agent program): SAFs
Note
• A simulation service program can be written in any language
• A simulation service must be compliant to the interface spec.
Gio Wiederhold I3 105
106. Tools for Managing Partitioning
Separate internals and interfaces, at many levels
• Object Libraries
• Product Design hierarchical standards (PDES)
• Domain-Specific Systems Analysis (DSSA)
• Ontology documentation (Ontolingua)
• Remote Object Access (CORBA 1.2, 2.0)
• Knowledge Interchange Formalism (KIF)
• Transport in / of heterogeneous situations
(KQML specifies content repr., ontology)
Gio Wiederhold I3 106
107. Moving to a Service Paradigm
• Server is an independent contractor, defines service
• Client selects service, and specifies parameters
• Server’s success depends on value provided
• Some form of payment received for services
x,y
Databases are a current example.
Simulations have the same potential.
Gio Wiederhold I3 107
108. New Role for Consultants
Old
• Used at Design Time
and
• To Explain Failures
Future
• Available as a Service
• Responsible for
Knowledge Maintenance
Gio Wiederhold I3 108
109. Long Range Science Vision
Systems Artificial
Databases Intelligence
Engineering
access knowledge mgmt
analysis
storage domain expertise
documentation
algebras uncertainty
costing
Integration Methods
GIS
Integration
Spatial is special. Science
Gio Wiederhold I3 109
110. Summary
• Mediation bridges Applications and Sources
• Mediator technology transforms data to information
by applying an expert maintainer’s knowledge
• Abstraction reduces data further for decision making
• Must be integrated with sensors, simulation results
• Mediation permits incremental system growth (nlogn)
• Mediators provide a service-model on the networks
New research
Recognition and resolution of semantic differences
Simulation access as a new service
more on http://www-db.stanford.edu/people/gio.html
Gio Wiederhold I3 110