SlideShare a Scribd company logo
1 of 110
Download to read offline
I3 Master




Integration of Information
           from
 Heterogeneous Sources

      October 2001
     Gio Wiederhold
     Stanford University

                             Gio Wiederhold I3 1
Change is constant

Changes are imposed by
• Technology advance
• Local government
• Federal rules
• Competition
• Emerging standards
Systems must be designed
  and operated to recognize
  and adapt to change

                              Gio Wiederhold I3 2
Information Leverage

Tactical              Strategic
• Customers           • Planning
• Inventory           • Capabilities
• Suppliers           • Opportunities




a variety of              external and
internal sources      imprecise sources

                                 Gio Wiederhold I3 3
Information            overload
    Data              starvation
              • More databases
                 – public & corporate
              • Faster communication
                 – digital
                 – packeting: TCP-IP, ATM
              • World-wide connectivity
                 – internet
                 – world-wide web
              • Disintermediation
                 – ubiquitous publishing
                               Gio Wiederhold I3 4
Focus on Information Systems

                 Computing
                 Systems



Processing       Information      Real-time
as                Systems         control of
Analyses         (on-line and     processes,
Payroll, . . .   . distributed,   factories, . . .
                     ... )


                                         Gio Wiederhold I3 5
Data and Knowledge

Knowledge Loop            Data Loop     Information is
                  Storage               created at the
      Education                         confluence of
                                        data -- the state
             Selection   Recording
                                                 &
            Integration                 knowledge --
                                           the ability to
Experience
           Abstraction                      select and
                          State changes
                                            project the
        Decision-making                     state into
              Action                        the future

                                              Gio Wiederhold I3 6
Knowledge Manifestations

• Procedural
                              • Creators
• system analysts
                              • faster
• programmers
                        }-{   • Maintainers
• Declarative
                              • easier
• domain analysts
• knowledge engineers
• rule writers


                                         Gio Wiederhold I3 7
Transform Data to Information


Application   decision-makers at workstations
  Layer

Mediation
  Layer             value-added services



Foundation
  Layer

                 data and simulation resources


                                           Gio Wiederhold I3 8
Dealing With Heterogeneity

• Hardware platform . . . . . Hidden by operating system
• Operating system . . . . . . Choices are reducing: NT, UNIX,
                                      ...
                                            Fewer choices
•   Programming language . . .
                                       Irrelevant in remote access
•   Database system model . .
                                       Relational and E-R common
•   Database system . . . . . . .
                                       Standards, convergence
•   Coverage . . . . . . . . . . . . .
     – Attributes                      Source dependent
     – Scope                         documented, additive
                                     undocumented, intersecting
• Data representation . . . . .
• Data semantics . . . . . . . . . Conversion problems, nulls
                                   Requires knowledge Gio Wiederhold I3 9
Definition*

A mediator is a software module that exploits
 encoded knowledge about certain sets or
 subsets of data to create information for a
 higher layer of applications.

It should be small and simple, so that it can
   be maintained by one expert or, at most, a
   small and coherent group of experts.

               * Wiederhold: IEEE Computer March 1992

                                           Gio Wiederhold I3 10
Flow in mediation

•   DELIVERY
              t        s
• SUMMARIZATION
              t        s
• INTEGRATION
               t   s
• ABSTRACTION
           t       s
• ACCESS




                                   Gio Wiederhold I3 11
Functions inside Mediation


               Summarize
articulation




      Transform             Hetero-
                            genous

      Selection            resources
                                  Gio Wiederhold I3 12
Example in Health Care

                               Health Care Planner
Will the Clinic loose Money?



            Patient                         Investment
          Care domain                         domain



 Age Profile          Service Operations             Bond Sales

  Patient Volume Growth        Loan Interest    State Support

                                                        Gio Wiederhold. 1995
                                                     Gio Wiederhold I3 13
Functional Layer

                                   Human-computer
               User interface       Interaction
                                   Application-
          Service                    specific code
        interface
                                       Domain-
                           MEDIATION    specific
Resource access                           code
  interface                                 Source-
                                             specific
                                               code
  Real-world
  interface
                                           Gio Wiederhold I3 14
Function of Mediation

Apply Domain-specific Specialist
 Knowledge to add value
•   to locate data sources
•   to describe data for use
•   to convert for consistency
•   to abstract for insight / models
•   to extrapolate to new situations
•   to integrate from diverse sources
•   to re-abstract for presentation

     INFORMATION
                                        Gio Wiederhold I3 15
Architectures &
                  Communication
Presen- Printed        terminl  Mini-            Work     User
 tation reports                comptr           station Workst.
                       Appli-                            Infor-
Infor-  Appli-                 Appli-           Infor-
                        cation                          mation
 mation cation                 cation           mation
Aggre- Compu-          Compu- Compu-           CORBA
                                                         Aggre-
 gation  tatio          tation  tation                   gation
Access,   I-O          SQL for Select           Object SQL, ...
 Select  code            A&S     FTP            Struct. for A&S
Data     Local           Data    File           Server   Distr.
Source  Storage          Base  Storage         Storage Sources
Function ‘mainframe’   smart     file server   client server      mediated
                        terminal                               Gio Wiederhold. 1995
                                                           Gio Wiederhold I3 16
Current Methods

• Access: WWW with MOSAIC
  – browsing, collection services: Harvest, ALIWEB, Fish
• SQL with Views
  – one verb, one database, one datatype
  – predefined subsets
• Grouping: Objects with Corba
  – predefined aggregation with methods
• View-Objects
  – created via extension of relational algebra
• Summarization
  – Tables from text documents; Exception search


                                                   Gio Wiederhold I3 17
Central Solutions do not Scale



  What works
  with 7 modules                       fails when we
  and one person                    have 100 and need
  in charge                              a committee




Any changes in resources affects the central module
                                             Gio Wiederhold I3 18
Evolution of mediation
applications
                 A2         A3             A4             A5
     A1

                                                                    A6
                  integrators
     a.           I1             I2
                                                      mediators
      network    b.              c.              M1            M2
                                                     d.     e.


D1                          wrappers
                                 W2             W3
                                                                 D6
                  W1                  D4         D5
          D2           D3
                                 datasources
                                                            Gio Wiederhold I3 19
Domain-specific Mediation

              • User application
                – Workstations
              • Mediator
                – Expert-owned
                  nodes
              • Data sources
                – Remote primary
                  and byproduct
                  services



                           Gio Wiederhold I3 20
Mediation for Quality

                User Model          BEST=
S= source
   reliability   f(S,C,T)              low cost
C= confidence                          rapid response
                  Assessments:         reliable delivery
T=             S1=.8 S2=.9 S3=8        trustworthiness

Estimates:
 C1= 5+_1      C2= 8+_1           C3= 10+_1
 T1=100+_160   T2=70+_30          T3=50+_80


    S1           S2                 S3

                                             Gio Wiederhold I3 21
Allocation Flexibility
                                  User Interfaces
                Application C   Application B
                                                        Application I
                                    M2
                                                                   Provider
  Provider of                                                      of medi-
  Mediator M
                      M
                                                                   ator N
Copy- if high
                                   HPC
intensity of                                                   N
interaction with                        M1
1. Application (M2)
2. Resources (N1,2)                                 N
3. Processing (M1)                                  1
                                  N
Mediators are          DB         2     DB
 only code                                               DBS R
                        P                Q
                            Databases                        Gio Wiederhold I3 22
Features of Mediation

• Domain-specific partitioning for                       C
  Creation and Maintenance                    B
                                      A
                                                                D
• Network-basing for easy
  Reconfiguration
                                          E
• Caching to deal with
  Asynchronocity
                                     A1
• Replication for    A1’
  Performance

                                                  Gio Wiederhold I3 23
Allocation Flexibility
                                  User Interfaces
                Application C   Application B
                                                        Application I
                                    M2
                                                                   Provider
  Provider of                                                      of medi-
  Mediator M
                      M
                                                                   ator N
Copy- if high
                                   HPC
intensity of                                                   N
interaction with                        M1
1. Application (M2)
2. Resources (N1,2)                                 N
3. Processing (M1)                                  1
                                  N
Mediators are          DB         2     DB
 only code                                               DBS R
                        P                Q
                            Databases                        Gio Wiederhold I3 24
Central Solutions do not Scale



What works
with 7 modules                        fails when we
and one person                     have 100 and need
in charge                               a committee




Changes in resources affect the intermediary modules
                                            Gio Wiederhold I3 25
Integration at two levels
Application
• Informal, pragmatic
• User-control

Mediation
• Formal service
• Domain-Expert control




                                     Gio Wiederhold. 1995
                                  Gio Wiederhold I3 26
Status of Mediation Technology

Today                       Future
• Handcrafted               • Generated from models
• Expert consults with      • Domain Expert
  programmer                  maintains models
• Programmer codes the      • Specification
  knowledge needed            determines functions
• Resource changes          • Resource changes
  require advise, program     trigger regeneration
  update


                                          Gio Wiederhold I3 27
Facilitators
                                          Another
                                       Module Type in
Facilitators Procure Linkages
                                        Information
• search for suitable resources
                                          Systems
• resolve terminological mappings
• build system configurations
• issue subqueries, as needed
• combine results from subqueries
    perform these tasks dynamically
             without human intervention
    depend greatly on ontologies
• can call on mediators for value added services


                                                   Gio Wiederhold I3 28
Facilitators and Mediators




                                  accessible
                                  ontology
                    designed
dynamic




                               Gio Wiederhold I3 29
Available Technology/Science

     User Models Domain Ontologies        Geographic Models

      Agents      Deductive Databases    Spatial abstractions

    Object Bases     Temporal Algebras    Uncertainty algebras

   Constraint Management        Circumscription    Security Filters

 Active Databases    Human Lang. Proc.    Case-based Reasoning

DB Views       Wrappers       Distributed Storage Systems Caching

Database Models     Knobots    Simulation Access    High Perf.Comm.



                                                          Gio Wiederhold I3 30
Status of Mediation Technology

Today                       Future
• Handcrafted               • Generated from models
• Expert consults with      • Domain Expert
  programmer                  maintains models
• Programmer codes the      • Specification
  knowledge needed            determines functions
• Resource changes          • Resource changes
  require advise, program     trigger regeneration
  update


                                          Gio Wiederhold I3 31
Coverage of Current I3 Efforts
      Good progress / active research / related work / poor coverage




                                ]




                                                                      )
                                                   |
              (
        Discovery                                              Abstraction
 :-)




                                           :-[




                                                                                         :-(
        (web,schema                                            for relevance
        searching)                  Maintenance                to customer
                                    (rule technology?)
                                                                  Caching /




                                                                                         :-|
         Facilitation
                                                                    History
 :-[




         (auto linking)




                                                                                :-(
                                Mediators
                 :-|




                                                         :-(
                              for multiple domains                    Integration
      Security
                                                                      over sources
  for cooperation
                                :-)
:-(




                                                                                          :-[
                       Wrapping (syntactical heterogeneity)


  Databases / Web / Text / Simulation




                                                                                          :-(
                 :-)




                              :-)



                                            :-[




                                                                  Gio Wiederhold I3 32
Building Stovepipes
                           Gio Wiederhold. 1995




 Mismatched
   assump-        Scaffolding
     tions




   Similar
    functions,
   different
    assign-
                  Scaffolding
     ments to
    modules
                      Gio Wiederhold I3 33
Middleware

CORBA (Common Object Request Broker)




                                                Many standards by many vendor groups
    – IBM SOM, DSOM
• DOE (Distributed .Objects Everywhere)



                                                                                             {
    – SunSoft
• DOME                                                                                 Shared
• EZ-bridge                                                                            speci-
    – System Strategies inc.                                                           fication
• ILU (InterLanguage Unification) Xerox
• ISIS
• KQML (Knowledge Query & Manipulation Lang.)
• MQM (Message Queing Middleware)
    – IBM (for mainframe connections)
• OLE (Object embedding and Linking)
• OpenDOC (Apple)
• PDES (Product Data Interchange using STEP)
• TIB (Teknekron Information Bus)
                                                                                             Gio Wiederhold I3 34
New Tools
From the ARPA-Sponsored Knowledge Sharing Effort

• KQML: Knowledge Query & Manipulation Language
   More Verbs: Performatives
   Multi-source, Multi-mediator, Multi-content

• KIF: Kowledge Interchange Formalism
    Exchange complex data, rules, . . .
       among Expert Systems and Subsystems

• LOOM: Classification-based Expert System
• Ontolingua: Repository for Domain Terminologies
                                         Gio Wiederhold I3 35
KQML
      KNOWLEDGE QUERY & MANIPULATION LANGUAGE

= Ontology
= Representation    }
•   Get,
                                  Hq97
•   Put,
•   Infer,
•   Subscribe,
•    Advertise,
                           speak KIF, objects,
•   ...
                             tuples, equations

                                         Gio Wiederhold I3 36
KQML APIs

 Several suppliers            Multiple platforms
FAT
Fat    and THIN versions Mainly to Internet (TCP/IP)
             thin
  Not (yet) shrinkwrapped, require interaction
 –   Un.of Maryland, Baltimore County, with UNISYS
 –   Stanford Design Projects ABSE [Gensererth et al.]
 –   Crystalliz (Cambridge MA), transmits PDES, SQL on PC
 –   BBN for planning, rapid assembly of joint task forces
 –   ISX (Westlake Village, CA) Demonstration tools
 –   Toronto Univ. Enterprise Integration Laboratory
 –   EITech Servicemail (uses email to go across firewalls)




                                                   Gio Wiederhold I3 37
KIF -- Knowledge Interchange

Transmits among
 Expert Systems
• LOOM
• Ontolingua
• others

ANSI X3T2 evaluation
Compatible with Conceptual Graphs
Used by KQML to describe choices

                                    Gio Wiederhold I3 38
Two Design Phases

1. Resource Integration

                          2. Customer Focusing

            Co mmon
             M odel




                                       Gio Wiederhold I3 39
Mediator Design Principle

Transform Data into
  Information
Match
 Customer Model
     Hierarchical
         to
 Resource Model
     General network

(and maintain models)
                             Gio Wiederhold I3 40
Fat versus thin mediators

            • too thin: insufficient added value

            • Too fat: hard to
              compose
                                  Just right

service     • Too narrow: few costumers
 scope

                          • too broad:
                            hard to maintain, needs
                            a committee
     domain scope
                                          Gio Wiederhold I3 41
Heterogeneity among Domains

If interoperation involves distinct
    domains mismatch ensues
• Autonomy conflicts with consistency,
   – Local Needs have Priority,
   – Outside uses are a Byproduct
Heterogeneity must be addressed
• Platform and Operating Systems 4 4
• Representation and Access Conventions 4
• Naming and Ontology :
                                    Gio Wiederhold I3 42
Unsolved problem in Interoperation

Common assumption in assembling and integrating
  distributed information resources
• The language used by the resources is the same
• Sublanguages used by the resources are subsets of a
  globally consistent language
This assumption is provably false.
Working towards the goal of global consistency is
1. naïve -- the goal cannot be achieved
2. inefficient -- languages are efficient in local contexts

                                                 Gio Wiederhold I3 43
Ontology: components             .




We represent the contents and structure of a
languages by its ontology:
• a set of well-defined terms,
  which delimit the domain of
  discourse
• relationships among those terms,
  chosen from a limited set
a formalizable subset of expert
  knowledge
                                      Gio Wiederhold I3 44
SKC’s grounded definition                                   .




• Ontology:
    a set of terms and their relationships
• Term:
    a reference to real-world and abstract objects
• Relationship:
    a named and typed set of links between objects
• Reference:
    a label that names objects
• Real-world object:
    an entity instance with a physical manifestation
• Abstract object:
    a concept which refers to other objects


                                                       Gio Wiederhold I3 45
Where are Ontologies found?

Ontologies allow communication among partners in
 enterprises (rarely in machine-readable form)
Relationships determine meaning - parent, school, company

Variable and Class names in Software
Databases use ontologies during design                 in
 their E-R diagrams (implicitly) and to represent
 the leaf nodes in their schemas.
Knowledge-bases use term ontologies (often
 explicitely), add class definition (to hold instances),
 constraints, and operations among the terms.

                                                    Gio Wiederhold I3 46
Establishing Ontologies

Top-down:
  –Commonly acceptable UPPER layers
Domain-specific
  –Analysis and Sharing tools
  –Model and Object-type based
Bottom-up
  –Wordlist creation from task-specific
   collections
  –Database models, schemas, and contents

                                 Gio Wiederhold I3 47
Large Ontologies: good or bad?

 Have all the Knowledge together
  + simple for customers of KBs
  – hard for owners of KBs, must synchronize with many
   others
  – in the limit -- everybody must be globally consistent

 Large KB will cover multiple / all domains
   created by a committee -- slow
   maintained by a committee -- costly
 Differences in level of abstraction -- efficiency
   homeowner: nail
   carpenter: sinker, brad, boxnail, . . .
                                               Gio Wiederhold I3 48
Domain ontology assumption                               .




• a domain will contain known objects
• the object configuration is consistent
• within a domain all terms are consistent
  &
• relationships among objects are
  consistent Ontology
      Domain                No committee is needed
                             to forge compromises *
                             within a domain
• context is implicit in use
                                Compromises hide valuable details
• explicit context is needed
  for external use                            Gio Wiederhold I3 49
SKC Objective

Provide for Maintainable Ontologies
• devolve maintenance onto many
  domain-specific experts / authorities
                                                 SKC
• provide an algebra to compute
  composed ontologies that are
  limited to their articulation terms
• enable interpretation within the
  source contexts

                                        Gio Wiederhold I3 50
Conservative assumption !

When dealing with multiple ontologies one can never be
  sure that identically or similarly spelled words mean the
  same thing,
  I.e, refer to exactly the same set of real-world objects
               under all current and future conditions
• Common, optimistic assumption: Meaning is identical
   – Gets worse when terms are stemmed
• SKC, conservative or pessimistic assumption: Meaning
  never matches, unless there is a match rule
   – number of matching rules is reduced by focusing on the
     articulation

                                                Gio Wiederhold I3 51
An Ontology Algebra

A knowledge-based algebra for ontologies

  Intersection    create a subset ontology
                            keep sharable entries
  Union           create a joint ontology
                            merge entries
  Difference      create a distinct ontology
                            remove shared entries

The Articulation Ontology (AO) consists of
  matching rules that link domain ontologies

                                            Gio Wiederhold I3 52
Sample Operation: INTERSECTION


    Result contains      Terms useful
      shared terms       for purchasing




Source Domain 1:       Source Domain 2:
Owned and maintained   Owned and maintained
by Store               by Factory

                                   Gio Wiederhold I3 53
INTERSECTION support

Articulation ontology   Terms useful
                        for purchasing
    Matching
  rules that use
 terms from the
2 source domains




           Store            Factory
        Ontology            Ontology
                                  Gio Wiederhold I3 54
Sample Intersections

  Articulation          size = size
   ontology        color =table(colcode)
  matching rules :     style = style


Ana-
tomy                                       Shoe Factory
                                           • Material inventory {...}
{. . . }      Shoe Store                   • Employees { . . . }
              • Shoes { . . . }            • Machinery { . . . }
                                                                           Hard-
              • Customers { . . . }        • Processes { . . . }           ware
              • Employees { . . . }        • Shoes { . . . }


foot = foot       Employees                              Employees
                  Nail (toe, foot)    Department         Nail (fastener)
                  ...                    Store           ...
                                                             Gio Wiederhold I3 55
Other Basic Operations

UNION: merging              DIFFERENCE: material
entire ontologies           fully under local control

                       Arti-
                     culation
                     ontology




   typically prior
   intersections
                                           Gio Wiederhold I3 56
Features of an algebra


     Operations can be composed
     Operations can be rearranged
Alternate arrangements can be evaluated
        Optimization is enabled
 The record of past operations can be
           kept and reused

                                  Gio Wiederhold I3 57
Knowledge Composition

                                                      Composed knowledge for
                      Articulation
Legend:                                               applications using A,B,C,E
                      knowledge
U : union             for
                              U
U                           (A B) U
                              U
    : intersection          (B C) U
                                  U                                Articulation knowledge
                                                                                  U
                              (C E)                                    for (C E)

                                                                   Knowledge
                     Articulation                                    resource
                     knowledge                                         E
                             U
                     for (A B)
                                             U        Knowledge               U
                                        (B       C)    resource          (C       D)
                                                          C
               Knowledge              Knowledge                     Knowledge
                resource               resource                     resource
                   A                      B                            D
                                                                          Gio Wiederhold I3 58
Sample Processing in HPKB
• What is the most recent year    – Problems resolved by SKC
  an OPEC member nation was          * Factbook has out of date
  on the UN security council?          OPEC & UN SC lists
   – Related to DARPA HPKB               • Indonesia not listed
       Challenge Problem                 • Gabon (left OPEC
   – SKC resolves 3 Sources                1994)
       » CIA Factbook ‘96            * different country names
         (nation)                        • Gambia => The
       » OPEC (members, dates)             Gambia
       » UN (SC members, years)      * historical country names
   – SKC obtains the                     • Yugoslavia
      Correct Answer                 » UN lists future security
       » 1996 (Indonesia)              council members
   – Other groups obtained               • Gabon 1999
     more,                           » intent of original question
     but factually wrong                 • Temporal variants
     answers

                                                     Gio Wiederhold I3 59
Tools to create articulations

Graph matcher
for
Articulation-
      creating
Expert

       Vehicle                                    Transport
       ontology                                    ontology




                      Suggestions
                    for articulations

                                        Gio Wiederhold I3 60
continue from initial point

Also suggest similar terms
   for further articulation:
• by spelling similarity,
• by graph position
• by term match repository

Expert response:
1. Okay
2. False
3. Irrelevant
       to this articulation
All results are recorded
Okay’s are converted into articulation rules
                                               Gio Wiederhold I3 61
Candidate Match Repository

   Term linkages automatically extracted from 1912 Webster’s dictionary *




                                                              * free, other sources
                                                              have been processed.
                                                              .



Based on processing
headwords  definitions                                Notice presence
using algebra primitives                               of 2 domains:
                                                       chemistry, transport




                                                           Gio Wiederhold I3 62
Using the match repository




                             Gio Wiederhold I3 63
Navigating the match repository




                             Gio Wiederhold I3 64
Primitive Operations
          Model                and             Instance
Unary                                Constructors
                                     • create object
• Summarize -- structure up
                                     • create set
• Glossarize - list terms            Connectors
• Filter - reduce instances          • match object
• Extract - circumscription          • match set
Binary                               Editors
                                     • insert value
• Match - data corrobaration         • edit value
• Difference - distance              • move value
  measure                            • delete value
• Intersect - schem                  Converters
  discovery                          • object - value
• Blend - schema extension           • object indirection
                                     • reference indirection

                                                          Gio Wiederhold I3 65
Future: exploiting the result

                         Avoid n2 problem of interpreter
Result has links         mapping as stated by Swartout
to source                as an issue in HPKB year 1




      Processing & query evaluation
     is best performed within Source
        Domains & by their engines
                                        Gio Wiederhold I3 66
SKC Synopsis

• Research: Reliable query answers from heterogeneous, imperfect
  data sources
• Sources:
   – General: CIA World Factbook ‘96, UN www, OPEC www
       Webster’s Dictionary, Thesaurus, Oxford English Dictionary
   – Topical: OPEC, BattleSpace Sensors, Logistics Servers
• Client: DARPA High Performance Knowledge Base
              (HPKB) project
• Theory: Rule-based algebra
   – Translation & Composition primitives



                                                            Gio Wiederhold I3 67
Innovation in SKC

•   No need to harmonize full ontologies
•   Focus on what is critical for interoperation
•   Rules specific for articulation
•   Potentially many sets of articulation rules
• Maintenance is distributed
   –to n sources
   –to m articulation agents
      is m < n2 , depending on architecture
        density a research question
                                       Gio Wiederhold I3 68
Mega-programming Process


 mega-
program-            Mega-program
  mer                   Text
                                                         customer
                         Feedback   CHAIMS
Module / platform                   compiler
  descriptions

         Wrapper / API
  Modules to be / API
        Wrapper                        Mega-                Result
   composed                           program                GUI
     Modules to be API
         Wrapper /
       composed
       Module to be
        composed
                                                Gio Wiederhold I3 69
Decomposing CALL statements

CHAIMS                                                   progress
decomposes                                                  in
                           Copying                       scale of
CALL
functions                                               computing
                         Code sharing
                  Parameterized computation
            Objects with overloaded method names
         Remote procedure calls to distributed modules
    Constrained (black box) access to encapsulated data
Set Up        Estimate      Invoke        Inspect               Extract



                                                    Gio Wiederhold I3 70
Maintenance is good for you
                                                                            13
                                                                           12
                                                        ?                   11
                              100%                                         10 years
depreciation = 1 / lifetime




                               90                                           9
                               80                                           8
 maintenance cost




                               70                                           7
                               60
relative annual




                                                                            6
                               50                                           5
                               40                                           4




                                                                                 lifetime
                               30                                           3
                               20                                           2
                               10                                           1

                               0
                                     automobile   software   hardware

                                                                    Gio Wiederhold I3 71
Growing Systems: n modules

   Federated: to deal with many servers and clients




resource reuse



                              changes are difficult
                              affect many clients

                                           Gio Wiederhold I3 72
Systems with   Mediators
                                  Gio Wiederhold. 1995
Applications . . . .



Mediators . . . . . .



Data Resources . . .




                              Gio Wiederhold I3 73
Growth through   Reuse
                          Gio Wiederhold. 1995
New Application



Prior & Revised
Mediators


Extended Data
Resources



                     Gio Wiederhold I3 74
Linear O(n) Cost of Growth
            now O(n2)
• Data changes only affect some          7 2
  mediators; only in their domain
• Mediators can
  1. supply old information to n-1
  prior applications
  2. provide better information to the
  new application
  3. be partially or completely reused
• New applications, using the new
  data, can be developed and
  inserted dynamically
                                         Gio Wiederhold I3 75
A mediator Is not just
         static software
  Application
  Interface                         Changes of
                                    user needs



      Software & People                    Domain
                       Owner/ Creator      changes
Models, programs,
                        Maintainer
rules, caches, . . .
                         Lessor - Seller
                          Advertisor


Resource Interfaces             Resource
                                changes
                                           Gio Wiederhold I3 76
Assigning maintenance responsibility
a. Source data quality –
             supplier database, files, or web pages
b. Interface to the source –                                     Sources
             wrapper, supplier or vendor for supplier
c. Source selection –
             expert specialist in mediator
d. Source quality assessment –
             customer input to mediator                           Services
e. Semantic interoperation –
             specialist group providing input to the mediator
f. Consistency and metadata information –
             mediator service operation or warehouse
g. Informal, pragmatic integration –
             client services with customer input                 Customers
h. User presentation formats –
             client services with customer input
                                                      Gio Wiederhold I3 77
Sample projects

• Tsimmis at Stanford
• E-Commerce in Digital Libraries
• INEEL: information integration for environmental
  restoration
• MIFT: feedback for training
• Civil Engineering and Architecture
• F-22
• SimQL
• Security


                                              Gio Wiederhold I3 78
Projects at Stanford DB group

    Data Mining.
 Mediator & Wrapper
    Generation.
   Warehousing.               MIDAS
 Security Mediators.                               WHIPS
 Megaprogramming.          TSIMMIS
 Simulation Access.
                                      TIHI
Changes, Consistency,
 and Configurations.
                        C3 CHAIMS SimQL

                                             Gio Wiederhold I3 79
The TSIMMIS Project
  Ramana Yerneni, Yannis Papakonstantinou, ...


• Objective: Support mediation technology
  –integrated access to distributed,
   autonomous, heterogeneous data sources,
   using object fusion
  –wrapper toolkit to rapidly create wrappers,
   based on source specification,
   a uniform interface to heterogeneous sources
  –mediator toolkit to rapidly construct
   mediators, based on a mediator specification,
   to integrate data from a set of wrappers


                                        Gio Wiederhold I3 80
Investors Need to Fuse Information
   from Multiple       Sources                .




                                       Network



                 Ticker Tape             Personal
                                         database
WWW
      • group together information about
                       the same real-world entity
      • remove redundancies
      • resolve conflicts
                                       Gio Wiederhold I3 81
An Integration Architecture

                        Client
                      Application
                             portfolios for each company

                        Mediator
stock market prices                    business reports
             Wrapper               Wrapper


               Ticker
                Tape                Dialog
                                             Gio Wiederhold I3 82
Additional Challenge: Sources Without a
         Well-Structured Schema


• semistructured        Examples
  – irregular        • World Wide Web
  – deeply nested    • SGML documents
• incomplete         • genome, chemical
  schema knowledge     structures
  – autonomous
                     • bibliographic
  – dynamic
                       information
                     • files

                                   Gio Wiederhold I3 83
Wrappers & Mediators from
      High-Level Specifications
                            DeclarativeMediator
          Client               Specification

                            Mediator Specification
         Mediator                Interpreter
                             Wrapper Specification
Wrapper                          Interpreter
              Wrapper
                                 Declarative
                                   Source
Source             Source       Specifications


                                        Gio Wiederhold I3 84
E-money
Services must be paid for
• Incentive for creation and improvement
• price proportional to value added, often small
• profit f (cost, market, price, overhead )
• price low per item, so overhead must be low
Simple payment (no credit accounts, checks)




Enabled through secure signatures
                                        yes

                                                   Gio Wiederhold I3 85
E-Commerce in the Digital
                  Library
                      Steven Ketchpel & DL Economics Group


Payment                                      Delivery
CyberCash                                    Cryptolope
 DigiCash
                        Major                 DigiBox
First Virtual        Integration               HTTP
    SET                Problem                 E-mail




 Shopping Models: Pay-per-view, Subscription,
 Session, Shareware, Auctions, Site License,
 Gift Certificate, Layaway, Pre-paid vouchers, … .

                                                 Gio Wiederhold I3 86
Shopping model: merchant-independent
        logic controlling flow of business model

Example shopping models:                                       State
Order, Pay, (Deliver 52 times)                              Information
(1 month; Order, Deliver) Pay
                                                           Event Handlers




                                                                                       Event Handlers
                                                    Bill
                          Event Handlers

                                                2                       1
                                                                         Order                               Merchant
         Customer
                                                                        Complete
                                                3
                                                                    4       Payment
                                           Start Transfer $
                                                                            Complete
                                                           Event Handlers
      Abstract API                                                                Proxy event handlers
 allows application to                                                                translate from
                                                    Payment/Delivery/
   interact with many                                Other Services                native applications
    different services                                                             to shopping model
  in a consistent way                                                               defined protocols
                                                                                                        Gio Wiederhold I3 87
TSIMMIS Status

• Mediator Specification Interpreter running on
  Ultrix, AIX, OSF.
• 9000 lines of C/C++ code
• 4000 C++ lines of Server/Client Support Libraries
• Integration of three disparate bibliographic
  sources
   – legacy system
   – flat BibTeX files
   – relational DB
   – wwWeb files

                                           Gio Wiederhold I3 88
Mediator Specification Interpreter
           Architecture
Result           Query

         Query Rewriter               Mediator
                                    Specification
                logical datamerge
                 program
  Cost-Based Optimizer
                 plan
    Datamerge Engine
   Queries to
                   Results
   Wrappers
                                        Gio Wiederhold I3 89
Environmental Restoration at
           INEL Undoing 50 years of messes
                        ….
                    MSL [Stanford]
       MQL [ISX]                                                        OQL [ODMG]

                                           OEM
                                  OEM
          OEM         QEM                                  QEM
                                        QEM                                            OEM
    QEM                                                      OEM
                  other                                                         QEM
                 mediators
                                   mediator                             CORBA
    wrapper                                                 QEM
                       QEM                    OEM
                                  OEM                                     wrapper
                                         QEM
                   wrapper
                                        wrapper
Many projects       ERIS
                                         IEDMS
many sources                                                              ISX - Stanford Univ.
                LOCKHEED MARTIN                Idaho National
                                               Engineering Laboratory
                                                                              Gio Wiederhold I3 90
CHAIMS - software composition
                       Domain expert


  IO module           Client workstation              IO module


                             C

Computation
  Services             b                              e
                           MEGA modules
                  a
                      T                    d
    Sites                    S c     U         T
              R
  Data
Resources


                                                   Gio Wiederhold I3 91
Mediation to Implement Feedback in
                       Training
       David Maluf, Priya Panchapagesan, Ted Linden


Another task of mediators, prior to integration



                       MIFT                 Abstraction




Abstraction to match levels of granularity

                                               Gio Wiederhold I3 92
Mediation Feedback:
                      Playback or Graph
     User
   Interface              Commanders      Training
               Trainees                                                 UI in
                                          Developers Analysts           Java
                             Observers
Application
      Layer
                                                                     Standards
                Objectives                                            in KQML

 Mediation
                 Tasks             Stanford
   Layers                                                        Mediators with
                                                                 rules in CLIPS

                                               I.D.A
                                                                    Wrappers
 Wrapped                                                            in C/C++
Simulation
Resources                       Janus            SimNet
                                                                Gio Wiederhold I3 93
MIFT                  .
     Result                   .


Analyses:
• Force ratio
• Losses
• Area gain

Exercise

Simulator
       Type




                Gio Wiederhold I3 94
Control Valve Sizing,
                        Future
               From Andrew Arnold: Civ. Eng. Qualification Exam




• Interpretation
  – Programmatic
• Analysis
  – Integrated
• Evaluation
  – Integrated
• Transformation
  – Automated

                                                     Gio Wiederhold I3 95
F-22 IWSDB Phase 6
User Interfaces    Integration Services   Wrappers Databases

                                                            PD
          Appli-       Change             Sy-                DS
          cation      Notification        base
Provi-    PRIDE                                      Index
                            Query Re-
sioner                     formulation    WAIS
                                          server
                     Match      Domain                  Suppliers
Engi-    IWSDB       maker       Model
neer
         client                           S
                             Domain
                                          Q
         GUI                 Matching
                                          L


                                                   Gio Wiederhold I3 96
Current state of DM Support
 past                        now                                        future
                                                            time
        organized support                          disjointed support




    Data integration
                              x17 @qbfera
                              ffga 67 .78 jjkl,a
                              nsnd nn 23.5a
                                                        Intuition +
                                             • Spreadsheets
                                             • Planning of allocations
         Databases
                                             • Other simulations
distributed, heterogeneous                   various point assessments

                                                                   Gio Wiederhold I3 97
Information Systems should also
            Project into the Future




past             now          future
                                          time

     Databases,
                                  Simulations,
 accessed via SQL or
                             accessed via SimQL and
  CORBA compliant
                               compliant wrappers
      wrappers

                    Msg
                  systems,
                  sensors
                                                 Gio Wiederhold I3 98
SimQL: Simulation Access Service
 Information Systems should also deal
                     with the Future




past          SQL             now               SimQL                future
                                                          time

Decision-making requires dealing with the future, as well the past
• Databases deal well with the past
• Sensors can provide current status
• Spreadsheets, simulations deal with the likely futures
Information systems should be able to combine all three



                                                         Gio Wiederhold I3 99
Stanford experiment, supported by DARPA & NIST
            Phase 1 Architectures
      Logistics
     Application                       Manufacturing
                                        Application

SimQL access                                      SimQL access
                 SimQL access
                                       SQL access
                   wrapper                            wrapper
  wrapper                    wrapper




                Weather                    Test     Engineering
Spreadsheets                               Data
                (short-, long-term)
                                                    Gio Wiederhold I3 100
Enabling Interoperation

Databases                              Simulations should
• serve clients via SQL by             • serve clients via SimQL by
   Sharing a Model (The Schema)
                                          Sharing a Model (research q.)
   A query language over the model        A query language over the model
the SQL interface enables              a SimQL interface will enable
• independence of                      • independence of
   application development                application development
   DBMS technology development            simulation technology develop’t
   reuse of infrastructure                reuse of infrastructure
Today                                  Objective
• most new systems use a               • build information systems
  DBMS for data storage                  combining DBMS, Simulations
   even with less performance,            even with less performance,
   inability to handle all problems,      inability to handle all problems,
   but enough of them well enough.        but enough of them . . .
                                                            Gio Wiederhold I3 101
Internet requirements

• Ubiquitous acess to simulations
               of a wide variety of types
• Rapid response to parameter changes
   – often High-Performance computation is
     needed
   – distributed simulations with synchronization
• Rapid Service Composition
   – High bandwidth among simulations
   – Acces to multiple services in parallel

                                            Gio Wiederhold I3 102
Even the present needs SimQL
                                                   point-in-time for
      last recorded observations                     situational
                                                     assessment




                                simple simulations
                                to extrapolate data

       past                                    now                             future
                                                                  time

                                       Is the delivery truck in X?
Not all data are current::         • Is the right stuff on the truck?
                                         • Will the crew be at X?
                           • Will the forces be ready to accept delivery?


                                                                            Gio Wiederhold I3 103
Use of Simulation Results




Simulation results can be composed for
    Alternative Courses-of-actions
Composition should be seamless, elegant,
 with computation and recomputation of
 likelihoods
Results change as now moves forwards and
 eliminates earlier alternatives.

                                  Gio Wiederhold I3 104
Types of simulation services
1. Continously executing: weather prediction
    – SimQL result reports best match samples
2. Execution specific to query: what-if assessment
    – may require HPC power for adequate response
3. Past simulations collect results in a base: materials
    – performs inter- or extra-polations to match query parameters
4. Combinations, i.e., 2. + 3.: top layer simulation using stored
   partial lower level results: weapon performance in new setting
5. Human-in-the-loop (mediated by an agent program): SAFs
Note
• A simulation service program can be written in any language
• A simulation service must be compliant to the interface spec.

                                                    Gio Wiederhold I3 105
Tools for Managing Partitioning

Separate internals and interfaces, at many levels
• Object Libraries
• Product Design hierarchical standards (PDES)
• Domain-Specific Systems Analysis (DSSA)
• Ontology documentation (Ontolingua)
• Remote Object Access (CORBA 1.2, 2.0)
• Knowledge Interchange Formalism (KIF)
• Transport in / of heterogeneous situations
      (KQML specifies content repr., ontology)


                                          Gio Wiederhold I3 106
Moving to a Service Paradigm

• Server is an independent contractor, defines service
• Client selects service, and specifies parameters
• Server’s success depends on value provided
•       Some form of payment received for services

         x,y




 Databases are a current example.
 Simulations have the same potential.
                                            Gio Wiederhold I3 107
New Role for Consultants

          Old
          • Used at Design Time
            and
          • To Explain Failures

          Future
          • Available as a Service
          • Responsible for
            Knowledge Maintenance

                            Gio Wiederhold I3 108
Long Range Science Vision

                          Systems           Artificial
     Databases                            Intelligence
                         Engineering
       access                           knowledge mgmt
                           analysis
      storage                           domain expertise
                        documentation
      algebras                            uncertainty
                           costing


                      Integration Methods

    GIS
                          Integration
Spatial is special.         Science
                                              Gio Wiederhold I3 109
Summary
• Mediation bridges Applications and Sources
• Mediator technology transforms data to information
  by applying an expert maintainer’s knowledge
• Abstraction reduces data further for decision making
• Must be integrated with sensors, simulation results
• Mediation permits incremental system growth (nlogn)
• Mediators provide a service-model on the networks
New research
   Recognition and resolution of semantic differences
   Simulation access as a new service
more on http://www-db.stanford.edu/people/gio.html
                                           Gio Wiederhold I3 110

More Related Content

What's hot

PRESERVATION Web archiving
PRESERVATION  Web archivingPRESERVATION  Web archiving
PRESERVATION Web archivingEssam Obaid
 
Explicit vs. latent concept models for cross language information retrieval
Explicit vs. latent concept models for cross language information retrievalExplicit vs. latent concept models for cross language information retrieval
Explicit vs. latent concept models for cross language information retrievalNitish Aggarwal
 
Information Explosion - Erik Moller
Information Explosion - Erik MollerInformation Explosion - Erik Moller
Information Explosion - Erik MollerHPDutchWorld
 
Media file formats for broadcasters
Media file formats for broadcastersMedia file formats for broadcasters
Media file formats for broadcastersvrt-medialab
 
Host Identification and Location Decoupling a Comparison of Approaches
Host Identification and Location Decoupling a Comparison of ApproachesHost Identification and Location Decoupling a Comparison of Approaches
Host Identification and Location Decoupling a Comparison of ApproachesAntonio Marcos Alberti
 
Enabling High Level Application Development In The Internet Of Things
Enabling High Level Application Development In The Internet Of ThingsEnabling High Level Application Development In The Internet Of Things
Enabling High Level Application Development In The Internet Of ThingsPankesh Patel
 
Gtb Product Technical Present
Gtb Product Technical PresentGtb Product Technical Present
Gtb Product Technical Presentgtbsalesindia
 

What's hot (9)

Forrester
ForresterForrester
Forrester
 
PRESERVATION Web archiving
PRESERVATION  Web archivingPRESERVATION  Web archiving
PRESERVATION Web archiving
 
Explicit vs. latent concept models for cross language information retrieval
Explicit vs. latent concept models for cross language information retrievalExplicit vs. latent concept models for cross language information retrieval
Explicit vs. latent concept models for cross language information retrieval
 
163 166
163 166163 166
163 166
 
Information Explosion - Erik Moller
Information Explosion - Erik MollerInformation Explosion - Erik Moller
Information Explosion - Erik Moller
 
Media file formats for broadcasters
Media file formats for broadcastersMedia file formats for broadcasters
Media file formats for broadcasters
 
Host Identification and Location Decoupling a Comparison of Approaches
Host Identification and Location Decoupling a Comparison of ApproachesHost Identification and Location Decoupling a Comparison of Approaches
Host Identification and Location Decoupling a Comparison of Approaches
 
Enabling High Level Application Development In The Internet Of Things
Enabling High Level Application Development In The Internet Of ThingsEnabling High Level Application Development In The Internet Of Things
Enabling High Level Application Development In The Internet Of Things
 
Gtb Product Technical Present
Gtb Product Technical PresentGtb Product Technical Present
Gtb Product Technical Present
 

Viewers also liked

111 the purpose of creation
111 the purpose of creation111 the purpose of creation
111 the purpose of creationTimothy Henning
 
Hotărâre nr. 1860 din 21 decembrie 2006
Hotărâre nr. 1860 din 21 decembrie 2006Hotărâre nr. 1860 din 21 decembrie 2006
Hotărâre nr. 1860 din 21 decembrie 2006sconcs
 
Mta public sl meeting
Mta public sl meetingMta public sl meeting
Mta public sl meetingDavid Foster
 
Connactor Kennislunch thema mobility
Connactor Kennislunch thema mobilityConnactor Kennislunch thema mobility
Connactor Kennislunch thema mobilityFrankwin Mussche
 
Contractul colectiv de_munca_2011-2014
Contractul colectiv de_munca_2011-2014Contractul colectiv de_munca_2011-2014
Contractul colectiv de_munca_2011-2014contafinanciara
 

Viewers also liked (9)

Cs207 2
Cs207 2Cs207 2
Cs207 2
 
111 the purpose of creation
111 the purpose of creation111 the purpose of creation
111 the purpose of creation
 
Hotărâre nr. 1860 din 21 decembrie 2006
Hotărâre nr. 1860 din 21 decembrie 2006Hotărâre nr. 1860 din 21 decembrie 2006
Hotărâre nr. 1860 din 21 decembrie 2006
 
Cs207 9
Cs207 9Cs207 9
Cs207 9
 
Cs207 9
Cs207 9Cs207 9
Cs207 9
 
Mta public sl meeting
Mta public sl meetingMta public sl meeting
Mta public sl meeting
 
Cs207 4
Cs207 4Cs207 4
Cs207 4
 
Connactor Kennislunch thema mobility
Connactor Kennislunch thema mobilityConnactor Kennislunch thema mobility
Connactor Kennislunch thema mobility
 
Contractul colectiv de_munca_2011-2014
Contractul colectiv de_munca_2011-2014Contractul colectiv de_munca_2011-2014
Contractul colectiv de_munca_2011-2014
 

Similar to I3master

Cloud Computing overview and case study
Cloud Computing overview and case studyCloud Computing overview and case study
Cloud Computing overview and case studyBabak Hosseinzadeh
 
Where finance and it meet
Where finance and it meetWhere finance and it meet
Where finance and it meetEddy Vaassen
 
Scenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiScenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiFondazione CUOA
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Edward Curry
 
Hadoop World 2011: Security Considerations for Hadoop Deployments - Jeremy Gl...
Hadoop World 2011: Security Considerations for Hadoop Deployments - Jeremy Gl...Hadoop World 2011: Security Considerations for Hadoop Deployments - Jeremy Gl...
Hadoop World 2011: Security Considerations for Hadoop Deployments - Jeremy Gl...Cloudera, Inc.
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source BriefDataTactics
 
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...BIOVIA
 
Application development for the internet of things
Application development for the internet of thingsApplication development for the internet of things
Application development for the internet of thingsPankesh Patel
 
Intel Social Computing & Sustainability Issues
Intel Social Computing & Sustainability IssuesIntel Social Computing & Sustainability Issues
Intel Social Computing & Sustainability IssuesUmair Mohsin
 
Cisco data analytics in ioe_rajiv niles_2015 nov
Cisco data analytics in ioe_rajiv niles_2015 novCisco data analytics in ioe_rajiv niles_2015 nov
Cisco data analytics in ioe_rajiv niles_2015 novCiscoKorea
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntelAPAC
 
Information governance in the Facebook Era
Information governance in the Facebook EraInformation governance in the Facebook Era
Information governance in the Facebook EraJohn Mancini
 
End user computing feri sulianta
End user computing   feri suliantaEnd user computing   feri sulianta
End user computing feri suliantaferisulianta.com
 
Agile BI : meeting the best of both worlds from departmental and enterprise BI
Agile BI : meeting the best of both worlds from departmental and enterprise BIAgile BI : meeting the best of both worlds from departmental and enterprise BI
Agile BI : meeting the best of both worlds from departmental and enterprise BIJean-Michel Franco
 
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldKim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldBigDataViz
 

Similar to I3master (20)

Cloud Computing overview and case study
Cloud Computing overview and case studyCloud Computing overview and case study
Cloud Computing overview and case study
 
Where finance and it meet
Where finance and it meetWhere finance and it meet
Where finance and it meet
 
Scenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiScenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativi
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
 
Curated Computing
Curated Computing Curated Computing
Curated Computing
 
Hadoop World 2011: Security Considerations for Hadoop Deployments - Jeremy Gl...
Hadoop World 2011: Security Considerations for Hadoop Deployments - Jeremy Gl...Hadoop World 2011: Security Considerations for Hadoop Deployments - Jeremy Gl...
Hadoop World 2011: Security Considerations for Hadoop Deployments - Jeremy Gl...
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source Brief
 
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
 
Application development for the internet of things
Application development for the internet of thingsApplication development for the internet of things
Application development for the internet of things
 
Intel Social Computing & Sustainability Issues
Intel Social Computing & Sustainability IssuesIntel Social Computing & Sustainability Issues
Intel Social Computing & Sustainability Issues
 
XEN App
XEN AppXEN App
XEN App
 
Cisco data analytics in ioe_rajiv niles_2015 nov
Cisco data analytics in ioe_rajiv niles_2015 novCisco data analytics in ioe_rajiv niles_2015 nov
Cisco data analytics in ioe_rajiv niles_2015 nov
 
Big Data a big deal?
Big Data a big deal?Big Data a big deal?
Big Data a big deal?
 
International IT Deployment
International IT DeploymentInternational IT Deployment
International IT Deployment
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
Smarter Computing Big Data
Smarter Computing Big DataSmarter Computing Big Data
Smarter Computing Big Data
 
Information governance in the Facebook Era
Information governance in the Facebook EraInformation governance in the Facebook Era
Information governance in the Facebook Era
 
End user computing feri sulianta
End user computing   feri suliantaEnd user computing   feri sulianta
End user computing feri sulianta
 
Agile BI : meeting the best of both worlds from departmental and enterprise BI
Agile BI : meeting the best of both worlds from departmental and enterprise BIAgile BI : meeting the best of both worlds from departmental and enterprise BI
Agile BI : meeting the best of both worlds from departmental and enterprise BI
 
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldKim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our World
 

More from Gio Wiederhold (17)

Software economics+ssitc13 tutorial
Software economics+ssitc13 tutorialSoftware economics+ssitc13 tutorial
Software economics+ssitc13 tutorial
 
Software economics+ssitc13 tutorial
Software economics+ssitc13 tutorialSoftware economics+ssitc13 tutorial
Software economics+ssitc13 tutorial
 
Cs207 6
Cs207 6Cs207 6
Cs207 6
 
Cs207 7
Cs207 7Cs207 7
Cs207 7
 
Cs207 8
Cs207 8Cs207 8
Cs207 8
 
Cs207 3
Cs207 3Cs207 3
Cs207 3
 
Cs207 1
Cs207 1Cs207 1
Cs207 1
 
Quantifying the future
Quantifying the futureQuantifying the future
Quantifying the future
 
Quantifying thefuture
Quantifying thefutureQuantifying thefuture
Quantifying thefuture
 
Cs207 8
Cs207 8Cs207 8
Cs207 8
 
Cs207 7
Cs207 7Cs207 7
Cs207 7
 
Cs207 6
Cs207 6Cs207 6
Cs207 6
 
Cs207 5
Cs207 5Cs207 5
Cs207 5
 
Cs207 4
Cs207 4Cs207 4
Cs207 4
 
Cs207 3
Cs207 3Cs207 3
Cs207 3
 
Cs207 2
Cs207 2Cs207 2
Cs207 2
 
Cs207 1
Cs207 1Cs207 1
Cs207 1
 

I3master

  • 1. I3 Master Integration of Information from Heterogeneous Sources October 2001 Gio Wiederhold Stanford University Gio Wiederhold I3 1
  • 2. Change is constant Changes are imposed by • Technology advance • Local government • Federal rules • Competition • Emerging standards Systems must be designed and operated to recognize and adapt to change Gio Wiederhold I3 2
  • 3. Information Leverage Tactical Strategic • Customers • Planning • Inventory • Capabilities • Suppliers • Opportunities a variety of external and internal sources imprecise sources Gio Wiederhold I3 3
  • 4. Information overload Data starvation • More databases – public & corporate • Faster communication – digital – packeting: TCP-IP, ATM • World-wide connectivity – internet – world-wide web • Disintermediation – ubiquitous publishing Gio Wiederhold I3 4
  • 5. Focus on Information Systems Computing Systems Processing Information Real-time as Systems control of Analyses (on-line and processes, Payroll, . . . . distributed, factories, . . . ... ) Gio Wiederhold I3 5
  • 6. Data and Knowledge Knowledge Loop Data Loop Information is Storage created at the Education confluence of data -- the state Selection Recording & Integration knowledge -- the ability to Experience Abstraction select and State changes project the Decision-making state into Action the future Gio Wiederhold I3 6
  • 7. Knowledge Manifestations • Procedural • Creators • system analysts • faster • programmers }-{ • Maintainers • Declarative • easier • domain analysts • knowledge engineers • rule writers Gio Wiederhold I3 7
  • 8. Transform Data to Information Application decision-makers at workstations Layer Mediation Layer value-added services Foundation Layer data and simulation resources Gio Wiederhold I3 8
  • 9. Dealing With Heterogeneity • Hardware platform . . . . . Hidden by operating system • Operating system . . . . . . Choices are reducing: NT, UNIX, ... Fewer choices • Programming language . . . Irrelevant in remote access • Database system model . . Relational and E-R common • Database system . . . . . . . Standards, convergence • Coverage . . . . . . . . . . . . . – Attributes Source dependent – Scope documented, additive undocumented, intersecting • Data representation . . . . . • Data semantics . . . . . . . . . Conversion problems, nulls Requires knowledge Gio Wiederhold I3 9
  • 10. Definition* A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications. It should be small and simple, so that it can be maintained by one expert or, at most, a small and coherent group of experts. * Wiederhold: IEEE Computer March 1992 Gio Wiederhold I3 10
  • 11. Flow in mediation • DELIVERY t s • SUMMARIZATION t s • INTEGRATION t s • ABSTRACTION t s • ACCESS Gio Wiederhold I3 11
  • 12. Functions inside Mediation Summarize articulation Transform Hetero- genous Selection resources Gio Wiederhold I3 12
  • 13. Example in Health Care Health Care Planner Will the Clinic loose Money? Patient Investment Care domain domain Age Profile Service Operations Bond Sales Patient Volume Growth Loan Interest State Support Gio Wiederhold. 1995 Gio Wiederhold I3 13
  • 14. Functional Layer Human-computer User interface Interaction Application- Service specific code interface Domain- MEDIATION specific Resource access code interface Source- specific code Real-world interface Gio Wiederhold I3 14
  • 15. Function of Mediation Apply Domain-specific Specialist Knowledge to add value • to locate data sources • to describe data for use • to convert for consistency • to abstract for insight / models • to extrapolate to new situations • to integrate from diverse sources • to re-abstract for presentation  INFORMATION Gio Wiederhold I3 15
  • 16. Architectures & Communication Presen- Printed terminl Mini- Work User tation reports comptr station Workst. Appli- Infor- Infor- Appli- Appli- Infor- cation mation mation cation cation mation Aggre- Compu- Compu- Compu- CORBA Aggre- gation tatio tation tation gation Access, I-O SQL for Select Object SQL, ... Select code A&S FTP Struct. for A&S Data Local Data File Server Distr. Source Storage Base Storage Storage Sources Function ‘mainframe’ smart file server client server mediated terminal Gio Wiederhold. 1995 Gio Wiederhold I3 16
  • 17. Current Methods • Access: WWW with MOSAIC – browsing, collection services: Harvest, ALIWEB, Fish • SQL with Views – one verb, one database, one datatype – predefined subsets • Grouping: Objects with Corba – predefined aggregation with methods • View-Objects – created via extension of relational algebra • Summarization – Tables from text documents; Exception search Gio Wiederhold I3 17
  • 18. Central Solutions do not Scale What works with 7 modules fails when we and one person have 100 and need in charge a committee Any changes in resources affects the central module Gio Wiederhold I3 18
  • 19. Evolution of mediation applications A2 A3 A4 A5 A1 A6 integrators a. I1 I2 mediators network b. c. M1 M2 d. e. D1 wrappers W2 W3 D6 W1 D4 D5 D2 D3 datasources Gio Wiederhold I3 19
  • 20. Domain-specific Mediation • User application – Workstations • Mediator – Expert-owned nodes • Data sources – Remote primary and byproduct services Gio Wiederhold I3 20
  • 21. Mediation for Quality User Model BEST= S= source reliability f(S,C,T) low cost C= confidence rapid response Assessments: reliable delivery T= S1=.8 S2=.9 S3=8 trustworthiness Estimates: C1= 5+_1 C2= 8+_1 C3= 10+_1 T1=100+_160 T2=70+_30 T3=50+_80 S1 S2 S3 Gio Wiederhold I3 21
  • 22. Allocation Flexibility User Interfaces Application C Application B Application I M2 Provider Provider of of medi- Mediator M M ator N Copy- if high HPC intensity of N interaction with M1 1. Application (M2) 2. Resources (N1,2) N 3. Processing (M1) 1 N Mediators are DB 2 DB only code DBS R P Q Databases Gio Wiederhold I3 22
  • 23. Features of Mediation • Domain-specific partitioning for C Creation and Maintenance B A D • Network-basing for easy Reconfiguration E • Caching to deal with Asynchronocity A1 • Replication for A1’ Performance Gio Wiederhold I3 23
  • 24. Allocation Flexibility User Interfaces Application C Application B Application I M2 Provider Provider of of medi- Mediator M M ator N Copy- if high HPC intensity of N interaction with M1 1. Application (M2) 2. Resources (N1,2) N 3. Processing (M1) 1 N Mediators are DB 2 DB only code DBS R P Q Databases Gio Wiederhold I3 24
  • 25. Central Solutions do not Scale What works with 7 modules fails when we and one person have 100 and need in charge a committee Changes in resources affect the intermediary modules Gio Wiederhold I3 25
  • 26. Integration at two levels Application • Informal, pragmatic • User-control Mediation • Formal service • Domain-Expert control Gio Wiederhold. 1995 Gio Wiederhold I3 26
  • 27. Status of Mediation Technology Today Future • Handcrafted • Generated from models • Expert consults with • Domain Expert programmer maintains models • Programmer codes the • Specification knowledge needed determines functions • Resource changes • Resource changes require advise, program trigger regeneration update Gio Wiederhold I3 27
  • 28. Facilitators Another Module Type in Facilitators Procure Linkages Information • search for suitable resources Systems • resolve terminological mappings • build system configurations • issue subqueries, as needed • combine results from subqueries perform these tasks dynamically without human intervention depend greatly on ontologies • can call on mediators for value added services Gio Wiederhold I3 28
  • 29. Facilitators and Mediators accessible ontology designed dynamic Gio Wiederhold I3 29
  • 30. Available Technology/Science User Models Domain Ontologies Geographic Models Agents Deductive Databases Spatial abstractions Object Bases Temporal Algebras Uncertainty algebras Constraint Management Circumscription Security Filters Active Databases Human Lang. Proc. Case-based Reasoning DB Views Wrappers Distributed Storage Systems Caching Database Models Knobots Simulation Access High Perf.Comm. Gio Wiederhold I3 30
  • 31. Status of Mediation Technology Today Future • Handcrafted • Generated from models • Expert consults with • Domain Expert programmer maintains models • Programmer codes the • Specification knowledge needed determines functions • Resource changes • Resource changes require advise, program trigger regeneration update Gio Wiederhold I3 31
  • 32. Coverage of Current I3 Efforts Good progress / active research / related work / poor coverage ] ) | ( Discovery Abstraction :-) :-[ :-( (web,schema for relevance searching) Maintenance to customer (rule technology?) Caching / :-| Facilitation History :-[ (auto linking) :-( Mediators :-| :-( for multiple domains Integration Security over sources for cooperation :-) :-( :-[ Wrapping (syntactical heterogeneity) Databases / Web / Text / Simulation :-( :-) :-) :-[ Gio Wiederhold I3 32
  • 33. Building Stovepipes Gio Wiederhold. 1995 Mismatched assump- Scaffolding tions Similar functions, different assign- Scaffolding ments to modules Gio Wiederhold I3 33
  • 34. Middleware CORBA (Common Object Request Broker) Many standards by many vendor groups – IBM SOM, DSOM • DOE (Distributed .Objects Everywhere) { – SunSoft • DOME Shared • EZ-bridge speci- – System Strategies inc. fication • ILU (InterLanguage Unification) Xerox • ISIS • KQML (Knowledge Query & Manipulation Lang.) • MQM (Message Queing Middleware) – IBM (for mainframe connections) • OLE (Object embedding and Linking) • OpenDOC (Apple) • PDES (Product Data Interchange using STEP) • TIB (Teknekron Information Bus) Gio Wiederhold I3 34
  • 35. New Tools From the ARPA-Sponsored Knowledge Sharing Effort • KQML: Knowledge Query & Manipulation Language More Verbs: Performatives Multi-source, Multi-mediator, Multi-content • KIF: Kowledge Interchange Formalism Exchange complex data, rules, . . . among Expert Systems and Subsystems • LOOM: Classification-based Expert System • Ontolingua: Repository for Domain Terminologies Gio Wiederhold I3 35
  • 36. KQML KNOWLEDGE QUERY & MANIPULATION LANGUAGE = Ontology = Representation } • Get, Hq97 • Put, • Infer, • Subscribe, • Advertise, speak KIF, objects, • ... tuples, equations Gio Wiederhold I3 36
  • 37. KQML APIs Several suppliers Multiple platforms FAT Fat and THIN versions Mainly to Internet (TCP/IP) thin Not (yet) shrinkwrapped, require interaction – Un.of Maryland, Baltimore County, with UNISYS – Stanford Design Projects ABSE [Gensererth et al.] – Crystalliz (Cambridge MA), transmits PDES, SQL on PC – BBN for planning, rapid assembly of joint task forces – ISX (Westlake Village, CA) Demonstration tools – Toronto Univ. Enterprise Integration Laboratory – EITech Servicemail (uses email to go across firewalls) Gio Wiederhold I3 37
  • 38. KIF -- Knowledge Interchange Transmits among Expert Systems • LOOM • Ontolingua • others ANSI X3T2 evaluation Compatible with Conceptual Graphs Used by KQML to describe choices Gio Wiederhold I3 38
  • 39. Two Design Phases 1. Resource Integration 2. Customer Focusing Co mmon M odel Gio Wiederhold I3 39
  • 40. Mediator Design Principle Transform Data into Information Match Customer Model Hierarchical to Resource Model General network (and maintain models) Gio Wiederhold I3 40
  • 41. Fat versus thin mediators • too thin: insufficient added value • Too fat: hard to compose Just right service • Too narrow: few costumers scope • too broad: hard to maintain, needs a committee domain scope Gio Wiederhold I3 41
  • 42. Heterogeneity among Domains If interoperation involves distinct domains mismatch ensues • Autonomy conflicts with consistency, – Local Needs have Priority, – Outside uses are a Byproduct Heterogeneity must be addressed • Platform and Operating Systems 4 4 • Representation and Access Conventions 4 • Naming and Ontology : Gio Wiederhold I3 42
  • 43. Unsolved problem in Interoperation Common assumption in assembling and integrating distributed information resources • The language used by the resources is the same • Sublanguages used by the resources are subsets of a globally consistent language This assumption is provably false. Working towards the goal of global consistency is 1. naïve -- the goal cannot be achieved 2. inefficient -- languages are efficient in local contexts Gio Wiederhold I3 43
  • 44. Ontology: components . We represent the contents and structure of a languages by its ontology: • a set of well-defined terms, which delimit the domain of discourse • relationships among those terms, chosen from a limited set a formalizable subset of expert knowledge Gio Wiederhold I3 44
  • 45. SKC’s grounded definition . • Ontology: a set of terms and their relationships • Term: a reference to real-world and abstract objects • Relationship: a named and typed set of links between objects • Reference: a label that names objects • Real-world object: an entity instance with a physical manifestation • Abstract object: a concept which refers to other objects Gio Wiederhold I3 45
  • 46. Where are Ontologies found? Ontologies allow communication among partners in enterprises (rarely in machine-readable form) Relationships determine meaning - parent, school, company Variable and Class names in Software Databases use ontologies during design in their E-R diagrams (implicitly) and to represent the leaf nodes in their schemas. Knowledge-bases use term ontologies (often explicitely), add class definition (to hold instances), constraints, and operations among the terms. Gio Wiederhold I3 46
  • 47. Establishing Ontologies Top-down: –Commonly acceptable UPPER layers Domain-specific –Analysis and Sharing tools –Model and Object-type based Bottom-up –Wordlist creation from task-specific collections –Database models, schemas, and contents Gio Wiederhold I3 47
  • 48. Large Ontologies: good or bad?  Have all the Knowledge together + simple for customers of KBs – hard for owners of KBs, must synchronize with many others – in the limit -- everybody must be globally consistent  Large KB will cover multiple / all domains  created by a committee -- slow  maintained by a committee -- costly  Differences in level of abstraction -- efficiency  homeowner: nail  carpenter: sinker, brad, boxnail, . . . Gio Wiederhold I3 48
  • 49. Domain ontology assumption . • a domain will contain known objects • the object configuration is consistent • within a domain all terms are consistent & • relationships among objects are consistent Ontology Domain No committee is needed to forge compromises * within a domain • context is implicit in use  Compromises hide valuable details • explicit context is needed for external use Gio Wiederhold I3 49
  • 50. SKC Objective Provide for Maintainable Ontologies • devolve maintenance onto many domain-specific experts / authorities SKC • provide an algebra to compute composed ontologies that are limited to their articulation terms • enable interpretation within the source contexts Gio Wiederhold I3 50
  • 51. Conservative assumption ! When dealing with multiple ontologies one can never be sure that identically or similarly spelled words mean the same thing, I.e, refer to exactly the same set of real-world objects under all current and future conditions • Common, optimistic assumption: Meaning is identical – Gets worse when terms are stemmed • SKC, conservative or pessimistic assumption: Meaning never matches, unless there is a match rule – number of matching rules is reduced by focusing on the articulation Gio Wiederhold I3 51
  • 52. An Ontology Algebra A knowledge-based algebra for ontologies Intersection create a subset ontology keep sharable entries Union create a joint ontology merge entries Difference create a distinct ontology remove shared entries The Articulation Ontology (AO) consists of matching rules that link domain ontologies Gio Wiederhold I3 52
  • 53. Sample Operation: INTERSECTION Result contains Terms useful shared terms for purchasing Source Domain 1: Source Domain 2: Owned and maintained Owned and maintained by Store by Factory Gio Wiederhold I3 53
  • 54. INTERSECTION support Articulation ontology Terms useful for purchasing Matching rules that use terms from the 2 source domains Store Factory Ontology Ontology Gio Wiederhold I3 54
  • 55. Sample Intersections Articulation size = size ontology color =table(colcode) matching rules : style = style Ana- tomy Shoe Factory • Material inventory {...} {. . . } Shoe Store • Employees { . . . } • Shoes { . . . } • Machinery { . . . } Hard- • Customers { . . . } • Processes { . . . } ware • Employees { . . . } • Shoes { . . . } foot = foot Employees Employees Nail (toe, foot) Department Nail (fastener) ... Store ... Gio Wiederhold I3 55
  • 56. Other Basic Operations UNION: merging DIFFERENCE: material entire ontologies fully under local control Arti- culation ontology typically prior intersections Gio Wiederhold I3 56
  • 57. Features of an algebra Operations can be composed Operations can be rearranged Alternate arrangements can be evaluated Optimization is enabled The record of past operations can be kept and reused Gio Wiederhold I3 57
  • 58. Knowledge Composition Composed knowledge for Articulation Legend: applications using A,B,C,E knowledge U : union for U U (A B) U U : intersection (B C) U U Articulation knowledge U (C E) for (C E) Knowledge Articulation resource knowledge E U for (A B) U Knowledge U (B C) resource (C D) C Knowledge Knowledge Knowledge resource resource resource A B D Gio Wiederhold I3 58
  • 59. Sample Processing in HPKB • What is the most recent year – Problems resolved by SKC an OPEC member nation was * Factbook has out of date on the UN security council? OPEC & UN SC lists – Related to DARPA HPKB • Indonesia not listed Challenge Problem • Gabon (left OPEC – SKC resolves 3 Sources 1994) » CIA Factbook ‘96 * different country names (nation) • Gambia => The » OPEC (members, dates) Gambia » UN (SC members, years) * historical country names – SKC obtains the • Yugoslavia Correct Answer » UN lists future security » 1996 (Indonesia) council members – Other groups obtained • Gabon 1999 more, » intent of original question but factually wrong • Temporal variants answers Gio Wiederhold I3 59
  • 60. Tools to create articulations Graph matcher for Articulation- creating Expert Vehicle Transport ontology ontology Suggestions for articulations Gio Wiederhold I3 60
  • 61. continue from initial point Also suggest similar terms for further articulation: • by spelling similarity, • by graph position • by term match repository Expert response: 1. Okay 2. False 3. Irrelevant to this articulation All results are recorded Okay’s are converted into articulation rules Gio Wiederhold I3 61
  • 62. Candidate Match Repository Term linkages automatically extracted from 1912 Webster’s dictionary * * free, other sources have been processed. . Based on processing headwords  definitions Notice presence using algebra primitives of 2 domains: chemistry, transport Gio Wiederhold I3 62
  • 63. Using the match repository Gio Wiederhold I3 63
  • 64. Navigating the match repository Gio Wiederhold I3 64
  • 65. Primitive Operations Model and Instance Unary Constructors • create object • Summarize -- structure up • create set • Glossarize - list terms Connectors • Filter - reduce instances • match object • Extract - circumscription • match set Binary Editors • insert value • Match - data corrobaration • edit value • Difference - distance • move value measure • delete value • Intersect - schem Converters discovery • object - value • Blend - schema extension • object indirection • reference indirection Gio Wiederhold I3 65
  • 66. Future: exploiting the result Avoid n2 problem of interpreter Result has links mapping as stated by Swartout to source as an issue in HPKB year 1 Processing & query evaluation is best performed within Source Domains & by their engines Gio Wiederhold I3 66
  • 67. SKC Synopsis • Research: Reliable query answers from heterogeneous, imperfect data sources • Sources: – General: CIA World Factbook ‘96, UN www, OPEC www Webster’s Dictionary, Thesaurus, Oxford English Dictionary – Topical: OPEC, BattleSpace Sensors, Logistics Servers • Client: DARPA High Performance Knowledge Base (HPKB) project • Theory: Rule-based algebra – Translation & Composition primitives Gio Wiederhold I3 67
  • 68. Innovation in SKC • No need to harmonize full ontologies • Focus on what is critical for interoperation • Rules specific for articulation • Potentially many sets of articulation rules • Maintenance is distributed –to n sources –to m articulation agents is m < n2 , depending on architecture density a research question Gio Wiederhold I3 68
  • 69. Mega-programming Process mega- program- Mega-program mer Text customer Feedback CHAIMS Module / platform compiler descriptions Wrapper / API Modules to be / API Wrapper Mega- Result composed program GUI Modules to be API Wrapper / composed Module to be composed Gio Wiederhold I3 69
  • 70. Decomposing CALL statements CHAIMS progress decomposes in Copying scale of CALL functions computing Code sharing Parameterized computation Objects with overloaded method names Remote procedure calls to distributed modules Constrained (black box) access to encapsulated data Set Up Estimate Invoke Inspect Extract Gio Wiederhold I3 70
  • 71. Maintenance is good for you 13 12 ? 11 100% 10 years depreciation = 1 / lifetime 90 9 80 8 maintenance cost 70 7 60 relative annual 6 50 5 40 4 lifetime 30 3 20 2 10 1 0 automobile software hardware Gio Wiederhold I3 71
  • 72. Growing Systems: n modules Federated: to deal with many servers and clients resource reuse changes are difficult affect many clients Gio Wiederhold I3 72
  • 73. Systems with Mediators Gio Wiederhold. 1995 Applications . . . . Mediators . . . . . . Data Resources . . . Gio Wiederhold I3 73
  • 74. Growth through Reuse Gio Wiederhold. 1995 New Application Prior & Revised Mediators Extended Data Resources Gio Wiederhold I3 74
  • 75. Linear O(n) Cost of Growth now O(n2) • Data changes only affect some 7 2 mediators; only in their domain • Mediators can 1. supply old information to n-1 prior applications 2. provide better information to the new application 3. be partially or completely reused • New applications, using the new data, can be developed and inserted dynamically Gio Wiederhold I3 75
  • 76. A mediator Is not just static software Application Interface Changes of user needs Software & People Domain Owner/ Creator changes Models, programs, Maintainer rules, caches, . . . Lessor - Seller Advertisor Resource Interfaces Resource changes Gio Wiederhold I3 76
  • 77. Assigning maintenance responsibility a. Source data quality – supplier database, files, or web pages b. Interface to the source – Sources wrapper, supplier or vendor for supplier c. Source selection – expert specialist in mediator d. Source quality assessment – customer input to mediator Services e. Semantic interoperation – specialist group providing input to the mediator f. Consistency and metadata information – mediator service operation or warehouse g. Informal, pragmatic integration – client services with customer input Customers h. User presentation formats – client services with customer input Gio Wiederhold I3 77
  • 78. Sample projects • Tsimmis at Stanford • E-Commerce in Digital Libraries • INEEL: information integration for environmental restoration • MIFT: feedback for training • Civil Engineering and Architecture • F-22 • SimQL • Security Gio Wiederhold I3 78
  • 79. Projects at Stanford DB group Data Mining. Mediator & Wrapper Generation. Warehousing. MIDAS Security Mediators. WHIPS Megaprogramming. TSIMMIS Simulation Access. TIHI Changes, Consistency, and Configurations. C3 CHAIMS SimQL Gio Wiederhold I3 79
  • 80. The TSIMMIS Project Ramana Yerneni, Yannis Papakonstantinou, ... • Objective: Support mediation technology –integrated access to distributed, autonomous, heterogeneous data sources, using object fusion –wrapper toolkit to rapidly create wrappers, based on source specification, a uniform interface to heterogeneous sources –mediator toolkit to rapidly construct mediators, based on a mediator specification, to integrate data from a set of wrappers Gio Wiederhold I3 80
  • 81. Investors Need to Fuse Information from Multiple Sources . Network Ticker Tape Personal database WWW • group together information about the same real-world entity • remove redundancies • resolve conflicts Gio Wiederhold I3 81
  • 82. An Integration Architecture Client Application portfolios for each company Mediator stock market prices business reports Wrapper Wrapper Ticker Tape Dialog Gio Wiederhold I3 82
  • 83. Additional Challenge: Sources Without a Well-Structured Schema • semistructured Examples – irregular • World Wide Web – deeply nested • SGML documents • incomplete • genome, chemical schema knowledge structures – autonomous • bibliographic – dynamic information • files Gio Wiederhold I3 83
  • 84. Wrappers & Mediators from High-Level Specifications DeclarativeMediator Client Specification Mediator Specification Mediator Interpreter Wrapper Specification Wrapper Interpreter Wrapper Declarative Source Source Source Specifications Gio Wiederhold I3 84
  • 85. E-money Services must be paid for • Incentive for creation and improvement • price proportional to value added, often small • profit f (cost, market, price, overhead ) • price low per item, so overhead must be low Simple payment (no credit accounts, checks) Enabled through secure signatures yes Gio Wiederhold I3 85
  • 86. E-Commerce in the Digital Library Steven Ketchpel & DL Economics Group Payment Delivery CyberCash Cryptolope DigiCash Major DigiBox First Virtual Integration HTTP SET Problem E-mail Shopping Models: Pay-per-view, Subscription, Session, Shareware, Auctions, Site License, Gift Certificate, Layaway, Pre-paid vouchers, … . Gio Wiederhold I3 86
  • 87. Shopping model: merchant-independent logic controlling flow of business model Example shopping models: State Order, Pay, (Deliver 52 times) Information (1 month; Order, Deliver) Pay Event Handlers Event Handlers Bill Event Handlers 2 1 Order Merchant Customer Complete 3 4 Payment Start Transfer $ Complete Event Handlers Abstract API Proxy event handlers allows application to translate from Payment/Delivery/ interact with many Other Services native applications different services to shopping model in a consistent way defined protocols Gio Wiederhold I3 87
  • 88. TSIMMIS Status • Mediator Specification Interpreter running on Ultrix, AIX, OSF. • 9000 lines of C/C++ code • 4000 C++ lines of Server/Client Support Libraries • Integration of three disparate bibliographic sources – legacy system – flat BibTeX files – relational DB – wwWeb files Gio Wiederhold I3 88
  • 89. Mediator Specification Interpreter Architecture Result Query Query Rewriter Mediator Specification logical datamerge program Cost-Based Optimizer plan Datamerge Engine Queries to Results Wrappers Gio Wiederhold I3 89
  • 90. Environmental Restoration at INEL Undoing 50 years of messes …. MSL [Stanford] MQL [ISX] OQL [ODMG] OEM OEM OEM QEM QEM QEM OEM QEM OEM other QEM mediators mediator CORBA wrapper QEM QEM OEM OEM wrapper QEM wrapper wrapper Many projects ERIS IEDMS many sources ISX - Stanford Univ. LOCKHEED MARTIN Idaho National Engineering Laboratory Gio Wiederhold I3 90
  • 91. CHAIMS - software composition Domain expert IO module Client workstation IO module C Computation Services b e MEGA modules a T d Sites S c U T R Data Resources Gio Wiederhold I3 91
  • 92. Mediation to Implement Feedback in Training David Maluf, Priya Panchapagesan, Ted Linden Another task of mediators, prior to integration MIFT Abstraction Abstraction to match levels of granularity Gio Wiederhold I3 92
  • 93. Mediation Feedback: Playback or Graph User Interface Commanders Training Trainees UI in Developers Analysts Java Observers Application Layer Standards Objectives in KQML Mediation Tasks Stanford Layers Mediators with rules in CLIPS I.D.A Wrappers Wrapped in C/C++ Simulation Resources Janus SimNet Gio Wiederhold I3 93
  • 94. MIFT . Result . Analyses: • Force ratio • Losses • Area gain Exercise Simulator Type Gio Wiederhold I3 94
  • 95. Control Valve Sizing, Future From Andrew Arnold: Civ. Eng. Qualification Exam • Interpretation – Programmatic • Analysis – Integrated • Evaluation – Integrated • Transformation – Automated Gio Wiederhold I3 95
  • 96. F-22 IWSDB Phase 6 User Interfaces Integration Services Wrappers Databases PD Appli- Change Sy- DS cation Notification base Provi- PRIDE Index Query Re- sioner formulation WAIS server Match Domain Suppliers Engi- IWSDB maker Model neer client S Domain Q GUI Matching L Gio Wiederhold I3 96
  • 97. Current state of DM Support past now future time organized support disjointed support Data integration x17 @qbfera ffga 67 .78 jjkl,a nsnd nn 23.5a Intuition + • Spreadsheets • Planning of allocations Databases • Other simulations distributed, heterogeneous various point assessments Gio Wiederhold I3 97
  • 98. Information Systems should also Project into the Future past now future time Databases, Simulations, accessed via SQL or accessed via SimQL and CORBA compliant compliant wrappers wrappers Msg systems, sensors Gio Wiederhold I3 98
  • 99. SimQL: Simulation Access Service Information Systems should also deal with the Future past SQL now SimQL future time Decision-making requires dealing with the future, as well the past • Databases deal well with the past • Sensors can provide current status • Spreadsheets, simulations deal with the likely futures Information systems should be able to combine all three Gio Wiederhold I3 99
  • 100. Stanford experiment, supported by DARPA & NIST Phase 1 Architectures Logistics Application Manufacturing Application SimQL access SimQL access SimQL access SQL access wrapper wrapper wrapper wrapper Weather Test Engineering Spreadsheets Data (short-, long-term) Gio Wiederhold I3 100
  • 101. Enabling Interoperation Databases Simulations should • serve clients via SQL by • serve clients via SimQL by Sharing a Model (The Schema) Sharing a Model (research q.) A query language over the model A query language over the model the SQL interface enables a SimQL interface will enable • independence of • independence of application development application development DBMS technology development simulation technology develop’t reuse of infrastructure reuse of infrastructure Today Objective • most new systems use a • build information systems DBMS for data storage combining DBMS, Simulations even with less performance, even with less performance, inability to handle all problems, inability to handle all problems, but enough of them well enough. but enough of them . . . Gio Wiederhold I3 101
  • 102. Internet requirements • Ubiquitous acess to simulations of a wide variety of types • Rapid response to parameter changes – often High-Performance computation is needed – distributed simulations with synchronization • Rapid Service Composition – High bandwidth among simulations – Acces to multiple services in parallel Gio Wiederhold I3 102
  • 103. Even the present needs SimQL point-in-time for last recorded observations situational assessment simple simulations to extrapolate data past now future time Is the delivery truck in X? Not all data are current:: • Is the right stuff on the truck? • Will the crew be at X? • Will the forces be ready to accept delivery? Gio Wiederhold I3 103
  • 104. Use of Simulation Results Simulation results can be composed for Alternative Courses-of-actions Composition should be seamless, elegant, with computation and recomputation of likelihoods Results change as now moves forwards and eliminates earlier alternatives. Gio Wiederhold I3 104
  • 105. Types of simulation services 1. Continously executing: weather prediction – SimQL result reports best match samples 2. Execution specific to query: what-if assessment – may require HPC power for adequate response 3. Past simulations collect results in a base: materials – performs inter- or extra-polations to match query parameters 4. Combinations, i.e., 2. + 3.: top layer simulation using stored partial lower level results: weapon performance in new setting 5. Human-in-the-loop (mediated by an agent program): SAFs Note • A simulation service program can be written in any language • A simulation service must be compliant to the interface spec. Gio Wiederhold I3 105
  • 106. Tools for Managing Partitioning Separate internals and interfaces, at many levels • Object Libraries • Product Design hierarchical standards (PDES) • Domain-Specific Systems Analysis (DSSA) • Ontology documentation (Ontolingua) • Remote Object Access (CORBA 1.2, 2.0) • Knowledge Interchange Formalism (KIF) • Transport in / of heterogeneous situations (KQML specifies content repr., ontology) Gio Wiederhold I3 106
  • 107. Moving to a Service Paradigm • Server is an independent contractor, defines service • Client selects service, and specifies parameters • Server’s success depends on value provided • Some form of payment received for services x,y Databases are a current example. Simulations have the same potential. Gio Wiederhold I3 107
  • 108. New Role for Consultants Old • Used at Design Time and • To Explain Failures Future • Available as a Service • Responsible for Knowledge Maintenance Gio Wiederhold I3 108
  • 109. Long Range Science Vision Systems Artificial Databases Intelligence Engineering access knowledge mgmt analysis storage domain expertise documentation algebras uncertainty costing Integration Methods GIS Integration Spatial is special. Science Gio Wiederhold I3 109
  • 110. Summary • Mediation bridges Applications and Sources • Mediator technology transforms data to information by applying an expert maintainer’s knowledge • Abstraction reduces data further for decision making • Must be integrated with sensors, simulation results • Mediation permits incremental system growth (nlogn) • Mediators provide a service-model on the networks New research Recognition and resolution of semantic differences Simulation access as a new service more on http://www-db.stanford.edu/people/gio.html Gio Wiederhold I3 110