SlideShare a Scribd company logo
1 of 142
Download to read offline
Exchanging more than Complete Data

                                                Marcelo Arenas
                                                     PUC Chile


                Joint work with Jorge P´rez (U. de Chile) and Juan Reutter (U. Edinburgh)
                                       e



M. Arenas   –    Exchanging more than Complete Data - RR2011                                1 / 68
Outline: First part




            ◮   The data exchange problem
                   ◮   Some fundamental results in relational data exchange


            ◮   The need for a more general data exchange framework
                   ◮   Two important scenarios: Incomplete databases (open-world
                       databases: RDF) and knowledge bases




M. Arenas   –   Exchanging more than Complete Data - RR2011                        2 / 68
Outline: First part




            ◮   The data exchange problem
                   ◮   Some fundamental results in relational data exchange


            ◮   The need for a more general data exchange framework
                   ◮   Two important scenarios: Incomplete databases (open-world
                       databases: RDF) and knowledge bases




M. Arenas   –   Exchanging more than Complete Data - RR2011                        3 / 68
The problem of data exchange


       Given: A source schema S, a target schema T and a specification
       Σ of the relationship between these schemas


       Data exchange: Problem of materializing an instance of T given
       an instance of S
            ◮   Target instance should reflect the source data as accurately as
                possible, given the constraints imposed by Σ and T
            ◮   It should be efficiently computable
            ◮   It should allow one to evaluate queries on the target in a way
                that is semantically consistent with the source data



M. Arenas   –   Exchanging more than Complete Data - RR2011                      4 / 68
Data exchange in a picture




                                                              Query Q

                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange in a picture




                                                              Query Q

                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange in a picture




                                                              Query Q

                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange in a picture




                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange in a picture




                                                              Query Q

                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange: Some fundamental questions



       What are the challenges in the area?
            ◮   What is a good language for specifying the relationship
                between source and target data?
                   ◮   Expressiveness versus complexity

            ◮   What is a good instance to materialize?
            ◮   What does it mean to answer a query over target data?
            ◮   How do we answer queries over target data? Can we do this
                efficiently?




M. Arenas   –   Exchanging more than Complete Data - RR2011                 6 / 68
Exchanging relational data

       The data exchange problem has been extensively studied in the
       relational world.
            ◮   It has also been commercially implemented: IBM Clio


       Relational data exchange setting:
            ◮   Source and target schemas: Relational schemas
            ◮   Relationship between source and target schemas:
                Source-to-target tuple-generating dependencies (st-tgds)


       Semantics of data exchange has been precisely defined.
            ◮   Efficient algorithms for materializing target instances and for
                answering queries over the target schema have been developed


M. Arenas   –   Exchanging more than Complete Data - RR2011                     7 / 68
Schema mapping: The key component in relational data
 exchange


       Schema mapping: M = (S, T, Σ)
            ◮   S and T are disjoint relational schemas
            ◮   Σ is a finite set of st-tgds:

                                      ∀¯∀¯ (ϕ(¯ , y ) → ∃¯ ψ(¯ , z ))
                                       x y    x ¯        z x ¯

                       ϕ(¯, y ): conjunction of relational atomic formulas over S
                         x ¯
                       ψ(¯, z ): conjunction of relational atomic formulas over T
                         x ¯




M. Arenas   –   Exchanging more than Complete Data - RR2011                         8 / 68
Relational schema mappings: An example

       Example
            ◮   S: Employee(name)

            ◮   T: Dept(name, number)

            ◮   Σ:

                               ∀x Employee(x) → ∃y Dept(x, y )




M. Arenas   –   Exchanging more than Complete Data - RR2011      9 / 68
Relational schema mappings: An example

       Example
            ◮   S: Employee(name)

            ◮   T: Dept(name, number)

            ◮   Σ:

                               ∀x Employee(x) → ∃y Dept(x, y )



       Note
       We omit universal quantifiers in st-tgds:
                                Employee(x) → ∃y Dept(x, y )

M. Arenas   –   Exchanging more than Complete Data - RR2011      9 / 68
Relational data exchange problem


       Fixed: M = (S, T, Σ)

       Problem: Given instance I of S, find an instance J of T such that
       (I , J) satisfies Σ
            ◮   (I , J) satisfies ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) if whenever I satisfies
                                   x ¯        z x ¯
                    a ¯
                ϕ(¯, b), there is a tuple c such that J satisfies ψ(¯, c )
                                            ¯                          a ¯




M. Arenas   –   Exchanging more than Complete Data - RR2011                        10 / 68
Relational data exchange problem


       Fixed: M = (S, T, Σ)

       Problem: Given instance I of S, find an instance J of T such that
       (I , J) satisfies Σ
            ◮   (I , J) satisfies ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) if whenever I satisfies
                                   x ¯        z x ¯
                    a ¯
                ϕ(¯, b), there is a tuple c such that J satisfies ψ(¯, c )
                                            ¯                          a ¯



       Notation
       J is a solution for I under M
            ◮   SolM (I ): Set of solutions for I under M



M. Arenas   –   Exchanging more than Complete Data - RR2011                        10 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:




M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}




M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}
                J2 : {Dept(Peter,1), Dept(Peter,2)}




M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}
                J2 : {Dept(Peter,1), Dept(Peter,2)}
                J3 : {Dept(Peter,1), Dept(John,1)}




M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}
                J2 : {Dept(Peter,1), Dept(Peter,2)}
                J3 : {Dept(Peter,1), Dept(John,1)}
                J4 : {Dept(Peter,n1 )}



M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}
                J2 : {Dept(Peter,1), Dept(Peter,2)}
                J3 : {Dept(Peter,1), Dept(John,1)}
                J4 : {Dept(Peter,n1 )}
                J5 : {Dept(Peter,n1 ), Dept(Peter,n2 )}

M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
Canonical universal solution



       Algorithm (Chase)
            Input         :    M = (S, T, Σ) and an instance I of S
            Output        :    Canonical universal solution J ⋆ for I under M

                 let J ⋆ := empty instance of T
                 for every ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) in Σ do
                                x ¯       z x ¯
                                  a ¯                         a ¯
                      for every ¯, b such that I satisfies ϕ(¯, b) do
                           create a fresh tuple n of pairwise distinct null values
                                                 ¯
                           insert ψ(¯, n) into J
                                    a ¯          ⋆




M. Arenas    –   Exchanging more than Complete Data - RR2011                         12 / 68
Canonical universal solution: Example

       Example
       Consider mapping M specified by dependency:
                                   Employee(x)           → ∃y Dept(x, y )

       Canonical universal solution for
       I = {Employee(Peter), Employee(John)}:
            ◮   For a = Peter do
                   ◮   Create a fresh null value n1
                   ◮   Insert Dept(Peter, n1 ) into J ⋆
            ◮   For a = John do
                   ◮   Create a fresh null value n2
                   ◮   Insert Dept(John, n2 ) into J ⋆

       Result: J ⋆ = {Dept(Peter, n1 ), Dept(John, n2 )}

M. Arenas   –   Exchanging more than Complete Data - RR2011                 13 / 68
Query answering in data exchange



       Given: Mapping M, source instance I and query Q over the target
       schema
            ◮   What does it mean to answer Q?




M. Arenas   –   Exchanging more than Complete Data - RR2011              14 / 68
Query answering in data exchange



       Given: Mapping M, source instance I and query Q over the target
       schema
            ◮   What does it mean to answer Q?



       Definition (Certain answers)
                     certainM (Q, I ) =                                               Q(J)
                                                      J is a solution for I under M




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  14 / 68
Certain answers: Example



       Example
       Consider mapping M specified by:
                                 Employee(x) → ∃y Dept(x, y )


       Given instance I = {Employee(Peter)}:

                            certainM (∃y Dept(x, y ), I )     =   {Peter}
                            certainM (Dept(x, y ), I )        =   ∅




M. Arenas   –   Exchanging more than Complete Data - RR2011                 15 / 68
Query rewriting: An approach for answering queries


       How can we compute certain answers?
            ◮   Na¨ algorithm does not work: infinitely many solutions
                  ıve




M. Arenas   –   Exchanging more than Complete Data - RR2011             16 / 68
Query rewriting: An approach for answering queries


       How can we compute certain answers?
            ◮   Na¨ algorithm does not work: infinitely many solutions
                  ıve


       Approach proposed in [FKMP03]: Query Rewriting
                Given a mapping M and a target query Q, compute a query
                Q ⋆ such that for every source instance I with canonical
                universal solution J ⋆ :

                                      certainM (Q, I ) = Q ⋆ (J ⋆ )




M. Arenas   –   Exchanging more than Complete Data - RR2011                16 / 68
Query rewriting over the canonical universal solution

       Theorem (FKMP03)
       Given a mapping M specified by st-tgds and a union of
       conjunctive queries Q, there exists a query Q ⋆ such that for every
       source instance I with canonical universal solution J ⋆ :

                                      certainM (Q, I ) = Q ⋆ (J ⋆ )




M. Arenas   –   Exchanging more than Complete Data - RR2011                  17 / 68
Query rewriting over the canonical universal solution

       Theorem (FKMP03)
       Given a mapping M specified by st-tgds and a union of
       conjunctive queries Q, there exists a query Q ⋆ such that for every
       source instance I with canonical universal solution J ⋆ :

                                      certainM (Q, I ) = Q ⋆ (J ⋆ )



       Proof idea: Assume that C(a) holds whenever a is a constant.

       Then:
                Q ⋆ (x1 , . . . , xm ) = C(x1 ) ∧ · · · ∧ C(xm ) ∧ Q(x1 , . . . , xm )



M. Arenas   –   Exchanging more than Complete Data - RR2011                              17 / 68
Computing certain answers: Complexity



       Data complexity: Data exchange setting and query are considered
       to be fixed.


       Corollary (FKMP03)
       For mappings given by st-tgds, certain answers for UCQ can be
       computed in polynomial time (data complexity)




M. Arenas   –   Exchanging more than Complete Data - RR2011              18 / 68
Relational data exchange: Some lessons learned

       Key steps in the development of the area:
            ◮   Definition of schema mappings: Precise syntax and semantics
                   ◮   Definition of the notion of solution

            ◮   Identification of good solutions
            ◮   Polynomial time algorithms for materializing good solutions
            ◮   Definition of target queries: Precise semantics
            ◮   Polynomial time algorithms for computing certain answers for
                UCQ




M. Arenas   –   Exchanging more than Complete Data - RR2011                    19 / 68
Relational data exchange: Some lessons learned

       Key steps in the development of the area:
            ◮   Definition of schema mappings: Precise syntax and semantics
                   ◮   Definition of the notion of solution

            ◮   Identification of good solutions
            ◮   Polynomial time algorithms for materializing good solutions
            ◮   Definition of target queries: Precise semantics
            ◮   Polynomial time algorithms for computing certain answers for
                UCQ


       Creating schema mappings is a time consuming and expensive
       process
            ◮   Manual or semi-automatic process in general

M. Arenas   –   Exchanging more than Complete Data - RR2011                    19 / 68
Outline: First part




            ◮   The data exchange problem
                   ◮   Some fundamental results in relational data exchange


            ◮   The need for a more general data exchange framework
                   ◮   Two important scenarios: Incomplete databases (open-world
                       databases: RDF) and knowledge bases




M. Arenas   –   Exchanging more than Complete Data - RR2011                        20 / 68
Ongoing project: Reusing schema mappings


                      S                                   T         U
                                       ΣST                    ΣTU



                                                        ΣSU




M. Arenas   –   Exchanging more than Complete Data - RR2011             21 / 68
Ongoing project: Reusing schema mappings


                      S                                   T         U
                                       ΣST                    ΣTU



                                                        ΣSU




M. Arenas   –   Exchanging more than Complete Data - RR2011             21 / 68
Ongoing project: Reusing schema mappings


                      S                                   T          U
                                       ΣST                     ΣTU



                                                       ΣSU ?




M. Arenas   –   Exchanging more than Complete Data - RR2011              21 / 68
Ongoing project: Reusing schema mappings


                      S                                   T          U
                                       ΣST                     ΣTU



                                                       ΣSU ?




       We need some operators for schema mappings




M. Arenas   –   Exchanging more than Complete Data - RR2011              21 / 68
Ongoing project: Reusing schema mappings


                      S                                   T            U
                                       ΣST                       ΣTU



                                               ΣSU = ΣST ◦ ΣTU




       We need some operators for schema mappings
            ◮   Composition in the above case




M. Arenas   –   Exchanging more than Complete Data - RR2011                21 / 68
Metadata management



       Contributions mentioned in the previous slides are just a first step
       towards the development of a general framework for data exchange.


       In fact, as pointed in [B03],
            many information system problems involve not only the design
            and integration of complex application artifacts, but also their
            subsequent manipulation.




M. Arenas   –   Exchanging more than Complete Data - RR2011                    22 / 68
Metadata management



       This has motivated the need for the development of a general
       infrastructure for managing schema mappings.

       The problem of managing schema mappings is called metadata
       management.

       High-level algebraic operators, such as compose, are used to
       manipulate mappings.
            ◮   What other operators are needed?




M. Arenas   –   Exchanging more than Complete Data - RR2011           23 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS =ΣVS ?
       Σ−1
        SV                       ΣSV
                                                          Σ−1 ◦ ΣST
                                                           VS           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                          VS



                            V




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS =ΣVS ?
       ΣSV
        −1
                                 ΣSV
                                                          Σ−1 ◦ ΣST
                                                           VS           (ΣVS ◦ ΣST ) ◦ ΣTU
                                                                          −1




                            V




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                             S                                 T                    U
                                              ΣST                      ΣTU
 ΣVS = Σ−1
        SV


            ΣVS ?                 ΣSV
                                                           Σ−1 ◦ ΣST
                                                            VS           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                           VS



                             V




M. Arenas    –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS = Σ−1
        SV                       ΣSV
                                                          Σ−1 ◦ ΣST
                                                           VS           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                          VS



                            V




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS = Σ−1
        SV                       ΣSV
                                                          Σ−1 ◦ ΣST
                                                           SV           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                          VS



                            V




       Composition and inverse operators have to be combined




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS = Σ−1
        SV                       ΣSV
                                                          Σ−1 ◦ ΣST
                                                           SV           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                          SV



                            V




       Composition and inverse operators have to be combined




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
Metadata management: A more general data exchange
 framework is needed

       Composition and inverse operators have been extensively studied in the
       relational world.
            ◮   Semantics, computation, . . .


       Combining these operators is an open issue.




M. Arenas   –   Exchanging more than Complete Data - RR2011                     25 / 68
Metadata management: A more general data exchange
 framework is needed

       Composition and inverse operators have been extensively studied in the
       relational world.
            ◮   Semantics, computation, . . .


       Combining these operators is an open issue.
            ◮   Key observation: A target instance of a mapping can be the source
                instance of another mapping




M. Arenas   –   Exchanging more than Complete Data - RR2011                         25 / 68
Metadata management: A more general data exchange
 framework is needed

       Composition and inverse operators have been extensively studied in the
       relational world.
            ◮   Semantics, computation, . . .


       Combining these operators is an open issue.
            ◮   Key observation: A target instance of a mapping can be the source
                instance of another mapping
            ◮   Sources instances may contain null values




M. Arenas   –   Exchanging more than Complete Data - RR2011                         25 / 68
Metadata management: A more general data exchange
 framework is needed

       Composition and inverse operators have been extensively studied in the
       relational world.
            ◮   Semantics, computation, . . .


       Combining these operators is an open issue.
            ◮   Key observation: A target instance of a mapping can be the source
                instance of another mapping
            ◮   Sources instances may contain null values


       There is a need for a data exchange framework that can handle databases
       with incomplete information.


M. Arenas   –   Exchanging more than Complete Data - RR2011                         25 / 68
Data exchange in the RDF world

       There is an increasing interest in publishing relational data as RDF
            ◮   Resulted in the creation of the W3C RDB2RDF Working Group


       The problem of translating relational data into RDF can be seen as a
       data exchange problem
            ◮   Schema mappings can be used to describe how the relational data is
                to be mapped into RDF




M. Arenas   –   Exchanging more than Complete Data - RR2011                          26 / 68
Data exchange in the RDF world

       There is an increasing interest in publishing relational data as RDF
            ◮   Resulted in the creation of the W3C RDB2RDF Working Group


       The problem of translating relational data into RDF can be seen as a
       data exchange problem
            ◮   Schema mappings can be used to describe how the relational data is
                to be mapped into RDF


       But there is a mismatch here: A relational database under a closed-world
       semantics is to be translated into an RDF graph under an open-world
       semantics
            ◮   There is a need for a data exchange framework that can handle
                both databases with complete and incomplete information


M. Arenas   –   Exchanging more than Complete Data - RR2011                          26 / 68
Data exchange in the RDF world


       An issue discussed at the W3C RDB2RDF Working Group: Is a
       mapping information preserving?
            ◮   In particular: For the default mapping defined by this group


       How can we address this issue?
            ◮   Metadata management can help us




M. Arenas   –   Exchanging more than Complete Data - RR2011                   27 / 68
Data exchange in the RDF world


       An issue discussed at the W3C RDB2RDF Working Group: Is a
       mapping information preserving?
            ◮   In particular: For the default mapping defined by this group


       How can we address this issue?
            ◮   Metadata management can help us


       Question to answer: Is a mapping invertible?




M. Arenas   –   Exchanging more than Complete Data - RR2011                   27 / 68
Data exchange in the RDF world


       An issue discussed at the W3C RDB2RDF Working Group: Is a
       mapping information preserving?
            ◮   In particular: For the default mapping defined by this group


       How can we address this issue?
            ◮   Metadata management can help us


       Question to answer: Is a mapping invertible?
            ◮   This time an RDF graph is to be translated into a relational
                database!




M. Arenas   –   Exchanging more than Complete Data - RR2011                    27 / 68
Data exchange in the RDF world


       An issue discussed at the W3C RDB2RDF Working Group: Is a
       mapping information preserving?
            ◮   In particular: For the default mapping defined by this group


       How can we address this issue?
            ◮   Metadata management can help us


       Question to answer: Is a mapping invertible?
            ◮   This time an RDF graph is to be translated into a relational
                database!
            ◮   We want to have a unifying framework for all these cases



M. Arenas   –   Exchanging more than Complete Data - RR2011                    27 / 68
But these are not the only reasons . . .

       Nowadays several applications use knowledge bases to represent data.
            ◮   A knowledge base has not only data but also rules that allows to
                infer new data
            ◮   In the Semantics Web: RDFS and OWL ontologies




M. Arenas   –   Exchanging more than Complete Data - RR2011                        28 / 68
But these are not the only reasons . . .

       Nowadays several applications use knowledge bases to represent data.
            ◮   A knowledge base has not only data but also rules that allows to
                infer new data
            ◮   In the Semantics Web: RDFS and OWL ontologies


       In a data exchange application over the Semantics Web:
                The input is a mapping and a source specification including data
                and rules, and the output is a target specification also including
                data and rules




M. Arenas   –   Exchanging more than Complete Data - RR2011                         28 / 68
But these are not the only reasons . . .

       Nowadays several applications use knowledge bases to represent data.
            ◮   A knowledge base has not only data but also rules that allows to
                infer new data
            ◮   In the Semantics Web: RDFS and OWL ontologies


       In a data exchange application over the Semantics Web:
                The input is a mapping and a source specification including data
                and rules, and the output is a target specification also including
                data and rules


       There is a need for a data exchange framework that can handle
       knowledge bases.


M. Arenas   –   Exchanging more than Complete Data - RR2011                         28 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example
       Assume given the following source knowledge base:

       Data:
                                   Father                          Mother
                               Andy    Bob                       Carrie Bob
                               Bob     Danny
                               Danny Eddie

       Rules:
                                           Father(x, y ) → Parent(x, y )
                                           Mother(x, y ) → Parent(x, y )
                     Parent(x, y ) ∧ Parent(y , z)            → Grandparent(x, z)


M. Arenas   –   Exchanging more than Complete Data - RR2011                         29 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Given a mapping:
                                         Father(x, y ) → F(x, y )
                                    Grandparent(x, y ) → G(x, y )


       What is a good translation of the initial knowledge base?




M. Arenas   –   Exchanging more than Complete Data - RR2011         30 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Given a mapping:
                                         Father(x, y ) → F(x, y )
                                    Grandparent(x, y ) → G(x, y )


       What is a good translation of the initial knowledge base?

       Data:
                                       F                               G
                             Andy          Bob                Andy         Danny
                             Bob           Danny              Carrie       Danny
                             Danny         Eddie              Bob          Eddie

       Rules: ∅

M. Arenas   –   Exchanging more than Complete Data - RR2011                        30 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:
                                          Father(x, y )       → Parent(x, y )
                                    Mother(x, y )             → Parent(x, y )
                    Parent(x, y ) ∧ Parent(y , z)             → Grandparent(x, z)




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:


                                    Mother(x, y )             → Parent(x, y )
                    Parent(x, y ) ∧ Parent(y , z)             → Grandparent(x, z)




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                    Parent(x, y ) ∧ Parent(y , z)             → Grandparent(x, z)




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)


       What data should we materialize?




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)


       What data should we materialize?
                                      F                                 G
                            Andy          Bob                  Andy         Danny
                            Bob           Danny                Carrie       Danny
                            Danny         Eddie                Bob          Eddie




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)


       What data should we materialize?
                                      F                                  G
                            Andy          Bob
                            Bob           Danny                 Carrie       Danny
                            Danny         Eddie




M. Arenas   –   Exchanging more than Complete Data - RR2011                          31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)


       What data should we materialize?
                                      F                                  G
                            Andy          Bob
                            Bob           Danny                 Carrie       Danny
                            Danny         Eddie

       Is this a good translation? Why?

M. Arenas   –   Exchanging more than Complete Data - RR2011                          31 / 68
One can exchange more than complete data


            ◮   In data exchange one starts with a database instance (with
                complete information).

            ◮   What if we have an initial object that has several
                interpretations?
                   ◮   A representation of a set of possible instances


            ◮   We propose a new general formalism to exchange
                representations of possible instances
                   ◮   We apply it to the problems of exchanging instances with
                       incomplete information and exchanging knowledge bases




M. Arenas   –   Exchanging more than Complete Data - RR2011                       32 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        33 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        34 / 68
Representation systems

       A representation system R = (W, rep) consists of:
            ◮   a set W of representatives
            ◮   a function rep that assigns a set of instances to every element
                in W
                                rep(V) = {I1 , I2 , I3 , . . .} for every V ∈ W


       Uniformity assumption: For every V ∈ W, there exists a relational
       schema S (the type of V) such that rep(V) ⊆ Inst(S)




M. Arenas   –   Exchanging more than Complete Data - RR2011                       35 / 68
Representation systems

       A representation system R = (W, rep) consists of:
            ◮   a set W of representatives
            ◮   a function rep that assigns a set of instances to every element
                in W
                                rep(V) = {I1 , I2 , I3 , . . .} for every V ∈ W


       Uniformity assumption: For every V ∈ W, there exists a relational
       schema S (the type of V) such that rep(V) ⊆ Inst(S)



       Incomplete instances and knowledge bases are representation
       systems


M. Arenas   –   Exchanging more than Complete Data - RR2011                       35 / 68
In classical data exchange we consider only complete data


       Recall that given M = (S, T, Σ), I ∈ Inst(S) and J ∈ Inst(T): J is
       a solution for I under M if (I , J) |= Σ


                                                  J ∈ SolM (I )




M. Arenas   –   Exchanging more than Complete Data - RR2011                 36 / 68
In classical data exchange we consider only complete data


       Recall that given M = (S, T, Σ), I ∈ Inst(S) and J ∈ Inst(T): J is
       a solution for I under M if (I , J) |= Σ


                                                  J ∈ SolM (I )



       This can be extended to set of instances. Given X ⊆ Inst(S):


                                      SolM (X ) =                    SolM (I )
                                                              I ∈X




M. Arenas   –   Exchanging more than Complete Data - RR2011                      36 / 68
Extending the definition to representation systems

       Given:
            ◮   a mapping M = (S, T, Σ)
            ◮   a representation system R = (W, rep)
            ◮   U, V ∈ W of types S and T, respectively




M. Arenas   –   Exchanging more than Complete Data - RR2011   37 / 68
Extending the definition to representation systems

       Given:
            ◮   a mapping M = (S, T, Σ)
            ◮   a representation system R = (W, rep)
            ◮   U, V ∈ W of types S and T, respectively



       Definition (APR11)
       V is an R-solution of U under M if
                                       rep(V) ⊆ SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011     37 / 68
Extending the definition to representation systems

       Given:
            ◮   a mapping M = (S, T, Σ)
            ◮   a representation system R = (W, rep)
            ◮   U, V ∈ W of types S and T, respectively



       Definition (APR11)
       V is an R-solution of U under M if
                                       rep(V) ⊆ SolM (rep(U))



       Or equivalently: V is an R-solution of U if for every J ∈ rep(V),
       there exists I ∈ rep(U) such that J ∈ SolM (I ).

M. Arenas   –   Exchanging more than Complete Data - RR2011                37 / 68
Universal solutions




       What is a good solution in this framework?




M. Arenas   –   Exchanging more than Complete Data - RR2011   38 / 68
Universal solutions




       What is a good solution in this framework?



       Definition (APR11)
       V is an universal R-solution of U under M if
                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011     38 / 68
Strong representation systems

       Let C be a class of mappings.




M. Arenas   –   Exchanging more than Complete Data - RR2011   39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M∈C                  and for every U ∈ W            , there exists a
       V ∈W             :

                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011                   39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M ∈ C from S to T, and for every U ∈ W              , there exists a
       V ∈W             :

                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011                   39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M ∈ C from S to T, and for every U ∈ W of type S, there exists a
       V ∈W             :

                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011                39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M ∈ C from S to T, and for every U ∈ W of type S, there exists a
       V ∈ W of type T:

                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011                39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M ∈ C from S to T, and for every U ∈ W of type S, there exists a
       V ∈ W of type T:

                                       rep(V) = SolM (rep(U))



       If R = (W, rep) is a strong representation system, then the
       universal solutions for the representatives in W can be represented
       in the same system.


M. Arenas   –   Exchanging more than Complete Data - RR2011                  39 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        40 / 68
Motivating questions



       What is a strong representation system for the class of mappings
       specified by st-tgds?
            ◮   Are instances including nulls enough?


       Can the fundamental data exchange problems be solved in
       polynomial time in this setting?
            ◮   Computing (universal) solutions
            ◮   Computing certain answers




M. Arenas   –   Exchanging more than Complete Data - RR2011               41 / 68
Naive instances

       We have already considered naive instances: Instances with null values
            ◮   Example: Canonical universal solution


       A naive instance I has labeled nulls:
                                                     R(1, n1 )
                                                     R(n1 , 2)
                                                     R(1, n2 )




M. Arenas   –   Exchanging more than Complete Data - RR2011                     42 / 68
Naive instances

       We have already considered naive instances: Instances with null values
            ◮   Example: Canonical universal solution


       A naive instance I has labeled nulls:
                                                     R(1, n1 )
                                                     R(n1 , 2)
                                                     R(1, n2 )


       The interpretations of I are constructed by replacing nulls by constants:

                        rep(I)      =     {K | µ(I) ⊆ K for some valuation µ}




M. Arenas   –   Exchanging more than Complete Data - RR2011                        42 / 68
Are naive instances expressive enough?


       Naive instances have been extensively used in data exchange:


       Proposition (FKMP03)
       Let M = (S, T, Σ), where Σ is a set of st-tgds. Then for every
       instance I of S, there exists a naive instance J of T such that:

                                           rep(J ) = SolM (I )


       In fact, the canonical universal solution satisfies the property
       mentioned above.




M. Arenas   –   Exchanging more than Complete Data - RR2011               43 / 68
Are naive instances expressive enough?



       But naive instances are not expressive enough to deal with
       incomplete information in the source instances:



       Proposition (APR11)
       Naive instances are not a strong representation system for the class
       of mappings specified by st-tgds




M. Arenas   –   Exchanging more than Complete Data - RR2011                   44 / 68
Are naive instances expressive enough?


       Example
       Consider a mapping M specified by:
                                Manager(x, y ) → Reports(x, y )
                                Manager(x, x) → SelfManager(x)


       The canonical universal solution for I = {Manager(n, Peter)} under M:
                                       J     = {Reports(n, Peter)}


       But J is not a good solution for I.
            ◮   It cannot represent the fact that if n is given value Peter, then
                SelfManager(Peter) should hold in the target.



M. Arenas   –   Exchanging more than Complete Data - RR2011                         45 / 68
Conditional instances


       What should be added to naive instances to obtain a strong
       representation system?




M. Arenas   –   Exchanging more than Complete Data - RR2011         46 / 68
Conditional instances


       What should be added to naive instances to obtain a strong
       representation system?
            ◮   Answer from database theory: Conditions on the nulls




M. Arenas   –   Exchanging more than Complete Data - RR2011            46 / 68
Conditional instances


       What should be added to naive instances to obtain a strong
       representation system?
            ◮   Answer from database theory: Conditions on the nulls


       Conditional instances: Naive instances plus tuple conditions

       A tuple condition is a positive Boolean combinations of:
            ◮   equalities and inequalities between nulls, and between nulls
                and constants




M. Arenas   –   Exchanging more than Complete Data - RR2011                    46 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2




M. Arenas   –   Exchanging more than Complete Data - RR2011               47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:




M. Arenas   –   Exchanging more than Complete Data - RR2011               47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3




M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3
                   R(1, 2)
                   R(2, 2)




M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3
                   R(1, 2)                           R(1, 3)
                   R(2, 2)




M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3
                   R(1, 2)                           R(1, 3)
                   R(2, 2)                                                   R(2, 3)




M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3
                   R(1, 2)                           R(1, 3)
                   R(2, 2)                                                   R(2, 3)


       Interpretations of a conditional instance I:

                        rep(I)      =     {K | µ(I) ⊆ K for some valuation µ}



M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Positive conditional instances




       Many problems are intractable over conditional instances.
            ◮   We also consider a restricted class of conditional instances


       Positive conditional instances: Conditional instances without
       inequalities




M. Arenas   –   Exchanging more than Complete Data - RR2011                    48 / 68
(Positive) conditional instances are enough

       Theorem (APR11)
       Both conditional instances and positive conditional instances are strong
       representation systems for the class of mappings specified by st-tgds.


       Example
       Consider again the mapping M specified by:
                                Manager(x, y ) → Reports(x, y )
                                Manager(x, x) → SelfManager(x)


       The following is a universal solution for I = {Manager(n, Peter)}
                                    Reports(n, Peter)         true
                                    SelfManager(Peter)        n = Peter


M. Arenas   –   Exchanging more than Complete Data - RR2011                       49 / 68
Positive conditional instances are exactly the needed
 representation system

       Positive conditional instances are minimal:

       Theorem (APR11)
       All the following are needed to obtain a strong representation
       system for the class of mappings specified by st-tgds:
            ◮   equalities between nulls
            ◮   equalities between constant and nulls
            ◮   conjunctions and disjunctions



       Conditional instances are enough but not minimal.


M. Arenas   –   Exchanging more than Complete Data - RR2011             50 / 68
Positive conditional instance can be used in practice!

       Let M = (S, T, Σ), where Σ is a set of st-tgds.




M. Arenas   –   Exchanging more than Complete Data - RR2011   51 / 68
Positive conditional instance can be used in practice!

       Let M = (S, T, Σ), where Σ is a set of st-tgds.

       Theorem (APR11)
       There exists a polynomial time algorithm that, given a positive
       conditional instance I over S, computes a positive conditional instance
       J over T that is a universal solution for I under M.




M. Arenas   –   Exchanging more than Complete Data - RR2011                      51 / 68
Positive conditional instance can be used in practice!

       Let M = (S, T, Σ), where Σ is a set of st-tgds.

       Theorem (APR11)
       There exists a polynomial time algorithm that, given a positive
       conditional instance I over S, computes a positive conditional instance
       J over T that is a universal solution for I under M.



       Let Q be a union of conjunctive queries over T.

                                     Q(J ) =                      Q(J)
                                                      J∈rep(J )

                       certainM (Q, I)          =                                     Q(J )
                                                      J is a solution for I under M




M. Arenas   –   Exchanging more than Complete Data - RR2011                                   51 / 68
Positive conditional instance can be used in practice!



       Theorem (APR11)
       There exists a polynomial time algorithm that, given a positive
       conditional instance I over S, computes certainM (Q, I).




M. Arenas   –   Exchanging more than Complete Data - RR2011              52 / 68
Positive conditional instance can be used in practice!



       Theorem (APR11)
       There exists a polynomial time algorithm that, given a positive
       conditional instance I over S, computes certainM (Q, I).



       The same result holds for the class of unions of conjunctive queries with
       at most one inequality per disjunct.
            ◮   The other important class of queries in the data exchange area for
                which certain answers can be computed in polynomial time




M. Arenas   –   Exchanging more than Complete Data - RR2011                          52 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        53 / 68
The semantics of knowledge bases is given by sets of
 instances



       Knowledge base over S: (I , Γ) such that
            ◮   I ∈ Inst(S)
            ◮   Γ a set of rules over S


       Semantics: finite models

                     Mod(I , Γ) = {K ∈ Inst(S) | I ⊆ K and K |= Γ}




M. Arenas   –   Exchanging more than Complete Data - RR2011          54 / 68
We can apply our formalism to knowledge bases


       (I2 , Γ2 ) is a KB-solution for (I1 , Γ1 ) under M if:


                                Mod(I2 , Γ2 ) ⊆ SolM (Mod(I1 , Γ1 ))



       (I2 , Γ2 ) is a universal KB-solution for (I1 , Γ1 ) under M if:


                                Mod(I2 , Γ2 ) = SolM (Mod(I1 , Γ1 ))




M. Arenas   –   Exchanging more than Complete Data - RR2011               55 / 68
Motivating questions




       Same as for the case of instances with incomplete information.
            ◮   Constructing universal KB-solutions
            ◮   Answering target queries


       New fundamental problem: Construct solutions including as much
       implicit knowledge as possible.




M. Arenas   –   Exchanging more than Complete Data - RR2011             56 / 68
What are good knowledge-base solutions?

       First alternative: universal KB-solutions

       But there exist some other KB-solutions desirable to materialize
            ◮   Minimality comes into play




M. Arenas   –   Exchanging more than Complete Data - RR2011               57 / 68
What are good knowledge-base solutions?

       First alternative: universal KB-solutions

       But there exist some other KB-solutions desirable to materialize
            ◮   Minimality comes into play


       Given sets X , Y of instances:
            ◮   X ≡min Y if X and Y coincide in the minimal instances under ⊆



       Definition
       (I2 , Γ2 ) is a minimal KB-solution of (I1 , Γ1 ) under M if:

                               Mod(I2 , Γ2 )       ≡min       SolM (Mod(I1 , Γ1 ))



M. Arenas   –   Exchanging more than Complete Data - RR2011                          57 / 68
Two requirements to construct minimal knowledge-base
 solutions

       Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution
       (I2 , Γ2 ) we would like:




M. Arenas   –   Exchanging more than Complete Data - RR2011              58 / 68
Two requirements to construct minimal knowledge-base
 solutions

       Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution
       (I2 , Γ2 ) we would like:


            1. Γ2 to only depend on Γ1 and M:

                                             Γ2 is safe for Γ1 and M




M. Arenas   –   Exchanging more than Complete Data - RR2011              58 / 68
Two requirements to construct minimal knowledge-base
 solutions

       Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution
       (I2 , Γ2 ) we would like:


            1. Γ2 to only depend on Γ1 and M:

                                             Γ2 is safe for Γ1 and M


       Definition
       Γ2 is safe for Γ1 and M, if for every I1 there exists I2 :

                   (I2 , Γ2 ) is a minimal KB-solution of (I1 , Γ1 ) under M


M. Arenas   –   Exchanging more than Complete Data - RR2011                    58 / 68
Two requirements to construct minimal knowledge-base
 solutions



            2. Γ2 to be as informative as possible (thus minimizing the size
               of I2 ):




M. Arenas   –   Exchanging more than Complete Data - RR2011                    59 / 68
Two requirements to construct minimal knowledge-base
 solutions



            2. Γ2 to be as informative as possible (thus minimizing the size
               of I2 ):


       Definition
       Γ2 is optimal-safe if for every other safe set Γ′ :

                                                     Γ2 |= Γ′




M. Arenas   –   Exchanging more than Complete Data - RR2011                    59 / 68
Computing minimal KB-solutions

       To obtain algorithms for computing minimal KB-solutions, we need
       to specify the language used in knowledge bases.
            ◮   Full st-tgd:

                                          ∀¯∀¯ (ϕ(¯ , y ) → ψ(¯ ))
                                           x y    x ¯         x




M. Arenas   –   Exchanging more than Complete Data - RR2011               60 / 68
Computing minimal KB-solutions

       To obtain algorithms for computing minimal KB-solutions, we need
       to specify the language used in knowledge bases.
            ◮   Full st-tgd:

                                          ∀¯∀¯ (ϕ(¯ , y ) → ψ(¯ ))
                                           x y    x ¯         x




       Theorem (APR11)
       There exists a polynomial-time algorithm that, given
       M = (S, T, Σ), where Σ is a set of full st-tgds, and given a set Γ1
       of full tgds over S, computes a set Γ2 of second-order logic
       sentences over T that is optimal-safe for Γ1 and M.


M. Arenas   –   Exchanging more than Complete Data - RR2011                  60 / 68
Computing minimal KB-solutions

       Unfortunately, first-order logic is no expressive enough.


       Theorem (APR11)
       There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a
       set Γ1 of full tgds over S such that:

                        no FO-sentence is optimal-safe for Γ1 and M.




M. Arenas   –   Exchanging more than Complete Data - RR2011                 61 / 68
Computing minimal KB-solutions

       Unfortunately, first-order logic is no expressive enough.


       Theorem (APR11)
       There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a
       set Γ1 of full tgds over S such that:

                        no FO-sentence is optimal-safe for Γ1 and M.



       How can we deal with these problems in practice?




M. Arenas   –   Exchanging more than Complete Data - RR2011                 61 / 68
Computing minimal KB-solutions

       Unfortunately, first-order logic is no expressive enough.


       Theorem (APR11)
       There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a
       set Γ1 of full tgds over S such that:

                        no FO-sentence is optimal-safe for Γ1 and M.



       How can we deal with these problems in practice?
            ◮   We need to restrict the language used to specify knowledge
                bases: Description logics [ABC11]



M. Arenas   –   Exchanging more than Complete Data - RR2011                  61 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        62 / 68
We can exchange more than complete data

       We propose a general formalism to exchange representation
       systems
            ◮   Applications to incomplete instances
            ◮   Applications to knowledge bases


       Next step: Apply our general setting to the Semantic Web
            ◮   Semantic Web data has nulls (blank nodes)
            ◮   Semantic Web specifications have rules (RDFS, OWL)


       Lots of interesting problems to solve if knowledge bases are
       specified by means of description logics.
            ◮   Better results can be obtained


M. Arenas   –   Exchanging more than Complete Data - RR2011           63 / 68
We can exchange more than complete data

       We propose a general formalism to exchange representation
       systems
            ◮   Applications to incomplete instances
            ◮   Applications to knowledge bases


       Next step: Apply our general setting to the Semantic Web
            ◮   Semantic Web data has nulls (blank nodes)
            ◮   Semantic Web specifications have rules (RDFS, OWL)


       Lots of interesting problems to solve if knowledge bases are
       specified by means of description logics.
            ◮   Better results can be obtained


M. Arenas   –   Exchanging more than Complete Data - RR2011           63 / 68
We can exchange more than complete data

       We propose a general formalism to exchange representation
       systems
            ◮   Applications to incomplete instances
            ◮   Applications to knowledge bases


       Next step: Apply our general setting to the Semantic Web
            ◮   Semantic Web data has nulls (blank nodes)
            ◮   Semantic Web specifications have rules (RDFS, OWL)


       Lots of interesting problems to solve if knowledge bases are
       specified by means of description logics.
            ◮   Better results can be obtained


M. Arenas   –   Exchanging more than Complete Data - RR2011           63 / 68
Thank you!




M. Arenas   –   Exchanging more than Complete Data - RR2011   64 / 68
Bibliography



            [ABC11]         M. Arenas, E. Botoeva, D. Calvanese. Knowledge Base Exchange.
                            DL 2011.

            [APR11]         M. Arenas, J. P´rez, J. Reutter. Data Exchange beyond Complete
                                           e
                            Data. PODS 2011.

            [B03]           P. A. Bernstein. Applying Model Management to Classical Meta
                            Data Problems. CIDR 2003.

            [FKMP03]        R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa. Data Exchange:
                            Semantics and Query Answering. ICDT 2003.




M. Arenas    –   Exchanging more than Complete Data - RR2011                             65 / 68
Bonus track: Computation of solutions and its associated
 decision problem


       Decision problem: Check-KB-Sol
            Input:         M = (S, T, Σ), where Σ is a set of st-tgds
                           (I1 , Γ1 ) KB over S with Γ1 a set of tgds
                           (I2 , Γ2 ) KB over T with Γ2 a set of tgds

            Output:        Is (I2 , Γ2 ) a KB-solution of (I1 , Γ1 ) under M?




M. Arenas    –   Exchanging more than Complete Data - RR2011                    66 / 68
Bonus track: Computation of solutions and its associated
 decision problem


       Decision problem: Check-KB-Sol
            Input:         M = (S, T, Σ), where Σ is a set of st-tgds
                           (I1 , Γ1 ) KB over S with Γ1 a set of tgds
                           (I2 , Γ2 ) KB over T with Γ2 a set of tgds

            Output:        Is (I2 , Γ2 ) a KB-solution of (I1 , Γ1 ) under M?




       Theorem (APR11)
                   Check-KB-Sol is undecidable (even for a fixed M).


M. Arenas    –   Exchanging more than Complete Data - RR2011                    66 / 68
Bonus track: Computation of solutions and its associated
 decision problem




       Undecidability is a consequence of using ∃ in knowledge bases.
            ◮   We need to restrict the input


       Check-Full-KB-Sol: Γ1 , Γ2 are assumed to be sets of full tgds




M. Arenas   –   Exchanging more than Complete Data - RR2011             67 / 68
Bonus track: Computation of solutions and its associated
 decision problem


       Theorem (APR11)
                       Check-Full-KB-Sol is EXPTIME-complete.




M. Arenas   –   Exchanging more than Complete Data - RR2011     68 / 68
Bonus track: Computation of solutions and its associated
 decision problem


       Theorem (APR11)
                        Check-Full-KB-Sol is EXPTIME-complete.



       Theorem (APR11)
       If M = (S, T, Σ) is fixed:

                     Check-Full-KB-Sol is ∆P [O(log n)]-complete.
                                           2


            ∆P [O(log n)]:
             2                      P NP with a logarithmic number of
                                    calls to the NP oracle.


M. Arenas    –   Exchanging more than Complete Data - RR2011            68 / 68

More Related Content

More from net2-project

Extracting Information for Context-aware Meeting Preparation
Extracting Information for Context-aware Meeting PreparationExtracting Information for Context-aware Meeting Preparation
Extracting Information for Context-aware Meeting Preparationnet2-project
 
Vector spaces for information extraction - Random Projection Example
Vector spaces for information extraction - Random Projection ExampleVector spaces for information extraction - Random Projection Example
Vector spaces for information extraction - Random Projection Examplenet2-project
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1net2-project
 
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...net2-project
 
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...net2-project
 
Managing Social Communities
Managing Social CommunitiesManaging Social Communities
Managing Social Communitiesnet2-project
 
Data Exchange over RDF
Data Exchange over RDFData Exchange over RDF
Data Exchange over RDFnet2-project
 
Exchanging More than Complete Data
Exchanging More than Complete DataExchanging More than Complete Data
Exchanging More than Complete Datanet2-project
 
Exchanging More than Complete Data
Exchanging More than Complete DataExchanging More than Complete Data
Exchanging More than Complete Datanet2-project
 
Answer-set programming
Answer-set programmingAnswer-set programming
Answer-set programmingnet2-project
 
Evolving web, evolving search
Evolving web, evolving searchEvolving web, evolving search
Evolving web, evolving searchnet2-project
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 

More from net2-project (12)

Extracting Information for Context-aware Meeting Preparation
Extracting Information for Context-aware Meeting PreparationExtracting Information for Context-aware Meeting Preparation
Extracting Information for Context-aware Meeting Preparation
 
Vector spaces for information extraction - Random Projection Example
Vector spaces for information extraction - Random Projection ExampleVector spaces for information extraction - Random Projection Example
Vector spaces for information extraction - Random Projection Example
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1
 
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
 
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
 
Managing Social Communities
Managing Social CommunitiesManaging Social Communities
Managing Social Communities
 
Data Exchange over RDF
Data Exchange over RDFData Exchange over RDF
Data Exchange over RDF
 
Exchanging More than Complete Data
Exchanging More than Complete DataExchanging More than Complete Data
Exchanging More than Complete Data
 
Exchanging More than Complete Data
Exchanging More than Complete DataExchanging More than Complete Data
Exchanging More than Complete Data
 
Answer-set programming
Answer-set programmingAnswer-set programming
Answer-set programming
 
Evolving web, evolving search
Evolving web, evolving searchEvolving web, evolving search
Evolving web, evolving search
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Exchanging more than Complete Data

  • 1. Exchanging more than Complete Data Marcelo Arenas PUC Chile Joint work with Jorge P´rez (U. de Chile) and Juan Reutter (U. Edinburgh) e M. Arenas – Exchanging more than Complete Data - RR2011 1 / 68
  • 2. Outline: First part ◮ The data exchange problem ◮ Some fundamental results in relational data exchange ◮ The need for a more general data exchange framework ◮ Two important scenarios: Incomplete databases (open-world databases: RDF) and knowledge bases M. Arenas – Exchanging more than Complete Data - RR2011 2 / 68
  • 3. Outline: First part ◮ The data exchange problem ◮ Some fundamental results in relational data exchange ◮ The need for a more general data exchange framework ◮ Two important scenarios: Incomplete databases (open-world databases: RDF) and knowledge bases M. Arenas – Exchanging more than Complete Data - RR2011 3 / 68
  • 4. The problem of data exchange Given: A source schema S, a target schema T and a specification Σ of the relationship between these schemas Data exchange: Problem of materializing an instance of T given an instance of S ◮ Target instance should reflect the source data as accurately as possible, given the constraints imposed by Σ and T ◮ It should be efficiently computable ◮ It should allow one to evaluate queries on the target in a way that is semantically consistent with the source data M. Arenas – Exchanging more than Complete Data - RR2011 4 / 68
  • 5. Data exchange in a picture Query Q Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 6. Data exchange in a picture Query Q Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 7. Data exchange in a picture Query Q Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 8. Data exchange in a picture Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 9. Data exchange in a picture Query Q Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 10. Data exchange: Some fundamental questions What are the challenges in the area? ◮ What is a good language for specifying the relationship between source and target data? ◮ Expressiveness versus complexity ◮ What is a good instance to materialize? ◮ What does it mean to answer a query over target data? ◮ How do we answer queries over target data? Can we do this efficiently? M. Arenas – Exchanging more than Complete Data - RR2011 6 / 68
  • 11. Exchanging relational data The data exchange problem has been extensively studied in the relational world. ◮ It has also been commercially implemented: IBM Clio Relational data exchange setting: ◮ Source and target schemas: Relational schemas ◮ Relationship between source and target schemas: Source-to-target tuple-generating dependencies (st-tgds) Semantics of data exchange has been precisely defined. ◮ Efficient algorithms for materializing target instances and for answering queries over the target schema have been developed M. Arenas – Exchanging more than Complete Data - RR2011 7 / 68
  • 12. Schema mapping: The key component in relational data exchange Schema mapping: M = (S, T, Σ) ◮ S and T are disjoint relational schemas ◮ Σ is a finite set of st-tgds: ∀¯∀¯ (ϕ(¯ , y ) → ∃¯ ψ(¯ , z )) x y x ¯ z x ¯ ϕ(¯, y ): conjunction of relational atomic formulas over S x ¯ ψ(¯, z ): conjunction of relational atomic formulas over T x ¯ M. Arenas – Exchanging more than Complete Data - RR2011 8 / 68
  • 13. Relational schema mappings: An example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: ∀x Employee(x) → ∃y Dept(x, y ) M. Arenas – Exchanging more than Complete Data - RR2011 9 / 68
  • 14. Relational schema mappings: An example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: ∀x Employee(x) → ∃y Dept(x, y ) Note We omit universal quantifiers in st-tgds: Employee(x) → ∃y Dept(x, y ) M. Arenas – Exchanging more than Complete Data - RR2011 9 / 68
  • 15. Relational data exchange problem Fixed: M = (S, T, Σ) Problem: Given instance I of S, find an instance J of T such that (I , J) satisfies Σ ◮ (I , J) satisfies ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) if whenever I satisfies x ¯ z x ¯ a ¯ ϕ(¯, b), there is a tuple c such that J satisfies ψ(¯, c ) ¯ a ¯ M. Arenas – Exchanging more than Complete Data - RR2011 10 / 68
  • 16. Relational data exchange problem Fixed: M = (S, T, Σ) Problem: Given instance I of S, find an instance J of T such that (I , J) satisfies Σ ◮ (I , J) satisfies ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) if whenever I satisfies x ¯ z x ¯ a ¯ ϕ(¯, b), there is a tuple c such that J satisfies ψ(¯, c ) ¯ a ¯ Notation J is a solution for I under M ◮ SolM (I ): Set of solutions for I under M M. Arenas – Exchanging more than Complete Data - RR2011 10 / 68
  • 17. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 18. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 19. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} J2 : {Dept(Peter,1), Dept(Peter,2)} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 20. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} J2 : {Dept(Peter,1), Dept(Peter,2)} J3 : {Dept(Peter,1), Dept(John,1)} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 21. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} J2 : {Dept(Peter,1), Dept(Peter,2)} J3 : {Dept(Peter,1), Dept(John,1)} J4 : {Dept(Peter,n1 )} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 22. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} J2 : {Dept(Peter,1), Dept(Peter,2)} J3 : {Dept(Peter,1), Dept(John,1)} J4 : {Dept(Peter,n1 )} J5 : {Dept(Peter,n1 ), Dept(Peter,n2 )} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 23. Canonical universal solution Algorithm (Chase) Input : M = (S, T, Σ) and an instance I of S Output : Canonical universal solution J ⋆ for I under M let J ⋆ := empty instance of T for every ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) in Σ do x ¯ z x ¯ a ¯ a ¯ for every ¯, b such that I satisfies ϕ(¯, b) do create a fresh tuple n of pairwise distinct null values ¯ insert ψ(¯, n) into J a ¯ ⋆ M. Arenas – Exchanging more than Complete Data - RR2011 12 / 68
  • 24. Canonical universal solution: Example Example Consider mapping M specified by dependency: Employee(x) → ∃y Dept(x, y ) Canonical universal solution for I = {Employee(Peter), Employee(John)}: ◮ For a = Peter do ◮ Create a fresh null value n1 ◮ Insert Dept(Peter, n1 ) into J ⋆ ◮ For a = John do ◮ Create a fresh null value n2 ◮ Insert Dept(John, n2 ) into J ⋆ Result: J ⋆ = {Dept(Peter, n1 ), Dept(John, n2 )} M. Arenas – Exchanging more than Complete Data - RR2011 13 / 68
  • 25. Query answering in data exchange Given: Mapping M, source instance I and query Q over the target schema ◮ What does it mean to answer Q? M. Arenas – Exchanging more than Complete Data - RR2011 14 / 68
  • 26. Query answering in data exchange Given: Mapping M, source instance I and query Q over the target schema ◮ What does it mean to answer Q? Definition (Certain answers) certainM (Q, I ) = Q(J) J is a solution for I under M M. Arenas – Exchanging more than Complete Data - RR2011 14 / 68
  • 27. Certain answers: Example Example Consider mapping M specified by: Employee(x) → ∃y Dept(x, y ) Given instance I = {Employee(Peter)}: certainM (∃y Dept(x, y ), I ) = {Peter} certainM (Dept(x, y ), I ) = ∅ M. Arenas – Exchanging more than Complete Data - RR2011 15 / 68
  • 28. Query rewriting: An approach for answering queries How can we compute certain answers? ◮ Na¨ algorithm does not work: infinitely many solutions ıve M. Arenas – Exchanging more than Complete Data - RR2011 16 / 68
  • 29. Query rewriting: An approach for answering queries How can we compute certain answers? ◮ Na¨ algorithm does not work: infinitely many solutions ıve Approach proposed in [FKMP03]: Query Rewriting Given a mapping M and a target query Q, compute a query Q ⋆ such that for every source instance I with canonical universal solution J ⋆ : certainM (Q, I ) = Q ⋆ (J ⋆ ) M. Arenas – Exchanging more than Complete Data - RR2011 16 / 68
  • 30. Query rewriting over the canonical universal solution Theorem (FKMP03) Given a mapping M specified by st-tgds and a union of conjunctive queries Q, there exists a query Q ⋆ such that for every source instance I with canonical universal solution J ⋆ : certainM (Q, I ) = Q ⋆ (J ⋆ ) M. Arenas – Exchanging more than Complete Data - RR2011 17 / 68
  • 31. Query rewriting over the canonical universal solution Theorem (FKMP03) Given a mapping M specified by st-tgds and a union of conjunctive queries Q, there exists a query Q ⋆ such that for every source instance I with canonical universal solution J ⋆ : certainM (Q, I ) = Q ⋆ (J ⋆ ) Proof idea: Assume that C(a) holds whenever a is a constant. Then: Q ⋆ (x1 , . . . , xm ) = C(x1 ) ∧ · · · ∧ C(xm ) ∧ Q(x1 , . . . , xm ) M. Arenas – Exchanging more than Complete Data - RR2011 17 / 68
  • 32. Computing certain answers: Complexity Data complexity: Data exchange setting and query are considered to be fixed. Corollary (FKMP03) For mappings given by st-tgds, certain answers for UCQ can be computed in polynomial time (data complexity) M. Arenas – Exchanging more than Complete Data - RR2011 18 / 68
  • 33. Relational data exchange: Some lessons learned Key steps in the development of the area: ◮ Definition of schema mappings: Precise syntax and semantics ◮ Definition of the notion of solution ◮ Identification of good solutions ◮ Polynomial time algorithms for materializing good solutions ◮ Definition of target queries: Precise semantics ◮ Polynomial time algorithms for computing certain answers for UCQ M. Arenas – Exchanging more than Complete Data - RR2011 19 / 68
  • 34. Relational data exchange: Some lessons learned Key steps in the development of the area: ◮ Definition of schema mappings: Precise syntax and semantics ◮ Definition of the notion of solution ◮ Identification of good solutions ◮ Polynomial time algorithms for materializing good solutions ◮ Definition of target queries: Precise semantics ◮ Polynomial time algorithms for computing certain answers for UCQ Creating schema mappings is a time consuming and expensive process ◮ Manual or semi-automatic process in general M. Arenas – Exchanging more than Complete Data - RR2011 19 / 68
  • 35. Outline: First part ◮ The data exchange problem ◮ Some fundamental results in relational data exchange ◮ The need for a more general data exchange framework ◮ Two important scenarios: Incomplete databases (open-world databases: RDF) and knowledge bases M. Arenas – Exchanging more than Complete Data - RR2011 20 / 68
  • 36. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 37. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 38. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU ? M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 39. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU ? We need some operators for schema mappings M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 40. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU = ΣST ◦ ΣTU We need some operators for schema mappings ◮ Composition in the above case M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 41. Metadata management Contributions mentioned in the previous slides are just a first step towards the development of a general framework for data exchange. In fact, as pointed in [B03], many information system problems involve not only the design and integration of complex application artifacts, but also their subsequent manipulation. M. Arenas – Exchanging more than Complete Data - RR2011 22 / 68
  • 42. Metadata management This has motivated the need for the development of a general infrastructure for managing schema mappings. The problem of managing schema mappings is called metadata management. High-level algebraic operators, such as compose, are used to manipulate mappings. ◮ What other operators are needed? M. Arenas – Exchanging more than Complete Data - RR2011 23 / 68
  • 43. An inverse operator is also needed S T U ΣST ΣTU ΣVS =ΣVS ? Σ−1 SV ΣSV Σ−1 ◦ ΣST VS (Σ−1 ◦ ΣST ) ◦ ΣTU VS V M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 44. An inverse operator is also needed S T U ΣST ΣTU ΣVS =ΣVS ? ΣSV −1 ΣSV Σ−1 ◦ ΣST VS (ΣVS ◦ ΣST ) ◦ ΣTU −1 V M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 45. An inverse operator is also needed S T U ΣST ΣTU ΣVS = Σ−1 SV ΣVS ? ΣSV Σ−1 ◦ ΣST VS (Σ−1 ◦ ΣST ) ◦ ΣTU VS V M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 46. An inverse operator is also needed S T U ΣST ΣTU ΣVS = Σ−1 SV ΣSV Σ−1 ◦ ΣST VS (Σ−1 ◦ ΣST ) ◦ ΣTU VS V M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 47. An inverse operator is also needed S T U ΣST ΣTU ΣVS = Σ−1 SV ΣSV Σ−1 ◦ ΣST SV (Σ−1 ◦ ΣST ) ◦ ΣTU VS V Composition and inverse operators have to be combined M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 48. An inverse operator is also needed S T U ΣST ΣTU ΣVS = Σ−1 SV ΣSV Σ−1 ◦ ΣST SV (Σ−1 ◦ ΣST ) ◦ ΣTU SV V Composition and inverse operators have to be combined M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 49. Metadata management: A more general data exchange framework is needed Composition and inverse operators have been extensively studied in the relational world. ◮ Semantics, computation, . . . Combining these operators is an open issue. M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68
  • 50. Metadata management: A more general data exchange framework is needed Composition and inverse operators have been extensively studied in the relational world. ◮ Semantics, computation, . . . Combining these operators is an open issue. ◮ Key observation: A target instance of a mapping can be the source instance of another mapping M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68
  • 51. Metadata management: A more general data exchange framework is needed Composition and inverse operators have been extensively studied in the relational world. ◮ Semantics, computation, . . . Combining these operators is an open issue. ◮ Key observation: A target instance of a mapping can be the source instance of another mapping ◮ Sources instances may contain null values M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68
  • 52. Metadata management: A more general data exchange framework is needed Composition and inverse operators have been extensively studied in the relational world. ◮ Semantics, computation, . . . Combining these operators is an open issue. ◮ Key observation: A target instance of a mapping can be the source instance of another mapping ◮ Sources instances may contain null values There is a need for a data exchange framework that can handle databases with incomplete information. M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68
  • 53. Data exchange in the RDF world There is an increasing interest in publishing relational data as RDF ◮ Resulted in the creation of the W3C RDB2RDF Working Group The problem of translating relational data into RDF can be seen as a data exchange problem ◮ Schema mappings can be used to describe how the relational data is to be mapped into RDF M. Arenas – Exchanging more than Complete Data - RR2011 26 / 68
  • 54. Data exchange in the RDF world There is an increasing interest in publishing relational data as RDF ◮ Resulted in the creation of the W3C RDB2RDF Working Group The problem of translating relational data into RDF can be seen as a data exchange problem ◮ Schema mappings can be used to describe how the relational data is to be mapped into RDF But there is a mismatch here: A relational database under a closed-world semantics is to be translated into an RDF graph under an open-world semantics ◮ There is a need for a data exchange framework that can handle both databases with complete and incomplete information M. Arenas – Exchanging more than Complete Data - RR2011 26 / 68
  • 55. Data exchange in the RDF world An issue discussed at the W3C RDB2RDF Working Group: Is a mapping information preserving? ◮ In particular: For the default mapping defined by this group How can we address this issue? ◮ Metadata management can help us M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68
  • 56. Data exchange in the RDF world An issue discussed at the W3C RDB2RDF Working Group: Is a mapping information preserving? ◮ In particular: For the default mapping defined by this group How can we address this issue? ◮ Metadata management can help us Question to answer: Is a mapping invertible? M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68
  • 57. Data exchange in the RDF world An issue discussed at the W3C RDB2RDF Working Group: Is a mapping information preserving? ◮ In particular: For the default mapping defined by this group How can we address this issue? ◮ Metadata management can help us Question to answer: Is a mapping invertible? ◮ This time an RDF graph is to be translated into a relational database! M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68
  • 58. Data exchange in the RDF world An issue discussed at the W3C RDB2RDF Working Group: Is a mapping information preserving? ◮ In particular: For the default mapping defined by this group How can we address this issue? ◮ Metadata management can help us Question to answer: Is a mapping invertible? ◮ This time an RDF graph is to be translated into a relational database! ◮ We want to have a unifying framework for all these cases M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68
  • 59. But these are not the only reasons . . . Nowadays several applications use knowledge bases to represent data. ◮ A knowledge base has not only data but also rules that allows to infer new data ◮ In the Semantics Web: RDFS and OWL ontologies M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68
  • 60. But these are not the only reasons . . . Nowadays several applications use knowledge bases to represent data. ◮ A knowledge base has not only data but also rules that allows to infer new data ◮ In the Semantics Web: RDFS and OWL ontologies In a data exchange application over the Semantics Web: The input is a mapping and a source specification including data and rules, and the output is a target specification also including data and rules M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68
  • 61. But these are not the only reasons . . . Nowadays several applications use knowledge bases to represent data. ◮ A knowledge base has not only data but also rules that allows to infer new data ◮ In the Semantics Web: RDFS and OWL ontologies In a data exchange application over the Semantics Web: The input is a mapping and a source specification including data and rules, and the output is a target specification also including data and rules There is a need for a data exchange framework that can handle knowledge bases. M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68
  • 62. Knowledge exchange: A more general data exchange framework is needed Example Assume given the following source knowledge base: Data: Father Mother Andy Bob Carrie Bob Bob Danny Danny Eddie Rules: Father(x, y ) → Parent(x, y ) Mother(x, y ) → Parent(x, y ) Parent(x, y ) ∧ Parent(y , z) → Grandparent(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 29 / 68
  • 63. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Given a mapping: Father(x, y ) → F(x, y ) Grandparent(x, y ) → G(x, y ) What is a good translation of the initial knowledge base? M. Arenas – Exchanging more than Complete Data - RR2011 30 / 68
  • 64. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Given a mapping: Father(x, y ) → F(x, y ) Grandparent(x, y ) → G(x, y ) What is a good translation of the initial knowledge base? Data: F G Andy Bob Andy Danny Bob Danny Carrie Danny Danny Eddie Bob Eddie Rules: ∅ M. Arenas – Exchanging more than Complete Data - RR2011 30 / 68
  • 65. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: Father(x, y ) → Parent(x, y ) Mother(x, y ) → Parent(x, y ) Parent(x, y ) ∧ Parent(y , z) → Grandparent(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 66. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: Mother(x, y ) → Parent(x, y ) Parent(x, y ) ∧ Parent(y , z) → Grandparent(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 67. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: Parent(x, y ) ∧ Parent(y , z) → Grandparent(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 68. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 69. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) What data should we materialize? M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 70. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) What data should we materialize? F G Andy Bob Andy Danny Bob Danny Carrie Danny Danny Eddie Bob Eddie M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 71. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) What data should we materialize? F G Andy Bob Bob Danny Carrie Danny Danny Eddie M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 72. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) What data should we materialize? F G Andy Bob Bob Danny Carrie Danny Danny Eddie Is this a good translation? Why? M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 73. One can exchange more than complete data ◮ In data exchange one starts with a database instance (with complete information). ◮ What if we have an initial object that has several interpretations? ◮ A representation of a set of possible instances ◮ We propose a new general formalism to exchange representations of possible instances ◮ We apply it to the problems of exchanging instances with incomplete information and exchanging knowledge bases M. Arenas – Exchanging more than Complete Data - RR2011 32 / 68
  • 74. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 33 / 68
  • 75. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 34 / 68
  • 76. Representation systems A representation system R = (W, rep) consists of: ◮ a set W of representatives ◮ a function rep that assigns a set of instances to every element in W rep(V) = {I1 , I2 , I3 , . . .} for every V ∈ W Uniformity assumption: For every V ∈ W, there exists a relational schema S (the type of V) such that rep(V) ⊆ Inst(S) M. Arenas – Exchanging more than Complete Data - RR2011 35 / 68
  • 77. Representation systems A representation system R = (W, rep) consists of: ◮ a set W of representatives ◮ a function rep that assigns a set of instances to every element in W rep(V) = {I1 , I2 , I3 , . . .} for every V ∈ W Uniformity assumption: For every V ∈ W, there exists a relational schema S (the type of V) such that rep(V) ⊆ Inst(S) Incomplete instances and knowledge bases are representation systems M. Arenas – Exchanging more than Complete Data - RR2011 35 / 68
  • 78. In classical data exchange we consider only complete data Recall that given M = (S, T, Σ), I ∈ Inst(S) and J ∈ Inst(T): J is a solution for I under M if (I , J) |= Σ J ∈ SolM (I ) M. Arenas – Exchanging more than Complete Data - RR2011 36 / 68
  • 79. In classical data exchange we consider only complete data Recall that given M = (S, T, Σ), I ∈ Inst(S) and J ∈ Inst(T): J is a solution for I under M if (I , J) |= Σ J ∈ SolM (I ) This can be extended to set of instances. Given X ⊆ Inst(S): SolM (X ) = SolM (I ) I ∈X M. Arenas – Exchanging more than Complete Data - RR2011 36 / 68
  • 80. Extending the definition to representation systems Given: ◮ a mapping M = (S, T, Σ) ◮ a representation system R = (W, rep) ◮ U, V ∈ W of types S and T, respectively M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68
  • 81. Extending the definition to representation systems Given: ◮ a mapping M = (S, T, Σ) ◮ a representation system R = (W, rep) ◮ U, V ∈ W of types S and T, respectively Definition (APR11) V is an R-solution of U under M if rep(V) ⊆ SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68
  • 82. Extending the definition to representation systems Given: ◮ a mapping M = (S, T, Σ) ◮ a representation system R = (W, rep) ◮ U, V ∈ W of types S and T, respectively Definition (APR11) V is an R-solution of U under M if rep(V) ⊆ SolM (rep(U)) Or equivalently: V is an R-solution of U if for every J ∈ rep(V), there exists I ∈ rep(U) such that J ∈ SolM (I ). M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68
  • 83. Universal solutions What is a good solution in this framework? M. Arenas – Exchanging more than Complete Data - RR2011 38 / 68
  • 84. Universal solutions What is a good solution in this framework? Definition (APR11) V is an universal R-solution of U under M if rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 38 / 68
  • 85. Strong representation systems Let C be a class of mappings. M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 86. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M∈C and for every U ∈ W , there exists a V ∈W : rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 87. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M ∈ C from S to T, and for every U ∈ W , there exists a V ∈W : rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 88. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M ∈ C from S to T, and for every U ∈ W of type S, there exists a V ∈W : rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 89. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M ∈ C from S to T, and for every U ∈ W of type S, there exists a V ∈ W of type T: rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 90. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M ∈ C from S to T, and for every U ∈ W of type S, there exists a V ∈ W of type T: rep(V) = SolM (rep(U)) If R = (W, rep) is a strong representation system, then the universal solutions for the representatives in W can be represented in the same system. M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 91. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 40 / 68
  • 92. Motivating questions What is a strong representation system for the class of mappings specified by st-tgds? ◮ Are instances including nulls enough? Can the fundamental data exchange problems be solved in polynomial time in this setting? ◮ Computing (universal) solutions ◮ Computing certain answers M. Arenas – Exchanging more than Complete Data - RR2011 41 / 68
  • 93. Naive instances We have already considered naive instances: Instances with null values ◮ Example: Canonical universal solution A naive instance I has labeled nulls: R(1, n1 ) R(n1 , 2) R(1, n2 ) M. Arenas – Exchanging more than Complete Data - RR2011 42 / 68
  • 94. Naive instances We have already considered naive instances: Instances with null values ◮ Example: Canonical universal solution A naive instance I has labeled nulls: R(1, n1 ) R(n1 , 2) R(1, n2 ) The interpretations of I are constructed by replacing nulls by constants: rep(I) = {K | µ(I) ⊆ K for some valuation µ} M. Arenas – Exchanging more than Complete Data - RR2011 42 / 68
  • 95. Are naive instances expressive enough? Naive instances have been extensively used in data exchange: Proposition (FKMP03) Let M = (S, T, Σ), where Σ is a set of st-tgds. Then for every instance I of S, there exists a naive instance J of T such that: rep(J ) = SolM (I ) In fact, the canonical universal solution satisfies the property mentioned above. M. Arenas – Exchanging more than Complete Data - RR2011 43 / 68
  • 96. Are naive instances expressive enough? But naive instances are not expressive enough to deal with incomplete information in the source instances: Proposition (APR11) Naive instances are not a strong representation system for the class of mappings specified by st-tgds M. Arenas – Exchanging more than Complete Data - RR2011 44 / 68
  • 97. Are naive instances expressive enough? Example Consider a mapping M specified by: Manager(x, y ) → Reports(x, y ) Manager(x, x) → SelfManager(x) The canonical universal solution for I = {Manager(n, Peter)} under M: J = {Reports(n, Peter)} But J is not a good solution for I. ◮ It cannot represent the fact that if n is given value Peter, then SelfManager(Peter) should hold in the target. M. Arenas – Exchanging more than Complete Data - RR2011 45 / 68
  • 98. Conditional instances What should be added to naive instances to obtain a strong representation system? M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68
  • 99. Conditional instances What should be added to naive instances to obtain a strong representation system? ◮ Answer from database theory: Conditions on the nulls M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68
  • 100. Conditional instances What should be added to naive instances to obtain a strong representation system? ◮ Answer from database theory: Conditions on the nulls Conditional instances: Naive instances plus tuple conditions A tuple condition is a positive Boolean combinations of: ◮ equalities and inequalities between nulls, and between nulls and constants M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68
  • 101. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 102. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 103. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 104. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 R(1, 2) R(2, 2) M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 105. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 R(1, 2) R(1, 3) R(2, 2) M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 106. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 R(1, 2) R(1, 3) R(2, 2) R(2, 3) M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 107. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 R(1, 2) R(1, 3) R(2, 2) R(2, 3) Interpretations of a conditional instance I: rep(I) = {K | µ(I) ⊆ K for some valuation µ} M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 108. Positive conditional instances Many problems are intractable over conditional instances. ◮ We also consider a restricted class of conditional instances Positive conditional instances: Conditional instances without inequalities M. Arenas – Exchanging more than Complete Data - RR2011 48 / 68
  • 109. (Positive) conditional instances are enough Theorem (APR11) Both conditional instances and positive conditional instances are strong representation systems for the class of mappings specified by st-tgds. Example Consider again the mapping M specified by: Manager(x, y ) → Reports(x, y ) Manager(x, x) → SelfManager(x) The following is a universal solution for I = {Manager(n, Peter)} Reports(n, Peter) true SelfManager(Peter) n = Peter M. Arenas – Exchanging more than Complete Data - RR2011 49 / 68
  • 110. Positive conditional instances are exactly the needed representation system Positive conditional instances are minimal: Theorem (APR11) All the following are needed to obtain a strong representation system for the class of mappings specified by st-tgds: ◮ equalities between nulls ◮ equalities between constant and nulls ◮ conjunctions and disjunctions Conditional instances are enough but not minimal. M. Arenas – Exchanging more than Complete Data - RR2011 50 / 68
  • 111. Positive conditional instance can be used in practice! Let M = (S, T, Σ), where Σ is a set of st-tgds. M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68
  • 112. Positive conditional instance can be used in practice! Let M = (S, T, Σ), where Σ is a set of st-tgds. Theorem (APR11) There exists a polynomial time algorithm that, given a positive conditional instance I over S, computes a positive conditional instance J over T that is a universal solution for I under M. M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68
  • 113. Positive conditional instance can be used in practice! Let M = (S, T, Σ), where Σ is a set of st-tgds. Theorem (APR11) There exists a polynomial time algorithm that, given a positive conditional instance I over S, computes a positive conditional instance J over T that is a universal solution for I under M. Let Q be a union of conjunctive queries over T. Q(J ) = Q(J) J∈rep(J ) certainM (Q, I) = Q(J ) J is a solution for I under M M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68
  • 114. Positive conditional instance can be used in practice! Theorem (APR11) There exists a polynomial time algorithm that, given a positive conditional instance I over S, computes certainM (Q, I). M. Arenas – Exchanging more than Complete Data - RR2011 52 / 68
  • 115. Positive conditional instance can be used in practice! Theorem (APR11) There exists a polynomial time algorithm that, given a positive conditional instance I over S, computes certainM (Q, I). The same result holds for the class of unions of conjunctive queries with at most one inequality per disjunct. ◮ The other important class of queries in the data exchange area for which certain answers can be computed in polynomial time M. Arenas – Exchanging more than Complete Data - RR2011 52 / 68
  • 116. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 53 / 68
  • 117. The semantics of knowledge bases is given by sets of instances Knowledge base over S: (I , Γ) such that ◮ I ∈ Inst(S) ◮ Γ a set of rules over S Semantics: finite models Mod(I , Γ) = {K ∈ Inst(S) | I ⊆ K and K |= Γ} M. Arenas – Exchanging more than Complete Data - RR2011 54 / 68
  • 118. We can apply our formalism to knowledge bases (I2 , Γ2 ) is a KB-solution for (I1 , Γ1 ) under M if: Mod(I2 , Γ2 ) ⊆ SolM (Mod(I1 , Γ1 )) (I2 , Γ2 ) is a universal KB-solution for (I1 , Γ1 ) under M if: Mod(I2 , Γ2 ) = SolM (Mod(I1 , Γ1 )) M. Arenas – Exchanging more than Complete Data - RR2011 55 / 68
  • 119. Motivating questions Same as for the case of instances with incomplete information. ◮ Constructing universal KB-solutions ◮ Answering target queries New fundamental problem: Construct solutions including as much implicit knowledge as possible. M. Arenas – Exchanging more than Complete Data - RR2011 56 / 68
  • 120. What are good knowledge-base solutions? First alternative: universal KB-solutions But there exist some other KB-solutions desirable to materialize ◮ Minimality comes into play M. Arenas – Exchanging more than Complete Data - RR2011 57 / 68
  • 121. What are good knowledge-base solutions? First alternative: universal KB-solutions But there exist some other KB-solutions desirable to materialize ◮ Minimality comes into play Given sets X , Y of instances: ◮ X ≡min Y if X and Y coincide in the minimal instances under ⊆ Definition (I2 , Γ2 ) is a minimal KB-solution of (I1 , Γ1 ) under M if: Mod(I2 , Γ2 ) ≡min SolM (Mod(I1 , Γ1 )) M. Arenas – Exchanging more than Complete Data - RR2011 57 / 68
  • 122. Two requirements to construct minimal knowledge-base solutions Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution (I2 , Γ2 ) we would like: M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68
  • 123. Two requirements to construct minimal knowledge-base solutions Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution (I2 , Γ2 ) we would like: 1. Γ2 to only depend on Γ1 and M: Γ2 is safe for Γ1 and M M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68
  • 124. Two requirements to construct minimal knowledge-base solutions Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution (I2 , Γ2 ) we would like: 1. Γ2 to only depend on Γ1 and M: Γ2 is safe for Γ1 and M Definition Γ2 is safe for Γ1 and M, if for every I1 there exists I2 : (I2 , Γ2 ) is a minimal KB-solution of (I1 , Γ1 ) under M M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68
  • 125. Two requirements to construct minimal knowledge-base solutions 2. Γ2 to be as informative as possible (thus minimizing the size of I2 ): M. Arenas – Exchanging more than Complete Data - RR2011 59 / 68
  • 126. Two requirements to construct minimal knowledge-base solutions 2. Γ2 to be as informative as possible (thus minimizing the size of I2 ): Definition Γ2 is optimal-safe if for every other safe set Γ′ : Γ2 |= Γ′ M. Arenas – Exchanging more than Complete Data - RR2011 59 / 68
  • 127. Computing minimal KB-solutions To obtain algorithms for computing minimal KB-solutions, we need to specify the language used in knowledge bases. ◮ Full st-tgd: ∀¯∀¯ (ϕ(¯ , y ) → ψ(¯ )) x y x ¯ x M. Arenas – Exchanging more than Complete Data - RR2011 60 / 68
  • 128. Computing minimal KB-solutions To obtain algorithms for computing minimal KB-solutions, we need to specify the language used in knowledge bases. ◮ Full st-tgd: ∀¯∀¯ (ϕ(¯ , y ) → ψ(¯ )) x y x ¯ x Theorem (APR11) There exists a polynomial-time algorithm that, given M = (S, T, Σ), where Σ is a set of full st-tgds, and given a set Γ1 of full tgds over S, computes a set Γ2 of second-order logic sentences over T that is optimal-safe for Γ1 and M. M. Arenas – Exchanging more than Complete Data - RR2011 60 / 68
  • 129. Computing minimal KB-solutions Unfortunately, first-order logic is no expressive enough. Theorem (APR11) There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a set Γ1 of full tgds over S such that: no FO-sentence is optimal-safe for Γ1 and M. M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68
  • 130. Computing minimal KB-solutions Unfortunately, first-order logic is no expressive enough. Theorem (APR11) There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a set Γ1 of full tgds over S such that: no FO-sentence is optimal-safe for Γ1 and M. How can we deal with these problems in practice? M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68
  • 131. Computing minimal KB-solutions Unfortunately, first-order logic is no expressive enough. Theorem (APR11) There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a set Γ1 of full tgds over S such that: no FO-sentence is optimal-safe for Γ1 and M. How can we deal with these problems in practice? ◮ We need to restrict the language used to specify knowledge bases: Description logics [ABC11] M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68
  • 132. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 62 / 68
  • 133. We can exchange more than complete data We propose a general formalism to exchange representation systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases Next step: Apply our general setting to the Semantic Web ◮ Semantic Web data has nulls (blank nodes) ◮ Semantic Web specifications have rules (RDFS, OWL) Lots of interesting problems to solve if knowledge bases are specified by means of description logics. ◮ Better results can be obtained M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68
  • 134. We can exchange more than complete data We propose a general formalism to exchange representation systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases Next step: Apply our general setting to the Semantic Web ◮ Semantic Web data has nulls (blank nodes) ◮ Semantic Web specifications have rules (RDFS, OWL) Lots of interesting problems to solve if knowledge bases are specified by means of description logics. ◮ Better results can be obtained M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68
  • 135. We can exchange more than complete data We propose a general formalism to exchange representation systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases Next step: Apply our general setting to the Semantic Web ◮ Semantic Web data has nulls (blank nodes) ◮ Semantic Web specifications have rules (RDFS, OWL) Lots of interesting problems to solve if knowledge bases are specified by means of description logics. ◮ Better results can be obtained M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68
  • 136. Thank you! M. Arenas – Exchanging more than Complete Data - RR2011 64 / 68
  • 137. Bibliography [ABC11] M. Arenas, E. Botoeva, D. Calvanese. Knowledge Base Exchange. DL 2011. [APR11] M. Arenas, J. P´rez, J. Reutter. Data Exchange beyond Complete e Data. PODS 2011. [B03] P. A. Bernstein. Applying Model Management to Classical Meta Data Problems. CIDR 2003. [FKMP03] R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa. Data Exchange: Semantics and Query Answering. ICDT 2003. M. Arenas – Exchanging more than Complete Data - RR2011 65 / 68
  • 138. Bonus track: Computation of solutions and its associated decision problem Decision problem: Check-KB-Sol Input: M = (S, T, Σ), where Σ is a set of st-tgds (I1 , Γ1 ) KB over S with Γ1 a set of tgds (I2 , Γ2 ) KB over T with Γ2 a set of tgds Output: Is (I2 , Γ2 ) a KB-solution of (I1 , Γ1 ) under M? M. Arenas – Exchanging more than Complete Data - RR2011 66 / 68
  • 139. Bonus track: Computation of solutions and its associated decision problem Decision problem: Check-KB-Sol Input: M = (S, T, Σ), where Σ is a set of st-tgds (I1 , Γ1 ) KB over S with Γ1 a set of tgds (I2 , Γ2 ) KB over T with Γ2 a set of tgds Output: Is (I2 , Γ2 ) a KB-solution of (I1 , Γ1 ) under M? Theorem (APR11) Check-KB-Sol is undecidable (even for a fixed M). M. Arenas – Exchanging more than Complete Data - RR2011 66 / 68
  • 140. Bonus track: Computation of solutions and its associated decision problem Undecidability is a consequence of using ∃ in knowledge bases. ◮ We need to restrict the input Check-Full-KB-Sol: Γ1 , Γ2 are assumed to be sets of full tgds M. Arenas – Exchanging more than Complete Data - RR2011 67 / 68
  • 141. Bonus track: Computation of solutions and its associated decision problem Theorem (APR11) Check-Full-KB-Sol is EXPTIME-complete. M. Arenas – Exchanging more than Complete Data - RR2011 68 / 68
  • 142. Bonus track: Computation of solutions and its associated decision problem Theorem (APR11) Check-Full-KB-Sol is EXPTIME-complete. Theorem (APR11) If M = (S, T, Σ) is fixed: Check-Full-KB-Sol is ∆P [O(log n)]-complete. 2 ∆P [O(log n)]: 2 P NP with a logarithmic number of calls to the NP oracle. M. Arenas – Exchanging more than Complete Data - RR2011 68 / 68