Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Survey of Approaches to Automatic Schema Matching<br />Erhard Rahm<br />Philip A. Bernstein<br />VLDB  2001<br />1<br />
Introduction<br />Schema means representation of data.<br />Schema matching is a basic problem in many database applicatio...
Match<br />Match, which takes two schemas as input and produces a mapping between elements of the two schemas that corresp...
Mapping(cont.)<br />A mapping element  Cust.C# to Customer.CustID Expression =>“Cust.C# = Customer.CustID”.<br />Concatena...
Application Domains<br />Schema integration.<br />Data warehouses.<br />E-commerce.<br />Semantic query processing.<br />5...
Architecture for Generic Match(cont.)<br />6<br />
Classification of Schema Matching Approaches Overview<br />7<br />
Classification of Schema Matching Approaches<br />For individual matchers, we consider the following largely-orthogonal cl...
Classification of Schema Matching Approaches(cont.)<br />      3. Language vs constraint: -linguistic-based approach based...
Classification of Schema Matching Approaches Overview<br />10<br />
Schema-Level Matchers<br />Only consider schema information, such as -Name.-Description.-Data type.-Relationship types (pa...
Classification of Schema Matching Approaches Overview<br />12<br />
Granularity of Match<br />Element-levelvsStructure-level.<br />Element-level: -match elements at the atomic level, such as...
Match Cardinality<br />14<br />
Classification of Schema Matching Approaches Overview<br />15<br />
Linguistic Approaches<br />Language-based or linguistic matchers use names and text to find semantically similar schema el...
Name Matching<br />Name-based matching matches schema elements with equal or similar names. <br />Similarity of names can ...
Name Matching (cont.)<br />4. Equality of hypernyms.book is-a publication and article is-a publication imply <br />book∼pu...
Description Matching<br />Description are used to express the intended semantics of schema elements.eg:    S1: empn // emp...
Classification of Schema Matching Approaches Overview<br />20<br />
Constraint-based Approaches<br />If input schemas contain such information, it can be used by a matcher to determine the s...
Constraint-based Approaches(cont.)<br />Type and key information suggest that Born matches Birthdate and Pnomatches either...
Auxiliary Information<br />Auxiliary Information:1.Dictionaries.2.Thesauri.3.User-provided information .can improve our ma...
Reusing Schema and Mapping Information(cont.)<br />24<br />
Instance-Level Approaches<br />Instance-level has two approaches:1. To enhance the effectiveness of schema-     level matc...
Instance-Level Approaches(cont.)<br />DeptName is a better match candidate for Dept than EmpName.<br />Take EmpNo, DeptNoa...
Combining Different Matchers<br />A matcher that uses just one approach is unlikely to achieve as many good match candidat...
Sample Approaches From the Literature<br />LSD.<br />SKAT.<br />TransScm.<br />ARTEMIS.<br />28<br />
Learning Source Descriptions(LSD)<br />.<br />29<br />
 Semantic Knowledge Articulation Tool(SKAT) <br />A rule-based approach to semi-automatically determine matches between sc...
TransScm<br />Input schemas are transformed into labeled graphs.<br />Edges in the schema graphs represent component relat...
ARTEMIS<br />It first computes “affinities” in the range 0 to 1 between attributes.1.Name affinity.2.Data Type affinity.3.St...
Characteristics of Proposed Schema Match Approaches<br />33<br />
Characteristics of Proposed Schema Match Approaches(cont.)<br />34<br />
Characteristics of Proposed Schema Match Approaches(cont.)<br />35<br />
Characteristics of Proposed Schema Match Approaches(cont.)<br />36<br />
Conclusion<br />We used the taxonomy to characterize and compare a variety of previous match implementations.<br />We hope...
Upcoming SlideShare
Loading in …5
×

20100810

665 views

Published on

  • Be the first to comment

20100810

  1. 1. A Survey of Approaches to Automatic Schema Matching<br />Erhard Rahm<br />Philip A. Bernstein<br />VLDB  2001<br />1<br />
  2. 2. Introduction<br />Schema means representation of data.<br />Schema matching is a basic problem in many database application domains.<br />We present a taxonomy that covers many of these existing approaches.<br />2<br />
  3. 3. Match<br />Match, which takes two schemas as input and produces a mapping between elements of the two schemas that correspond semantically to each other.<br />3<br />
  4. 4. Mapping(cont.)<br />A mapping element Cust.C# to Customer.CustID Expression =>“Cust.C# = Customer.CustID”.<br />Concatenate(Cust.FirstName, Cust.LastName) = Customer.Contact”<br />4<br />
  5. 5. Application Domains<br />Schema integration.<br />Data warehouses.<br />E-commerce.<br />Semantic query processing.<br />5<br />
  6. 6. Architecture for Generic Match(cont.)<br />6<br />
  7. 7. Classification of Schema Matching Approaches Overview<br />7<br />
  8. 8. Classification of Schema Matching Approaches<br />For individual matchers, we consider the following largely-orthogonal classification criteria:1. Instance vs schema: matching material are from instance or schema.2. Element vs structure:match for individual schema elements, such as attributes, or for combinations of elements, such as complex schema structures.<br />8<br />
  9. 9. Classification of Schema Matching Approaches(cont.)<br /> 3. Language vs constraint: -linguistic-based approach based on names and textual descriptions <br /> -constraint-based approach based on keys and relationships. 4. Matching cardinality:each mapping element may interrelate one or more elements of the two schemas. 5. Auxiliary information: such as dictionaries, global schemas, previous matching decisions, and user input.<br />9<br />
  10. 10. Classification of Schema Matching Approaches Overview<br />10<br />
  11. 11. Schema-Level Matchers<br />Only consider schema information, such as -Name.-Description.-Data type.-Relationship types (part-of, is-a, etc.).-Constraints.-Schema structure.<br />11<br />
  12. 12. Classification of Schema Matching Approaches Overview<br />12<br />
  13. 13. Granularity of Match<br />Element-levelvsStructure-level.<br />Element-level: -match elements at the atomic level, such as attributes in an XML schema.<br />Structure-level: -matching combinations of elements that appear together in a structure.<br />13<br />
  14. 14. Match Cardinality<br />14<br />
  15. 15. Classification of Schema Matching Approaches Overview<br />15<br />
  16. 16. Linguistic Approaches<br />Language-based or linguistic matchers use names and text to find semantically similar schema elements.<br />We discuss two schema-level approaches -Name matching. -Description matching.<br />16<br />
  17. 17. Name Matching<br />Name-based matching matches schema elements with equal or similar names. <br />Similarity of names can be defined and measured in various ways:1. Equality of names. - Homonyms ex: “line” of business vs “line” of order.2. Equality of canonical name.CName -> customer name.EmpNO ->employee number.3. Equality of synonyms.car ∼ automobile. mark ∼ brand.<br />17<br />
  18. 18. Name Matching (cont.)<br />4. Equality of hypernyms.book is-a publication and article is-a publication imply <br />book∼publication, article∼publication, and book∼article. 5. Similarity of names based pronunciation.<br />ShipTo ∼ = Ship2 .6. User-provided name matches.<br />reportsTo ∼ manager. issue ∼ bug.<br />18<br />
  19. 19. Description Matching<br />Description are used to express the intended semantics of schema elements.eg: S1: empn // employee name.<br /> S2: name // name of employee.<br />19<br />
  20. 20. Classification of Schema Matching Approaches Overview<br />20<br />
  21. 21. Constraint-based Approaches<br />If input schemas contain such information, it can be used by a matcher to determine the similarity of schema elements.<br />Schemas often contain constraints to define-data types.-value ranges.-uniqueness.-optionality.-relationship types and so on.<br />21<br />
  22. 22. Constraint-based Approaches(cont.)<br />Type and key information suggest that Born matches Birthdate and Pnomatches either EmpNo or DeptNo. <br />22<br />
  23. 23. Auxiliary Information<br />Auxiliary Information:1.Dictionaries.2.Thesauri.3.User-provided information .can improve our matching process.<br />Reuse the matched schemas.<br />23<br />
  24. 24. Reusing Schema and Mapping Information(cont.)<br />24<br />
  25. 25. Instance-Level Approaches<br />Instance-level has two approaches:1. To enhance the effectiveness of schema- level matching. 2. To perform instance-level matching on its own.<br />Most of the approaches discussed previously for schema-level matching can be applied to instance-level matching.<br />25<br />
  26. 26. Instance-Level Approaches(cont.)<br />DeptName is a better match candidate for Dept than EmpName.<br />Take EmpNo, DeptNoandPno as example. Based on similar value ranges ,we match Pnoto EmpNo rather than DeptNo.<br />26<br />
  27. 27. Combining Different Matchers<br />A matcher that uses just one approach is unlikely to achieve as many good match candidates as one that combines several approaches.<br />Combination can be done in two ways:1. Hybrid matcher. - integrates multiple matching criteria .2. Composite matchers.- combine the results of independently executed matchers.<br />27<br />
  28. 28. Sample Approaches From the Literature<br />LSD.<br />SKAT.<br />TransScm.<br />ARTEMIS.<br />28<br />
  29. 29. Learning Source Descriptions(LSD)<br />.<br />29<br />
  30. 30. Semantic Knowledge Articulation Tool(SKAT) <br />A rule-based approach to semi-automatically determine matches between schemas.<br />Rules are formulated in first-order logic to express match and mismatch relationships<br />The user has to initially provide match and mismatch relationships then approve or reject generated matches.<br />Schemas are transformed into a graph-based object-oriented database model.<br />30<br />
  31. 31. TransScm<br />Input schemas are transformed into labeled graphs.<br />Edges in the schema graphs represent component relationships.<br />The matching is performed node by node (element-level, 1:1)<br />There are several matchers which are checked in a fixed order.<br />If no match is found or if a matcher determines multiple match candidates, user intervention is required.(provide a rule or select a match candidate. )<br />31<br />
  32. 32. ARTEMIS<br />It first computes “affinities” in the range 0 to 1 between attributes.1.Name affinity.2.Data Type affinity.3.Struct affinity.<br />Then completes the schema integration by clustering attributes based on those affinities and then constructing views based on the clusters.<br />32<br />
  33. 33. Characteristics of Proposed Schema Match Approaches<br />33<br />
  34. 34. Characteristics of Proposed Schema Match Approaches(cont.)<br />34<br />
  35. 35. Characteristics of Proposed Schema Match Approaches(cont.)<br />35<br />
  36. 36. Characteristics of Proposed Schema Match Approaches(cont.)<br />36<br />
  37. 37. Conclusion<br />We used the taxonomy to characterize and compare a variety of previous match implementations.<br />We hope that the taxonomy will be useful to programmers who need to implement a match algorithm.<br />37<br />

×