Semantic RDF based integration framework for heterogeneous XML data sources

825 views

Published on

An approach brings a semantic RDF-based solution to XML integration problem...

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
825
On SlideShare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Semantic RDF based integration framework for heterogeneous XML data sources

  1. 1. “ Semantic RDF Based Integration Framework for Heterogeneous XML Data Sources ” Deniz KILINÇ [email_address]
  2. 2. Agenda <ul><li>Introduction </li></ul><ul><li>Integration </li></ul><ul><li>Related Works </li></ul><ul><li>Description of Method </li></ul><ul><li>A Real Life Scenario </li></ul><ul><li>Scenario Design Using Our Approach </li></ul><ul><li>Conclusion </li></ul><ul><li>References </li></ul>
  3. 3. Introduction (I) <ul><li>Different database servers like MSSQL, Oracle and DB2-400 can store and manipulate data </li></ul><ul><li>When data is spread across various databases then there is some integration to perform. </li></ul><ul><li>For integration, a common standard is required. </li></ul><ul><li>World Wide Web Consortium (W3C) announced Extensible Markup Language (XML) technology as a universal data exchange format </li></ul>
  4. 4. Introduction (II) <ul><li>XML </li></ul><ul><ul><li>easy to learn, implement, read, and test </li></ul></ul><ul><ul><li>effective, portable and easily customized data format that can easily sent over through any protocol </li></ul></ul><ul><li>As a point of Business-to Business (B2B) view XML is a shortened product development time for most XML-related and data exchange. </li></ul><ul><li>In B2B, if you exchange data , you are probably using some form of Electronic Data Interchange (EDI) which is costly. </li></ul>
  5. 5. Integration (I) <ul><li>Difficulty of integration depends on being homogenous or heterogeneous of XML data. </li></ul><ul><li>Integrating data which is homogenous is simple. </li></ul><ul><ul><li>E ach document’s schema matches each other </li></ul></ul><ul><li>I n real world, different XML source elements and attributes may have different names, orders and types in their schemas, because of changes in requirements. </li></ul><ul><li>T he integration will not be as simple as like in homogenous data integration. </li></ul>
  6. 6. Integration (II) <ul><li>To solve integration problem , we offer </li></ul><ul><ul><li>To construct Semantic RDF (Resource Description Framework) based “ regular expressions ” which define the whole structure semantically instead of using XPath (XML Path Language) and XSLT (Extensible Stylesheet Language) </li></ul></ul><ul><ul><li>easier and more effective compared to other works in use because it is related logical vocabulary instead of interfering XML document directly . </li></ul></ul>
  7. 7. Related Work (I) <ul><li>Researchers try to solve integration problem, using a predefined global schema for the heterogeneous XML data sources which have their own local schemas. </li></ul><ul><li>Their solutions require much complicated rules for mapping from local schemas to agree upon a global schema. </li></ul><ul><li>Mappings are really too complex for IT administrators and they have to know XPath and XSLT in detail. </li></ul>
  8. 8. Related Work (II) <ul><li>Mappings may not be one-to-one match from local schema to global-one, because an element in local schema may not exist in global-one, vice-versa. </li></ul><ul><li>They solve this problem by adding empty XML elements into local data sources that changes the original structure. </li></ul><ul><li>Hence too much redundant storage space is spent. </li></ul>
  9. 9. Description of Method (I) <ul><li>Our approach brings a semantic RDF-based solution to XML integration problem </li></ul><ul><li>RDF is acronym of Resource Description Framework </li></ul><ul><li>An RDF document abstracts the details of document by presenting a semantic layer </li></ul><ul><li>An RDF document has three parts </li></ul><ul><ul><li>identifier </li></ul></ul><ul><ul><li>property </li></ul></ul><ul><ul><li>of property </li></ul></ul>
  10. 10. Description of Method (II) <ul><li><rdf: RDF xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;> </li></ul><ul><li><rdf: Description about=&quot; http://www.cs.deu.edu.tr &quot;> </li></ul><ul><li>< Author > Sibel KILINÇ </ Author > </li></ul><ul><li></rdf: Description> </li></ul><ul><li></rdf: RDF> </li></ul><ul><li>Identifier , “http://www.cs.deu.edu.tr” </li></ul><ul><li>property, “Author” </li></ul><ul><li>object which is the value of property, “Sibel KILINÇ” </li></ul>
  11. 11. Description of Method (III) <ul><li>Sample Regular Expression </li></ul><ul><li>STOCK-NAME (xS, xN) </li></ul><ul><li>:= </li></ul><ul><li>Source/STOCK xT, xT/ID xS, xT/NAME xN </li></ul><ul><li>Source is the name of local schema </li></ul><ul><li>Each STOCK element of Source is assigned to a temporal variable, xT. </li></ul><ul><li>Each xT has two properties named ID and NAME. </li></ul><ul><li>xS is an identifier and holds the ID of STOCK. </li></ul><ul><li>xN is the name of property and value of xN holds NAME of STOCK </li></ul>
  12. 12. Description of Method (IV) <ul><li>Semantic RDF-based Integration Framework has two major components; </li></ul><ul><ul><li>Regular Expression Generator Tool (REGT) </li></ul></ul><ul><ul><li>Integrator Tool Box (ITB) </li></ul></ul>
  13. 13. Semantic RDF-based Integration Framework
  14. 14. Description of Method (V) <ul><li>Regular Expression Generator Tool (REGT) , </li></ul><ul><ul><li>takes N local XML data sources and Global Semantic Vocabulary that consists of common XML elements in all local XML data sources with an identifier and a property like in RDF. The left part of “STOCK-NAME” description is an example element of vocabulary. </li></ul></ul><ul><ul><li>produces N Regular Expression Sets with respect to local data sources. The right part “STOCK-NAME” description is an example output of REGT. The upper part of framework presents the processes done at local. </li></ul></ul>
  15. 15. Description of Method (VI) <ul><li>Integrator Tool Box (ITB) </li></ul><ul><ul><li>combines right sides of the regular expression set by the guide of vocabulary elements using some iteration expressions. </li></ul></ul><ul><ul><li>Finally, Global XML Data source has produced without changing the structure of local XML data sources, unlike related works. </li></ul></ul>
  16. 16. A Real Life Scenario (I) <ul><li>Scenario </li></ul><ul><ul><li>t here is a corporation which has a branch office and a center office. </li></ul></ul><ul><ul><li>c enter office is in Ankara </li></ul></ul><ul><ul><li>branch office is in İzmir </li></ul></ul><ul><ul><li>e ach office holds different XML schemas for daily orders </li></ul></ul><ul><ul><li>manager of the corporation wants to see some audit reports, so that all orders must be brought together </li></ul></ul><ul><ul><li>a t this point, an integration process is needed </li></ul></ul>
  17. 17. A Real Life Scenario (II) <ul><li>General solution steps to XML integration problem </li></ul><ul><ul><li>A global definition must be agreed on. </li></ul></ul><ul><ul><li>For each source, different mappings must be used to transform local schemas into global-one. </li></ul></ul><ul><ul><li>All transformed sources must be joined. </li></ul></ul><ul><li>Related Works in use are concentrated on XML based technologies. </li></ul><ul><ul><li>A global XML Schema is defined. </li></ul></ul><ul><ul><li>XPath or XQuery mappings are used for transformation step by changing the structure of original local schemas. </li></ul></ul><ul><ul><li>Complex and redundant XSL-Transformations are use for joining step. </li></ul></ul>
  18. 18. A Real Life Scenario (III) <ul><li>Although integration problem is solved by using these steps, new two problems arise. </li></ul><ul><ul><li>First, since XML elements in local data sources may not exist in global-one, matching becomes a problem. Related Works solve this problem by adding empty XML elements into local data sources that changes the original structure. As a result too much redundant storage space is spent. </li></ul></ul><ul><ul><li>Second, to apply transformation and joining steps, IT administrators of each branch must know complex XPath mappings and XSLT transformations. </li></ul></ul>
  19. 19. A Real Life Scenario (IV) <ul><li>In our framework, these problems are solved. </li></ul><ul><ul><li>A common vocabulary is agreed on. </li></ul></ul><ul><ul><li>Instead of adding new elements to schema structures, RDF-based regular expressions are generated from local data sources by using REGT that facilitates the transformation process instead of formulating the complex XPath queries. </li></ul></ul><ul><li>Regular expressions are integrated and processed by using ITB that facilitates the joining process . Thanks to ITB and REGT, IT administrators do not need to know XSLT and XPath technology. </li></ul>
  20. 20. Scenario Design Using Our Approach (I) <ul><li>The predicate rules used in this scenario are; </li></ul><ul><ul><li>STOCK-ORDER(xS,xO), </li></ul></ul><ul><ul><li>STOCK-NAME(xS,xN), </li></ul></ul><ul><ul><li>STOCK-QUANTITY(xS,xQ), </li></ul></ul><ul><ul><li>ORDER-DATE(xO,xD) </li></ul></ul><ul><ul><li>ORDER-CUST(xO,xC) </li></ul></ul><ul><li>These predicate rules are the elements of common vocabulary which is also known as global definition. </li></ul>
  21. 21. Scenario Design Using Our Approach (II) Local schema and a sample XML document of center office, Ankara <schema> <element name=&quot;ORDERLIST&quot;> <complexType> <sequence> <elemen tname=&quot;ORDER“ type=&quot;typeOrder&quot;/> </sequence> </complexType> <complexType name=&quot;typeOrder&quot;> <element name=&quot;NUMBER&quot; type=&quot;string&quot;/> <element name=&quot;DATE&quot; type=&quot;DateTime&quot;/> <element name=&quot;CUSTID“ type=&quot;string&quot;/> <element name=&quot;STOCK&quot; type=&quot;typeStock&quot;/> </complexType> <complexType name=&quot;typeStock&quot;> <element name=&quot;ID&quot; type=&quot;string&quot;/> <element name=&quot;NAME&quot; type=&quot;string&quot;/> <element name=&quot;QUANTITY&quot; type=&quot;integer&quot;/> </complexType> </element> </schema> <ORDERLIST> <ORDER> <NUMBER>AN001</NUMBER> <DATE>01/03/2004</DATE> <CUSTID>0001</CUSTID> <STOCK> <ID>S0001</ID> <NAME>CANON</NAME> <QUANTITY> 3</QUANTITY> </STOCK> </ORDER> <ORDER> <NUMBER>AN002</NUMBER> <DATE>01/03/2004</DATE> <CUSTID>0002</CUSTID> <STOCK> <ID>S0005</ID> <NAME>PANASONIC</NAME> <QUANTITY>10</QUANTITY> </STOCK> </ORDER> </ORDERLIST>
  22. 22. Scenario Design Using Our Approach (III) <ul><li>REGT takes common semantic vocabulary and local schema of Ankara as input and produces the following regular expression set 1. </li></ul><ul><ul><li>* STOCK-ORDER(xS,xO) ::= </li></ul></ul><ul><ul><li>Ankara/ORDERLIST/ORDER xT, xT/STOCK/ID xS, xT/NUMBER xO </li></ul></ul><ul><ul><li>* STOCK-NAME(xS,xN) ::= </li></ul></ul><ul><ul><li>Ankara/ORDERLIST/ORDER/STOCK xT, xT/ID xS, xT/NAME xN </li></ul></ul><ul><ul><li>* STOCK-QUANTITY(xS,xQ) ::= </li></ul></ul><ul><ul><li>Ankara/ORDERLIST/ORDER/STOCK xT, xT/ID xS, xT/QUANTITY xQ </li></ul></ul><ul><ul><li>* ORDER-DATE (xO,xD)::= </li></ul></ul><ul><ul><li>Ankara/ORDERLIST/ORDER xT, xT/NUMBER xO, xT/DATE xD </li></ul></ul><ul><ul><li>* ORDER-CUST(xO,xC)::= </li></ul></ul><ul><ul><li>Ankara/ORDERLIST/ORDER xT, xT/NUMBER xO, xT/CUSTID xC </li></ul></ul>
  23. 23. Scenario Design Using Our Approach (IV) Local schema and a sample XML document of branch office, İzmir <schema> <element name=&quot;ORDLIST&quot;> <complexType> <sequence> <element name=&quot;ORD&quot; type=&quot;typeOrd&quot;/> </sequence> </complexType> <complexType name=&quot;typeOrd&quot;> <element name=&quot;ORDID&quot; type=&quot;string&quot;/> <element name=&quot;CUSTID&quot; type=&quot;string&quot;/> <element name=&quot;DATE&quot; type=&quot;DateTime&quot;/> <element name=&quot;STKLIST&quot; type=&quot;typeStList&quot;/> </complexType> <complexType name=&quot;typeStList&quot;> <element name=&quot;STK&quot; type=&quot;typeSt&quot;/> </complexType> <complexType name=&quot;typeSt&quot;> <element name=&quot;ID&quot; type=&quot;string&quot;/> <element name=&quot;QUANT&quot; type=&quot;integer&quot;/> <element name=&quot;NAME&quot; type=&quot;string&quot;/> </complexType></element></schema> <ORDLIST> <ORD> <ORDID>IZ001</ORDID> <CUSTID>0005</CUSTID> <DATE>01/03/2004</DATE> <STKLIST> <STK> <ID>S0006</ID> <QUANT>2</QUANT> <NAME>NIKON</NAME> </STK> </STKLIST> </ORD> <ORD> <ORDID>IZ005</ORDID> <CUSTID>0008</CUSTID> <DATE>01/03/2004</DATE> <STKLIST> <STK><ID>S0005</ID> <QUANT>5</QUANT> <NAME>PANASONIC</NAME> </STK> </STKLIST> </ORD></ORDERLIST>
  24. 24. Scenario Design Using Our Approach (V) <ul><li>REGT takes common semantic vocabulary and local schema of İzmir as input and produces the following regular expression set 2. </li></ul><ul><ul><li>* STOCK-ORDER(xS,xO) ::= </li></ul></ul><ul><ul><li>İzmir/ORDLIST/ORD xT, xT/STKLIST/STK/ID xS, xT/ORDID xO </li></ul></ul><ul><ul><li>* STOCK-NAME(xS,xN) ::= </li></ul></ul><ul><ul><li>İzmir /ORDLIST/ORD/STKLIST/STK xT, xT/ID xS, xT/NAME xN </li></ul></ul><ul><ul><li>* STOCK-QUANTITY(xS,xQ) ::= </li></ul></ul><ul><ul><li>İzmir/ORDLIST/ORD/STKLIST/STK xT, xT/ID xS, xT/QUANT xQ </li></ul></ul><ul><ul><li>* ORDER-DATE(xO,xD)::= </li></ul></ul><ul><ul><li>İzmir /ORDLIST/ORD xT, xT /ORDID xO, xT/DATE xD </li></ul></ul><ul><ul><li>* ORDER-CUST(xO,xC)::= </li></ul></ul><ul><ul><li>İzmir /ORDLIST/ORD xT, xT/ORDID xO, xT/CUSTID xC </li></ul></ul>
  25. 25. Scenario Design Using Our Approach (VI) <ul><li>Regular expression set 1 and 2 have been prepared to be joined and processed by ITB. </li></ul><ul><li>Manager wants to report the orders for ‘PANASONIC’ stock in each office. </li></ul><ul><li>In report stock name, quantity and order id will be displayed. </li></ul><ul><li>Using ITB, the following code section is produced and run. </li></ul>
  26. 26. Scenario Design Using Our Approach (VII) OutXML(<ORDERLIST>); For $xO in each (STOCK-ORDER(xS,xO)) If ($xS = STOCK-NAME(xS,xN)) And ($xN = ‘PANASONIC’) And ($xS = STOCK-QUANTITY(xS,xQ)) Then OutXML (<ORDEREDSTOCK> <ORDID>$xO</ORDID> <NAME>$xN</NAME> <QUANT>$xQ</QUANT> </ORDEREDSTOCK>); End If Next OutXML(</ORDERLIST>); <ORDERLIST> <ORDEREDSTOCK> <ORDID>AN002</ORDID> <NAME> PANASONIC </NAME> <QUANT>10</QUANT> </ORDEREDSTOCK> <ORDEREDSTOCK> <ORDID>IZ005</ORDID> <NAME> PANASONIC </NAME> <QUANT>5</QUANT> </ORDEREDSTOCK> </ORDERLIST>
  27. 27. Conclusion <ul><li>A new approach to integrate XML data sources which have different structure is presented </li></ul><ul><li>New tools for schema transformation and regular expression integration are also suggested </li></ul><ul><li>Has advantage of RDF based logical semantic layer with preserving original data sources during integration process </li></ul><ul><li>This system guarantees that the global data source includes all the elements and attributes which have defined as a parameter in global semantic vocabulary. This vocabulary can be changed according to requests of corporation. </li></ul>
  28. 28. References (I) <ul><li>Bray, T., Paoli, J., Spenger-McQueen, C.M.: Extensible Markup Language (XML) 1.0. W3C Recommendation, February 1998 </li></ul><ul><li>Clarke, J., XSL Transformations (XSLT) version 1.0. W3C Recommendation, November 1999 </li></ul><ul><li>Halevy, A., Etzioni, O., Doan, A., Ives, Z., Madhavan, J., McDowell, L., Tatarinov, I.: Crossing the Structure Chasm. In CIDT’03 (2003) </li></ul><ul><li>Harold, E.R., XML Bible, IDG Books Worldwide Inc., United States of America, 1999 </li></ul><ul><li>Kılınc, D., Kut, A.: Xml Teknolojisine Gerçekçi Yaklaşım. Inet-tr’09 (2003) </li></ul>
  29. 29. References (II) <ul><li>Kılınc, D., Kut, A.: XSL Dönüşümleri ve Biçimleme. AB’03 (2004) </li></ul><ul><li>Popa, L., Velegrakis, Y., Miller, R.J., Hernandez, M.A., Fagin, R. : Translating Web Data. Proceedings of the 28th VLDB Conference (2002) 598-609 </li></ul><ul><li>XML schema. Available at http://www.w3.org/XML/Schema . </li></ul><ul><li>XPath 2.0. W3C Working Draft, November 2002 </li></ul><ul><li>XQuery 1.0: An XML Query Language. W3C Working Draft, November 2002 </li></ul>
  30. 30. <ul><li><TheEnd> </li></ul><ul><li><Thanks>ToAll</Thanks> </li></ul><ul><li></TheEnd> </li></ul>

×