10/29

223
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
223
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

10/29

  1. 1. Views and Storing XML in Relational Databases Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems Most slide content courtesy of Zack Ives.
  2. 2. What are views? <ul><li>We frequently want to reference data in a way that differs from the way it’s stored </li></ul><ul><ul><li>XML data  HTML, text, etc. </li></ul></ul><ul><ul><li>Relational data  XML data </li></ul></ul><ul><ul><li>Relational data  Different relational representation </li></ul></ul><ul><ul><li>XML data  Different XML representation </li></ul></ul><ul><li>Generally, these can all be thought of as different views over the data </li></ul><ul><ul><li>A view is a named query </li></ul></ul><ul><ul><li>It is the DB analog of methods with arguments in PL </li></ul></ul><ul><ul><li>We will start with views in the same model (relational, XML), and then address moving XML data to a relational format. </li></ul></ul>
  3. 3. Views in SQL <ul><li>A view is first created (CREATE VIEW). </li></ul><ul><li>We then use the name of the view to invoke the query (treating it as if it were the relation it returns) </li></ul>Creating a view: CREATE VIEW V(A,B,C) AS SELECT A,B,C FROM R WHERE R.A = “123”; This expands in the query processor to: SELECT * FROM (SELECT A,B,C FROM R WHERE R.A = “123”) AS V , R WHERE V .B = 5 AND V .C = R.C Using the view: SELECT * FROM V , R WHERE V .B=5 AND V .C=R.C
  4. 4. Views in XQuery <ul><li>In XQuery, views are defined as functions. </li></ul>Creating a view: declare function V() as element(content)* { for $r in doc(“R”)/root/tree, $a in $r/a, $b in $r/b, $c in $r/c where $a = “123” return <content>{$a, $b, $c}</content> } Using a view: for $v in V()/content, $r in doc(“R”)/root/tree where $v/b = $r/b return $v
  5. 5. Why are views useful? <ul><li>In addition to describing transformations from one schema to another, views have several common uses: </li></ul><ul><li>Providing security/access control </li></ul><ul><ul><li>We can assign users permissions on different views </li></ul></ul><ul><ul><li>Can select rows or project out columns so we only reveal what we want! </li></ul></ul><ul><li>Can be used as relations in other queries </li></ul><ul><ul><li>Allows the user to query things that make more sense </li></ul></ul>
  6. 6. Virtual vs. Materialized Views <ul><li>A virtual view is a named query that is actually re-computed every time – as shown on the previous slide </li></ul><ul><ul><li>CREATE VIEW V(A,B,C) AS </li></ul></ul><ul><li>SELECT A,B,C FROM R WHERE R.A = “123” </li></ul><ul><li>A materialized view is one that is computed once and its results are stored as a table </li></ul><ul><ul><li>Think of this as a cached answer </li></ul></ul><ul><ul><li>These are incredibly useful! </li></ul></ul><ul><ul><li>Techniques exist for using materialized views to answer other queries </li></ul></ul>SELECT * FROM V , R WHERE V .B = 5 AND V .C = R.C
  7. 7. Example <ul><li>Suppose V is defined as “ select * from R where R.A= “123 ”. If R is initially: </li></ul><ul><li>Then the result of evaluating “select * from V” is </li></ul><ul><li>Suppose we delete (123, 8, World) from R. Then the result of “ select * from V ” should be </li></ul>A B C 123 5 Atlas 456 10 Guide 123 8 World A B C 123 5 Atlas 123 8 World A B C 123 5 Atlas
  8. 8. Views Should Stay Fresh <ul><li>Views (sometimes called intensional relations ) behave, from the perspective of a query language, exactly like base relations ( extensional relations) </li></ul><ul><li>But there’s an association that should be maintained: </li></ul><ul><ul><li>If tuples change in the base relation, they should change in the view (whether it’s materialized or not) </li></ul></ul><ul><ul><li>If tuples change in the view, that should reflect in the base relation(s) </li></ul></ul>
  9. 9. View Maintenance and the View Update Problem <ul><li>There exist algorithms to incrementally recompute a materialized view when the base relations change </li></ul><ul><li>We can try to propagate view changes to the base relations </li></ul><ul><ul><li>This is not hard for the previous example. </li></ul></ul><ul><ul><li>However, there are lots of views that aren’t easily updatable: </li></ul></ul><ul><ul><li>We can ensure views are updatable by enforcing certain constraints (e.g., no aggregation), but this limits the kinds of views we can have </li></ul></ul>R S R ⋈ S A B 1 2 2 2 B C 2 4 2 3 A B C 1 2 4 1 2 3 2 2 4 2 2 3 delete?
  10. 10. “ Publishing” XML Views of Relational Data <ul><li>It can be done with SQL/XML, an extension of the SQL standard (see http://sqlxml.org ) </li></ul><ul><li>(Don’t confuse with old and lame SQLXML for SQL Server.) </li></ul><ul><li>select xmlelement(name “Customer”, </li></ul><ul><li>xmlelement(name “CustID”, c.CustId), </li></ul><ul><li>xmlelement(name “CustName”, c.CustName), </li></ul><ul><li>xmlelement(name “City”, c.City) ) </li></ul><ul><li>from customers c </li></ul><ul><li>where c.Status = “preferred” </li></ul><ul><li>This is a very valuable tool for B2B (business-to-business) data exchange. Available in Oracle, DB2, SQL Server. </li></ul>
  11. 11. Embedding XML in a Relational Database <ul><li>Straightforward solution: add attributes of type “XML ”. Promoted by the same SQL/XML standard. </li></ul><ul><li>create table clients( </li></ul><ul><li>id int primary key not null, </li></ul><ul><li>name varchar(50), </li></ul><ul><li>status varchar(10), </li></ul><ul><li>contactinfo xml ) </li></ul><ul><li>Available in Oracle, DB2, SQL Server. Syntax may vary. Above syntax is from DB2. </li></ul>
  12. 12. Querying Relationally Embedded XML with SQL/XML <ul><li>SQL/XML standard specifies xmlexists and xmlquery for embedding XPath and XQuery into SQL. DB2 syntax . </li></ul><ul><li>select name from clients </li></ul><ul><li>where xmlexists('$c/Client/Address[zip=&quot;95116&quot;]' </li></ul><ul><li>passing clients.contactinfo as &quot;c&quot;) </li></ul><ul><li>select name, </li></ul><ul><li>xmlquery('for $e in $c/Client/email return $e' </li></ul><ul><li>passing contactinfo as &quot;c&quot;) </li></ul><ul><li>from clients </li></ul><ul><li>where status = 'Gold’ </li></ul>
  13. 13. SQL Extensions for XQuery (DB2) <ul><li>for $y in </li></ul><ul><li>db2-fn:xmlcolumn('CLIENTS.CONTACTINFO')/Client/Address </li></ul><ul><li>return $y </li></ul><ul><li>for $y in db2-fn:sqlquery('select contactinfo </li></ul><ul><li> from clients </li></ul><ul><li> where status=''Gold'' ’ </li></ul><ul><li>)/Client </li></ul><ul><li>where $y/Address/city=&quot;San Jose&quot; </li></ul><ul><li>return ( </li></ul><ul><li>if ($y/email) then <emailList>{$y/email} </li></ul><ul><li> </emailList> </li></ul><ul><li>else $y/Address ) </li></ul>
  14. 14. More Uniform Ways for Storing XML in an RDBMS: Mapping Relational  XML <ul><li>We know the following: </li></ul><ul><ul><li>XML data is tree-like </li></ul></ul><ul><ul><li>XML is SEMI-structured </li></ul></ul><ul><ul><ul><li>There’s some structured “stuff”, especially if it follows a DTD </li></ul></ul></ul><ul><ul><ul><li>There is some unstructured “stuff”, eg. text </li></ul></ul></ul><ul><li>Issues relate to describing XML structure, particularly parent/child in a relational encoding </li></ul><ul><ul><li>Relations are flat </li></ul></ul><ul><ul><li>Tuples can be “connected” via foreign-key/primary-key links </li></ul></ul>
  15. 15. The Simplest Way to Encode a Tree <ul><li>Suppose we had: <tree id=“0”> <content id=“1”> <sub-content>XYZ </sub-content> <i-content>14 </i-content> </content> </tree> </li></ul><ul><li>If we have no IDs, we CREATE values… </li></ul><ul><li>BinaryLikeEdge(key, label, type, value, parent) </li></ul>What are shortcomings here? key label type value parent 0 tree ref - - 1 content ref - 0 2 sub-content ref - 1 3 i-content ref - 1 4 - str XYZ 2 5 - int 14 3
  16. 16. Florescu/Kossmann Improved Edge Approach <ul><li>Consider order, typing; separate the values </li></ul><ul><li>Edge(parent, ordinal, label, flag, target) </li></ul><ul><li>Vint(vid, value) </li></ul><ul><li>Vstring(vid, value) </li></ul>parent ord label flag target - 1 tree ref 0 0 1 content ref 1 1 1 sub-content str v2 1 1 i-content int v3 vid value v3 14 vid value v2 XYZ
  17. 17. How Do You Compute the XML? <ul><li>Assume we know the structure of the XML tree (we’ll see how to avoid this later) </li></ul><ul><li>We can compute an “XML-like” SQL relation using “ outer unions ” </li></ul><ul><ul><li>Idea: if we take two non-union-compatible expressions, pad each with NULLs, we can UNION them together </li></ul></ul><ul><ul><li>Let’s see how this works… </li></ul></ul>
  18. 18. A Relation that Mirrors the XML Hierarchy <ul><li><tree id=“0”> <content id=“1”> <sub-content> XYZ </sub-content> <i-content>14</i-content> </content> </tree> </li></ul><ul><li>Write an SQL query whose output looks like: </li></ul>rLabel rid rOrd clabel cid cOrd sLabel sid sOrd str int tree 0 1 - - - - - - - - - 0 1 content 1 1 - - - - - - 0 1 - 1 1 sub-content 2 1 - - - 0 1 - 1 1 - 2 1 XYZ - - 0 1 - 1 2 i-content 3 1 - - - 0 1 - 1 2 - 3 1 - 14
  19. 19. A Relation that Mirrors the XML Hierarchy <ul><li>Output relation would look like: </li></ul>rLabel rid rOrd clabel cid cOrd sLabel sid sOrd str int tree 0 1 - - - - - - - - - 0 1 content 1 1 - - - - - - 0 1 - 1 1 sub-content 2 1 - - - 0 1 - 1 1 - 2 1 XYZ - - 0 1 - 1 2 i-content 3 1 - - - 0 1 - 1 2 - 3 1 - 14
  20. 20. A Relation that Mirrors the XML Hierarchy <ul><li>Output relation would look like: </li></ul>Colors are representative of separate SQL queries… rLabel rid rOrd clabel cid cOrd sLabel sid sOrd str int tree 0 1 - - - - - - - - - 0 1 content 1 1 - - - - - - 0 1 - 1 1 sub-content 2 1 - - - 0 1 - 1 1 - 2 1 XYZ - - 0 1 - 1 2 i-content 3 1 - - - 0 1 - 1 2 - 3 1 - 14
  21. 21. SQL for Outputting XML <ul><li>For each sub-portion we preserve the keys (target, ord) of the ancestors </li></ul><ul><li>Root: </li></ul><ul><ul><li>select E.label AS rLabel, E.target AS rid, E.ord AS rOrd , null AS cLabel, null AS cid, null AS cOrd, null AS subOrd, null AS sid, null AS str, null AS int from Edge E where parent IS NULL </li></ul></ul><ul><li>First-level children: </li></ul><ul><ul><li>select null AS rLabel, E.target AS rid, E.ord AS rOrd , E1.label AS cLabel, E1.target AS cid, E1.ord AS cOrd , null AS … from Edge E, Edge E1 where E.parent IS NULL AND E.target = E1.parent </li></ul></ul>
  22. 22. The Rest of the Queries <ul><li>Grandchild: </li></ul><ul><ul><li>select null as rLabel, E.target AS rid, E.ord AS rOrd , null AS cLabel, E1.target AS cid, E1.ord AS cOrd , E2.label as sLabel, E2.target as sid, E2.ord AS sOrd , null as … from Edge E, Edge E1, Edge E2 where E.parent IS NULL AND E.target = E1.parent AND E1.target = E2.parent </li></ul></ul><ul><li>Strings: </li></ul><ul><ul><li>select null as rLabel, E.target AS rid, E.ord AS rOrd , null AS cLabel, E1.target AS cid, E1.ord AS cOrd , null as sLabel, E2.target as sid, E2.ord AS sOrd , Vi.val AS str , null as int from Edge E, Edge E1, Edge E2, Vint Vi where E.parent IS NULL AND E.target = E1.parent AND E1.target = E2.parent AND Vi.vid = E2.target </li></ul></ul><ul><li>How would we do integers? </li></ul>
  23. 23. Finally… <ul><li>Union them all together: </li></ul><ul><ul><li>( select E.label as rLabel, E.target AS rid, E.ord AS rOrd , … from Edge E where parent IS NULL) UNION ( select null as rLabel, E.target AS rid, E.ord AS rOrd , E1.label AS cLabel, E1.target AS cid, E1.ord AS cOrd , null as … from Edge E, Edge E1 where E.parent IS NULL AND E.target = E1.parent ) UNION ( </li></ul></ul><ul><ul><li> . </li></ul></ul><ul><ul><li> : ) UNION ( . : </li></ul></ul><ul><ul><li>) </li></ul></ul><ul><li>Then another module will add the XML tags, and we’re done! </li></ul>
  24. 24. “ Inlining” Techniques <ul><li>Folks at Wisconsin noted we can exploit the “structured” aspects of semi-structured XML </li></ul><ul><ul><li>If we’re given a DTD, often the DTD has a lot of required (and often singleton) child elements </li></ul></ul><ul><ul><ul><li>Book(title, author*, publisher) </li></ul></ul></ul><ul><ul><li>Recall how normalization worked: </li></ul></ul><ul><ul><ul><li>Decompose until we have everything in a relation determined by the keys </li></ul></ul></ul><ul><ul><ul><li>… But don’t decompose any further than that </li></ul></ul></ul><ul><ul><li>Shanmugasundaram et al. try not to decompose XML beyond the point of singleton children </li></ul></ul>
  25. 25. Inlining Techniques <ul><li>Start with DTD, build a graph representing structure </li></ul>tree content sub-content i-content * * * <ul><li>The edges are annotated with ?, * indicating repetition, optionality of children </li></ul><ul><li>They simplify the DTD to figure this out </li></ul>@id @id ?
  26. 26. Building Schemas <ul><li>Now, they tried several alternatives that differ in how they handle elements w/multiple ancestors </li></ul><ul><ul><li>Can create a separate relation for each path </li></ul></ul><ul><ul><li>Can create a single relation for each element </li></ul></ul><ul><ul><li>Can try to inline these </li></ul></ul><ul><li>For tree examples, these are basically the same </li></ul><ul><ul><li>Combine non-set-valued things with parent </li></ul></ul><ul><ul><li>Add separate relation for set-valued child elements </li></ul></ul><ul><ul><li>Create new keys as needed </li></ul></ul>name book author
  27. 27. Schemas for Our Example <ul><li>TheRoot(rootID) </li></ul><ul><li>Content(parentID, id, @id) </li></ul><ul><li>Sub-content(parentID, varchar) </li></ul><ul><li>I-content(parentID, int) </li></ul><ul><li>If we suddenly changed DTD to <!ELEMENT content(sub-content*, i-content?) what would happen? </li></ul>
  28. 28. XQuery to SQL <ul><li>Inlining method needs external knowledge about the schema </li></ul><ul><ul><li>Needs to supply the tags and info not stored in the tables </li></ul></ul><ul><li>We can actually directly translate simple XQuery into SQL over the relations – not simply reconstruct the XML </li></ul>
  29. 29. An Example <ul><li>for $X in document(“mydoc”)/tree/content where $X/sub-content = “XYZ” return $X </li></ul><ul><li>The steps of the path expression are generally joins </li></ul><ul><ul><li>… Except that some steps are eliminated by the fact we’ve inlined subelements </li></ul></ul><ul><ul><li>Let’s try it over the schema: </li></ul></ul><ul><ul><ul><li>TheRoot(rootID) </li></ul></ul></ul><ul><ul><ul><li>Content(parentID, id, @id) </li></ul></ul></ul><ul><ul><ul><li>Sub-content(parentID, varchar) </li></ul></ul></ul><ul><ul><ul><li>I-content(parentID, int) </li></ul></ul></ul>
  30. 30. XML Views of Relations <ul><li>We’ve seen that views are useful things </li></ul><ul><li>Allow us to store and refer to the results of a query </li></ul><ul><li>We’ve seen an example of a view that changes from XML to relations – and we’ve even seen how such a view can be posed in XQuery and “unfolded” into SQL </li></ul>
  31. 31. An Important Set of Questions <ul><li>Views are incredibly powerful formalisms for describing how data relates: fn: rel  …  rel  rel </li></ul><ul><li>Can I define a view recursively ? </li></ul><ul><ul><li>Why might this be useful in the XML construction case? When should the recursion stop? </li></ul></ul><ul><li>Suppose we have two views, v 1 and v 2 </li></ul><ul><ul><li>How do I know whether they represent the same data? </li></ul></ul><ul><ul><li>If v 1 is materialized, can we use it to compute v 2 ? </li></ul></ul><ul><ul><ul><li>This is fundamental to query optimization and data integration, as we’ll see later </li></ul></ul></ul>
  32. 32. Summary <ul><li>We’ve seen that views are useful things, and are heavily used within database applications as well as for data exchange. </li></ul><ul><li>Views allow us to store and refer to the results of a query ( materialized vs. virtual views) </li></ul><ul><li>We’ve seen an example of a view that changes from XML to relations – and we’ve even seen how such a view can be posed in XQuery and “unfolded” into SQL </li></ul><ul><li>We have also seen how to define XML views of relational data </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×