The magic is in the glue
  XQuery+Cloud
    Daniela Florescu
        Oracle
My personal history
 PhD in object-oriented query
 processing/optimization
 Loved the database theory and practice
 (relat...
… after 4 years in Oracle
 Applications are the really important issue
   How to develop, deploy, maintain, evolve, custom...
Agenda
 Current pain in building apps
 What can XQuery do for customers ?
 What can the Cloud do for customers ?
 How do w...
Imagine I am a customer, I need
to build a new app.
 1. How much does it cost
      Cost of developing the app (salaries)
...
Other questions ?
2. How fast can I deliver the app
      Quicker on the market then my competitors ?
3. How good the appl...
Customers concerns
 Cost
 Time to market
 Flexibility
 Customizability
 Sustainability
 Risk

 Often a tradeoff
          ...
Different classes of customers
 Enterprise (e.g. Bank of America)
   Cost
   Sustainability
   Risk
   Customizability
   ...
Typical enterprise app stack

          Communication
         (XML, REST, WS)       Oracle
                              ...
Cost ? $$$$!

Cost of developing the app
                                                  Communication
Cost of deploying...
Time to market ? Years!
2. How fast can I deliver the app
                                 Communication
                 ...
Flexibility ? Customizability?
 Hardly any !
  Can I adapt if something changes ?             Communication
      Operatio...
Two major evil points
1. Multi layer infrastructure
2. Schemas a pre-requisite                 Communication

            ...
Another evil point
  Lack of cost elasticity
    Cost proportional with income
  Lack of elasticity in performance
    Res...
Agenda
 Current pain in building apps
 What can XQuery do for customers ?
 What can the Cloud do for customers ?
 How do w...
Why XML ?
 Covers all spectrum from structured data
 to textual information
 Schema independent
 Platform independent
 Con...
What is XQuery ?
A programming language for XML processing
Functional in style
Turing complete
Contains:
   Navigation
   ...
History and status
 Standard of the W3C
   Good and bad
 10 years old
 40 existing implementations
 Implemented in major d...
Navigation
 fn:doc("catalog.xml") /items/item
 fn:doc("catalog.xml")/items//item
 fn:doc("catalog.xml")/items//*
 fn:doc("...
FLWOR
for $i in fn:doc("catalog.xml")/items/item,
$p in fn:doc("parts.xml")/parts/part[partno =
   $i/partno],
$s in fn:do...
Creation of new information
<descriptive-catalog>
  { for $i in fn:doc("catalog.xml")/items/item,
  $p in fn:doc("parts.xm...
Textual search
 $doc ftcontains ( ( "mustang" ftand
 ({("great", "excellent")} any word occurs at
 least 2 times) ) window...
Declarative updates
for $p in /inventory/part
 let $deltap := $changes/part[partno eq
   $p/partno]
return
  replace value...
Transforms
let $oldx := /a/b/x
return
  copy $newx := $oldx
  modify
     (rename node $newx as "newx",
     replace value...
Streams and windowing
 for sliding window $w in (2, 4, 6, 8, 10, 12, 14)
 start at $s when fn:true()
 only end at $e when ...
Scripting expressions
block
{
      declare $a as xs:integer := 0;
      declare $b as xs:integer := 1;
      declare $c a...
Where can it be used in
today’s architectures?
 Databases
 Middle tiers
   Information dispatch
   Transformation
   Data ...
XQuery’s real potential
                            XML              XML
 Standalone programming
 language for information...
1. Cost
Why XQuery ?                                  2. Time to market
                                              3. F...
Declarativity
 Small number of lines of code
   Development cost
   Time to market
   # bugs
 Easy to optimize automatical...
Declarativity, negative side
1. Less number of developers capable of
   writing such code
2. Easy to write, harder to read...
Rethink transactions and data
consistency
 XQuery silent as ACID transactions go
   On purpose !
 Are ACID transactions re...
Sigmod’08
 Data consistency is something to optimize, not an
 absolute requirement
 Data consistency models [Tanembaum]
  ...
Introspection opportunities
Closed world
Everything is (or will be) XML
  Data, schemas, code, PULs, metadata, config
  s,...
Why NOT XQuery
XML is complicated
XML Schema is hard/impossible to understand
XQuery is complicated
XQuery is incomplete (...
Agenda
 Current pain in building apps
 What can XQuery do for customers ?
 What can the Cloud do for customers ?
 How do w...
What is Cloud Computing ?
 The „rental cars“ paradigm for computing
 Commoditization of (certain aspects of ) Computing
  ...
Case Study: Amazon AWS
EC2 : scalable virtual private servers using
Xen.
S3 : WS based storage for applications
SQS : host...
The limits of the (Amazon) Cloud
  Cloud Computing a great starting point
  Unfortunately, only a fraction of the stack

 ...
Making use of the Cloud

 Solution 1 (conservative)               Risk   Benefit

   Take an existing application
   (Java...
Solution 1 (conservative)
  take a traditional DBMS (e.g., Oracle, MySQL, ...)
  install it on an EC2 instance
  use S3 or...
Solution 2 (reactionary)
 Rethink the whole system architecture
   do NOT use a traditional DBMS and app server
   create ...
Agenda
 Current pain in building apps
 What can XQuery do for customers ?
 What can the Cloud do for customers ?
 How do w...
XQuery + AWS Cloud
Cookbook:
  Take an existing XQuery processor
  Partition the XML data on S3
  Map REST calls to XQuery...
XQuery in the Cloud
(connected)




                      45
Customers concerns
 Cost
 Time to market
 Flexibility
 Customizability
 Sustainability




                     46
XQuery in the Cloud (no
Server)




                          47
XQuery in the Cloud (offline)




                                48
Demo at www.28msec.com !

Look at www.programmableweb.com
for use cases ( consumer and
enterprise mashups)



            ...
Competitors: Internet
Web 2.0 Development Frameworks
  E.g., Ruby on Rails, PHP / LAMP, ...
  Deployment in the cloud stil...
Competitors: Enterprise
Salesforce AppExchange
  proprietary programming model
  Limited applications domain (CRM)
Microso...
Web 2.0 Support vs. Cloud
 Support
  Deployment

        AWS
                               Google App Engine,         XQu...
Agenda
 Current pain in building apps
 What can XQuery do for customers ?
 What can the Cloud do for customers ?
 How do w...
Versions and variations
Human mind does not like agreements
  We like our differences (for a good reason)
Different ways t...
Versions and variations
 Research problems:
   What is a (data, schema, code) variation ?
   What does it mean to run an a...
Conclusion
 XQuery in the cloud a serious alternative
 for some (large # and large $$) customers
 Nothing equivalent in th...
My advice
Keep the eye on the apps, not db
Keep the customer in mind
Rethink the entire stack
Don’t be afraid to shake dow...
Upcoming SlideShare
Loading in …5
×

The Magic's in the Glue: Daniela Florescu Presentation on XQuery and the Cloud

2,837 views
2,762 views

Published on

Presentation by Daniela Florescu of Oracle on XQuery and cloud computing. Original URL is: http://isg.ics.uci.edu/slides/FlorescuIrvine.ppt

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,837
On SlideShare
0
From Embeds
0
Number of Embeds
209
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Magic's in the Glue: Daniela Florescu Presentation on XQuery and the Cloud

  1. 1. The magic is in the glue XQuery+Cloud Daniela Florescu Oracle
  2. 2. My personal history PhD in object-oriented query processing/optimization Loved the database theory and practice (relational, object-oriented, semi- structured) Got really interested in it, and thought it was important… ….then I joined Oracle. 2
  3. 3. … after 4 years in Oracle Applications are the really important issue How to develop, deploy, maintain, evolve, customize Databases are a side effect Customers are educated to think they need them DB are only useful as part of a general application architecture Customer is the king If they don’t make $$$, you don’t either Customers are in pain building apps right now 3
  4. 4. Agenda Current pain in building apps What can XQuery do for customers ? What can the Cloud do for customers ? How do we put them together ? How do XQuery+Cloud solve the problem ? Some open research problems 4
  5. 5. Imagine I am a customer, I need to build a new app. 1. How much does it cost Cost of developing the app (salaries) Cost of deploying the app Hardware, software licenses, maintenance Loss of income because of mis-provisioning Do I have to pay up front? Is the cost proportional with the income ? 5
  6. 6. Other questions ? 2. How fast can I deliver the app Quicker on the market then my competitors ? 3. How good the application is More customers for the app. => more income Acceptable operational characteristics ? 4. Can I adapt if something changes ? Operational characteristics Functionality 5. Can I customize the same app in a different vertical / different set of customers ? 6. Is there a risk in the technology ? 6
  7. 7. Customers concerns Cost Time to market Flexibility Customizability Sustainability Risk Often a tradeoff 7
  8. 8. Different classes of customers Enterprise (e.g. Bank of America) Cost Sustainability Risk Customizability Flexibility Time to market Government agency (eg. DoD) Sustainability Cost Time to market (?) Flexibility (?) Customizability Risk Consumer (e.g Craiglist) Time to market Cost Flexibility Customizability Sustainability Risk 8
  9. 9. Typical enterprise app stack Communication (XML, REST, WS) Oracle IBM Application logic SAP (Java, C#) Microsoft Database SQL) 9
  10. 10. Cost ? $$$$! Cost of developing the app Communication Cost of deploying the app (XML, REST, WS) (hardware, software licenses, maintenance) Loss of income because of mis- Application logic provisioning (Java, C#) Do I have to pay up front? Is the cost proportional with the income ? Database SQL) 10
  11. 11. Time to market ? Years! 2. How fast can I deliver the app Communication (XML, REST, WS) Application logic (Java, C#) Database SQL) 11
  12. 12. Flexibility ? Customizability? Hardly any ! Can I adapt if something changes ? Communication Operational characteristics (XML, REST, WS) Functionality Can I customise it to a different vertical? Application logic (Java, C#) Oracle experience: for every $1M for Oracle app licenses, customers Database pay $2M to customize it. SQL) (SAP experience even worse :-) 12
  13. 13. Two major evil points 1. Multi layer infrastructure 2. Schemas a pre-requisite Communication Application Logic (schema-less) New apps: put Persistent (key, value) store get Even the Oracle apps ! (schema-less) New platforms: Salesforce, GoogleApps, Facebook XQuery a possible solution. 13
  14. 14. Another evil point Lack of cost elasticity Cost proportional with income Lack of elasticity in performance Response time independent of # clients The Cloud is the beginning of a solution. 14
  15. 15. Agenda Current pain in building apps What can XQuery do for customers ? What can the Cloud do for customers ? How do we put them together ? How do XQuery+Cloud solve the problem ? Some open research problems 15
  16. 16. Why XML ? Covers all spectrum from structured data to textual information Schema independent Platform independent Continuity with the basic Internet infrastructure (URI, HTML, HTTP) 16
  17. 17. What is XQuery ? A programming language for XML processing Functional in style Turing complete Contains: Navigation Declarative query and aggregation (FLWOR) Search (full text) Declarative updates Transforms Scripting Streaming and windowing Error handling and second order expressions Packaging (modules) Has limitations (further) 17
  18. 18. History and status Standard of the W3C Good and bad 10 years old 40 existing implementations Implemented in major databases Best implementations in open source If you have XML data, it is hard to avoid. 18
  19. 19. Navigation fn:doc("catalog.xml") /items/item fn:doc("catalog.xml")/items//item fn:doc("catalog.xml")/items//* fn:doc("catalog.xml")/items/@item fn:doc("parts.xml")/parts/part[partno = $i/partno] $x/items/item 19
  20. 20. FLWOR for $i in fn:doc("catalog.xml")/items/item, $p in fn:doc("parts.xml")/parts/part[partno = $i/partno], $s in fn:doc("suppliers.xml")/suppliers /supplier[suppno = $i/suppno] order by $p/description, $s/suppname return $ s Groupby, having, outerjoins, etc 20
  21. 21. Creation of new information <descriptive-catalog> { for $i in fn:doc("catalog.xml")/items/item, $p in fn:doc("parts.xml")/parts/part[partno = $i/partno], $s in fn:doc("suppliers.xml")/suppliers /supplier[suppno = $i/suppno] order by $p/description, $s/suppname return <item> { $p/description, $s/suppname, $i/price } </item> } </descriptive-catalog> 21
  22. 22. Textual search $doc ftcontains ( ( "mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) ) window 11 words ftand ftnot "rust" ) same paragraph 22
  23. 23. Declarative updates for $p in /inventory/part let $deltap := $changes/part[partno eq $p/partno] return replace value of node $p/quantity with $p/quantity + $deltap/quantity 23
  24. 24. Transforms let $oldx := /a/b/x return copy $newx := $oldx modify (rename node $newx as "newx", replace value of node $newx by $newx * 2) return ($oldx, $newx) 24
  25. 25. Streams and windowing for sliding window $w in (2, 4, 6, 8, 10, 12, 14) start at $s when fn:true() only end at $e when $e - $s eq 2 return <window>{ $w }</window> Result of the above query: <window>2 4 6</window> <window>4 6 8</window> <window>6 8 10</window> <window>8 10 12</window> <window>10 12 14</window> 25
  26. 26. Scripting expressions block { declare $a as xs:integer := 0; declare $b as xs:integer := 1; declare $c as xs:integer := $a + $b; declare $fibseq as xs:integer* := ($a, $b); while ($c < 100) { set $fibseq := ($fibseq, $c); set $a := $b; set $b := $c; set $c := $a + $b; }; $fibseq; } 26
  27. 27. Where can it be used in today’s architectures? Databases Middle tiers Information dispatch Transformation Data integration Browsers (see XQIB demo, WWW’09 paper) Mobile devices (XQuery on iPhone anyone ?) 27
  28. 28. XQuery’s real potential XML XML Standalone programming language for information intensive applications Application Can build extremely rich applications Logic (XQuery) XML 28
  29. 29. 1. Cost Why XQuery ? 2. Time to market 3. Flexibility 4. Customizability Because of XML 5. Sustainability Schema independent 6. Risk Continuity with basic Internet infrastructure Continuity structured data <--> textual information XQuery’s own advantages Declarative Single layer code Open source friendly Extra Goodies Opportunity to rethink ACID transactions Unique opportunities for introspection Code and data migration 29
  30. 30. Declarativity Small number of lines of code Development cost Time to market # bugs Easy to optimize automatically Easy to parallelize automatically Especially important in the cloud Easier to achieve elasticity in performance Easier to generate automatically Important for smart/non-developers UIs 30
  31. 31. Declarativity, negative side 1. Less number of developers capable of writing such code 2. Easy to write, harder to read 3. Tools harder to make (e.g. debuggers) 4. Performance can be unstable Despite that, in the history of CS we evolve in the direction of declarativity Assembly, C, C++, Java, Haskell Cobol, SQL 31
  32. 32. Rethink transactions and data consistency XQuery silent as ACID transactions go On purpose ! Are ACID transactions really needed ? Are they really enforced in Web apps ? No. Open research field Interaction of programming languages with new transactional models and new data consistency models 32
  33. 33. Sigmod’08 Data consistency is something to optimize, not an absolute requirement Data consistency models [Tanembaum] Shared-Disk (Naïve approach) No concurrency control at all Eventual Consistency (Basic Protocol) Updates become visible any time and will persist No lost update on page level Atomicity All or no updates of a transaction become visible Monotonic reads, Read your writes, Monotonic writes, ... Strong Consistency database-style consistency (ACID) via OCC Data consistency a la carte 33
  34. 34. Introspection opportunities Closed world Everything is (or will be) XML Data, schemas, code, PULs, metadata, config s, runtime information Unique opportunity to: introspect at runtime all of them reason about them change them dynamically (not only data, but schemas, code and configuration) Open research field: Consequences on programming 34
  35. 35. Why NOT XQuery XML is complicated XML Schema is hard/impossible to understand XQuery is complicated XQuery is incomplete (maybe research opport.?) Missing a standard persistent data model Missing DDL functionality (indexes, integrity constraints) Missing basic functionalities (e.g. eval, function overloading) Missing basic data modeling functionality (n:m relationships) XQuery lacks a standard environment (e.g. J2EE) (maybe research opport.?) No tools (debuggers, profilers) (maybe research opport.?) Performance is not clear yet (certainly research opport !) There are few XQuery developers (teaching opport  ) 35
  36. 36. Agenda Current pain in building apps What can XQuery do for customers ? What can the Cloud do for customers ? How do we put them together ? How do XQuery+Cloud solve the problem ? Some open research problems 36
  37. 37. What is Cloud Computing ? The „rental cars“ paradigm for computing Commoditization of (certain aspects of ) Computing CPU, storage, and network Goal 1: Reduction of Cost principle: fine-grained renting of resources „pay as you go“ (elasticity of cost) Goal 2: Simplification of Management potentially infinite/unbreakable computing resources potentially no administration Goal 3: Elasticity of performance Same resp time independently of workload Note: does not work yet for DB or apps 37
  38. 38. Case Study: Amazon AWS EC2 : scalable virtual private servers using Xen. S3 : WS based storage for applications SQS : hosted message queue for web applications SimpleDB : the core functionality of a database Hadoop based functionality Similar providers: IBM Blue Cloud, Microsoft Azure, (GoogleApp engine) 38
  39. 39. The limits of the (Amazon) Cloud Cloud Computing a great starting point Unfortunately, only a fraction of the stack Customization, Training, ... Application Application Server DBMS Hardware 39
  40. 40. Making use of the Cloud Solution 1 (conservative) Risk Benefit Take an existing application (Java+SQL, etc) and try to make it run on the cloud (e.g. make Oracle run on AWS) Solution 2 (reactionary) Create an fresh new infrastructure, specially designed for Web apps requirements, to be deployed in the cloud 40
  41. 41. Solution 1 (conservative) take a traditional DBMS (e.g., Oracle, MySQL, ...) install it on an EC2 instance use S3 or EBS as a persistent store Advantages traditional databases are available proven to work well; many tools people trained and confident with them Disadvantages traditional DBMS solve the wrong problem anyway (e.g. focus on consistency) traditional DBMS make the wrong assumptions (DB optimizers fail on virtualized hardware) 41
  42. 42. Solution 2 (reactionary) Rethink the whole system architecture do NOT use a traditional DBMS and app server create new breed of application server (with DB) run application server on n EC2 instances use S3 + distributed consistency protocols Advantages and Disadvantages requires new breed of (immature) systems + tools solves the right problem and gets it right Examples: GoogleApps (Python in the cloud) Sausalito (www.28msec.com) (XQuery in the cloud) 42
  43. 43. Agenda Current pain in building apps What can XQuery do for customers ? What can the Cloud do for customers ? How do we put them together ? How do XQuery+Cloud solve the problem ? Some open research problems 43
  44. 44. XQuery + AWS Cloud Cookbook: Take an existing XQuery processor Partition the XML data on S3 Map REST calls to XQuery programs Run the XQuery programs on EC2 Use SQS for (asyncronous) updates Voila. The magic is in the glue (XQuery proc. + AWS ) Application Server + Web Server + Database integrated XQuery based application stack for Web- based apps fully SOA enabled all pre-configured and lean (ZERO admin) 44
  45. 45. XQuery in the Cloud (connected) 45
  46. 46. Customers concerns Cost Time to market Flexibility Customizability Sustainability 46
  47. 47. XQuery in the Cloud (no Server) 47
  48. 48. XQuery in the Cloud (offline) 48
  49. 49. Demo at www.28msec.com ! Look at www.programmableweb.com for use cases ( consumer and enterprise mashups) 49
  50. 50. Competitors: Internet Web 2.0 Development Frameworks E.g., Ruby on Rails, PHP / LAMP, ... Deployment in the cloud still problematic Google AppEngine, Facebook Apps Proprietary programming model (Python-based) Limited functionality Vendor lock-in, privacy issues Oracle on AWS, do-it-yourself on AWS limited functionality and/or scalability 50
  51. 51. Competitors: Enterprise Salesforce AppExchange proprietary programming model Limited applications domain (CRM) Microsoft Azure .Net programming model manual configuration needed (recent offering, market adoption unclear) Virtualization Companies (e.g., VMWare) No offerings / expertise for data management Oracle (Grid, RAC) limited scalability, cost prohibitive 51
  52. 52. Web 2.0 Support vs. Cloud Support Deployment AWS Google App Engine, XQuery+AWS Cloud Facebook Salesforce, Workday Azure VMWare Cloud, Citrix Oracle Trad. Ruby on Rails Development Proprietary Standard 52
  53. 53. Agenda Current pain in building apps What can XQuery do for customers ? What can the Cloud do for customers ? How do we put them together ? How do XQuery+Cloud solve the problem ? Some open research problems 53
  54. 54. Versions and variations Human mind does not like agreements We like our differences (for a good reason) Different ways to see: Data Schemas Code Current stack is imposing agreement unlike our own nature We have to come up with solutions that allow, welcome and exploit variations Darwinian, evolutionary approach to data, schema and code mutations 54
  55. 55. Versions and variations Research problems: What is a (data, schema, code) variation ? What does it mean to run an app in the presence of variations ? How do you store (index, etc) variations ? How do you re-integrate them back into mainstream app (e.g. community voting ?) What is the correct lifecycle for data, schema, code that allows and maximally exploits variations ? Note: I have a easier time to think of a solution if the app is in XML/XQuery rather if the app is in Java+SQL (even Python) 55
  56. 56. Conclusion XQuery in the cloud a serious alternative for some (large # and large $$) customers Nothing equivalent in the competition: How “solid” (standard, tested) this is Richness of applications Potential for optimization and parallelization Ease of porting to the cloud 56
  57. 57. My advice Keep the eye on the apps, not db Keep the customer in mind Rethink the entire stack Don’t be afraid to shake down existing ideas about how applications are supposed to work Thank you! 57

×