Database as a ServiceSeminar, ICDE 2010, Long Beach, March 04<br />Wolfgang Lehner | Dresden University of Technology, Ger...
Introduction<br />Motivation<br />SaaS<br />Cloud Computing<br />UseCases<br />2<br />
Software as a Service (SaaS)<br />Traditional Software<br />On-DemandUtility<br />Plug In, SubscribePay-per-Use<br />Build...
Comparison of business model<br />4<br />
Avoidhiddencostof traditional SW<br />Traditional Software<br />SaaS<br />SW Licenses<br />Subscription Fee<br />Training<...
The Long Tail<br />Dozens of markets of millions or millions of markets of dozens?<br />Your Large Customers<br />$ / Cust...
Acquisition Model<br />Service<br />Business Model<br />Pay for usage<br />Access ModelInternet<br />Technical ModelScalab...
To Qualify as a Cloud<br />Common, Location-independent, Online Utility on Demand*<br />Common implies multi-tenancy, not ...
Cloud Differentials: Service Models<br />9<br />Cloud Software as a Service (SaaS)<br />Use provider’s applications over a...
Cloud Differentials: Characteristics<br />10<br />Platform<br />Physical – Virtual<br />Homogenous – Heterogeneous<br />De...
UseCases: Large-Scale Data Analytics<br />Outsourceyourdata and usecloudresourcesforanalysis<br />Historical and mostlynon...
UseCases: Database Hosting<br />Public datasets<br />Biologicaldatabases: a singlerepositoryinstead of > 700 separate data...
UseCases: Service Hosting<br />Data managementforSaaSsolutions<br />Run theservicesnearthedata<br />= ASP<br />Alreadymany...
Foundations & Architectures<br />Virtualization<br />Programmingmodels<br />Consistencymodels & replication<br />SLAs & Wo...
Topics covered in this Seminar<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tena...
Current Solutions<br />userperspective<br />one DB for all clients<br />one DB per client<br />Virtualization<br />Replica...
... it‘s simple!<br />17<br />
Virtualization<br />Separating the abstract view of computing resources from the implementation of these resources<br />ad...
Whatcanbevirtualized – thebigfour.<br />19<br />
Different TypesofVirtualization<br />20<br />APP 1<br />APP 4<br />APP 2<br />APP 3<br />APP 5<br />OPERATING SYSTEM<br />...
Virtual Machines<br />21<br />Technique with long history (since the 1960's)<br />Prominent since IBM 370 mainframeseries<...
DBMS on top of Virtual Machines<br />... yetanotherapplication?<br />... Overhead?<br />SQL Server withinVMware<br />22<br />
Virtualization Design Advisor<br />What fraction of node resources goes to what DBMS?<br />Configuring VM parameters<br />...
Some Experiments<br />Workload Definition based on TPC-H<br />Q18 isoneofthemost CPU intensive queries<br />Q21 isoneofthe...
Some Experiments (2)<br />Workload Settings<br />W3 = 1C<br />W4 = kC<br />Workload Settings<br />W5 = 1C<br />W6 = kI<br ...
Virtualization in DBaaS environments<br />DB Layer<br />DB Server<br />DB Server<br />DB Server<br />DB<br />DB<br />DB<br...
Existing Tools for Node Virtualization<br />DB Server<br />DB Layer<br />DB<br />DB<br />DB<br />DB<br />DB<br />DB Ad2vis...
MQTs
MDC
Redistribution of Tables</li></ul>DB Workload Manager<br />Instance <br />Layer<br />Instance<br />Instance<br />DB Server...
 VM expects static (peak) resource requirements
 Interactions between layers can improve performance/utilization</li></ul>Node<br />Ressource Model<br />VM<br />VM<br />V...
Resources Configuration
(manual) Migration</li></ul>HW Layer<br />27<br />
Layer Interactions (2)<br />Experiment<br />DB2 on Linux<br />TPC-H workload on 1GB database<br />Ranges for resource gran...
Storage Virtualization<br />General Goal<br />provide a layerofindircetiontoallowthedefinitionofvirtualstoragedevices<br /...
Virtualization in DBaaS Environments (2)<br />DB Layer<br />DB Server<br />DB Server<br />DB Server<br />DB<br />DB<br />D...
Virtualization in DBaaS Environments (2)<br />DB Layer<br />DB<br />DB<br />DB<br />DB<br />DB<br />DB Server<br />Instanc...
MQTs
MDC
Redistribution of Tables</li></ul>DB Workload Manager<br />StorageRessource Model<br />Storage Configuration<br /><ul><li>...
Replication
Archiving</li></ul>Shared Disk<br />Local Disk<br />
Onewaytogo? Paravirtualization<br />CPU and Memory Paravirtualization<br />extendstheguest to allow direct interaction wit...
Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level ...
Multi Tenancy<br />Goal: consolidate multiple customersontothesame operational system<br />best resourceutilization<br />f...
Extensibility: customer-specificschemachanges
Security: preventingunauthorizeddataaccessesbyothertenants
Performance/scalability: scale-up & scale-out
Maintenance: on tenantlevelinstead of on databaselevel</li></ul>34<br />
Flexible Schema Approaches<br />Goal: allowtenant-specificschemaadditions (columns)<br />Universal Table<br />Extension Ta...
Flexible Schema Approaches: Comparison<br />Best performance<br />Flexible schemaevolution<br />Pivottable<br />Extension ...
Access Control in Multi-Tenant DB<br />Shared DB approachesrequirerow-levelaccesscontrol<br />Query transformation.... whe...
In a Nutshell<br />How shall virtualization be handled on<br />Machine level (VM to HW)<br />DBMS level (database to insta...
Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level ...
MapReduce Background<br />40<br />Programming model and an associated implementation for large-scale data processing<br />...
Logic Flow of WordCount<br />Mapper<br />Hadoop Map/Reduce is a software framework for easily writing applications which p...
MapRecude Disadvantages<br />Extremely rigid data flow<br />Common operations must be coded by hand<br />join, filter, spl...
PigLatin<br />PigLatin<br />On top of map-reduce/ Hadoop<br />Mix of declarative style of SQL and procedural style of map-...
Example<br /><ul><li>Task:  Determine the most visited websites in each category.</li></ul>URL Info<br />Visits<br />44<br />
Implementation in MapReduce<br />45<br />
ExampleWorkflow in Pig-Latin<br />load URL Info<br />load Visits<br />visits = load ‘/data/visits’ as (user, url, time);<b...
 group, filter, foreach</li></ul>group by category<br />foreachcategory<br />generate top10 URLs<br />46<br />
Compilation in MapReduce<br />Every group or join operation forms a map-reduce boundary<br />Other operations pipelined in...
Data warehouse infrastructure built on top of Hadoop, providing:<br />Data Summarization<br />Ad hoc querying<br />Simple ...
Hive - Example<br />LOAD DATA INPATH `/data/visits` INTO TABLE visits<br />INSERT OVERWRITE TABLE visitCounts<br />SELECT ...
Higher level query language for JSON documents<br />Developed at IBM‘s Almaden research center<br />Supports several opera...
JAQL - Example<br />registerFunction(„top“, „de.tuberlin.cs.dima.jaqlextensions.top10“);<br />$visits= hdfsRead(„/data/vis...
Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level ...
ACID vs. BASE<br />Traditional distributeddatamanagement<br />Web-scaledatamanagement<br />ACID<br />BasicallyAvailableSof...
CAP Theorem [Brewer 2000]<br />Consistency: all clientshavethesameview, even in case of updates<br />Availability: all cli...
CAP Theorem<br />No consistencyguarantees➟ updateswithconflictresolution<br />On a partitionevent, simplywaituntildataisco...
CAP: Explanations<br />PA :=update(o)<br />PB:=read(o)<br />1.<br />3.<br />2.<br />M<br />Networkpartitions ➫ M isnotdeli...
Consistency Models [Vogels 2008]<br />A<br />B<br />C<br />update: D0->D1<br />read(D)<br />D0<br />Distributedstoragesyst...
Variations of EventualConsistency<br />Causalconsistency:<br />If A notifies B aboutthe update, B will read D1 (butnot C!)...
Database Replication<br />storethesamedata on multiple nodes in order to improvereliability, accessibility, fault-toleranc...
Allowsreplicas to diverge; requiresconflictresolution
Allowdatabeaccessedwithouta-priorisynchronization
Updates arepropagated in thebackground
Occasionalconflictsarefixedaftertheyhappen
Improvedavailability, flexibility, scalabability, butsee CAP theorem</li></ul>59<br />
OptimisticReplication: Elements<br />1<br />2<br />2<br />2<br />2<br />1<br />1<br />1<br />1<br />2<br />2<br />2<br />1...
Conflict Resolution & Update Propagation<br />Single master<br />Thomas writerule<br />Dividingobjects, ...<br />Vector cl...
Updates pass throughthesystemlikeinfectiousdiseases
Pairwisecommunication: a sitecontactsothers (randomlychosen) and sends ist information, e.g. aboutupdates
All sitesprocessmessages in thesame way
Proactivebehaviour: no failurerecoverynecessary!
Basic approaches:anti-entropy, rumor mongering, ...</li></li></ul><li>Outline<br />Query & Programming Model<br />Logical ...
The Notion of QoS and Predictability<br />Service Level Agreement<br />legal part<br />technical part<br />Service Level O...
Deadlineconstraints
Percentileconstraints
 fees, penalties, ...</li></ul>Common understandingaboutservices, guarantees, responsibilities<br />63<br />Application Se...
TechniquesforQoS in Data Management<br />64<br />Providesufficientresources<br />Capacityplanning: „Howmuchboxesforcustome...
Workload Management<br />Purpose:<br />achieveperformancegoalsforclasses of requests (queries, transactions)<br />Resource...
Workload Characteristics<br />Functional<br />I/O requirements (volume, bandwidth)<br />CPU<br />Degree of parallelism<br ...
WLM: Model<br />classes<br />workload classification<br />MPL<br />result<br />admission control &scheduling<br />transact...
Utility Functions<br />Utility function = preferencespecification<br />mappossiblesystemstates (e.g. resourceprovisioning ...
Explorespace of alternative mappings (searchproblem)
Runtimemonitoring and control</li></ul>utility<br />response time<br />68<br />Kephart, Das: Achievingself-management via ...
WorkloadModeling & Prediction<br />Goal: predictresourcerequirementsfor a givenworkload, <br />i.e., find correlationbetwe...
Calculate job coordinates in query plan projectionbased on job featurevector
Inferjob‘scoordinates on theperformanceprojection</li></ul>69<br />
Upcoming SlideShare
Loading in …5
×

Database as a Service - Tutorial @ICDE 2010

4,457 views

Published on

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,457
On SlideShare
0
From Embeds
0
Number of Embeds
74
Actions
Shares
0
Downloads
316
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • SAP Business Objects: Business Objects BI On-Demand
  • Database as a Service - Tutorial @ICDE 2010

    1. 1. Database as a ServiceSeminar, ICDE 2010, Long Beach, March 04<br />Wolfgang Lehner | Dresden University of Technology, Germany Kai-Uwe Sattler | Ilmenau University of Technology, Germany <br />1<br />
    2. 2. Introduction<br />Motivation<br />SaaS<br />Cloud Computing<br />UseCases<br />2<br />
    3. 3. Software as a Service (SaaS)<br />Traditional Software<br />On-DemandUtility<br />Plug In, SubscribePay-per-Use<br />Build Your Own <br />3<br />
    4. 4. Comparison of business model<br />4<br />
    5. 5. Avoidhiddencostof traditional SW<br />Traditional Software<br />SaaS<br />SW Licenses<br />Subscription Fee<br />Training<br />Training<br />Customization<br />Hardware<br />IT Staff<br />Maintenance<br />Customization<br />5<br />
    6. 6. The Long Tail<br />Dozens of markets of millions or millions of markets of dozens?<br />Your Large Customers<br />$ / Customer<br />What if you lower your cost of sale (i.e. lower barrier to entry) and you also lower cost of operations<br />Your Typical Customers<br />New addressable market >> current market<br />(Currently) “non addressable” Customers<br /># of Customers<br />6<br />
    7. 7. Acquisition Model<br />Service<br />Business Model<br />Pay for usage<br />Access ModelInternet<br />Technical ModelScalable, elastic, shareable<br />EC2 & S3<br />"All that matters is results — <br />I don't care how it is done"<br />Cloud Computing:<br />A style of computing where massively scalable, IT-enabled capabilities are provided "as a service" across the Internet to multiple external customers.<br />"I don't want to own assets — I want<br />to pay for elastic usage, like a utility"<br />"I want accessibility from anywhere from any device"<br />"It's about economies of scale, with effective and dynamic sharing"<br />What is Cloud? – Gartner’s Definition<br />7<br />
    8. 8. To Qualify as a Cloud<br />Common, Location-independent, Online Utility on Demand*<br />Common implies multi-tenancy, not single or isolated tenancy <br />Utility implies pay-for-use pricing<br />onDemandimplies ~infinite, ~immediate, ~invisible scalability<br /> Alternatively, a “Zero-One-Infinity” definition:**<br />0On-premise infrastructure, acquisition cost, adoption cost, support cost<br />1Coherent and resilient environment – not a brittle “software <br /> stack”<br />Scalability in response to changing need, Integratability/ <br /> Interoperability with legacy assets and other services Customizability/Programmability from data, through logic, up into the user interface without compromising robust <br /> multi-tenancy <br />* Joe Weinman, Vice President of Solutions Sales, AT&T, 3 Nov. 2008<br />** From The Jargon File: “Allow none of foo, one of foo, or any number of foo”<br />8<br />
    9. 9. Cloud Differentials: Service Models<br />9<br />Cloud Software as a Service (SaaS)<br />Use provider’s applications over a network <br />Cloud Platform as a Service (PaaS)<br />Deploy customer-created applications to a cloud <br />Cloud Infrastructure as a Service (IaaS)<br />Rent processing, storage, network capacity, and other fundamental computing resources<br />
    10. 10. Cloud Differentials: Characteristics<br />10<br />Platform<br />Physical – Virtual<br />Homogenous – Heterogeneous<br />Design Paradigms<br />Storage<br />CPU<br />Bandwidth<br />Usage Model<br />Exclusive<br />Shared<br />Pseudo-Shared<br />Size/Location<br />Large Scale(AWS, Google, BM/Google), <br />Small Scale(SMB, Academia)<br />Purpose<br />General Purpose<br />Special Purpose (e.g., DB-Cloud)<br />Administration/Jurisdiction<br />Public<br />Private<br />
    11. 11. UseCases: Large-Scale Data Analytics<br />Outsourceyourdata and usecloudresourcesforanalysis<br />Historical and mostlynon-criticaldata<br />Parallelizable, read-mostlyworkload, high variantworkloads<br />Relaxed ACID guarantees<br />Examples (HadoopPoweredBy):<br />Yahoo!: researchfor ad systems and Web search<br />Facebook: reporting and analytics<br />Netseer.com: crawling and log analysis<br />Journey Dynamics: trafficspeedforecasting<br />11<br />
    12. 12. UseCases: Database Hosting<br />Public datasets<br />Biologicaldatabases: a singlerepositoryinstead of > 700 separate databases<br />Semantic Web Data, Linkeddata, ...<br />Sloan Digital Sky Survey<br />TwitterCache<br />Already on Amazon AWS: <br />annotated human genomedata, <br />US census, <br />Freebase, ...<br />Archiving, Metadata Indexing, ...<br />12<br />
    13. 13. UseCases: Service Hosting<br />Data managementforSaaSsolutions<br />Run theservicesnearthedata<br />= ASP<br />Alreadymanyexistingapplications<br />CRM, e.g. Salesforce, SugarCRM<br />Web Analytics<br />Supply Chain Management<br />HelpDesk Management<br />Enterprise ResourcePlanning, e.g. SAP Business ByDesign<br />...<br />13<br />
    14. 14. Foundations & Architectures<br />Virtualization<br />Programmingmodels<br />Consistencymodels & replication<br />SLAs & Workloadmanagement<br />Security<br />14<br />
    15. 15. Topics covered in this Seminar<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />15<br />
    16. 16. Current Solutions<br />userperspective<br />one DB for all clients<br />one DB per client<br />Virtualization<br />Replication<br />16<br />DistributedStorage<br />physicalperspective<br />
    17. 17. ... it‘s simple!<br />17<br />
    18. 18. Virtualization<br />Separating the abstract view of computing resources from the implementation of these resources<br />addsflexibility and agility to the computing infrastructure<br />soften problems related to provisioning, manageability, …<br />lowers TCO: fewercomputingresources<br />Classicaldrivingfactor: serverconsolidation<br />18<br />E-mail server<br />Web server<br />Database server<br />E-mail server<br />Database server<br />Linux<br />Linux<br />Linux<br />Linux<br />Linux<br />EDBT2008 Tutorial (Aboulnaga e.a.)<br />Web server<br />Linux<br />Virtualization<br />Consolidate<br /> Improved utilization using consolidation<br />
    19. 19. Whatcanbevirtualized – thebigfour.<br />19<br />
    20. 20. Different TypesofVirtualization<br />20<br />APP 1<br />APP 4<br />APP 2<br />APP 3<br />APP 5<br />OPERATING SYSTEM<br />OPERATING SYSTEM<br />VIRTUAL MACHINE 1<br />VIRTUAL MACHINE 2<br />CPU<br />CPU<br />CPU<br />MEM<br />MEM<br />NET<br />VIRTUAL MACHINE MONITOR (VMM)<br />PHYSICAL STORAGE<br />PHYSICAL MACHINE<br />CPU<br />MEM<br />NET<br />CPU<br />CPU<br />
    21. 21. Virtual Machines<br />21<br />Technique with long history (since the 1960's)<br />Prominent since IBM 370 mainframeseries<br />Today<br />large scale<br />commodity hardware and operating systems<br />Virtual Machine Monitor (Hypervisor)<br />strong isolation between virtual machines (security, privacy, fault tolerance)<br />flexible mapping between virtual machines and physical resources<br />classical operationspause, resume, checkpoint, migrate (admin / load balancing)<br />Software deployment<br />Preconfigured virtual appliances<br />Repositories of virtual appliances on the web<br />
    22. 22. DBMS on top of Virtual Machines<br />... yetanotherapplication?<br />... Overhead?<br />SQL Server withinVMware<br />22<br />
    23. 23. Virtualization Design Advisor<br />What fraction of node resources goes to what DBMS?<br />Configuring VM parameters<br />What parameter settings are best for a given resource configuration<br />Configuringthe DBMS parameters<br />Example<br />Workload 1: TPC-H (10GByte)<br />Workload 2: TPC-H (10GByte) only Q18 (132 copies)<br />Virtualization design advisor<br />20% of CPU to Workload 1<br />80% of CPU to Workload 2<br />23<br />
    24. 24. Some Experiments<br />Workload Definition based on TPC-H<br />Q18 isoneofthemost CPU intensive queries<br />Q21 isoneofthe least CPU intensive queries<br />Workload Units<br />C: 25x Q18<br />I: 1x Q21<br />Experiment: Sensitivity to workloadResource Needs<br />W1 = 5C + 5I<br />W2 = kC + (10-k)I (increaseof k -> more CPU intensive)<br />Postgres<br />DB2<br />24<br />
    25. 25. Some Experiments (2)<br />Workload Settings<br />W3 = 1C<br />W4 = kC<br />Workload Settings<br />W5 = 1C<br />W6 = kI<br />25<br />
    26. 26. Virtualization in DBaaS environments<br />DB Layer<br />DB Server<br />DB Server<br />DB Server<br />DB<br />DB<br />DB<br />DB<br />DB<br />Instance <br />Layer<br />Instance<br />Instance<br />Instance<br />Instance<br />Instance<br />Instance<br />DB Server <br />Layer<br />VM<br />VM<br />VM<br />VM<br />VM<br />VM<br />VM Layer<br />HW Layer<br />26<br />
    27. 27. Existing Tools for Node Virtualization<br />DB Server<br />DB Layer<br />DB<br />DB<br />DB<br />DB<br />DB<br />DB Ad2visor<br /><ul><li>Indexes
    28. 28. MQTs
    29. 29. MDC
    30. 30. Redistribution of Tables</li></ul>DB Workload Manager<br />Instance <br />Layer<br />Instance<br />Instance<br />DB Server <br />Layer<br />Static Environment Assumptions<br /><ul><li> Advisor expects static hardware environment
    31. 31. VM expects static (peak) resource requirements
    32. 32. Interactions between layers can improve performance/utilization</li></ul>Node<br />Ressource Model<br />VM<br />VM<br />VM<br />VM Layer<br />VM Configuration<br /><ul><li>Monitoring
    33. 33. Resources Configuration
    34. 34. (manual) Migration</li></ul>HW Layer<br />27<br />
    35. 35. Layer Interactions (2)<br />Experiment<br />DB2 on Linux<br />TPC-H workload on 1GB database<br />Ranges for resource grants<br />Main memory (BP) – 50 MB to 1GB<br />Additional storage (Indexes) – 5% to 30% DB size<br />Varying advisor output (17-26 indexes)<br />Different possible improvement<br />Different expected Performance after improvement<br />DB Advisor<br />Expected Performance<br />Possible Improvement<br />Index <br />Storage<br />Index <br />Storage<br />35%<br />90%<br />25%<br />25%<br />20%<br />20%<br />15%<br />15%<br /><1%<br /><3%<br />10%<br />10%<br />VM Configuration<br />5%<br />5%<br />200<br />MB<br />400<br />MB<br />600<br />MB<br />800<br />MB<br />1<br />GB<br />200<br />MB<br />400<br />MB<br />600<br />MB<br />800<br />MB<br />1<br />GB<br />BP<br />BP<br />28<br />
    36. 36. Storage Virtualization<br />General Goal<br />provide a layerofindircetiontoallowthedefinitionofvirtualstoragedevices<br />minimize/avoiddowntime (local and remote mirroring)<br />improveperformance (distribution/balancing – provisioning - controlplacement)<br />reducecostofstorageadministration<br />Operations<br />create, destroy, grow, shrinkvirtualdevices<br />changesize, performance, reliability, ...<br />workloadfluctuations<br />hierarchicalstoragemanagement<br />versioning, snapshots, point-in-time copies<br />backup, checkpoints<br />exploit CPU and memory in the storage system<br />caching<br />executelow-level DBMS functions<br />29<br />
    37. 37. Virtualization in DBaaS Environments (2)<br />DB Layer<br />DB Server<br />DB Server<br />DB Server<br />DB<br />DB<br />DB<br />DB<br />DB<br />Instance <br />Layer<br />Instance<br />Instance<br />Instance<br />Instance<br />Instance<br />Instance<br />DB Server <br />Layer<br />VM<br />VM<br />VM<br />VM<br />VM<br />VM<br />VM Layer<br />Shared Disk<br />HW Layer<br />Storage Layer<br />30<br />Local Disk<br />
    38. 38. Virtualization in DBaaS Environments (2)<br />DB Layer<br />DB<br />DB<br />DB<br />DB<br />DB<br />DB Server<br />Instance <br />Layer<br />Instance<br />Instance<br />DB Server <br />Layer<br />VM<br />VM<br />VM<br />VM Layer<br />HW Layer<br />Storage Layer<br />31<br />DB Advisor<br /><ul><li>Indexes
    39. 39. MQTs
    40. 40. MDC
    41. 41. Redistribution of Tables</li></ul>DB Workload Manager<br />StorageRessource Model<br />Storage Configuration<br /><ul><li>Device Bundling
    42. 42. Replication
    43. 43. Archiving</li></ul>Shared Disk<br />Local Disk<br />
    44. 44. Onewaytogo? Paravirtualization<br />CPU and Memory Paravirtualization<br />extendstheguest to allow direct interaction withtheunderlyinghypervisor<br />reducesthemonitorcostincludingmemoryand System calloperations.<br />gainsfromparavirtualizationareworkloadspecific<br />Device Paravirtualization<br />places a highperformancevirtualization-aware device driver into the guest<br />paravirtualizeddriversaremoreCPU efficient (less CPU overhead forvirtualization)<br />Paravirtualizeddriverscanalso take advantage of HW features, like partial offload<br />
    45. 45. Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />33<br />
    46. 46. Multi Tenancy<br />Goal: consolidate multiple customersontothesame operational system<br />best resourceutilization<br />flexible,butlimitedscalability<br />separate DBper tenant<br />shared DBsharedschema<br />shared DBseparate schema<br /><ul><li>Requirements:
    47. 47. Extensibility: customer-specificschemachanges
    48. 48. Security: preventingunauthorizeddataaccessesbyothertenants
    49. 49. Performance/scalability: scale-up & scale-out
    50. 50. Maintenance: on tenantlevelinstead of on databaselevel</li></ul>34<br />
    51. 51. Flexible Schema Approaches<br />Goal: allowtenant-specificschemaadditions (columns)<br />Universal Table<br />Extension Table<br />PivotTable<br />35<br />
    52. 52. Flexible Schema Approaches: Comparison<br />Best performance<br />Flexible schemaevolution<br />Pivottable<br />Extension table<br />Chunkfolding<br />Private tables<br />Applicationownstheschema<br />Database ownstheschema<br />Universal table<br />XML columns<br />Universal table: <br />requirestechniquesforhandlingsparsedata<br />Fine-grainedindexsupportnotpossible<br />Pivottable:<br />Requiresjoinsforreconstructinglogicaltuples<br />Chunkfolding: similar to pivottables<br />Group of columnsarecombined in a chunk and mappedinto a chunktable<br />Requirescomplexquerytransformation<br />36<br />
    53. 53. Access Control in Multi-Tenant DB<br />Shared DB approachesrequirerow-levelaccesscontrol<br />Query transformation.... whereTenantID = 42 ...<br />Potential securityrisks<br />DBMS-levelcontrol, e.g. IBM DB2 LBAC<br />Label-based Access control<br />Controls read/writeaccess to individualrows and columns<br />Securitylabelswithpolicies<br />Requires separate accountforeachtenant<br />37<br />
    54. 54. In a Nutshell<br />How shall virtualization be handled on<br />Machine level (VM to HW)<br />DBMS level (database to instance to database server)<br />Schema level (multi tenancy)<br />... using …<br />Allocation between layers<br />Configuration inside layers<br />Flexible schemas<br />… when …<br />Characteristics of the workloads are known<br />Virtual machines are transparent<br />Tenant-specific schema extensions<br />… demanding that …<br />SLAs and security are respected<br />Each node’s utilization is maximized<br />Number of nodes is minimized<br />38<br />
    55. 55. Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />39<br />
    56. 56. MapReduce Background<br />40<br />Programming model and an associated implementation for large-scale data processing<br />Google and related approaches: Apache Hadoop and Microsoft Dryad<br />User-defined map & reduce functions<br />Infrastructure<br />hides details of parallelization<br />provides fault-tolerance, data distribution, I/O scheduling, load balancing, ...<br />map (in_key, in_value) -> (out_key, intermediate_value) list<br />reduce (out_key,intermediate_value list) -> out_value list<br />M<br />{ (key,value) }<br />R<br />M<br />R<br />M<br />
    57. 57. Logic Flow of WordCount<br />Mapper<br />Hadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner…<br />1  Hadoop Map/Reduce is a<br />Hadoop 1<br />Map  1<br />17  software framework for<br />Reduce  1<br />is  1<br />45  easily writing applications<br />a  1<br />…<br />…<br />Sort/Shuffle<br />Reducer<br />Hadoop [1, 1, 1, …,1]<br />Hadoop 5<br />Map  [1, 1, 1, …, 1]<br />Map  12<br />Reduce  [1, 1, 1, …, 1]<br />Reduce  12<br />is  [1, 1, 1, …, 1]<br />is  42<br />a  [1, 1, 1, …, 1]<br />a  23<br />
    58. 58. MapRecude Disadvantages<br />Extremely rigid data flow<br />Common operations must be coded by hand<br />join, filter, split, projection, aggregates, sorting, distinct<br />User plans may be suboptimal and lead to performance degradation<br />Semantics hidden inside map-reduce functions<br />Inflexible, difficult to maintain, extend and optimize<br />Combination of high-level declarative querying and low-level programming with MapReduce<br /> Dataflow Programming Languages<br />Hive, JAQL and Pig<br />M<br />R<br />42<br />
    59. 59. PigLatin<br />PigLatin<br />On top of map-reduce/ Hadoop<br />Mix of declarative style of SQL and procedural style of map-reduce<br />Consists of two parts<br />PigLatin: A Data Processing Language<br />Pig Infrastructure: An Evaluator for PigLatin<br /> programs<br />Pig compiles Pig Latin into physical plans <br />Plans are to be executed over Hadoop<br />30% of all queriesat Yahoo! in Pig-Latin<br />Open-source, http://incubator.apache.org/pig<br />43<br />
    60. 60. Example<br /><ul><li>Task: Determine the most visited websites in each category.</li></ul>URL Info<br />Visits<br />44<br />
    61. 61. Implementation in MapReduce<br />45<br />
    62. 62. ExampleWorkflow in Pig-Latin<br />load URL Info<br />load Visits<br />visits = load ‘/data/visits’ as (user, url, time);<br />gVisits = group visits byurl;<br />visitCounts = foreachgVisitsgenerateurl, count(visits);<br />urlInfo = load ‘/data/urlInfo’ as (url, category, pRank);<br />visitCounts = joinvisitCountsbyurl, urlInfobyurl;<br />gCategories = groupvisitCountsby category;<br />topUrls = foreachgCategoriesgenerate top(visitCounts,10);<br />store topUrls into ‘/data/topURLs’;<br />Operatedirectly over files.<br />group by url<br />foreachurl<br />generate count<br />Schemas optional. Can be assigned dynamically.<br />join on url<br />User-defined functions (UDFs) can be used in every construct<br /><ul><li> load, store
    63. 63. group, filter, foreach</li></ul>group by category<br />foreachcategory<br />generate top10 URLs<br />46<br />
    64. 64. Compilation in MapReduce<br />Every group or join operation forms a map-reduce boundary<br />Other operations pipelined into map and reduce phases<br />load URL Info<br />load Visits<br />Map1<br />Map2<br />group by url<br />Reduce1<br />foreachurl<br />generate count<br />join on url<br />Reduce2<br />Map3<br />group by category<br />Reduce3<br />foreachcategory<br />generate top10 URLs<br />47<br />
    65. 65. Data warehouse infrastructure built on top of Hadoop, providing:<br />Data Summarization<br />Ad hoc querying<br />Simple query language: Hive QL (based on SQL)<br />Extendable via custom mappers and reducers<br />Subproject of Hadoop<br />No „Hive format“<br />http://hadoop.apache.org/hive/<br />Hive<br />48<br />
    66. 66. Hive - Example<br />LOAD DATA INPATH `/data/visits` INTO TABLE visits<br />INSERT OVERWRITE TABLE visitCounts<br />SELECT url, category, count(*)<br />FROM visits<br />GROUP BY url, category;<br />LOAD DATA INPATH ‘/data/urlInfo’ INTO TABLE urlInfo<br />INSERT OVERWRITE TABLE visitCounts<br />SELECT vc.*, ui.*<br />FROM visitCountsvc JOIN urlInfoui ON (vc.url = ui.url);<br />INSERT OVERWRITE TABLE gCategories<br />SELECT category, count(*)<br />FROM visitCounts<br />GROUP BY category;<br />INSERT OVERWRITE TABLE topUrls<br />SELECT TRANSFORM (visitCounts) USING ‘top10’;<br />49<br />
    67. 67. Higher level query language for JSON documents<br />Developed at IBM‘s Almaden research center<br />Supports several operations known from SQL<br />Grouping, Joining, Sorting<br />Built-in support for<br />Loops, Conditionals, Recursion<br />Custom Java methods extend JAQL<br />JAQL scripts are compiled to MapReduce jobs<br />Various I/O<br />Local FS, HDFS, Hbase, Custom I/O adapters<br />http://www.jaql.org/<br />JAQL<br />50<br />
    68. 68. JAQL - Example<br />registerFunction(„top“, „de.tuberlin.cs.dima.jaqlextensions.top10“);<br />$visits= hdfsRead(„/data/visits“);<br />$visitCounts=<br />$visits<br />-> groupby $url = $<br />into { $url, num: count($)};<br />$urlInfo= hdfsRead(„data/urlInfo“);<br />$visitCounts=<br />join $visitCounts, $urlInfo<br />where $visitCounts.url == $urlInfo.url;<br />$gCategories=<br />$visitCounts<br />-> group by $category = $<br /> into {$category, num: count($)};<br />$topUrls= top10($gCategories);<br />hdfsWrite(“/data/topUrls”, $topUrls);<br />51<br />
    69. 69. Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />52<br />
    70. 70. ACID vs. BASE<br />Traditional distributeddatamanagement<br />Web-scaledatamanagement<br />ACID<br />BasicallyAvailableSoft-stateEventualconsistent<br />Strongconsistency<br />Isolation<br />Focus on „commit“<br />Availability?<br />Pessimistic<br />Difficultevolution (e.g. schema)<br />Weakconsistency<br />Availabilityfirst<br />Best effort<br />Optimistic (aggressive)<br />Fast and simple<br />Easierevolution<br />53<br />
    71. 71. CAP Theorem [Brewer 2000]<br />Consistency: all clientshavethesameview, even in case of updates<br />Availability: all clients find a replica of data, even in thepresence of failures<br />Tolerance to networkpartitions: systemproperties hold evenwhenthenetwork (system) ispartitioned<br />Youcanhave at mosttwoof thesepropertiesforanyshared-data system.<br />54<br />
    72. 72. CAP Theorem<br />No consistencyguarantees➟ updateswithconflictresolution<br />On a partitionevent, simplywaituntildataisconsistentagain➟ pessimisticlocking<br />All nodesare in contactwitheachotherorputeverything in a single box➟ 2 phasecommit<br />55<br />
    73. 73. CAP: Explanations<br />PA :=update(o)<br />PB:=read(o)<br />1.<br />3.<br />2.<br />M<br />Networkpartitions ➫ M isnotdelivered<br />Solutions?<br />Synchronousmessage: <PA,M> isatomic<br />Possiblelatencyproblems (availability)<br />Transaction <PA, M, PB>: requires to controlwhen PBhappens<br />Impacts partitiontoleranceoravailability<br />56<br />
    74. 74. Consistency Models [Vogels 2008]<br />A<br />B<br />C<br />update: D0->D1<br />read(D)<br />D0<br />Distributedstoragesystem<br />Strongconsistency: <br />afterthe update completes, anysubsequentaccessfrom A, B, C will return D1<br />Weakconsistency: <br />doesnotguaranteethatsubsequentaccesses will returnD1 -> a number of conditionsneed to bemetbeforeD1 isreturned<br />Eventualconsistency: <br />Special form of weakconsistency<br />Guaranteesthatif no newupdatesaremade, eventually all accesses will returnD1<br />57<br />
    75. 75. Variations of EventualConsistency<br />Causalconsistency:<br />If A notifies B aboutthe update, B will read D1 (butnot C!)<br />Read-your-writes:<br />A will alwaysread D1afteritsown update<br />Session consistency:<br />Read-your-writesinside a session<br />Monotonicreads:<br />If a process has seenDk, anysubsequentaccess will neverreturnany Diwith i < k<br />Monotonicwrites: <br />guarantees to serializethewrites of thesameprocess<br />58<br />
    76. 76. Database Replication<br />storethesamedata on multiple nodes in order to improvereliability, accessibility, fault-tolerance<br />Single master<br />Multimaster<br />Optimisticreplication<br />relaxedconsistency<br />1-copy consistency<br /><ul><li>Optimisticstrategies = lazyreplication
    77. 77. Allowsreplicas to diverge; requiresconflictresolution
    78. 78. Allowdatabeaccessedwithouta-priorisynchronization
    79. 79. Updates arepropagated in thebackground
    80. 80. Occasionalconflictsarefixedaftertheyhappen
    81. 81. Improvedavailability, flexibility, scalabability, butsee CAP theorem</li></ul>59<br />
    82. 82. OptimisticReplication: Elements<br />1<br />2<br />2<br />2<br />2<br />1<br />1<br />1<br />1<br />2<br />2<br />2<br />1<br />1<br />1. operationsubmission<br />3. scheduling<br />2. propagation<br />1+2<br />1+2<br />1+2<br />4. conflictresolution<br />5. commitment<br />60<br />Y. Saito, M. Shapiro: OptimisticReplication, ACM ComputingSurveys, 5(3):1-44, 2005<br />
    83. 83. Conflict Resolution & Update Propagation<br />Single master<br />Thomas writerule<br />Dividingobjects, ...<br />Vector clocks<br />App-specificorderingorpreconditions<br />Prohibit<br />Ignore<br />Reduce<br />Syntactic<br />Semantic<br />Detect & repair<br />61<br /><ul><li>Epidemicinformationdissemination
    84. 84. Updates pass throughthesystemlikeinfectiousdiseases
    85. 85. Pairwisecommunication: a sitecontactsothers (randomlychosen) and sends ist information, e.g. aboutupdates
    86. 86. All sitesprocessmessages in thesame way
    87. 87. Proactivebehaviour: no failurerecoverynecessary!
    88. 88. Basic approaches:anti-entropy, rumor mongering, ...</li></li></ul><li>Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />62<br />
    89. 89. The Notion of QoS and Predictability<br />Service Level Agreement<br />legal part<br />technical part<br />Service Level Objectives<br /><ul><li>Specificmeasurablescharacteristics; e.g. importance, performancegoals
    90. 90. Deadlineconstraints
    91. 91. Percentileconstraints
    92. 92. fees, penalties, ...</li></ul>Common understandingaboutservices, guarantees, responsibilities<br />63<br />Application Server / middleware<br />DBMS<br />OS / Hardware<br />
    93. 93. TechniquesforQoS in Data Management<br />64<br />Providesufficientresources<br />Capacityplanning: „Howmuchboxesforcustomer X?“<br />Cost vs. Performance tradeoff<br />Shielding<br />Dedicated (virtual) systemforcustomers<br />Scalability? Costefficiency?<br />Scheduling<br />Orderingrequests on priority<br />At whichlevel? <br />
    94. 94. Workload Management<br />Purpose:<br />achieveperformancegoalsforclasses of requests (queries, transactions)<br />Resourceprovisioning<br />Aspects:<br />Specification of service-levelobjectives<br />Workloadclassification and modeling<br />Admissioncontrol & scheduling<br />Staticpriorization: DB2 Query Patroller, Oracle Resource Manager, ...<br />Goal-orientedapproaches<br />Economicapproaches<br />Utility-basedapproaches<br />65<br />
    95. 95. Workload Characteristics<br />Functional<br />I/O requirements (volume, bandwidth)<br />CPU<br />Degree of parallelism<br />Response times?<br />Throughput?<br />…<br />Non-Functional<br />Availability<br />Reliability<br />Durability<br />Scalability<br />…<br />66<br />
    96. 96. WLM: Model<br />classes<br />workload classification<br />MPL<br />result<br />admission control &scheduling<br />transaction<br />response time<br />Admission control: limit the number of simultanously executing requests (multiprogramming level = MPL)<br />Scheduling: ordering requests by priority<br />67<br />
    97. 97. Utility Functions<br />Utility function = preferencespecification<br />mappossiblesystemstates (e.g. resourceprovisioning to jobs) to a real scalarvalue<br />Representsperformancefeature (response time, throughput, ...) and/oreconomicvalue<br /><ul><li>Goal: determinethemostvaluablefeasiblestate, i.e. maximizeutility
    98. 98. Explorespace of alternative mappings (searchproblem)
    99. 99. Runtimemonitoring and control</li></ul>utility<br />response time<br />68<br />Kephart, Das: Achievingself-management via utilityfunctions. IEEE Internet Computing 2007<br />
    100. 100. WorkloadModeling & Prediction<br />Goal: predictresourcerequirementsfor a givenworkload, <br />i.e., find correlationbetweenqueryfeatures and performancefeatures<br />Approaches: regression, correlationanalysis, KernelCanonical CA<br />queryplans/job descr.<br />jobfeaturematrix<br />query planprojection<br />KCCA<br />performancestatistics<br />performancefeaturematrix<br />performanceprojection<br />Ganapathi et al.: Predicting Multiple MetricsforQueries: BetterDecisionsEnabledbyMachineLearning. ICDE 2009<br /><ul><li>Prediction:
    101. 101. Calculate job coordinates in query plan projectionbased on job featurevector
    102. 102. Inferjob‘scoordinates on theperformanceprojection</li></ul>69<br />
    103. 103. Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />70<br />
    104. 104. OverviewandChallenges<br />outsourcing<br />Data Pre-<br />processor<br />Private informationretrieval / Access privacy<br />Data Owner<br />queries<br />Data confiden- tiality/ privacy<br />Query<br />Engine<br />Query Pre/Post- <br />processor<br />queryresults<br />User<br />Completenessandcorrectness<br />Service Provider<br />(un-trusted)<br />71<br />
    105. 105. Challenges I – Data Confidentiality/ Privacy<br />Need to store data in the cloud<br />But we do not trust the service providers for sensitive information<br />encrypt the data and store it<br />but still be able to run queries over the encrypted data <br />do most of the work at the server<br />Two issues<br />Privacy during transmission (wells studied, e.g. through SSL/TLS)<br />Privacy of stored data<br />Querying over encrypted data is challenging<br />needs to maintain content information on the server side, e.g. rangequeriesrequire order preserving data encryption mechanisms<br />privacyperformancetradeoff<br />72<br />
    106. 106. Query Processing on Encrypted Data<br />Metadata<br />server-side<br />query<br />Query<br />Translator<br />original<br />query<br />Query<br />Engine<br />Temporary<br />Result<br />client-side<br />query<br />encrypted<br />results<br />result<br />Query<br />Executor<br />User<br />Service Provider<br />(un-trusted)<br />Client Site<br />73<br />
    107. 107. Executing SQL over Encrypted Data Hacigumus et al., (SIGMOD 2002)<br />Main Steps<br />Partition sensitive domains<br />Order preserving: supportscomparison<br />Random: query rewriting becomes hard<br />Rewritequeries to targetpartitions<br />Execute queries and return results<br />Prune/post-processresults on client<br />Privacy-Precision Trade-off<br />Larger segments/partitions<br />  increasedprivacy<br /> decreasedprecision<br />  increasedoverheads in query<br />processing<br />74<br />
    108. 108. Relational Encryption<br />Service Provider Site<br />arbitrary encryption function,<br />e.g. AES, RSA, Blowfish, DES, …<br />Bucket Ids<br /><ul><li>Store an etuple for each tuple in the original table
    109. 109. Create a coarse index for each (or selected) attribute(s) in the original table</li></ul>75<br />
    110. 110. Index and Identification Functions<br />2<br />7<br />5<br />1<br />4<br /><ul><li>Partition function divides domain values into partitions (buckets)</li></ul> Partition (R.A) = { [0,200], (200,400], (400,600], (600,800], (800,1000] }<br /><ul><li>partitioning function has an impact on performance as well as privacy
    111. 111. Identification function assigns a partition id to each partition of attribute A</li></ul>identR.A( (200,400] ) = 7<br /><ul><li>Any function can be use as identification function, e.g., hash functions</li></ul>Partition (Bucket) ids<br />Meta-<br />data<br />=<br />200<br />0<br />400<br />600<br />800<br />1000<br />Domain Values<br />76<br />
    112. 112. Challenges II – Private Information Retrieval (PIR)<br />User queries should be invisible to service provider<br />More formal<br />database is modeled as a string x of length N stored at remote server<br />user wants to retrieve the bit xi for some i<br />without disclosing any information about i to the server<br />Paradox<br />imagine buying in a store without the seller knowing what you buy<br />X<br />i<br />x1, x2, …, xn<br />xi<br />User<br />77<br />
    113. 113. Information-Theoretic 2-server PIR<br />a1 = xl<br />l ϵQ1<br />Q1∈{1,…,n}<br />i<br />n<br />Service Provider 1<br />0<br />0<br />1<br />1<br />0<br />0<br />1<br />1<br />1<br />0<br />0<br />0<br />Q2=Q1 i<br />i<br />l ϵQ2<br />xi = a1 a2<br />Service Provider 2<br />User<br />+<br />+<br />+<br />+<br />a2 = xl<br />78<br />
    114. 114. Conclusion & Outlook<br />CurrentInfrastructures<br />MS Azure<br />Amazon RDS + SimpleDB<br />Amazon Dynamo<br />Google BigTable<br />Yahoo! PNUTS<br />Conclusion<br />Challenges & Trends<br />79<br />
    115. 115. Current Solutions<br />one DB for all clients<br />one DB per client<br />AmazonSimpleDB / Dynamo<br />Amazon RDS<br />Yahoo! PNUTS<br />Google Bigtable,Cassandra, Voldemort<br />Amazon S3<br />Microsoft SQL Azure<br />Virtualization<br />Replication<br />DistributedStorage<br />80<br />
    116. 116. Microsoft SQL Azure<br />Cloud databaseserviceforAzureplatform<br />Allows to create SQL server = group of databasesspreadacross multiple physicalmachines (incl. geo-location)<br />Supports relational model and T-SQL (tables, views, indices, triggers, storedprocedures)<br />Deployment and administrationusing SQL Server Management Studio<br />Currentlimitations<br />Individualdatabasesize = max. 10 GB<br />No supportfor CLR, distributedqueries & transactions, spatialdata<br />81<br />
    117. 117. Microsoft SQL Azure: Details<br />Databases<br />implemented as replicateddatapartitions<br />Across multiple physicalnodes<br />Provideloalbalancing and failover<br />API<br />SQL, ADO.NET, ODBC<br />Tabular Data Streams<br />SQL Server Authentication<br />Sync Framework<br />Prices<br />1 GB database: $9.99/month, 10 GB: $99.99/month + datatransfer<br />SLA: 99.9% availability<br />82<br />
    118. 118. Microsoft Azure: Other Services<br />AzureBlob<br />Blobstorage; PUT/GET interface via REST<br />Azure Table<br />Structuredstorage; LINQ, ADO.NET interface<br />Name<br />Customer #1<br />Address<br />Customer<br />Customer #2<br />StorageAccount<br />Property<br />Entity<br />Order<br />Table<br /><ul><li>Propertiescanbedefined per entity; Max size of entity: 1 MB
    119. 119. Partition key: usedforassigningentities to partitions; Rowkey: unique ID within a partition
    120. 120. Sort order: singleindex per table
    121. 121. Atomictransactionswithin a partition</li></ul>83<br />
    122. 122. Amazon RDS<br />Amazon Relational Database Services<br />Web Service to set up and operate a MySQLdatabase<br />Full-featuredMySQL 5.1<br />Automateddatabasebackup<br />Java-basedcommandlinetools and Web Service API forinstanceadministration<br />Native DB access<br />Prices:<br />Small DB instance (1.7 GB memory, 1 ECU): $0.11/hour<br />Largest DB instance (68 GB, 26 ECU): $3.10/hour<br />+ $0.10 GB-monthstorage<br />+ datatransfer<br />84<br />
    123. 123. Amazon Data Services <br />Amazon Simple Storage Service (S3)<br />DistributedBlobstorageforobjects (1 Byte ... 5 GB data)<br />REST-basedinterface to read, write, and deleteobjectsidentifiedbyunique, user-definedkey<br />Atomicsingle-keyupdates; no locking<br />Eventualconsistency (partiallyread-after-write)<br />Aug 2009: morethan 64 billionobjects<br />AmazonSimpleDB (= Amazon Dynamo???)<br />Distributedstructuredstorage<br />Web Service API foraccess<br />Eventualconsistency<br />85<br />
    124. 124. AmazonSimpleDB<br />Data model<br />Relational-likedatamodel: domain = collection of itemsdescribedbykey-valuepairs; maxsize 10 GB<br />Attributes canbeadded to certainrecords (256 per record)<br />Name: Wolfgang<br />Customer #1<br />City: Dresden<br />Customer<br />Customer #2<br />StorageAccount<br />Attribute: Value<br />Item<br />Order<br />Domain<br /><ul><li>Queries
    125. 125. Restricted to a singledomain
    126. 126. SFWsyntax + count() + multi-attributepredicates
    127. 127. Onlystring-valueddata: lexicographicalcomparisons</li></ul>86<br />
    128. 128. Amazon Dynamo<br />Highlyavailable and scalablekey-valuedatastorefortheAmazonplatform<br />Managesthestate of Amazonservices<br />Providingbestsellerlists, shoppingcarts, customerpreferences, productcatalogs -> requireonlyprimary-keyaccess (e.g. productid, customerid)<br />Completelydecentralized, minimal needformanualadministration (e.g. partitioning, redistribution)<br />Assumptions:<br />Simple querymodel: put/getoperations on keys, smallobjects (< 1MB)<br />Weakerconsistencybut high availability („alwayswritable“ datastore), no isolationguarantees<br />Efficiency: running on commodityhardware, guaranteedlatency = SLAs, e.g. 300 ms response time for 99.9% of requests, peakload of 500 requests/sec.<br />87<br />
    129. 129. Dynamo: Partitioning and Replication<br />Partitioningscheme<br />based on consistenthashing<br />Virtualnodes: eachphysicalnodeisresponsibleformorethanonevirtualnode<br />Replication<br />Eachdataitemisreplicated at n nodes<br />A<br />Key space = ring<br />B<br />E<br />Responsibility ofnode C<br />C<br />Replicas of keys<br />Fromrange (B,C)<br />D<br />88<br />
    130. 130. Dynamo: Data Versioning<br />Provideseventualconsistency -> asynchronouspropagation of updates<br />Updates result in a newversion of thedata<br />Vector clocksforcapturingcausalitiesbetween different versions of thesameobject<br />Vector clock = list of (node, counter)<br />Determinecausalordering/parallelbranches of versions<br />Update requestshave to specifywhichversionis to beupdated<br />Reconciliationduringclientreads! <br />reconcile(D)@NA<br />write(D)@NB<br />write(D)@NA<br />write(D)@NA<br />D3([NA,2],[NB,1])<br />D1([NA,1])<br />D2([NA,2])<br />D5([NA,3],[NB,1],[NC,1])<br />write(D)@NC<br />D4([NA,2],[NC,1])<br />89<br />
    131. 131. Dynamo: Replicamaintenance<br />Consistencyamongreplicas:<br />Quorum protocol: R nodesmustparticipate in a read, W nodes in a write; R + W > N<br />Sloppyquorum:<br />Read/writesareperformed on thefirst N healthynodes<br />Preference list: list of nodeswhichareresponsibleforstoring a givenkey<br />For highestavailability: W=1<br />Replicasynchronization<br />Anti-entropy:<br />Merkle trees: <br />hashtreeswhereleavesarehashes of keys, non-leavesarehashes of children<br />Ifhashvalues of twonodesareequal, no need to check children<br />90<br />
    132. 132. Google BigTable<br />Fast and large-scale DBMS for Google applications and services<br />Designed to scaleinto PB range<br />Usesdistributed Google File System (GFS) forstoringdata and log files<br />Depends on a clustermanagementsystemformanagingresource, monitoringstates, scheduling, ....<br />Canbeused as inputsource and outputtargetforMapReduceprograms<br />91<br />
    133. 133. BigTable: Data Model<br />Bigtable = sparse, distributed, multi-dimensional sortedmap<br />Indexedbyrowkey, columnkey, timestamp; value = array of bytes<br />Rowkeys up to 64 KB; columnkeysgrouped in columnfamilies<br />Timestamp (64 bitint) usedforversioning<br />Data ismaintained in lexicographic order byrowkeys<br />Rowrangeisdynamicallypartitioned ➪ tablet = unit of distribution and loadbalancing<br />Read/writeopsunder a singlerowkeyareatomic<br />value<br />columnkey<br />rowkey<br />t1<br />t2<br />92<br />
    134. 134. BigTable: System Architecture<br />Single-masterdistributedstoragesystem<br />masterserverresponsiblefor<br />Assigningtablets to tabletservers<br />Loadbalancing on tabletservers<br />Detectingaddition and expiration of tabletservers<br />Garbagecollection of GFS files<br />Tabletservers<br />Manage sets of tablets (10...1000 tablets per server, 100..200 MB per tablet)<br />Handle read/writerequests<br />Split tables<br />Distributed, persistentlock/nameserviceChubby<br />usesPaxosforreplicaconsistency (5 replicas)<br />Providesnamespaceconsisting of directories and files; allowsdiscovering of tabletservers<br />93<br />
    135. 135. BigTable: Tablets<br />Internallystored in SSTables<br />Immutable, sortedfile of key-valuepairs; organized in 64KB blocks + index (block ranges)<br />TabletLocation<br />Chubbycontainslocation of roottablet<br />Roottabletcontainslocation of all tablets in a METADATA table<br />METADATA tabletcontainslocation of usertablets + end keyrow (sparseindex)<br />Three-levelschemeaddresses 234tablets<br />Cachedbyclientlibrary<br />User tables<br />METADATA tablet<br />Roottablet<br />Chubbyfile<br />94<br />
    136. 136. BigTable: Tablets /2<br />TabletAssignment<br />Startingtabletserversacquire an exclusive lock in Chubby -> allowsdiscovery of tabletservers<br />Periodicallychecksbythemaster on the lock status of tabletservers<br />Replication of dataperformedby GFS <br />TabletServing<br />Updates (mutations) arelogged and thenapplied to an in-memoryversion (memtable)<br />Compactions<br />ConvertmemtableintoSSTable<br />MergeSSTables<br />95<br />
    137. 137. Yahoo! PNUTS<br />Yahoo!‘sdataservingplatform<br />Data & querymodel:<br />Simple relational model: tables of recordswithattributes (incl. Blobtypes)<br />Flexible schemaevolutionbyaddingattributes at any time<br />Queries: single-tableselection & projection<br />Updates & deletionsbased on primary-keyaccess<br />Storagemodel:<br />Records as parsed JSON objects<br />Filesystem-basedhashtablesorMySQLInnoDBengine<br />96<br />
    138. 138. PNUTS Architecture<br />Clients<br />REST API<br />Tablet <br />controller<br />Routers<br />Message<br />Broker<br />Storage units<br />97<br />
    139. 139. PNUTS: Consistency & Replication<br />Consistencymodel:<br />Per-recordtimelineconsistency: all replicasapply all updates in thesame order<br />User-specificguarantees: ready-any, read-latest, read-newer-than, writes, write-after-version<br />Partitioning and replication:<br />Tableshorizontallypartionedintotablets (100 MB ...10 GB)<br />Eachserverisresponsiblefor 100+ tables<br />Asynchronousreplicationbyusingmessagebroker (publish/subscribe)<br />Guarantees delivery of messages (incl. Logging)<br />Provides partial ordering of messages<br />Record-levelmembership + mastership-migrationprotocol<br />98<br />
    140. 140. Comparison<br />99<br />
    141. 141. Conclusion<br />DBaaS = outsourcingdatabases to reduce TCO<br />Reduce operational / administrationcosts<br />Pay as yougomodel<br />Widespectrum of solutions<br />„rent a database“<br />Cloud databases <br />Usecases<br />Database hosting<br />Hostedservices<br />Large-scaledataanalytics<br />100<br />
    142. 142. Challenges & Trends<br />101<br />Expressiveness:<br /><ul><li>Limitingfunctionality: SQL vs. put/get vs. MR</li></ul>Service-levelagreements:<br /><ul><li>Shielding: one (virtual) box per client
    143. 143. Limitingfunctionality: SQL vs. put/getoperations
    144. 144. Workloadmanagement</li></ul>Resourceprovisioning:<br /><ul><li>Virtualization on system and databaselevel</li></ul>Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Service Level Agreements<br />Storage Model<br />Confidentiality and trust<br /><ul><li>Data encryption
    145. 145. Information distribution</li></ul>DistributedStorage<br />Scalability and availability<br /><ul><li>Throughredundancy and partitioning
    146. 146. Butmayaffectconsistencymodel</li></li></ul><li>References<br />102<br />F. Chang et al.: Bigtable: A DistributedStorage System forStructured Data, OSDI 2006.<br />B.F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, Nick Puz , Daniel Weaver , RamanaYerneni, PNUTS: Yahoo!'shosteddataservingplatform, Proceedings of the VLDB Endowment, v.1 n.2, August 2008 <br />R. Baldoni, M. Raynal: Fundamentals of DistributedComputing: A Practical Tour of Vector Clock Systems, IEEE Distributed Systems Online, 2002<br />E. Brewer: Towards Robust Distributed Systems, PODC 2000<br />S. Gilbert, N. Lynch: Brewer‘sConjecture and theFeasibility of Consistent, Available, Partition-Tolerant Web Services, ACM SIGACT News, 2002<br />W. Vogels: EventuallyConsistent – Revisited, ACM Queue 6(6), 2008<br />D. Karger et al.: ConsistentHashing and RandomTrees: DistributedCachingProtocolsforRelieving Hot Spots on the World Wide Web, STOC '97<br />Y. Saito, M. Shapiro: OptimisticReplication, ACM ComputingSurveys, 5(3):1-44, 2005<br />S. Aulbach, T. Grust, D. Jacobs, A. Kemper, J. Rittinger: Multi-tenantdatabasesforsoftware as a service: schema-mappingtechniques. SIGMOD Conference 2008: 1195-1206<br />
    147. 147. References<br />103<br />G. DeCandia et al.: Dynamo: Amazon‘sHighlyAvailableKey-value Store, SOSP’07<br />P. Bernstein et al.: Data Management Issues in Supporting Large-scale Web Services, IEEE Data Engineering Bulletin, Dec. 2006<br />M. Brantner et al.: Building a Database on S3, SIGMOD’08<br />A. Aboulnaga, C. Amza, K. Salem: Virtualization and databases: state of the art and research challenges. EDBT 2008: 746-747<br />A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, S. Kamath: Automatic virtualmachineconfigurationfordatabaseworkloads. SIGMOD Conference 2008: 953-966<br />C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Piglatin: a not-so-foreignlanguagefordataprocessing, Proceedingsofthe 2008 ACM SIGMOD international conference on Management ofdata, June 09-12, 2008, Vancouver, Canada<br />R. Pike, S. Dorward, R. Griesemer, Se. Quinlan, Interpretingthedata: Parallel analysiswithSawzall, Scientific Programming, v.13 n.4, p.277-298, October 2005<br />
    148. 148. References<br />104<br />R. Chaiken, B. Jenkins , P Larson, B. Ramsey, D. Shakib, S. Weaver, J. Zhou, SCOPE: easy and efficient parallel processing of massive datasets, Proceedings of the VLDB Endowment, v.1 n.2, August 2008<br />B. Hore, S. Mehrotra, G. Tsudik, A privacy-preservingindexforrangequeries, Proceedings of theThirtieth international conference on Very large databases, p.720-731, August 31-September 03, 2004, Toronto, Canada<br />H. Hacigümüş, B. Iyer, C. Li, S. Mehrotra, Executing SQL overencrypteddata in thedatabase-service-providermodel, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin<br />D. Agrawal, A. El Abbadi, F. Emekçi, A. Metwally: Database Management as a Service: Challenges and Opportunities. ICDE 2009: 1709-1716<br />A. Shamir, How to share a secret, Communications of the ACM, v.22 n.11, p.612-613, Nov. 1979 <br />F. Kerschbaum, J. Vayssière, Privacy-preservingdataanalytics as an outsourcedservice, Proceedings of the 2008 ACM workshop on Secure web services, October 31-31, 2008, Alexandria, Virginia, USA<br />B. Chor, O. Goldreich, E. Kushilevitz , M. Sudan, Private informationretrieval, Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS'95), p.41, October 23-25, 1995<br />
    149. 149. Who has thefirstquestion?<br />105<br />?<br />wolfgang.lehner@tu-dresden.dekus@tu-ilmenau.de<br />

    ×