SlideShare a Scribd company logo
1 of 102
Database as a ServiceSeminar, ICDE 2010, Long Beach, March 04 Wolfgang Lehner | Dresden University of Technology, Germany Kai-Uwe Sattler | Ilmenau University of Technology, Germany  1
Introduction Motivation SaaS Cloud Computing UseCases 2
Software as a Service (SaaS) Traditional Software On-DemandUtility Plug In, SubscribePay-per-Use Build Your Own  3
Comparison of business model 4
Avoidhiddencostof traditional SW Traditional Software SaaS SW Licenses Subscription Fee Training Training Customization Hardware IT Staff Maintenance Customization 5
The Long Tail Dozens of markets of millions or millions of markets of dozens? Your Large Customers $ / Customer What if you lower your cost of sale (i.e. lower barrier to entry) and you also lower cost of operations Your Typical Customers New addressable market >> current market (Currently) “non addressable” Customers # of Customers 6
Acquisition Model Service Business Model Pay for usage Access ModelInternet Technical ModelScalable, elastic, shareable EC2 & S3 "All that matters is results —  I don't care how it is done" Cloud Computing: A style of computing where massively scalable, IT-enabled capabilities are provided "as a service" across the Internet to multiple external customers. "I don't want to own assets — I want to pay for elastic usage, like a utility" "I want accessibility from anywhere from any device" "It's about economies of scale, with effective and dynamic sharing" What is Cloud? – Gartner’s Definition 7
To Qualify as a Cloud Common, Location-independent, Online Utility on Demand* Common implies multi-tenancy, not single or isolated tenancy  Utility implies pay-for-use pricing onDemandimplies ~infinite, ~immediate, ~invisible scalability  Alternatively, a “Zero-One-Infinity” definition:** 0On-premise infrastructure, acquisition cost, adoption cost,	support cost 1Coherent and resilient environment – not a brittle “software  	stack” Scalability in response to changing need, Integratability/           	Interoperability with legacy assets and other services	Customizability/Programmability from data, through logic, 	up into the user interface without compromising robust  	multi-tenancy  * Joe Weinman, Vice President of Solutions Sales, AT&T, 3 Nov. 2008 ** From The Jargon File: “Allow none of foo, one of foo, or any number of foo” 8
Cloud Differentials: Service Models 9 Cloud Software as a Service (SaaS) Use provider’s applications over a network  Cloud Platform as a Service (PaaS) Deploy customer-created applications to a cloud  Cloud Infrastructure as a Service (IaaS) Rent processing, storage, network capacity, and other fundamental computing resources
Cloud Differentials: Characteristics 10 Platform Physical – Virtual Homogenous – Heterogeneous Design Paradigms Storage CPU Bandwidth Usage Model Exclusive Shared Pseudo-Shared Size/Location Large Scale(AWS, Google, BM/Google),  Small Scale(SMB, Academia) Purpose General Purpose Special Purpose (e.g., DB-Cloud) Administration/Jurisdiction Public Private
UseCases: Large-Scale Data Analytics Outsourceyourdata and usecloudresourcesforanalysis Historical and mostlynon-criticaldata Parallelizable, read-mostlyworkload, high variantworkloads Relaxed ACID guarantees Examples (HadoopPoweredBy): Yahoo!: researchfor ad systems and Web search Facebook: reporting and analytics Netseer.com: crawling and log analysis Journey Dynamics: trafficspeedforecasting 11
UseCases: Database Hosting Public datasets Biologicaldatabases: a singlerepositoryinstead of > 700 separate databases Semantic Web Data, Linkeddata, ... Sloan Digital Sky Survey TwitterCache Already on Amazon AWS:   annotated human genomedata,  US census,  Freebase, ... Archiving, Metadata Indexing, ... 12
UseCases: Service Hosting Data managementforSaaSsolutions Run theservicesnearthedata = ASP Alreadymanyexistingapplications CRM, e.g. Salesforce, SugarCRM Web Analytics Supply Chain Management HelpDesk Management Enterprise ResourcePlanning, e.g. SAP Business ByDesign ... 13
Foundations & Architectures Virtualization Programmingmodels Consistencymodels & replication SLAs & Workloadmanagement Security 14
Topics covered in this Seminar Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 15
Current Solutions userperspective one DB for all clients one DB per client Virtualization Replication 16 DistributedStorage physicalperspective
... it‘s simple! 17
Virtualization Separating the abstract view of computing resources from the implementation of these resources addsflexibility and agility to the computing infrastructure soften problems related to provisioning, manageability, … lowers TCO: fewercomputingresources Classicaldrivingfactor: serverconsolidation 18 E-mail server Web server Database server E-mail server Database server Linux Linux Linux Linux Linux EDBT2008 Tutorial (Aboulnaga e.a.) Web server Linux Virtualization Consolidate  Improved utilization using consolidation
Whatcanbevirtualized – thebigfour. 19
Different TypesofVirtualization 20 APP 1 APP 4 APP 2 APP 3 APP 5 OPERATING SYSTEM OPERATING SYSTEM VIRTUAL MACHINE 1 VIRTUAL MACHINE 2 CPU CPU CPU MEM MEM NET VIRTUAL MACHINE MONITOR  (VMM) PHYSICAL STORAGE PHYSICAL MACHINE CPU MEM NET CPU CPU
Virtual Machines 21 Technique with long history (since the 1960's) Prominent since IBM 370 mainframeseries Today large scale commodity hardware and operating systems Virtual Machine Monitor (Hypervisor) strong isolation between virtual machines (security, privacy, fault tolerance) flexible mapping between virtual machines and physical resources classical operationspause, resume, checkpoint, migrate (admin / load balancing) Software deployment Preconfigured virtual appliances Repositories of virtual appliances on the web
DBMS on top of Virtual Machines ... yetanotherapplication? ... Overhead? SQL Server withinVMware 22
Virtualization Design Advisor What fraction of node resources goes to what DBMS? Configuring VM parameters What parameter settings are best for a given resource configuration Configuringthe DBMS parameters Example Workload 1: TPC-H (10GByte) Workload 2: TPC-H (10GByte) only Q18 (132 copies) Virtualization design advisor 20% of CPU to Workload 1 80% of CPU to Workload 2 23
Some Experiments Workload Definition based on TPC-H Q18 isoneofthemost CPU intensive queries Q21 isoneofthe least CPU intensive queries Workload Units C: 25x Q18 I: 1x Q21 Experiment: Sensitivity to workloadResource Needs W1 = 5C + 5I W2 = kC + (10-k)I (increaseof k -> more CPU intensive) Postgres DB2 24
Some Experiments (2) Workload Settings W3 = 1C W4 = kC Workload Settings W5 = 1C W6 = kI 25
Virtualization in DBaaS environments DB Layer DB Server DB Server DB Server DB DB DB DB DB Instance  Layer Instance Instance Instance Instance Instance Instance DB Server  Layer VM VM VM VM VM VM VM Layer HW Layer 26
Existing Tools for Node Virtualization DB Server DB Layer DB DB DB DB DB DB Ad2visor ,[object Object]
MQTs
MDC
Redistribution of TablesDB Workload Manager Instance  Layer Instance Instance DB Server  Layer Static Environment Assumptions ,[object Object]
 VM expects static (peak) resource requirements
 Interactions between layers can improve performance/utilizationNode Ressource Model VM VM VM VM Layer VM Configuration ,[object Object]
Resources Configuration
(manual) MigrationHW Layer 27
Layer Interactions (2) Experiment DB2 on Linux TPC-H workload on 1GB database Ranges for resource grants Main memory (BP) – 50 MB to 1GB Additional storage (Indexes) – 5% to 30% DB size Varying advisor output (17-26 indexes) Different possible improvement Different expected Performance after improvement DB Advisor Expected Performance Possible Improvement Index  Storage Index  Storage 35% 90% 25% 25% 20% 20% 15% 15% <1% <3% 10% 10% VM Configuration 5% 5% 200 MB 400 MB 600 MB 800 MB 1 GB 200 MB 400 MB 600 MB 800 MB 1 GB BP BP 28
Storage Virtualization General Goal provide a layerofindircetiontoallowthedefinitionofvirtualstoragedevices minimize/avoiddowntime (local and remote mirroring) improveperformance (distribution/balancing – provisioning  - controlplacement) reducecostofstorageadministration Operations create, destroy, grow, shrinkvirtualdevices changesize, performance, reliability, ... workloadfluctuations hierarchicalstoragemanagement versioning, snapshots, point-in-time copies backup, checkpoints exploit CPU and memory in the storage system caching executelow-level DBMS functions 29
Virtualization in DBaaS Environments (2) DB Layer DB Server DB Server DB Server DB DB DB DB DB Instance  Layer Instance Instance Instance Instance Instance Instance DB Server  Layer VM VM VM VM VM VM VM Layer Shared Disk HW Layer Storage Layer 30 Local Disk
Virtualization in DBaaS Environments (2) DB Layer DB DB DB DB DB DB Server Instance  Layer Instance Instance DB Server  Layer VM VM VM VM Layer HW Layer Storage Layer 31 DB Advisor ,[object Object]
MQTs
MDC
Redistribution of TablesDB Workload Manager StorageRessource Model Storage Configuration ,[object Object]
Replication
ArchivingShared Disk Local Disk
Onewaytogo? Paravirtualization CPU and Memory Paravirtualization extendstheguest to allow direct interaction withtheunderlyinghypervisor reducesthemonitorcostincludingmemoryand System calloperations. gainsfromparavirtualizationareworkloadspecific Device Paravirtualization places a highperformancevirtualization-aware device driver into the guest paravirtualizeddriversaremoreCPU efficient (less CPU overhead forvirtualization) Paravirtualizeddriverscanalso take advantage of HW features, like partial offload
Outline Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 33
Multi Tenancy Goal: consolidate multiple customersontothesame operational system best resourceutilization flexible,butlimitedscalability separate DBper tenant shared DBsharedschema shared DBseparate schema ,[object Object]
Extensibility: customer-specificschemachanges
Security: preventingunauthorizeddataaccessesbyothertenants
Performance/scalability: scale-up & scale-out
Maintenance: on tenantlevelinstead of on databaselevel34
Flexible Schema Approaches Goal: allowtenant-specificschemaadditions (columns) Universal Table Extension Table PivotTable 35
Flexible Schema Approaches: Comparison Best performance Flexible schemaevolution Pivottable Extension table Chunkfolding Private tables Applicationownstheschema Database ownstheschema Universal table XML columns Universal table:  requirestechniquesforhandlingsparsedata Fine-grainedindexsupportnotpossible Pivottable: Requiresjoinsforreconstructinglogicaltuples Chunkfolding: similar to pivottables Group of columnsarecombined in a chunk and mappedinto a chunktable Requirescomplexquerytransformation 36
Access Control in Multi-Tenant DB Shared DB approachesrequirerow-levelaccesscontrol Query transformation.... whereTenantID = 42 ... Potential securityrisks DBMS-levelcontrol, e.g. IBM DB2 LBAC Label-based Access control Controls read/writeaccess to individualrows and columns Securitylabelswithpolicies Requires separate accountforeachtenant 37
In a Nutshell How shall virtualization be handled on Machine level (VM to HW) DBMS level (database to instance to database server) Schema level (multi tenancy) ... using … Allocation between layers Configuration inside layers Flexible schemas … when … Characteristics of the workloads are known Virtual machines are transparent Tenant-specific schema extensions … demanding that … SLAs and security are respected Each node’s utilization is maximized Number of nodes is minimized 38
Outline Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 39
MapReduce Background 40 Programming model and an associated implementation for large-scale data processing Google and related approaches: Apache Hadoop and Microsoft Dryad User-defined map & reduce functions Infrastructure hides details of  parallelization provides fault-tolerance, data distribution, I/O scheduling, load balancing, ... map  (in_key, in_value) ->	  (out_key, intermediate_value) list reduce (out_key,intermediate_value list) ->		out_value list M { (key,value) } R M R M
Logic Flow of WordCount Mapper Hadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner… 1  Hadoop Map/Reduce is a Hadoop 1 Map   1 17  software framework for Reduce   1 is   1 45  easily writing applications a   1 … … Sort/Shuffle Reducer Hadoop [1, 1, 1, …,1] Hadoop 5 Map   [1, 1, 1, …, 1] Map   12 Reduce   [1, 1, 1, …, 1] Reduce   12 is   [1, 1, 1, …, 1] is   42 a   [1, 1, 1, …, 1] a   23
MapRecude Disadvantages Extremely rigid data flow Common operations must be coded by hand join, filter, split, projection, aggregates, sorting, distinct User plans may be suboptimal and lead to performance degradation Semantics hidden inside map-reduce functions Inflexible, difficult to maintain, extend and optimize Combination of high-level declarative querying and low-level programming with MapReduce  Dataflow Programming Languages Hive, JAQL and Pig M R 42
PigLatin PigLatin On top of map-reduce/ Hadoop Mix of declarative style of SQL and procedural style of map-reduce Consists of two parts PigLatin: A Data Processing Language Pig Infrastructure: An Evaluator for PigLatin 	programs Pig compiles Pig Latin into physical plans  Plans are to be executed over Hadoop 30% of all queriesat Yahoo! in Pig-Latin Open-source, http://incubator.apache.org/pig 43
Example ,[object Object],URL Info Visits 44
Implementation in MapReduce 45
ExampleWorkflow in Pig-Latin load URL Info load Visits visits = load ‘/data/visits’ as (user, url, time); gVisits  = group visits byurl; visitCounts  = foreachgVisitsgenerateurl, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category, pRank); visitCounts  = joinvisitCountsbyurl, urlInfobyurl; gCategories = groupvisitCountsby category; topUrls = foreachgCategoriesgenerate top(visitCounts,10); store topUrls into ‘/data/topURLs’; Operatedirectly over files. group by url foreachurl generate count Schemas optional. Can be assigned dynamically. join on url User-defined functions (UDFs) can be used in every construct ,[object Object]
 group, filter, foreachgroup by category foreachcategory generate top10 URLs 46
Compilation in MapReduce Every group or join operation forms a map-reduce boundary Other operations pipelined into map and reduce phases load URL Info load Visits Map1 Map2 group by url Reduce1 foreachurl generate count join on url Reduce2 Map3 group by category Reduce3 foreachcategory generate top10 URLs 47
Data warehouse infrastructure built on top of Hadoop, providing: Data Summarization Ad hoc querying Simple query language: Hive QL (based on SQL) Extendable via custom mappers and reducers Subproject of Hadoop No „Hive format“ http://hadoop.apache.org/hive/ Hive 48
Hive - Example LOAD DATA INPATH `/data/visits` INTO TABLE visits INSERT OVERWRITE TABLE visitCounts SELECT url, category, count(*) FROM visits GROUP BY url, category; LOAD DATA INPATH ‘/data/urlInfo’ INTO TABLE urlInfo INSERT OVERWRITE TABLE visitCounts SELECT vc.*, ui.* FROM visitCountsvc JOIN urlInfoui ON (vc.url = ui.url); INSERT OVERWRITE TABLE gCategories SELECT category, count(*) FROM visitCounts GROUP BY category; INSERT OVERWRITE TABLE topUrls SELECT TRANSFORM (visitCounts) USING ‘top10’; 49
Higher level query language for JSON documents Developed at IBM‘s Almaden research center Supports several operations known from SQL Grouping,  Joining, Sorting Built-in support for Loops, Conditionals, Recursion Custom Java methods extend JAQL JAQL scripts are compiled to MapReduce jobs Various I/O Local FS, HDFS, Hbase, Custom I/O adapters http://www.jaql.org/ JAQL 50
JAQL - Example registerFunction(„top“, „de.tuberlin.cs.dima.jaqlextensions.top10“); $visits= hdfsRead(„/data/visits“); $visitCounts= $visits -> groupby $url = $ into { $url, num: count($)}; $urlInfo= hdfsRead(„data/urlInfo“); $visitCounts= join $visitCounts, $urlInfo where $visitCounts.url == $urlInfo.url; $gCategories= $visitCounts -> group by $category = $ 	into {$category, num: count($)}; $topUrls= top10($gCategories); hdfsWrite(“/data/topUrls”, $topUrls); 51
Outline Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 52
ACID vs. BASE Traditional distributeddatamanagement Web-scaledatamanagement ACID BasicallyAvailableSoft-stateEventualconsistent Strongconsistency Isolation Focus on „commit“ Availability? Pessimistic Difficultevolution (e.g. schema) Weakconsistency Availabilityfirst Best effort Optimistic (aggressive) Fast and simple Easierevolution 53
CAP Theorem [Brewer 2000] Consistency: all clientshavethesameview, even in case of updates Availability: all clients find a replica of data, even in thepresence of failures Tolerance to networkpartitions: systemproperties hold evenwhenthenetwork (system) ispartitioned Youcanhave at mosttwoof thesepropertiesforanyshared-data system. 54
CAP Theorem No consistencyguarantees➟ updateswithconflictresolution On a partitionevent, simplywaituntildataisconsistentagain➟ pessimisticlocking All nodesare in contactwitheachotherorputeverything in a single box➟ 2 phasecommit 55
CAP: Explanations PA :=update(o) PB:=read(o) 1. 3. 2. M Networkpartitions ➫ M isnotdelivered Solutions? Synchronousmessage: <PA,M> isatomic Possiblelatencyproblems (availability) Transaction <PA, M, PB>: requires to controlwhen PBhappens Impacts partitiontoleranceoravailability 56
Consistency Models [Vogels 2008] A B C update: D0->D1 read(D) D0 Distributedstoragesystem Strongconsistency:  afterthe update completes, anysubsequentaccessfrom A, B, C will return D1 Weakconsistency:  doesnotguaranteethatsubsequentaccesses will returnD1 -> a number of conditionsneed to bemetbeforeD1 isreturned Eventualconsistency:  Special form of weakconsistency Guaranteesthatif no newupdatesaremade, eventually all accesses will returnD1 57
Variations of EventualConsistency Causalconsistency: If A notifies B aboutthe update, B will read D1 (butnot C!) Read-your-writes: A will alwaysread D1afteritsown update Session consistency: Read-your-writesinside a session Monotonicreads: If a process has seenDk, anysubsequentaccess will neverreturnany Diwith i < k Monotonicwrites:  guarantees to serializethewrites of thesameprocess 58
Database Replication storethesamedata on multiple nodes in order to improvereliability, accessibility, fault-tolerance Single master Multimaster Optimisticreplication relaxedconsistency 1-copy consistency ,[object Object]
Allowsreplicas to diverge; requiresconflictresolution
Allowdatabeaccessedwithouta-priorisynchronization
Updates arepropagated in thebackground
Occasionalconflictsarefixedaftertheyhappen
Improvedavailability, flexibility, scalabability, butsee CAP theorem59
OptimisticReplication: Elements 1 2 2 2 2 1 1 1 1 2 2 2 1 1 1. operationsubmission 3. scheduling 2. propagation 1+2 1+2 1+2 4. conflictresolution 5. commitment 60 Y. Saito, M. Shapiro: OptimisticReplication, ACM ComputingSurveys, 5(3):1-44, 2005
Conflict Resolution & Update Propagation Single master Thomas writerule Dividingobjects, ... Vector clocks App-specificorderingorpreconditions Prohibit Ignore Reduce Syntactic Semantic Detect & repair 61 ,[object Object]
Updates pass throughthesystemlikeinfectiousdiseases
Pairwisecommunication: a sitecontactsothers (randomlychosen) and sends ist information, e.g. aboutupdates
All sitesprocessmessages in thesame way
Proactivebehaviour: no failurerecoverynecessary!
Basic approaches:anti-entropy, rumor mongering, ...,[object Object]
The Notion of QoS and Predictability Service Level Agreement legal part technical part Service Level Objectives ,[object Object]
Deadlineconstraints
Percentileconstraints
 fees, penalties, ...Common understandingaboutservices, guarantees, responsibilities 63 Application Server / middleware DBMS OS / Hardware
TechniquesforQoS in Data Management 64 Providesufficientresources Capacityplanning: „Howmuchboxesforcustomer X?“ Cost vs. Performance tradeoff Shielding Dedicated (virtual) systemforcustomers Scalability? Costefficiency? Scheduling Orderingrequests on priority At whichlevel?
Workload Management Purpose: achieveperformancegoalsforclasses of requests (queries, transactions) Resourceprovisioning Aspects: Specification of service-levelobjectives Workloadclassification and modeling Admissioncontrol & scheduling Staticpriorization: DB2 Query Patroller, Oracle Resource Manager, ... Goal-orientedapproaches Economicapproaches Utility-basedapproaches 65
Workload Characteristics Functional I/O requirements (volume, bandwidth) CPU Degree of parallelism Response times? Throughput? … Non-Functional Availability Reliability Durability Scalability … 66
WLM: Model classes workload classification MPL result admission control &scheduling transaction response time Admission control: limit the number of simultanously executing requests (multiprogramming level = MPL) Scheduling: ordering requests by priority 67
Utility Functions Utility function = preferencespecification mappossiblesystemstates (e.g. resourceprovisioning to jobs) to a real scalarvalue Representsperformancefeature (response time, throughput, ...) and/oreconomicvalue ,[object Object]
Explorespace of alternative mappings (searchproblem)
Runtimemonitoring and controlutility response time 68 Kephart, Das: Achievingself-management via utilityfunctions. IEEE Internet Computing 2007
WorkloadModeling & Prediction Goal: predictresourcerequirementsfor a givenworkload,  i.e., find correlationbetweenqueryfeatures and performancefeatures Approaches: regression, correlationanalysis, KernelCanonical CA queryplans/job descr. jobfeaturematrix query planprojection KCCA performancestatistics performancefeaturematrix performanceprojection Ganapathi et al.: Predicting Multiple MetricsforQueries: BetterDecisionsEnabledbyMachineLearning. ICDE 2009 ,[object Object]
Calculate job coordinates in query plan projectionbased on job featurevector
Inferjob‘scoordinates on theperformanceprojection69

More Related Content

What's hot

Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFTDataWorks Summit
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...DataWorks Summit
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARNDataWorks Summit
 
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsHBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsDataWorks Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016alanfgates
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHortonworks
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache HiveDataWorks Summit
 

What's hot (20)

Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARN
 
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsHBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, Solutions
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
 

Viewers also liked

Viewers also liked (20)

Bobina de tesla
Bobina de teslaBobina de tesla
Bobina de tesla
 
Cartilha idoso inss
Cartilha idoso inssCartilha idoso inss
Cartilha idoso inss
 
Linguagem java
Linguagem javaLinguagem java
Linguagem java
 
SIE- Definicion de Conceptos
SIE- Definicion de ConceptosSIE- Definicion de Conceptos
SIE- Definicion de Conceptos
 
Sistemas biológicos
Sistemas  biológicosSistemas  biológicos
Sistemas biológicos
 
Aneel
AneelAneel
Aneel
 
Folha 115
Folha 115Folha 115
Folha 115
 
Contabilidad Pdf
Contabilidad PdfContabilidad Pdf
Contabilidad Pdf
 
Palestra Mandic no EPICENTRO
Palestra Mandic no EPICENTROPalestra Mandic no EPICENTRO
Palestra Mandic no EPICENTRO
 
Ge capital conf bologna [read only]
Ge capital conf bologna [read only]Ge capital conf bologna [read only]
Ge capital conf bologna [read only]
 
Notícias 03 Banco do Nordeste - 2011
Notícias 03  Banco do Nordeste - 2011Notícias 03  Banco do Nordeste - 2011
Notícias 03 Banco do Nordeste - 2011
 
Prancha colunistas casa_park
Prancha colunistas casa_parkPrancha colunistas casa_park
Prancha colunistas casa_park
 
Unit 04 becoming a world power
Unit 04   becoming a world powerUnit 04   becoming a world power
Unit 04 becoming a world power
 
Jcla11 ws research
Jcla11 ws researchJcla11 ws research
Jcla11 ws research
 
AngloGold Ashanti Informe gri 2008
AngloGold Ashanti Informe gri 2008AngloGold Ashanti Informe gri 2008
AngloGold Ashanti Informe gri 2008
 
Atencion al cliente
Atencion al clienteAtencion al cliente
Atencion al cliente
 
Defensa final
Defensa finalDefensa final
Defensa final
 
Algorithms
AlgorithmsAlgorithms
Algorithms
 
Secretarias municipais as (1)
Secretarias municipais as (1)Secretarias municipais as (1)
Secretarias municipais as (1)
 
Apostila licitações
Apostila licitaçõesApostila licitações
Apostila licitações
 

Similar to Database as a Service - Tutorial @ICDE 2010

Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft SQL Server - Reduce Your Cost and Improve your Agility PresentationMicrosoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft SQL Server - Reduce Your Cost and Improve your Agility PresentationMicrosoft Private Cloud
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Microsoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics TutorialMicrosoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics TutorialIIMSE Edu
 
Windowsazureplatform Overviewlatest
Windowsazureplatform OverviewlatestWindowsazureplatform Overviewlatest
Windowsazureplatform Overviewlatestrajramab
 
An Introduction to Cloud Computing (2009)
An Introduction to Cloud Computing (2009)An Introduction to Cloud Computing (2009)
An Introduction to Cloud Computing (2009)Robert Grossman
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data SharingSurekha Parekh
 
Cloud computing - dien toan dam may
Cloud computing - dien toan dam mayCloud computing - dien toan dam may
Cloud computing - dien toan dam mayNguyen Duong
 
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Manoj Kumar
 
Cloud Computing 2010 - IBM Italia - Mariano Ammirabile
Cloud Computing 2010 - IBM Italia - Mariano AmmirabileCloud Computing 2010 - IBM Italia - Mariano Ammirabile
Cloud Computing 2010 - IBM Italia - Mariano AmmirabileManuela Moroncini
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBMongoDB
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloudJames Serra
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 
Emerging Technology in the Cloud! Real Life Examples. Pol Mac Aonghusa
Emerging Technology in the Cloud! Real Life Examples.  Pol Mac AonghusaEmerging Technology in the Cloud! Real Life Examples.  Pol Mac Aonghusa
Emerging Technology in the Cloud! Real Life Examples. Pol Mac Aonghusacatherinewall
 
Private cloud with z enterprise
Private cloud with z enterprisePrivate cloud with z enterprise
Private cloud with z enterpriseJim Porell
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsMatei Zaharia
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalAvere Systems
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architectureAdeel Javaid
 
Microsoft Cloud Database & Cloud BI
Microsoft Cloud Database & Cloud BIMicrosoft Cloud Database & Cloud BI
Microsoft Cloud Database & Cloud BIMark Kromer
 

Similar to Database as a Service - Tutorial @ICDE 2010 (20)

Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft SQL Server - Reduce Your Cost and Improve your Agility PresentationMicrosoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Microsoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics TutorialMicrosoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics Tutorial
 
Windowsazureplatform Overviewlatest
Windowsazureplatform OverviewlatestWindowsazureplatform Overviewlatest
Windowsazureplatform Overviewlatest
 
An Introduction to Cloud Computing (2009)
An Introduction to Cloud Computing (2009)An Introduction to Cloud Computing (2009)
An Introduction to Cloud Computing (2009)
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data Sharing
 
Cloud computing - dien toan dam may
Cloud computing - dien toan dam mayCloud computing - dien toan dam may
Cloud computing - dien toan dam may
 
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
 
Cloud Computing 2010 - IBM Italia - Mariano Ammirabile
Cloud Computing 2010 - IBM Italia - Mariano AmmirabileCloud Computing 2010 - IBM Italia - Mariano Ammirabile
Cloud Computing 2010 - IBM Italia - Mariano Ammirabile
 
IBM Cloud Journey v10
IBM Cloud Journey v10IBM Cloud Journey v10
IBM Cloud Journey v10
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
Emerging Technology in the Cloud! Real Life Examples. Pol Mac Aonghusa
Emerging Technology in the Cloud! Real Life Examples.  Pol Mac AonghusaEmerging Technology in the Cloud! Real Life Examples.  Pol Mac Aonghusa
Emerging Technology in the Cloud! Real Life Examples. Pol Mac Aonghusa
 
Private cloud with z enterprise
Private cloud with z enterprisePrivate cloud with z enterprise
Private cloud with z enterprise
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architecture
 
Microsoft Cloud Database & Cloud BI
Microsoft Cloud Database & Cloud BIMicrosoft Cloud Database & Cloud BI
Microsoft Cloud Database & Cloud BI
 

Recently uploaded

ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesashishpaul799
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxCeline George
 
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Mark Carrigan
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryEugene Lysak
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonMayur Khatri
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17Celine George
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Celine George
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...Nguyen Thanh Tu Collection
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfQucHHunhnh
 
Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024CapitolTechU
 
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatmentsaipooja36
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the lifeNitinDeodare
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/siemaillard
 
....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdfVikramadityaRaj
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45MysoreMuleSoftMeetup
 
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptxREPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptxmanishaJyala2
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfMinawBelay
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
 
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon season
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024
 
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the life
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
Post Exam Fun(da) Intra UEM General Quiz - Finals.pdf
Post Exam Fun(da) Intra UEM General Quiz - Finals.pdfPost Exam Fun(da) Intra UEM General Quiz - Finals.pdf
Post Exam Fun(da) Intra UEM General Quiz - Finals.pdf
 
....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptxREPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 

Database as a Service - Tutorial @ICDE 2010

  • 1. Database as a ServiceSeminar, ICDE 2010, Long Beach, March 04 Wolfgang Lehner | Dresden University of Technology, Germany Kai-Uwe Sattler | Ilmenau University of Technology, Germany 1
  • 2. Introduction Motivation SaaS Cloud Computing UseCases 2
  • 3. Software as a Service (SaaS) Traditional Software On-DemandUtility Plug In, SubscribePay-per-Use Build Your Own 3
  • 5. Avoidhiddencostof traditional SW Traditional Software SaaS SW Licenses Subscription Fee Training Training Customization Hardware IT Staff Maintenance Customization 5
  • 6. The Long Tail Dozens of markets of millions or millions of markets of dozens? Your Large Customers $ / Customer What if you lower your cost of sale (i.e. lower barrier to entry) and you also lower cost of operations Your Typical Customers New addressable market >> current market (Currently) “non addressable” Customers # of Customers 6
  • 7. Acquisition Model Service Business Model Pay for usage Access ModelInternet Technical ModelScalable, elastic, shareable EC2 & S3 "All that matters is results — I don't care how it is done" Cloud Computing: A style of computing where massively scalable, IT-enabled capabilities are provided "as a service" across the Internet to multiple external customers. "I don't want to own assets — I want to pay for elastic usage, like a utility" "I want accessibility from anywhere from any device" "It's about economies of scale, with effective and dynamic sharing" What is Cloud? – Gartner’s Definition 7
  • 8. To Qualify as a Cloud Common, Location-independent, Online Utility on Demand* Common implies multi-tenancy, not single or isolated tenancy Utility implies pay-for-use pricing onDemandimplies ~infinite, ~immediate, ~invisible scalability Alternatively, a “Zero-One-Infinity” definition:** 0On-premise infrastructure, acquisition cost, adoption cost, support cost 1Coherent and resilient environment – not a brittle “software stack” Scalability in response to changing need, Integratability/ Interoperability with legacy assets and other services Customizability/Programmability from data, through logic, up into the user interface without compromising robust multi-tenancy * Joe Weinman, Vice President of Solutions Sales, AT&T, 3 Nov. 2008 ** From The Jargon File: “Allow none of foo, one of foo, or any number of foo” 8
  • 9. Cloud Differentials: Service Models 9 Cloud Software as a Service (SaaS) Use provider’s applications over a network Cloud Platform as a Service (PaaS) Deploy customer-created applications to a cloud Cloud Infrastructure as a Service (IaaS) Rent processing, storage, network capacity, and other fundamental computing resources
  • 10. Cloud Differentials: Characteristics 10 Platform Physical – Virtual Homogenous – Heterogeneous Design Paradigms Storage CPU Bandwidth Usage Model Exclusive Shared Pseudo-Shared Size/Location Large Scale(AWS, Google, BM/Google), Small Scale(SMB, Academia) Purpose General Purpose Special Purpose (e.g., DB-Cloud) Administration/Jurisdiction Public Private
  • 11. UseCases: Large-Scale Data Analytics Outsourceyourdata and usecloudresourcesforanalysis Historical and mostlynon-criticaldata Parallelizable, read-mostlyworkload, high variantworkloads Relaxed ACID guarantees Examples (HadoopPoweredBy): Yahoo!: researchfor ad systems and Web search Facebook: reporting and analytics Netseer.com: crawling and log analysis Journey Dynamics: trafficspeedforecasting 11
  • 12. UseCases: Database Hosting Public datasets Biologicaldatabases: a singlerepositoryinstead of > 700 separate databases Semantic Web Data, Linkeddata, ... Sloan Digital Sky Survey TwitterCache Already on Amazon AWS: annotated human genomedata, US census, Freebase, ... Archiving, Metadata Indexing, ... 12
  • 13. UseCases: Service Hosting Data managementforSaaSsolutions Run theservicesnearthedata = ASP Alreadymanyexistingapplications CRM, e.g. Salesforce, SugarCRM Web Analytics Supply Chain Management HelpDesk Management Enterprise ResourcePlanning, e.g. SAP Business ByDesign ... 13
  • 14. Foundations & Architectures Virtualization Programmingmodels Consistencymodels & replication SLAs & Workloadmanagement Security 14
  • 15. Topics covered in this Seminar Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 15
  • 16. Current Solutions userperspective one DB for all clients one DB per client Virtualization Replication 16 DistributedStorage physicalperspective
  • 18. Virtualization Separating the abstract view of computing resources from the implementation of these resources addsflexibility and agility to the computing infrastructure soften problems related to provisioning, manageability, … lowers TCO: fewercomputingresources Classicaldrivingfactor: serverconsolidation 18 E-mail server Web server Database server E-mail server Database server Linux Linux Linux Linux Linux EDBT2008 Tutorial (Aboulnaga e.a.) Web server Linux Virtualization Consolidate  Improved utilization using consolidation
  • 20. Different TypesofVirtualization 20 APP 1 APP 4 APP 2 APP 3 APP 5 OPERATING SYSTEM OPERATING SYSTEM VIRTUAL MACHINE 1 VIRTUAL MACHINE 2 CPU CPU CPU MEM MEM NET VIRTUAL MACHINE MONITOR (VMM) PHYSICAL STORAGE PHYSICAL MACHINE CPU MEM NET CPU CPU
  • 21. Virtual Machines 21 Technique with long history (since the 1960's) Prominent since IBM 370 mainframeseries Today large scale commodity hardware and operating systems Virtual Machine Monitor (Hypervisor) strong isolation between virtual machines (security, privacy, fault tolerance) flexible mapping between virtual machines and physical resources classical operationspause, resume, checkpoint, migrate (admin / load balancing) Software deployment Preconfigured virtual appliances Repositories of virtual appliances on the web
  • 22. DBMS on top of Virtual Machines ... yetanotherapplication? ... Overhead? SQL Server withinVMware 22
  • 23. Virtualization Design Advisor What fraction of node resources goes to what DBMS? Configuring VM parameters What parameter settings are best for a given resource configuration Configuringthe DBMS parameters Example Workload 1: TPC-H (10GByte) Workload 2: TPC-H (10GByte) only Q18 (132 copies) Virtualization design advisor 20% of CPU to Workload 1 80% of CPU to Workload 2 23
  • 24. Some Experiments Workload Definition based on TPC-H Q18 isoneofthemost CPU intensive queries Q21 isoneofthe least CPU intensive queries Workload Units C: 25x Q18 I: 1x Q21 Experiment: Sensitivity to workloadResource Needs W1 = 5C + 5I W2 = kC + (10-k)I (increaseof k -> more CPU intensive) Postgres DB2 24
  • 25. Some Experiments (2) Workload Settings W3 = 1C W4 = kC Workload Settings W5 = 1C W6 = kI 25
  • 26. Virtualization in DBaaS environments DB Layer DB Server DB Server DB Server DB DB DB DB DB Instance Layer Instance Instance Instance Instance Instance Instance DB Server Layer VM VM VM VM VM VM VM Layer HW Layer 26
  • 27.
  • 28. MQTs
  • 29. MDC
  • 30.
  • 31. VM expects static (peak) resource requirements
  • 32.
  • 35. Layer Interactions (2) Experiment DB2 on Linux TPC-H workload on 1GB database Ranges for resource grants Main memory (BP) – 50 MB to 1GB Additional storage (Indexes) – 5% to 30% DB size Varying advisor output (17-26 indexes) Different possible improvement Different expected Performance after improvement DB Advisor Expected Performance Possible Improvement Index Storage Index Storage 35% 90% 25% 25% 20% 20% 15% 15% <1% <3% 10% 10% VM Configuration 5% 5% 200 MB 400 MB 600 MB 800 MB 1 GB 200 MB 400 MB 600 MB 800 MB 1 GB BP BP 28
  • 36. Storage Virtualization General Goal provide a layerofindircetiontoallowthedefinitionofvirtualstoragedevices minimize/avoiddowntime (local and remote mirroring) improveperformance (distribution/balancing – provisioning - controlplacement) reducecostofstorageadministration Operations create, destroy, grow, shrinkvirtualdevices changesize, performance, reliability, ... workloadfluctuations hierarchicalstoragemanagement versioning, snapshots, point-in-time copies backup, checkpoints exploit CPU and memory in the storage system caching executelow-level DBMS functions 29
  • 37. Virtualization in DBaaS Environments (2) DB Layer DB Server DB Server DB Server DB DB DB DB DB Instance Layer Instance Instance Instance Instance Instance Instance DB Server Layer VM VM VM VM VM VM VM Layer Shared Disk HW Layer Storage Layer 30 Local Disk
  • 38.
  • 39. MQTs
  • 40. MDC
  • 41.
  • 44. Onewaytogo? Paravirtualization CPU and Memory Paravirtualization extendstheguest to allow direct interaction withtheunderlyinghypervisor reducesthemonitorcostincludingmemoryand System calloperations. gainsfromparavirtualizationareworkloadspecific Device Paravirtualization places a highperformancevirtualization-aware device driver into the guest paravirtualizeddriversaremoreCPU efficient (less CPU overhead forvirtualization) Paravirtualizeddriverscanalso take advantage of HW features, like partial offload
  • 45. Outline Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 33
  • 46.
  • 50. Maintenance: on tenantlevelinstead of on databaselevel34
  • 51. Flexible Schema Approaches Goal: allowtenant-specificschemaadditions (columns) Universal Table Extension Table PivotTable 35
  • 52. Flexible Schema Approaches: Comparison Best performance Flexible schemaevolution Pivottable Extension table Chunkfolding Private tables Applicationownstheschema Database ownstheschema Universal table XML columns Universal table: requirestechniquesforhandlingsparsedata Fine-grainedindexsupportnotpossible Pivottable: Requiresjoinsforreconstructinglogicaltuples Chunkfolding: similar to pivottables Group of columnsarecombined in a chunk and mappedinto a chunktable Requirescomplexquerytransformation 36
  • 53. Access Control in Multi-Tenant DB Shared DB approachesrequirerow-levelaccesscontrol Query transformation.... whereTenantID = 42 ... Potential securityrisks DBMS-levelcontrol, e.g. IBM DB2 LBAC Label-based Access control Controls read/writeaccess to individualrows and columns Securitylabelswithpolicies Requires separate accountforeachtenant 37
  • 54. In a Nutshell How shall virtualization be handled on Machine level (VM to HW) DBMS level (database to instance to database server) Schema level (multi tenancy) ... using … Allocation between layers Configuration inside layers Flexible schemas … when … Characteristics of the workloads are known Virtual machines are transparent Tenant-specific schema extensions … demanding that … SLAs and security are respected Each node’s utilization is maximized Number of nodes is minimized 38
  • 55. Outline Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 39
  • 56. MapReduce Background 40 Programming model and an associated implementation for large-scale data processing Google and related approaches: Apache Hadoop and Microsoft Dryad User-defined map & reduce functions Infrastructure hides details of parallelization provides fault-tolerance, data distribution, I/O scheduling, load balancing, ... map (in_key, in_value) -> (out_key, intermediate_value) list reduce (out_key,intermediate_value list) -> out_value list M { (key,value) } R M R M
  • 57. Logic Flow of WordCount Mapper Hadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner… 1  Hadoop Map/Reduce is a Hadoop 1 Map  1 17  software framework for Reduce  1 is  1 45  easily writing applications a  1 … … Sort/Shuffle Reducer Hadoop [1, 1, 1, …,1] Hadoop 5 Map  [1, 1, 1, …, 1] Map  12 Reduce  [1, 1, 1, …, 1] Reduce  12 is  [1, 1, 1, …, 1] is  42 a  [1, 1, 1, …, 1] a  23
  • 58. MapRecude Disadvantages Extremely rigid data flow Common operations must be coded by hand join, filter, split, projection, aggregates, sorting, distinct User plans may be suboptimal and lead to performance degradation Semantics hidden inside map-reduce functions Inflexible, difficult to maintain, extend and optimize Combination of high-level declarative querying and low-level programming with MapReduce  Dataflow Programming Languages Hive, JAQL and Pig M R 42
  • 59. PigLatin PigLatin On top of map-reduce/ Hadoop Mix of declarative style of SQL and procedural style of map-reduce Consists of two parts PigLatin: A Data Processing Language Pig Infrastructure: An Evaluator for PigLatin programs Pig compiles Pig Latin into physical plans Plans are to be executed over Hadoop 30% of all queriesat Yahoo! in Pig-Latin Open-source, http://incubator.apache.org/pig 43
  • 60.
  • 62.
  • 63. group, filter, foreachgroup by category foreachcategory generate top10 URLs 46
  • 64. Compilation in MapReduce Every group or join operation forms a map-reduce boundary Other operations pipelined into map and reduce phases load URL Info load Visits Map1 Map2 group by url Reduce1 foreachurl generate count join on url Reduce2 Map3 group by category Reduce3 foreachcategory generate top10 URLs 47
  • 65. Data warehouse infrastructure built on top of Hadoop, providing: Data Summarization Ad hoc querying Simple query language: Hive QL (based on SQL) Extendable via custom mappers and reducers Subproject of Hadoop No „Hive format“ http://hadoop.apache.org/hive/ Hive 48
  • 66. Hive - Example LOAD DATA INPATH `/data/visits` INTO TABLE visits INSERT OVERWRITE TABLE visitCounts SELECT url, category, count(*) FROM visits GROUP BY url, category; LOAD DATA INPATH ‘/data/urlInfo’ INTO TABLE urlInfo INSERT OVERWRITE TABLE visitCounts SELECT vc.*, ui.* FROM visitCountsvc JOIN urlInfoui ON (vc.url = ui.url); INSERT OVERWRITE TABLE gCategories SELECT category, count(*) FROM visitCounts GROUP BY category; INSERT OVERWRITE TABLE topUrls SELECT TRANSFORM (visitCounts) USING ‘top10’; 49
  • 67. Higher level query language for JSON documents Developed at IBM‘s Almaden research center Supports several operations known from SQL Grouping, Joining, Sorting Built-in support for Loops, Conditionals, Recursion Custom Java methods extend JAQL JAQL scripts are compiled to MapReduce jobs Various I/O Local FS, HDFS, Hbase, Custom I/O adapters http://www.jaql.org/ JAQL 50
  • 68. JAQL - Example registerFunction(„top“, „de.tuberlin.cs.dima.jaqlextensions.top10“); $visits= hdfsRead(„/data/visits“); $visitCounts= $visits -> groupby $url = $ into { $url, num: count($)}; $urlInfo= hdfsRead(„data/urlInfo“); $visitCounts= join $visitCounts, $urlInfo where $visitCounts.url == $urlInfo.url; $gCategories= $visitCounts -> group by $category = $ into {$category, num: count($)}; $topUrls= top10($gCategories); hdfsWrite(“/data/topUrls”, $topUrls); 51
  • 69. Outline Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 52
  • 70. ACID vs. BASE Traditional distributeddatamanagement Web-scaledatamanagement ACID BasicallyAvailableSoft-stateEventualconsistent Strongconsistency Isolation Focus on „commit“ Availability? Pessimistic Difficultevolution (e.g. schema) Weakconsistency Availabilityfirst Best effort Optimistic (aggressive) Fast and simple Easierevolution 53
  • 71. CAP Theorem [Brewer 2000] Consistency: all clientshavethesameview, even in case of updates Availability: all clients find a replica of data, even in thepresence of failures Tolerance to networkpartitions: systemproperties hold evenwhenthenetwork (system) ispartitioned Youcanhave at mosttwoof thesepropertiesforanyshared-data system. 54
  • 72. CAP Theorem No consistencyguarantees➟ updateswithconflictresolution On a partitionevent, simplywaituntildataisconsistentagain➟ pessimisticlocking All nodesare in contactwitheachotherorputeverything in a single box➟ 2 phasecommit 55
  • 73. CAP: Explanations PA :=update(o) PB:=read(o) 1. 3. 2. M Networkpartitions ➫ M isnotdelivered Solutions? Synchronousmessage: <PA,M> isatomic Possiblelatencyproblems (availability) Transaction <PA, M, PB>: requires to controlwhen PBhappens Impacts partitiontoleranceoravailability 56
  • 74. Consistency Models [Vogels 2008] A B C update: D0->D1 read(D) D0 Distributedstoragesystem Strongconsistency: afterthe update completes, anysubsequentaccessfrom A, B, C will return D1 Weakconsistency: doesnotguaranteethatsubsequentaccesses will returnD1 -> a number of conditionsneed to bemetbeforeD1 isreturned Eventualconsistency: Special form of weakconsistency Guaranteesthatif no newupdatesaremade, eventually all accesses will returnD1 57
  • 75. Variations of EventualConsistency Causalconsistency: If A notifies B aboutthe update, B will read D1 (butnot C!) Read-your-writes: A will alwaysread D1afteritsown update Session consistency: Read-your-writesinside a session Monotonicreads: If a process has seenDk, anysubsequentaccess will neverreturnany Diwith i < k Monotonicwrites: guarantees to serializethewrites of thesameprocess 58
  • 76.
  • 77. Allowsreplicas to diverge; requiresconflictresolution
  • 79. Updates arepropagated in thebackground
  • 82. OptimisticReplication: Elements 1 2 2 2 2 1 1 1 1 2 2 2 1 1 1. operationsubmission 3. scheduling 2. propagation 1+2 1+2 1+2 4. conflictresolution 5. commitment 60 Y. Saito, M. Shapiro: OptimisticReplication, ACM ComputingSurveys, 5(3):1-44, 2005
  • 83.
  • 85. Pairwisecommunication: a sitecontactsothers (randomlychosen) and sends ist information, e.g. aboutupdates
  • 88.
  • 89.
  • 92. fees, penalties, ...Common understandingaboutservices, guarantees, responsibilities 63 Application Server / middleware DBMS OS / Hardware
  • 93. TechniquesforQoS in Data Management 64 Providesufficientresources Capacityplanning: „Howmuchboxesforcustomer X?“ Cost vs. Performance tradeoff Shielding Dedicated (virtual) systemforcustomers Scalability? Costefficiency? Scheduling Orderingrequests on priority At whichlevel?
  • 94. Workload Management Purpose: achieveperformancegoalsforclasses of requests (queries, transactions) Resourceprovisioning Aspects: Specification of service-levelobjectives Workloadclassification and modeling Admissioncontrol & scheduling Staticpriorization: DB2 Query Patroller, Oracle Resource Manager, ... Goal-orientedapproaches Economicapproaches Utility-basedapproaches 65
  • 95. Workload Characteristics Functional I/O requirements (volume, bandwidth) CPU Degree of parallelism Response times? Throughput? … Non-Functional Availability Reliability Durability Scalability … 66
  • 96. WLM: Model classes workload classification MPL result admission control &scheduling transaction response time Admission control: limit the number of simultanously executing requests (multiprogramming level = MPL) Scheduling: ordering requests by priority 67
  • 97.
  • 98. Explorespace of alternative mappings (searchproblem)
  • 99. Runtimemonitoring and controlutility response time 68 Kephart, Das: Achievingself-management via utilityfunctions. IEEE Internet Computing 2007
  • 100.
  • 101. Calculate job coordinates in query plan projectionbased on job featurevector
  • 103. Outline Query & Programming Model Logical Data Model Virtuali-zation Multi-Tenancy Service Level Agreements Storage Model DistributedStorage Replication Security 70
  • 104. OverviewandChallenges outsourcing Data Pre- processor Private informationretrieval / Access privacy Data Owner queries Data confiden- tiality/ privacy Query Engine Query Pre/Post- processor queryresults User Completenessandcorrectness Service Provider (un-trusted) 71
  • 105. Challenges I – Data Confidentiality/ Privacy Need to store data in the cloud But we do not trust the service providers for sensitive information encrypt the data and store it but still be able to run queries over the encrypted data do most of the work at the server Two issues Privacy during transmission (wells studied, e.g. through SSL/TLS) Privacy of stored data Querying over encrypted data is challenging needs to maintain content information on the server side, e.g. rangequeriesrequire order preserving data encryption mechanisms privacyperformancetradeoff 72
  • 106. Query Processing on Encrypted Data Metadata server-side query Query Translator original query Query Engine Temporary Result client-side query encrypted results result Query Executor User Service Provider (un-trusted) Client Site 73
  • 107. Executing SQL over Encrypted Data Hacigumus et al., (SIGMOD 2002) Main Steps Partition sensitive domains Order preserving: supportscomparison Random: query rewriting becomes hard Rewritequeries to targetpartitions Execute queries and return results Prune/post-processresults on client Privacy-Precision Trade-off Larger segments/partitions  increasedprivacy  decreasedprecision  increasedoverheads in query processing 74
  • 108.
  • 109. Create a coarse index for each (or selected) attribute(s) in the original table75
  • 110.
  • 111.
  • 112. Challenges II – Private Information Retrieval (PIR) User queries should be invisible to service provider More formal database is modeled as a string x of length N stored at remote server user wants to retrieve the bit xi for some i without disclosing any information about i to the server Paradox imagine buying in a store without the seller knowing what you buy X i x1, x2, …, xn xi User 77
  • 113. Information-Theoretic 2-server PIR a1 = xl l ϵQ1 Q1∈{1,…,n} i n Service Provider 1 0 0 1 1 0 0 1 1 1 0 0 0 Q2=Q1 i i l ϵQ2 xi = a1 a2 Service Provider 2 User + + + + a2 = xl 78
  • 114. Conclusion & Outlook CurrentInfrastructures MS Azure Amazon RDS + SimpleDB Amazon Dynamo Google BigTable Yahoo! PNUTS Conclusion Challenges & Trends 79
  • 115. Current Solutions one DB for all clients one DB per client AmazonSimpleDB / Dynamo Amazon RDS Yahoo! PNUTS Google Bigtable,Cassandra, Voldemort Amazon S3 Microsoft SQL Azure Virtualization Replication DistributedStorage 80
  • 116. Microsoft SQL Azure Cloud databaseserviceforAzureplatform Allows to create SQL server = group of databasesspreadacross multiple physicalmachines (incl. geo-location) Supports relational model and T-SQL (tables, views, indices, triggers, storedprocedures) Deployment and administrationusing SQL Server Management Studio Currentlimitations Individualdatabasesize = max. 10 GB No supportfor CLR, distributedqueries & transactions, spatialdata 81
  • 117. Microsoft SQL Azure: Details Databases implemented as replicateddatapartitions Across multiple physicalnodes Provideloalbalancing and failover API SQL, ADO.NET, ODBC Tabular Data Streams SQL Server Authentication Sync Framework Prices 1 GB database: $9.99/month, 10 GB: $99.99/month + datatransfer SLA: 99.9% availability 82
  • 118.
  • 119. Partition key: usedforassigningentities to partitions; Rowkey: unique ID within a partition
  • 122. Amazon RDS Amazon Relational Database Services Web Service to set up and operate a MySQLdatabase Full-featuredMySQL 5.1 Automateddatabasebackup Java-basedcommandlinetools and Web Service API forinstanceadministration Native DB access Prices: Small DB instance (1.7 GB memory, 1 ECU): $0.11/hour Largest DB instance (68 GB, 26 ECU): $3.10/hour + $0.10 GB-monthstorage + datatransfer 84
  • 123. Amazon Data Services Amazon Simple Storage Service (S3) DistributedBlobstorageforobjects (1 Byte ... 5 GB data) REST-basedinterface to read, write, and deleteobjectsidentifiedbyunique, user-definedkey Atomicsingle-keyupdates; no locking Eventualconsistency (partiallyread-after-write) Aug 2009: morethan 64 billionobjects AmazonSimpleDB (= Amazon Dynamo???) Distributedstructuredstorage Web Service API foraccess Eventualconsistency 85
  • 124.
  • 125. Restricted to a singledomain
  • 126. SFWsyntax + count() + multi-attributepredicates
  • 128. Amazon Dynamo Highlyavailable and scalablekey-valuedatastorefortheAmazonplatform Managesthestate of Amazonservices Providingbestsellerlists, shoppingcarts, customerpreferences, productcatalogs -> requireonlyprimary-keyaccess (e.g. productid, customerid) Completelydecentralized, minimal needformanualadministration (e.g. partitioning, redistribution) Assumptions: Simple querymodel: put/getoperations on keys, smallobjects (< 1MB) Weakerconsistencybut high availability („alwayswritable“ datastore), no isolationguarantees Efficiency: running on commodityhardware, guaranteedlatency = SLAs, e.g. 300 ms response time for 99.9% of requests, peakload of 500 requests/sec. 87
  • 129. Dynamo: Partitioning and Replication Partitioningscheme based on consistenthashing Virtualnodes: eachphysicalnodeisresponsibleformorethanonevirtualnode Replication Eachdataitemisreplicated at n nodes A Key space = ring B E Responsibility ofnode C C Replicas of keys Fromrange (B,C) D 88
  • 130. Dynamo: Data Versioning Provideseventualconsistency -> asynchronouspropagation of updates Updates result in a newversion of thedata Vector clocksforcapturingcausalitiesbetween different versions of thesameobject Vector clock = list of (node, counter) Determinecausalordering/parallelbranches of versions Update requestshave to specifywhichversionis to beupdated Reconciliationduringclientreads! reconcile(D)@NA write(D)@NB write(D)@NA write(D)@NA D3([NA,2],[NB,1]) D1([NA,1]) D2([NA,2]) D5([NA,3],[NB,1],[NC,1]) write(D)@NC D4([NA,2],[NC,1]) 89
  • 131. Dynamo: Replicamaintenance Consistencyamongreplicas: Quorum protocol: R nodesmustparticipate in a read, W nodes in a write; R + W > N Sloppyquorum: Read/writesareperformed on thefirst N healthynodes Preference list: list of nodeswhichareresponsibleforstoring a givenkey For highestavailability: W=1 Replicasynchronization Anti-entropy: Merkle trees: hashtreeswhereleavesarehashes of keys, non-leavesarehashes of children Ifhashvalues of twonodesareequal, no need to check children 90
  • 132. Google BigTable Fast and large-scale DBMS for Google applications and services Designed to scaleinto PB range Usesdistributed Google File System (GFS) forstoringdata and log files Depends on a clustermanagementsystemformanagingresource, monitoringstates, scheduling, .... Canbeused as inputsource and outputtargetforMapReduceprograms 91
  • 133. BigTable: Data Model Bigtable = sparse, distributed, multi-dimensional sortedmap Indexedbyrowkey, columnkey, timestamp; value = array of bytes Rowkeys up to 64 KB; columnkeysgrouped in columnfamilies Timestamp (64 bitint) usedforversioning Data ismaintained in lexicographic order byrowkeys Rowrangeisdynamicallypartitioned ➪ tablet = unit of distribution and loadbalancing Read/writeopsunder a singlerowkeyareatomic value columnkey rowkey t1 t2 92
  • 134. BigTable: System Architecture Single-masterdistributedstoragesystem masterserverresponsiblefor Assigningtablets to tabletservers Loadbalancing on tabletservers Detectingaddition and expiration of tabletservers Garbagecollection of GFS files Tabletservers Manage sets of tablets (10...1000 tablets per server, 100..200 MB per tablet) Handle read/writerequests Split tables Distributed, persistentlock/nameserviceChubby usesPaxosforreplicaconsistency (5 replicas) Providesnamespaceconsisting of directories and files; allowsdiscovering of tabletservers 93
  • 135. BigTable: Tablets Internallystored in SSTables Immutable, sortedfile of key-valuepairs; organized in 64KB blocks + index (block ranges) TabletLocation Chubbycontainslocation of roottablet Roottabletcontainslocation of all tablets in a METADATA table METADATA tabletcontainslocation of usertablets + end keyrow (sparseindex) Three-levelschemeaddresses 234tablets Cachedbyclientlibrary User tables METADATA tablet Roottablet Chubbyfile 94
  • 136. BigTable: Tablets /2 TabletAssignment Startingtabletserversacquire an exclusive lock in Chubby -> allowsdiscovery of tabletservers Periodicallychecksbythemaster on the lock status of tabletservers Replication of dataperformedby GFS TabletServing Updates (mutations) arelogged and thenapplied to an in-memoryversion (memtable) Compactions ConvertmemtableintoSSTable MergeSSTables 95
  • 137. Yahoo! PNUTS Yahoo!‘sdataservingplatform Data & querymodel: Simple relational model: tables of recordswithattributes (incl. Blobtypes) Flexible schemaevolutionbyaddingattributes at any time Queries: single-tableselection & projection Updates & deletionsbased on primary-keyaccess Storagemodel: Records as parsed JSON objects Filesystem-basedhashtablesorMySQLInnoDBengine 96
  • 138. PNUTS Architecture Clients REST API Tablet controller Routers Message Broker Storage units 97
  • 139. PNUTS: Consistency & Replication Consistencymodel: Per-recordtimelineconsistency: all replicasapply all updates in thesame order User-specificguarantees: ready-any, read-latest, read-newer-than, writes, write-after-version Partitioning and replication: Tableshorizontallypartionedintotablets (100 MB ...10 GB) Eachserverisresponsiblefor 100+ tables Asynchronousreplicationbyusingmessagebroker (publish/subscribe) Guarantees delivery of messages (incl. Logging) Provides partial ordering of messages Record-levelmembership + mastership-migrationprotocol 98
  • 141. Conclusion DBaaS = outsourcingdatabases to reduce TCO Reduce operational / administrationcosts Pay as yougomodel Widespectrum of solutions „rent a database“ Cloud databases  Usecases Database hosting Hostedservices Large-scaledataanalytics 100
  • 142.
  • 143. Limitingfunctionality: SQL vs. put/getoperations
  • 144.
  • 145.
  • 146.
  • 147. References 103 G. DeCandia et al.: Dynamo: Amazon‘sHighlyAvailableKey-value Store, SOSP’07 P. Bernstein et al.: Data Management Issues in Supporting Large-scale Web Services, IEEE Data Engineering Bulletin, Dec. 2006 M. Brantner et al.: Building a Database on S3, SIGMOD’08 A. Aboulnaga, C. Amza, K. Salem: Virtualization and databases: state of the art and research challenges. EDBT 2008: 746-747 A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, S. Kamath: Automatic virtualmachineconfigurationfordatabaseworkloads. SIGMOD Conference 2008: 953-966 C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Piglatin: a not-so-foreignlanguagefordataprocessing, Proceedingsofthe 2008 ACM SIGMOD international conference on Management ofdata, June 09-12, 2008, Vancouver, Canada R. Pike, S. Dorward, R. Griesemer, Se. Quinlan, Interpretingthedata: Parallel analysiswithSawzall, Scientific Programming, v.13 n.4, p.277-298, October 2005
  • 148. References 104 R. Chaiken, B. Jenkins , P Larson, B. Ramsey, D. Shakib, S. Weaver, J. Zhou, SCOPE: easy and efficient parallel processing of massive datasets, Proceedings of the VLDB Endowment, v.1 n.2, August 2008 B. Hore, S. Mehrotra, G. Tsudik, A privacy-preservingindexforrangequeries, Proceedings of theThirtieth international conference on Very large databases, p.720-731, August 31-September 03, 2004, Toronto, Canada H. Hacigümüş, B. Iyer, C. Li, S. Mehrotra, Executing SQL overencrypteddata in thedatabase-service-providermodel, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin D. Agrawal, A. El Abbadi, F. Emekçi, A. Metwally: Database Management as a Service: Challenges and Opportunities. ICDE 2009: 1709-1716 A. Shamir, How to share a secret, Communications of the ACM, v.22 n.11, p.612-613, Nov. 1979 F. Kerschbaum, J. Vayssière, Privacy-preservingdataanalytics as an outsourcedservice, Proceedings of the 2008 ACM workshop on Secure web services, October 31-31, 2008, Alexandria, Virginia, USA B. Chor, O. Goldreich, E. Kushilevitz , M. Sudan, Private informationretrieval, Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS'95), p.41, October 23-25, 1995
  • 149. Who has thefirstquestion? 105 ? wolfgang.lehner@tu-dresden.dekus@tu-ilmenau.de

Editor's Notes

  1. SAP Business Objects: Business Objects BI On-Demand