Database as a ServiceSeminar, ICDE 2010, Long Beach, March 04Wolfgang Lehner | Dresden University of Technology, Germany Kai-Uwe Sattler | Ilmenau University of Technology, Germany 1
IntroductionMotivationSaaSCloud ComputingUseCases2
Software as a Service (SaaS)Traditional SoftwareOn-DemandUtilityPlug In, SubscribePay-per-UseBuild Your Own 3
Comparison of business model4
Avoidhiddencostof traditional SWTraditional SoftwareSaaSSW LicensesSubscription FeeTrainingTrainingCustomizationHardwareIT StaffMaintenanceCustomization5
The Long TailDozens of markets of millions or millions of markets of dozens?Your Large Customers$ / CustomerWhat if you lower your cost of sale (i.e. lower barrier to entry) and you also lower cost of operationsYour Typical CustomersNew addressable market >> current market(Currently) “non addressable” Customers# of Customers6
Acquisition ModelServiceBusiness ModelPay for usageAccess ModelInternetTechnical ModelScalable, elastic, shareableEC2 & S3"All that matters is results — I don't care how it is done"Cloud Computing:A style of computing where massively scalable, IT-enabled capabilities are provided "as a service" across the Internet to multiple external customers."I don't want to own assets — I wantto pay for elastic usage, like a utility""I want accessibility from anywhere from any device""It's about economies of scale, with effective and dynamic sharing"What is Cloud? – Gartner’s Definition7
To Qualify as a CloudCommon, Location-independent, Online Utility on Demand*Common implies multi-tenancy, not single or isolated tenancy Utility implies pay-for-use pricingonDemandimplies ~infinite, ~immediate, ~invisible scalability Alternatively, a “Zero-One-Infinity” definition:**0On-premise infrastructure, acquisition cost, adoption cost,	support cost1Coherent and resilient environment – not a brittle “software 	stack”Scalability in response to changing need, Integratability/          	Interoperability with legacy assets and other services	Customizability/Programmability from data, through logic, 	up into the user interface without compromising robust 	multi-tenancy * Joe Weinman, Vice President of Solutions Sales, AT&T, 3 Nov. 2008** From The Jargon File: “Allow none of foo, one of foo, or any number of foo”8
Cloud Differentials: Service Models9Cloud Software as a Service (SaaS)Use provider’s applications over a network Cloud Platform as a Service (PaaS)Deploy customer-created applications to a cloud Cloud Infrastructure as a Service (IaaS)Rent processing, storage, network capacity, and other fundamental computing resources
Cloud Differentials: Characteristics10PlatformPhysical – VirtualHomogenous – HeterogeneousDesign ParadigmsStorageCPUBandwidthUsage ModelExclusiveSharedPseudo-SharedSize/LocationLarge Scale(AWS, Google, BM/Google), Small Scale(SMB, Academia)PurposeGeneral PurposeSpecial Purpose (e.g., DB-Cloud)Administration/JurisdictionPublicPrivate
UseCases: Large-Scale Data AnalyticsOutsourceyourdata and usecloudresourcesforanalysisHistorical and mostlynon-criticaldataParallelizable, read-mostlyworkload, high variantworkloadsRelaxed ACID guaranteesExamples (HadoopPoweredBy):Yahoo!: researchfor ad systems and Web searchFacebook: reporting and analyticsNetseer.com: crawling and log analysisJourney Dynamics: trafficspeedforecasting11
UseCases: Database HostingPublic datasetsBiologicaldatabases: a singlerepositoryinstead of > 700 separate databasesSemantic Web Data, Linkeddata, ...Sloan Digital Sky SurveyTwitterCacheAlready on Amazon AWS:  annotated human genomedata, US census, Freebase, ...Archiving, Metadata Indexing, ...12
UseCases: Service HostingData managementforSaaSsolutionsRun theservicesnearthedata= ASPAlreadymanyexistingapplicationsCRM, e.g. Salesforce, SugarCRMWeb AnalyticsSupply Chain ManagementHelpDesk ManagementEnterprise ResourcePlanning, e.g. SAP Business ByDesign...13
Foundations & ArchitecturesVirtualizationProgrammingmodelsConsistencymodels & replicationSLAs & WorkloadmanagementSecurity14
Topics covered in this SeminarQuery & Programming ModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity15
Current Solutionsuserperspectiveone DB for all clientsone DB per clientVirtualizationReplication16DistributedStoragephysicalperspective
... it‘s simple!17
VirtualizationSeparating the abstract view of computing resources from the implementation of these resourcesaddsflexibility and agility to the computing infrastructuresoften problems related to provisioning, manageability, …lowers TCO: fewercomputingresourcesClassicaldrivingfactor: serverconsolidation18E-mail serverWeb serverDatabase serverE-mail serverDatabase serverLinuxLinuxLinuxLinuxLinuxEDBT2008 Tutorial (Aboulnaga e.a.)Web serverLinuxVirtualizationConsolidate Improved utilization using consolidation
Whatcanbevirtualized – thebigfour.19
Different TypesofVirtualization20APP 1APP 4APP 2APP 3APP 5OPERATING SYSTEMOPERATING SYSTEMVIRTUAL MACHINE 1VIRTUAL MACHINE 2CPUCPUCPUMEMMEMNETVIRTUAL MACHINE MONITOR  (VMM)PHYSICAL STORAGEPHYSICAL MACHINECPUMEMNETCPUCPU
Virtual Machines21Technique with long history (since the 1960's)Prominent since IBM 370 mainframeseriesTodaylarge scalecommodity hardware and operating systemsVirtual Machine Monitor (Hypervisor)strong isolation between virtual machines (security, privacy, fault tolerance)flexible mapping between virtual machines and physical resourcesclassical operationspause, resume, checkpoint, migrate (admin / load balancing)Software deploymentPreconfigured virtual appliancesRepositories of virtual appliances on the web
DBMS on top of Virtual Machines... yetanotherapplication?... Overhead?SQL Server withinVMware22
Virtualization Design AdvisorWhat fraction of node resources goes to what DBMS?Configuring VM parametersWhat parameter settings are best for a given resource configurationConfiguringthe DBMS parametersExampleWorkload 1: TPC-H (10GByte)Workload 2: TPC-H (10GByte) only Q18 (132 copies)Virtualization design advisor20% of CPU to Workload 180% of CPU to Workload 223
Some ExperimentsWorkload Definition based on TPC-HQ18 isoneofthemost CPU intensive queriesQ21 isoneofthe least CPU intensive queriesWorkload UnitsC: 25x Q18I: 1x Q21Experiment: Sensitivity to workloadResource NeedsW1 = 5C + 5IW2 = kC + (10-k)I (increaseof k -> more CPU intensive)PostgresDB224
Some Experiments (2)Workload SettingsW3 = 1CW4 = kCWorkload SettingsW5 = 1CW6 = kI25
Virtualization in DBaaS environmentsDB LayerDB ServerDB ServerDB ServerDBDBDBDBDBInstance LayerInstanceInstanceInstanceInstanceInstanceInstanceDB Server LayerVMVMVMVMVMVMVM LayerHW Layer26
Existing Tools for Node VirtualizationDB ServerDB LayerDBDBDBDBDBDB Ad2visorIndexes
MQTs
MDC
Redistribution of TablesDB Workload ManagerInstance LayerInstanceInstanceDB Server LayerStatic Environment Assumptions Advisor expects static hardware environment
 VM expects static (peak) resource requirements
 Interactions between layers can improve performance/utilizationNodeRessource ModelVMVMVMVM LayerVM ConfigurationMonitoring
Resources Configuration
(manual) MigrationHW Layer27
Layer Interactions (2)ExperimentDB2 on LinuxTPC-H workload on 1GB databaseRanges for resource grantsMain memory (BP) – 50 MB to 1GBAdditional storage (Indexes) – 5% to 30% DB sizeVarying advisor output (17-26 indexes)Different possible improvementDifferent expected Performance after improvementDB AdvisorExpected PerformancePossible ImprovementIndex StorageIndex Storage35%90%25%25%20%20%15%15%<1%<3%10%10%VM Configuration5%5%200MB400MB600MB800MB1GB200MB400MB600MB800MB1GBBPBP28
Storage VirtualizationGeneral Goalprovide a layerofindircetiontoallowthedefinitionofvirtualstoragedevicesminimize/avoiddowntime (local and remote mirroring)improveperformance (distribution/balancing – provisioning  - controlplacement)reducecostofstorageadministrationOperationscreate, destroy, grow, shrinkvirtualdeviceschangesize, performance, reliability, ...workloadfluctuationshierarchicalstoragemanagementversioning, snapshots, point-in-time copiesbackup, checkpointsexploit CPU and memory in the storage systemcachingexecutelow-level DBMS functions29
Virtualization in DBaaS Environments (2)DB LayerDB ServerDB ServerDB ServerDBDBDBDBDBInstance LayerInstanceInstanceInstanceInstanceInstanceInstanceDB Server LayerVMVMVMVMVMVMVM LayerShared DiskHW LayerStorage Layer30Local Disk
Virtualization in DBaaS Environments (2)DB LayerDBDBDBDBDBDB ServerInstance LayerInstanceInstanceDB Server LayerVMVMVMVM LayerHW LayerStorage Layer31DB AdvisorIndexes
MQTs
MDC
Redistribution of TablesDB Workload ManagerStorageRessource ModelStorage ConfigurationDevice Bundling
Replication
ArchivingShared DiskLocal Disk
Onewaytogo? ParavirtualizationCPU and Memory Paravirtualizationextendstheguest to allow direct interaction withtheunderlyinghypervisorreducesthemonitorcostincludingmemoryand System calloperations.gainsfromparavirtualizationareworkloadspecificDevice Paravirtualizationplaces a highperformancevirtualization-aware device driver into the guestparavirtualizeddriversaremoreCPU efficient (less CPU overhead forvirtualization)Paravirtualizeddriverscanalso take advantage of HW features, like partial offload
OutlineQuery & Programming ModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity33
Multi TenancyGoal: consolidate multiple customersontothesame operational systembest resourceutilizationflexible,butlimitedscalabilityseparate DBper tenantshared DBsharedschemashared DBseparate schemaRequirements:
Extensibility: customer-specificschemachanges
Security: preventingunauthorizeddataaccessesbyothertenants
Performance/scalability: scale-up & scale-out
Maintenance: on tenantlevelinstead of on databaselevel34
Flexible Schema ApproachesGoal: allowtenant-specificschemaadditions (columns)Universal TableExtension TablePivotTable35
Flexible Schema Approaches: ComparisonBest performanceFlexible schemaevolutionPivottableExtension tableChunkfoldingPrivate tablesApplicationownstheschemaDatabase ownstheschemaUniversal tableXML columnsUniversal table: requirestechniquesforhandlingsparsedataFine-grainedindexsupportnotpossiblePivottable:RequiresjoinsforreconstructinglogicaltuplesChunkfolding: similar to pivottablesGroup of columnsarecombined in a chunk and mappedinto a chunktableRequirescomplexquerytransformation36
Access Control in Multi-Tenant DBShared DB approachesrequirerow-levelaccesscontrolQuery transformation.... whereTenantID = 42 ...Potential securityrisksDBMS-levelcontrol, e.g. IBM DB2 LBACLabel-based Access controlControls read/writeaccess to individualrows and columnsSecuritylabelswithpoliciesRequires separate accountforeachtenant37
In a NutshellHow shall virtualization be handled onMachine level (VM to HW)DBMS level (database to instance to database server)Schema level (multi tenancy)... using …Allocation between layersConfiguration inside layersFlexible schemas… when …Characteristics of the workloads are knownVirtual machines are transparentTenant-specific schema extensions… demanding that …SLAs and security are respectedEach node’s utilization is maximizedNumber of nodes is minimized38
OutlineQuery & Programming ModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity39
MapReduce Background40Programming model and an associated implementation for large-scale data processingGoogle and related approaches: Apache Hadoop and Microsoft DryadUser-defined map & reduce functionsInfrastructurehides details of  parallelizationprovides fault-tolerance, data distribution, I/O scheduling, load balancing, ...map  (in_key, in_value) ->	  (out_key, intermediate_value) listreduce (out_key,intermediate_value list) ->		out_value listM{ (key,value) }RMRM
Logic Flow of WordCountMapperHadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner…1  Hadoop Map/Reduce is aHadoop 1Map   117  software framework forReduce   1is   145  easily writing applicationsa   1……Sort/ShuffleReducerHadoop [1, 1, 1, …,1]Hadoop 5Map   [1, 1, 1, …, 1]Map   12Reduce   [1, 1, 1, …, 1]Reduce   12is   [1, 1, 1, …, 1]is   42a   [1, 1, 1, …, 1]a   23
MapRecude DisadvantagesExtremely rigid data flowCommon operations must be coded by handjoin, filter, split, projection, aggregates, sorting, distinctUser plans may be suboptimal and lead to performance degradationSemantics hidden inside map-reduce functionsInflexible, difficult to maintain, extend and optimizeCombination of high-level declarative querying and low-level programming with MapReduce Dataflow Programming LanguagesHive, JAQL and PigMR42
PigLatinPigLatinOn top of map-reduce/ HadoopMix of declarative style of SQL and procedural style of map-reduceConsists of two partsPigLatin: A Data Processing LanguagePig Infrastructure: An Evaluator for PigLatin	programsPig compiles Pig Latin into physical plans Plans are to be executed over Hadoop30% of all queriesat Yahoo! in Pig-LatinOpen-source, http://incubator.apache.org/pig43
ExampleTask:  Determine the most visited websites in each category.URL InfoVisits44
Implementation in MapReduce45
ExampleWorkflow in Pig-Latinload URL Infoload Visitsvisits = load ‘/data/visits’ as (user, url, time);gVisits  = group visits byurl;visitCounts  = foreachgVisitsgenerateurl, count(visits);urlInfo = load ‘/data/urlInfo’ as (url, category, pRank);visitCounts  = joinvisitCountsbyurl, urlInfobyurl;gCategories = groupvisitCountsby category;topUrls = foreachgCategoriesgenerate top(visitCounts,10);store topUrls into ‘/data/topURLs’;Operatedirectly over files.group by urlforeachurlgenerate countSchemas optional. Can be assigned dynamically.join on urlUser-defined functions (UDFs) can be used in every construct load, store
 group, filter, foreachgroup by categoryforeachcategorygenerate top10 URLs46
Compilation in MapReduceEvery group or join operation forms a map-reduce boundaryOther operations pipelined into map and reduce phasesload URL Infoload VisitsMap1Map2group by urlReduce1foreachurlgenerate countjoin on urlReduce2Map3group by categoryReduce3foreachcategorygenerate top10 URLs47
Data warehouse infrastructure built on top of Hadoop, providing:Data SummarizationAd hoc queryingSimple query language: Hive QL (based on SQL)Extendable via custom mappers and reducersSubproject of HadoopNo „Hive format“http://hadoop.apache.org/hive/Hive48
Hive - ExampleLOAD DATA INPATH `/data/visits` INTO TABLE visitsINSERT OVERWRITE TABLE visitCountsSELECT url, category, count(*)FROM visitsGROUP BY url, category;LOAD DATA INPATH ‘/data/urlInfo’ INTO TABLE urlInfoINSERT OVERWRITE TABLE visitCountsSELECT vc.*, ui.*FROM visitCountsvc JOIN urlInfoui ON (vc.url = ui.url);INSERT OVERWRITE TABLE gCategoriesSELECT category, count(*)FROM visitCountsGROUP BY category;INSERT OVERWRITE TABLE topUrlsSELECT TRANSFORM (visitCounts) USING ‘top10’;49
Higher level query language for JSON documentsDeveloped at IBM‘s Almaden research centerSupports several operations known from SQLGrouping,  Joining, SortingBuilt-in support forLoops, Conditionals, RecursionCustom Java methods extend JAQLJAQL scripts are compiled to MapReduce jobsVarious I/OLocal FS, HDFS, Hbase, Custom I/O adaptershttp://www.jaql.org/JAQL50
JAQL - ExampleregisterFunction(„top“, „de.tuberlin.cs.dima.jaqlextensions.top10“);$visits= hdfsRead(„/data/visits“);$visitCounts=$visits-> groupby $url = $into { $url, num: count($)};$urlInfo= hdfsRead(„data/urlInfo“);$visitCounts=join $visitCounts, $urlInfowhere $visitCounts.url == $urlInfo.url;$gCategories=$visitCounts-> group by $category = $	into {$category, num: count($)};$topUrls= top10($gCategories);hdfsWrite(“/data/topUrls”, $topUrls);51
OutlineQuery & Programming ModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity52
ACID vs. BASETraditional distributeddatamanagementWeb-scaledatamanagementACIDBasicallyAvailableSoft-stateEventualconsistentStrongconsistencyIsolationFocus on „commit“Availability?PessimisticDifficultevolution (e.g. schema)WeakconsistencyAvailabilityfirstBest effortOptimistic (aggressive)Fast and simpleEasierevolution53
CAP Theorem [Brewer 2000]Consistency: all clientshavethesameview, even in case of updatesAvailability: all clients find a replica of data, even in thepresence of failuresTolerance to networkpartitions: systemproperties hold evenwhenthenetwork (system) ispartitionedYoucanhave at mosttwoof thesepropertiesforanyshared-data system.54
CAP TheoremNo consistencyguarantees➟ updateswithconflictresolutionOn a partitionevent, simplywaituntildataisconsistentagain➟ pessimisticlockingAll nodesare in contactwitheachotherorputeverything in a single box➟ 2 phasecommit55
CAP: ExplanationsPA :=update(o)PB:=read(o)1.3.2.MNetworkpartitions ➫ M isnotdeliveredSolutions?Synchronousmessage: <PA,M> isatomicPossiblelatencyproblems (availability)Transaction <PA, M, PB>: requires to controlwhen PBhappensImpacts partitiontoleranceoravailability56
Consistency Models [Vogels 2008]ABCupdate: D0->D1read(D)D0DistributedstoragesystemStrongconsistency: afterthe update completes, anysubsequentaccessfrom A, B, C will return D1Weakconsistency: doesnotguaranteethatsubsequentaccesses will returnD1 -> a number of conditionsneed to bemetbeforeD1 isreturnedEventualconsistency: Special form of weakconsistencyGuaranteesthatif no newupdatesaremade, eventually all accesses will returnD157
Variations of EventualConsistencyCausalconsistency:If A notifies B aboutthe update, B will read D1 (butnot C!)Read-your-writes:A will alwaysread D1afteritsown updateSession consistency:Read-your-writesinside a sessionMonotonicreads:If a process has seenDk, anysubsequentaccess will neverreturnany Diwith i < kMonotonicwrites: guarantees to serializethewrites of thesameprocess58
Database Replicationstorethesamedata on multiple nodes in order to improvereliability, accessibility, fault-toleranceSingle masterMultimasterOptimisticreplicationrelaxedconsistency1-copy consistencyOptimisticstrategies = lazyreplication
Allowsreplicas to diverge; requiresconflictresolution
Allowdatabeaccessedwithouta-priorisynchronization
Updates arepropagated in thebackground
Occasionalconflictsarefixedaftertheyhappen
Improvedavailability, flexibility, scalabability, butsee CAP theorem59
OptimisticReplication: Elements122221111222111. operationsubmission3. scheduling2. propagation1+21+21+24. conflictresolution5. commitment60Y. Saito, M. Shapiro: OptimisticReplication, ACM ComputingSurveys, 5(3):1-44, 2005
Conflict Resolution & Update PropagationSingle masterThomas writeruleDividingobjects, ...Vector clocksApp-specificorderingorpreconditionsProhibitIgnoreReduceSyntacticSemanticDetect & repair61Epidemicinformationdissemination
Updates pass throughthesystemlikeinfectiousdiseases
Pairwisecommunication: a sitecontactsothers (randomlychosen) and sends ist information, e.g. aboutupdates
All sitesprocessmessages in thesame way
Proactivebehaviour: no failurerecoverynecessary!
Basic approaches:anti-entropy, rumor mongering, ...OutlineQuery & Programming ModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity62
The Notion of QoS and PredictabilityService Level Agreementlegal parttechnical partService Level ObjectivesSpecificmeasurablescharacteristics; e.g. importance, performancegoals
Deadlineconstraints
Percentileconstraints
 fees, penalties, ...Common understandingaboutservices, guarantees, responsibilities63Application Server / middlewareDBMSOS / Hardware
TechniquesforQoS in Data Management64ProvidesufficientresourcesCapacityplanning: „Howmuchboxesforcustomer X?“Cost vs. Performance tradeoffShieldingDedicated (virtual) systemforcustomersScalability? Costefficiency?SchedulingOrderingrequests on priorityAt whichlevel?
Workload ManagementPurpose:achieveperformancegoalsforclasses of requests (queries, transactions)ResourceprovisioningAspects:Specification of service-levelobjectivesWorkloadclassification and modelingAdmissioncontrol & schedulingStaticpriorization: DB2 Query Patroller, Oracle Resource Manager, ...Goal-orientedapproachesEconomicapproachesUtility-basedapproaches65
Workload CharacteristicsFunctionalI/O requirements (volume, bandwidth)CPUDegree of parallelismResponse times?Throughput?…Non-FunctionalAvailabilityReliabilityDurabilityScalability…66
WLM: Modelclassesworkload classificationMPLresultadmission control &schedulingtransactionresponse timeAdmission control: limit the number of simultanously executing requests (multiprogramming level = MPL)Scheduling: ordering requests by priority67
Utility FunctionsUtility function = preferencespecificationmappossiblesystemstates (e.g. resourceprovisioning to jobs) to a real scalarvalueRepresentsperformancefeature (response time, throughput, ...) and/oreconomicvalueGoal: determinethemostvaluablefeasiblestate, i.e. maximizeutility
Explorespace of alternative mappings (searchproblem)
Runtimemonitoring and controlutilityresponse time68Kephart, Das: Achievingself-management via utilityfunctions. IEEE Internet Computing 2007
WorkloadModeling & PredictionGoal: predictresourcerequirementsfor a givenworkload, i.e., find correlationbetweenqueryfeatures and performancefeaturesApproaches: regression, correlationanalysis, KernelCanonical CAqueryplans/job descr.jobfeaturematrixquery planprojectionKCCAperformancestatisticsperformancefeaturematrixperformanceprojectionGanapathi et al.: Predicting Multiple MetricsforQueries: BetterDecisionsEnabledbyMachineLearning. ICDE 2009Prediction:
Calculate job coordinates in query plan projectionbased on job featurevector
Inferjob‘scoordinates on theperformanceprojection69

Database as a Service - Tutorial @ICDE 2010

  • 1.
    Database as aServiceSeminar, ICDE 2010, Long Beach, March 04Wolfgang Lehner | Dresden University of Technology, Germany Kai-Uwe Sattler | Ilmenau University of Technology, Germany 1
  • 2.
  • 3.
    Software as aService (SaaS)Traditional SoftwareOn-DemandUtilityPlug In, SubscribePay-per-UseBuild Your Own 3
  • 4.
  • 5.
    Avoidhiddencostof traditional SWTraditionalSoftwareSaaSSW LicensesSubscription FeeTrainingTrainingCustomizationHardwareIT StaffMaintenanceCustomization5
  • 6.
    The Long TailDozensof markets of millions or millions of markets of dozens?Your Large Customers$ / CustomerWhat if you lower your cost of sale (i.e. lower barrier to entry) and you also lower cost of operationsYour Typical CustomersNew addressable market >> current market(Currently) “non addressable” Customers# of Customers6
  • 7.
    Acquisition ModelServiceBusiness ModelPayfor usageAccess ModelInternetTechnical ModelScalable, elastic, shareableEC2 & S3"All that matters is results — I don't care how it is done"Cloud Computing:A style of computing where massively scalable, IT-enabled capabilities are provided "as a service" across the Internet to multiple external customers."I don't want to own assets — I wantto pay for elastic usage, like a utility""I want accessibility from anywhere from any device""It's about economies of scale, with effective and dynamic sharing"What is Cloud? – Gartner’s Definition7
  • 8.
    To Qualify asa CloudCommon, Location-independent, Online Utility on Demand*Common implies multi-tenancy, not single or isolated tenancy Utility implies pay-for-use pricingonDemandimplies ~infinite, ~immediate, ~invisible scalability Alternatively, a “Zero-One-Infinity” definition:**0On-premise infrastructure, acquisition cost, adoption cost, support cost1Coherent and resilient environment – not a brittle “software stack”Scalability in response to changing need, Integratability/ Interoperability with legacy assets and other services Customizability/Programmability from data, through logic, up into the user interface without compromising robust multi-tenancy * Joe Weinman, Vice President of Solutions Sales, AT&T, 3 Nov. 2008** From The Jargon File: “Allow none of foo, one of foo, or any number of foo”8
  • 9.
    Cloud Differentials: ServiceModels9Cloud Software as a Service (SaaS)Use provider’s applications over a network Cloud Platform as a Service (PaaS)Deploy customer-created applications to a cloud Cloud Infrastructure as a Service (IaaS)Rent processing, storage, network capacity, and other fundamental computing resources
  • 10.
    Cloud Differentials: Characteristics10PlatformPhysical– VirtualHomogenous – HeterogeneousDesign ParadigmsStorageCPUBandwidthUsage ModelExclusiveSharedPseudo-SharedSize/LocationLarge Scale(AWS, Google, BM/Google), Small Scale(SMB, Academia)PurposeGeneral PurposeSpecial Purpose (e.g., DB-Cloud)Administration/JurisdictionPublicPrivate
  • 11.
    UseCases: Large-Scale DataAnalyticsOutsourceyourdata and usecloudresourcesforanalysisHistorical and mostlynon-criticaldataParallelizable, read-mostlyworkload, high variantworkloadsRelaxed ACID guaranteesExamples (HadoopPoweredBy):Yahoo!: researchfor ad systems and Web searchFacebook: reporting and analyticsNetseer.com: crawling and log analysisJourney Dynamics: trafficspeedforecasting11
  • 12.
    UseCases: Database HostingPublicdatasetsBiologicaldatabases: a singlerepositoryinstead of > 700 separate databasesSemantic Web Data, Linkeddata, ...Sloan Digital Sky SurveyTwitterCacheAlready on Amazon AWS: annotated human genomedata, US census, Freebase, ...Archiving, Metadata Indexing, ...12
  • 13.
    UseCases: Service HostingDatamanagementforSaaSsolutionsRun theservicesnearthedata= ASPAlreadymanyexistingapplicationsCRM, e.g. Salesforce, SugarCRMWeb AnalyticsSupply Chain ManagementHelpDesk ManagementEnterprise ResourcePlanning, e.g. SAP Business ByDesign...13
  • 14.
  • 15.
    Topics covered inthis SeminarQuery & Programming ModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity15
  • 16.
    Current Solutionsuserperspectiveone DBfor all clientsone DB per clientVirtualizationReplication16DistributedStoragephysicalperspective
  • 17.
  • 18.
    VirtualizationSeparating the abstractview of computing resources from the implementation of these resourcesaddsflexibility and agility to the computing infrastructuresoften problems related to provisioning, manageability, …lowers TCO: fewercomputingresourcesClassicaldrivingfactor: serverconsolidation18E-mail serverWeb serverDatabase serverE-mail serverDatabase serverLinuxLinuxLinuxLinuxLinuxEDBT2008 Tutorial (Aboulnaga e.a.)Web serverLinuxVirtualizationConsolidate Improved utilization using consolidation
  • 19.
  • 20.
    Different TypesofVirtualization20APP 1APP4APP 2APP 3APP 5OPERATING SYSTEMOPERATING SYSTEMVIRTUAL MACHINE 1VIRTUAL MACHINE 2CPUCPUCPUMEMMEMNETVIRTUAL MACHINE MONITOR (VMM)PHYSICAL STORAGEPHYSICAL MACHINECPUMEMNETCPUCPU
  • 21.
    Virtual Machines21Technique withlong history (since the 1960's)Prominent since IBM 370 mainframeseriesTodaylarge scalecommodity hardware and operating systemsVirtual Machine Monitor (Hypervisor)strong isolation between virtual machines (security, privacy, fault tolerance)flexible mapping between virtual machines and physical resourcesclassical operationspause, resume, checkpoint, migrate (admin / load balancing)Software deploymentPreconfigured virtual appliancesRepositories of virtual appliances on the web
  • 22.
    DBMS on topof Virtual Machines... yetanotherapplication?... Overhead?SQL Server withinVMware22
  • 23.
    Virtualization Design AdvisorWhatfraction of node resources goes to what DBMS?Configuring VM parametersWhat parameter settings are best for a given resource configurationConfiguringthe DBMS parametersExampleWorkload 1: TPC-H (10GByte)Workload 2: TPC-H (10GByte) only Q18 (132 copies)Virtualization design advisor20% of CPU to Workload 180% of CPU to Workload 223
  • 24.
    Some ExperimentsWorkload Definitionbased on TPC-HQ18 isoneofthemost CPU intensive queriesQ21 isoneofthe least CPU intensive queriesWorkload UnitsC: 25x Q18I: 1x Q21Experiment: Sensitivity to workloadResource NeedsW1 = 5C + 5IW2 = kC + (10-k)I (increaseof k -> more CPU intensive)PostgresDB224
  • 25.
    Some Experiments (2)WorkloadSettingsW3 = 1CW4 = kCWorkload SettingsW5 = 1CW6 = kI25
  • 26.
    Virtualization in DBaaSenvironmentsDB LayerDB ServerDB ServerDB ServerDBDBDBDBDBInstance LayerInstanceInstanceInstanceInstanceInstanceInstanceDB Server LayerVMVMVMVMVMVMVM LayerHW Layer26
  • 27.
    Existing Tools forNode VirtualizationDB ServerDB LayerDBDBDBDBDBDB Ad2visorIndexes
  • 28.
  • 29.
  • 30.
    Redistribution of TablesDBWorkload ManagerInstance LayerInstanceInstanceDB Server LayerStatic Environment Assumptions Advisor expects static hardware environment
  • 31.
    VM expectsstatic (peak) resource requirements
  • 32.
    Interactions betweenlayers can improve performance/utilizationNodeRessource ModelVMVMVMVM LayerVM ConfigurationMonitoring
  • 33.
  • 34.
  • 35.
    Layer Interactions (2)ExperimentDB2on LinuxTPC-H workload on 1GB databaseRanges for resource grantsMain memory (BP) – 50 MB to 1GBAdditional storage (Indexes) – 5% to 30% DB sizeVarying advisor output (17-26 indexes)Different possible improvementDifferent expected Performance after improvementDB AdvisorExpected PerformancePossible ImprovementIndex StorageIndex Storage35%90%25%25%20%20%15%15%<1%<3%10%10%VM Configuration5%5%200MB400MB600MB800MB1GB200MB400MB600MB800MB1GBBPBP28
  • 36.
    Storage VirtualizationGeneral Goalprovidea layerofindircetiontoallowthedefinitionofvirtualstoragedevicesminimize/avoiddowntime (local and remote mirroring)improveperformance (distribution/balancing – provisioning - controlplacement)reducecostofstorageadministrationOperationscreate, destroy, grow, shrinkvirtualdeviceschangesize, performance, reliability, ...workloadfluctuationshierarchicalstoragemanagementversioning, snapshots, point-in-time copiesbackup, checkpointsexploit CPU and memory in the storage systemcachingexecutelow-level DBMS functions29
  • 37.
    Virtualization in DBaaSEnvironments (2)DB LayerDB ServerDB ServerDB ServerDBDBDBDBDBInstance LayerInstanceInstanceInstanceInstanceInstanceInstanceDB Server LayerVMVMVMVMVMVMVM LayerShared DiskHW LayerStorage Layer30Local Disk
  • 38.
    Virtualization in DBaaSEnvironments (2)DB LayerDBDBDBDBDBDB ServerInstance LayerInstanceInstanceDB Server LayerVMVMVMVM LayerHW LayerStorage Layer31DB AdvisorIndexes
  • 39.
  • 40.
  • 41.
    Redistribution of TablesDBWorkload ManagerStorageRessource ModelStorage ConfigurationDevice Bundling
  • 42.
  • 43.
  • 44.
    Onewaytogo? ParavirtualizationCPU andMemory Paravirtualizationextendstheguest to allow direct interaction withtheunderlyinghypervisorreducesthemonitorcostincludingmemoryand System calloperations.gainsfromparavirtualizationareworkloadspecificDevice Paravirtualizationplaces a highperformancevirtualization-aware device driver into the guestparavirtualizeddriversaremoreCPU efficient (less CPU overhead forvirtualization)Paravirtualizeddriverscanalso take advantage of HW features, like partial offload
  • 45.
    OutlineQuery & ProgrammingModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity33
  • 46.
    Multi TenancyGoal: consolidatemultiple customersontothesame operational systembest resourceutilizationflexible,butlimitedscalabilityseparate DBper tenantshared DBsharedschemashared DBseparate schemaRequirements:
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
    Flexible Schema ApproachesGoal:allowtenant-specificschemaadditions (columns)Universal TableExtension TablePivotTable35
  • 52.
    Flexible Schema Approaches:ComparisonBest performanceFlexible schemaevolutionPivottableExtension tableChunkfoldingPrivate tablesApplicationownstheschemaDatabase ownstheschemaUniversal tableXML columnsUniversal table: requirestechniquesforhandlingsparsedataFine-grainedindexsupportnotpossiblePivottable:RequiresjoinsforreconstructinglogicaltuplesChunkfolding: similar to pivottablesGroup of columnsarecombined in a chunk and mappedinto a chunktableRequirescomplexquerytransformation36
  • 53.
    Access Control inMulti-Tenant DBShared DB approachesrequirerow-levelaccesscontrolQuery transformation.... whereTenantID = 42 ...Potential securityrisksDBMS-levelcontrol, e.g. IBM DB2 LBACLabel-based Access controlControls read/writeaccess to individualrows and columnsSecuritylabelswithpoliciesRequires separate accountforeachtenant37
  • 54.
    In a NutshellHowshall virtualization be handled onMachine level (VM to HW)DBMS level (database to instance to database server)Schema level (multi tenancy)... using …Allocation between layersConfiguration inside layersFlexible schemas… when …Characteristics of the workloads are knownVirtual machines are transparentTenant-specific schema extensions… demanding that …SLAs and security are respectedEach node’s utilization is maximizedNumber of nodes is minimized38
  • 55.
    OutlineQuery & ProgrammingModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity39
  • 56.
    MapReduce Background40Programming modeland an associated implementation for large-scale data processingGoogle and related approaches: Apache Hadoop and Microsoft DryadUser-defined map & reduce functionsInfrastructurehides details of parallelizationprovides fault-tolerance, data distribution, I/O scheduling, load balancing, ...map (in_key, in_value) -> (out_key, intermediate_value) listreduce (out_key,intermediate_value list) -> out_value listM{ (key,value) }RMRM
  • 57.
    Logic Flow ofWordCountMapperHadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner…1  Hadoop Map/Reduce is aHadoop 1Map  117  software framework forReduce  1is  145  easily writing applicationsa  1……Sort/ShuffleReducerHadoop [1, 1, 1, …,1]Hadoop 5Map  [1, 1, 1, …, 1]Map  12Reduce  [1, 1, 1, …, 1]Reduce  12is  [1, 1, 1, …, 1]is  42a  [1, 1, 1, …, 1]a  23
  • 58.
    MapRecude DisadvantagesExtremely rigiddata flowCommon operations must be coded by handjoin, filter, split, projection, aggregates, sorting, distinctUser plans may be suboptimal and lead to performance degradationSemantics hidden inside map-reduce functionsInflexible, difficult to maintain, extend and optimizeCombination of high-level declarative querying and low-level programming with MapReduce Dataflow Programming LanguagesHive, JAQL and PigMR42
  • 59.
    PigLatinPigLatinOn top ofmap-reduce/ HadoopMix of declarative style of SQL and procedural style of map-reduceConsists of two partsPigLatin: A Data Processing LanguagePig Infrastructure: An Evaluator for PigLatin programsPig compiles Pig Latin into physical plans Plans are to be executed over Hadoop30% of all queriesat Yahoo! in Pig-LatinOpen-source, http://incubator.apache.org/pig43
  • 60.
    ExampleTask: Determinethe most visited websites in each category.URL InfoVisits44
  • 61.
  • 62.
    ExampleWorkflow in Pig-LatinloadURL Infoload Visitsvisits = load ‘/data/visits’ as (user, url, time);gVisits = group visits byurl;visitCounts = foreachgVisitsgenerateurl, count(visits);urlInfo = load ‘/data/urlInfo’ as (url, category, pRank);visitCounts = joinvisitCountsbyurl, urlInfobyurl;gCategories = groupvisitCountsby category;topUrls = foreachgCategoriesgenerate top(visitCounts,10);store topUrls into ‘/data/topURLs’;Operatedirectly over files.group by urlforeachurlgenerate countSchemas optional. Can be assigned dynamically.join on urlUser-defined functions (UDFs) can be used in every construct load, store
  • 63.
    group, filter,foreachgroup by categoryforeachcategorygenerate top10 URLs46
  • 64.
    Compilation in MapReduceEverygroup or join operation forms a map-reduce boundaryOther operations pipelined into map and reduce phasesload URL Infoload VisitsMap1Map2group by urlReduce1foreachurlgenerate countjoin on urlReduce2Map3group by categoryReduce3foreachcategorygenerate top10 URLs47
  • 65.
    Data warehouse infrastructurebuilt on top of Hadoop, providing:Data SummarizationAd hoc queryingSimple query language: Hive QL (based on SQL)Extendable via custom mappers and reducersSubproject of HadoopNo „Hive format“http://hadoop.apache.org/hive/Hive48
  • 66.
    Hive - ExampleLOADDATA INPATH `/data/visits` INTO TABLE visitsINSERT OVERWRITE TABLE visitCountsSELECT url, category, count(*)FROM visitsGROUP BY url, category;LOAD DATA INPATH ‘/data/urlInfo’ INTO TABLE urlInfoINSERT OVERWRITE TABLE visitCountsSELECT vc.*, ui.*FROM visitCountsvc JOIN urlInfoui ON (vc.url = ui.url);INSERT OVERWRITE TABLE gCategoriesSELECT category, count(*)FROM visitCountsGROUP BY category;INSERT OVERWRITE TABLE topUrlsSELECT TRANSFORM (visitCounts) USING ‘top10’;49
  • 67.
    Higher level querylanguage for JSON documentsDeveloped at IBM‘s Almaden research centerSupports several operations known from SQLGrouping, Joining, SortingBuilt-in support forLoops, Conditionals, RecursionCustom Java methods extend JAQLJAQL scripts are compiled to MapReduce jobsVarious I/OLocal FS, HDFS, Hbase, Custom I/O adaptershttp://www.jaql.org/JAQL50
  • 68.
    JAQL - ExampleregisterFunction(„top“,„de.tuberlin.cs.dima.jaqlextensions.top10“);$visits= hdfsRead(„/data/visits“);$visitCounts=$visits-> groupby $url = $into { $url, num: count($)};$urlInfo= hdfsRead(„data/urlInfo“);$visitCounts=join $visitCounts, $urlInfowhere $visitCounts.url == $urlInfo.url;$gCategories=$visitCounts-> group by $category = $ into {$category, num: count($)};$topUrls= top10($gCategories);hdfsWrite(“/data/topUrls”, $topUrls);51
  • 69.
    OutlineQuery & ProgrammingModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity52
  • 70.
    ACID vs. BASETraditionaldistributeddatamanagementWeb-scaledatamanagementACIDBasicallyAvailableSoft-stateEventualconsistentStrongconsistencyIsolationFocus on „commit“Availability?PessimisticDifficultevolution (e.g. schema)WeakconsistencyAvailabilityfirstBest effortOptimistic (aggressive)Fast and simpleEasierevolution53
  • 71.
    CAP Theorem [Brewer2000]Consistency: all clientshavethesameview, even in case of updatesAvailability: all clients find a replica of data, even in thepresence of failuresTolerance to networkpartitions: systemproperties hold evenwhenthenetwork (system) ispartitionedYoucanhave at mosttwoof thesepropertiesforanyshared-data system.54
  • 72.
    CAP TheoremNo consistencyguarantees➟updateswithconflictresolutionOn a partitionevent, simplywaituntildataisconsistentagain➟ pessimisticlockingAll nodesare in contactwitheachotherorputeverything in a single box➟ 2 phasecommit55
  • 73.
    CAP: ExplanationsPA :=update(o)PB:=read(o)1.3.2.MNetworkpartitions➫ M isnotdeliveredSolutions?Synchronousmessage: <PA,M> isatomicPossiblelatencyproblems (availability)Transaction <PA, M, PB>: requires to controlwhen PBhappensImpacts partitiontoleranceoravailability56
  • 74.
    Consistency Models [Vogels2008]ABCupdate: D0->D1read(D)D0DistributedstoragesystemStrongconsistency: afterthe update completes, anysubsequentaccessfrom A, B, C will return D1Weakconsistency: doesnotguaranteethatsubsequentaccesses will returnD1 -> a number of conditionsneed to bemetbeforeD1 isreturnedEventualconsistency: Special form of weakconsistencyGuaranteesthatif no newupdatesaremade, eventually all accesses will returnD157
  • 75.
    Variations of EventualConsistencyCausalconsistency:IfA notifies B aboutthe update, B will read D1 (butnot C!)Read-your-writes:A will alwaysread D1afteritsown updateSession consistency:Read-your-writesinside a sessionMonotonicreads:If a process has seenDk, anysubsequentaccess will neverreturnany Diwith i < kMonotonicwrites: guarantees to serializethewrites of thesameprocess58
  • 76.
    Database Replicationstorethesamedata onmultiple nodes in order to improvereliability, accessibility, fault-toleranceSingle masterMultimasterOptimisticreplicationrelaxedconsistency1-copy consistencyOptimisticstrategies = lazyreplication
  • 77.
    Allowsreplicas to diverge;requiresconflictresolution
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
    OptimisticReplication: Elements122221111222111. operationsubmission3.scheduling2. propagation1+21+21+24. conflictresolution5. commitment60Y. Saito, M. Shapiro: OptimisticReplication, ACM ComputingSurveys, 5(3):1-44, 2005
  • 83.
    Conflict Resolution &Update PropagationSingle masterThomas writeruleDividingobjects, ...Vector clocksApp-specificorderingorpreconditionsProhibitIgnoreReduceSyntacticSemanticDetect & repair61Epidemicinformationdissemination
  • 84.
  • 85.
    Pairwisecommunication: a sitecontactsothers(randomlychosen) and sends ist information, e.g. aboutupdates
  • 86.
  • 87.
  • 88.
    Basic approaches:anti-entropy, rumormongering, ...OutlineQuery & Programming ModelLogical Data ModelVirtuali-zationMulti-TenancyService Level AgreementsStorage ModelDistributedStorageReplicationSecurity62
  • 89.
    The Notion ofQoS and PredictabilityService Level Agreementlegal parttechnical partService Level ObjectivesSpecificmeasurablescharacteristics; e.g. importance, performancegoals
  • 90.
  • 91.
  • 92.
    fees, penalties,...Common understandingaboutservices, guarantees, responsibilities63Application Server / middlewareDBMSOS / Hardware
  • 93.
    TechniquesforQoS in DataManagement64ProvidesufficientresourcesCapacityplanning: „Howmuchboxesforcustomer X?“Cost vs. Performance tradeoffShieldingDedicated (virtual) systemforcustomersScalability? Costefficiency?SchedulingOrderingrequests on priorityAt whichlevel?
  • 94.
    Workload ManagementPurpose:achieveperformancegoalsforclasses ofrequests (queries, transactions)ResourceprovisioningAspects:Specification of service-levelobjectivesWorkloadclassification and modelingAdmissioncontrol & schedulingStaticpriorization: DB2 Query Patroller, Oracle Resource Manager, ...Goal-orientedapproachesEconomicapproachesUtility-basedapproaches65
  • 95.
    Workload CharacteristicsFunctionalI/O requirements(volume, bandwidth)CPUDegree of parallelismResponse times?Throughput?…Non-FunctionalAvailabilityReliabilityDurabilityScalability…66
  • 96.
    WLM: Modelclassesworkload classificationMPLresultadmissioncontrol &schedulingtransactionresponse timeAdmission control: limit the number of simultanously executing requests (multiprogramming level = MPL)Scheduling: ordering requests by priority67
  • 97.
    Utility FunctionsUtility function= preferencespecificationmappossiblesystemstates (e.g. resourceprovisioning to jobs) to a real scalarvalueRepresentsperformancefeature (response time, throughput, ...) and/oreconomicvalueGoal: determinethemostvaluablefeasiblestate, i.e. maximizeutility
  • 98.
    Explorespace of alternativemappings (searchproblem)
  • 99.
    Runtimemonitoring and controlutilityresponsetime68Kephart, Das: Achievingself-management via utilityfunctions. IEEE Internet Computing 2007
  • 100.
    WorkloadModeling & PredictionGoal:predictresourcerequirementsfor a givenworkload, i.e., find correlationbetweenqueryfeatures and performancefeaturesApproaches: regression, correlationanalysis, KernelCanonical CAqueryplans/job descr.jobfeaturematrixquery planprojectionKCCAperformancestatisticsperformancefeaturematrixperformanceprojectionGanapathi et al.: Predicting Multiple MetricsforQueries: BetterDecisionsEnabledbyMachineLearning. ICDE 2009Prediction:
  • 101.
    Calculate job coordinatesin query plan projectionbased on job featurevector
  • 102.

Editor's Notes

  • #10 SAP Business Objects: Business Objects BI On-Demand