Database as a Service - Tutorial @ICDE 2010

  • 3,703 views
Uploaded on

 

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,703
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
263
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • SAP Business Objects: Business Objects BI On-Demand

Transcript

  • 1. Database as a ServiceSeminar, ICDE 2010, Long Beach, March 04
    Wolfgang Lehner | Dresden University of Technology, Germany Kai-Uwe Sattler | Ilmenau University of Technology, Germany
    1
  • 2. Introduction
    Motivation
    SaaS
    Cloud Computing
    UseCases
    2
  • 3. Software as a Service (SaaS)
    Traditional Software
    On-DemandUtility
    Plug In, SubscribePay-per-Use
    Build Your Own
    3
  • 4. Comparison of business model
    4
  • 5. Avoidhiddencostof traditional SW
    Traditional Software
    SaaS
    SW Licenses
    Subscription Fee
    Training
    Training
    Customization
    Hardware
    IT Staff
    Maintenance
    Customization
    5
  • 6. The Long Tail
    Dozens of markets of millions or millions of markets of dozens?
    Your Large Customers
    $ / Customer
    What if you lower your cost of sale (i.e. lower barrier to entry) and you also lower cost of operations
    Your Typical Customers
    New addressable market >> current market
    (Currently) “non addressable” Customers
    # of Customers
    6
  • 7. Acquisition Model
    Service
    Business Model
    Pay for usage
    Access ModelInternet
    Technical ModelScalable, elastic, shareable
    EC2 & S3
    "All that matters is results —
    I don't care how it is done"
    Cloud Computing:
    A style of computing where massively scalable, IT-enabled capabilities are provided "as a service" across the Internet to multiple external customers.
    "I don't want to own assets — I want
    to pay for elastic usage, like a utility"
    "I want accessibility from anywhere from any device"
    "It's about economies of scale, with effective and dynamic sharing"
    What is Cloud? – Gartner’s Definition
    7
  • 8. To Qualify as a Cloud
    Common, Location-independent, Online Utility on Demand*
    Common implies multi-tenancy, not single or isolated tenancy
    Utility implies pay-for-use pricing
    onDemandimplies ~infinite, ~immediate, ~invisible scalability
    Alternatively, a “Zero-One-Infinity” definition:**
    0On-premise infrastructure, acquisition cost, adoption cost, support cost
    1Coherent and resilient environment – not a brittle “software
    stack”
    Scalability in response to changing need, Integratability/
    Interoperability with legacy assets and other services Customizability/Programmability from data, through logic, up into the user interface without compromising robust
    multi-tenancy
    * Joe Weinman, Vice President of Solutions Sales, AT&T, 3 Nov. 2008
    ** From The Jargon File: “Allow none of foo, one of foo, or any number of foo”
    8
  • 9. Cloud Differentials: Service Models
    9
    Cloud Software as a Service (SaaS)
    Use provider’s applications over a network
    Cloud Platform as a Service (PaaS)
    Deploy customer-created applications to a cloud
    Cloud Infrastructure as a Service (IaaS)
    Rent processing, storage, network capacity, and other fundamental computing resources
  • 10. Cloud Differentials: Characteristics
    10
    Platform
    Physical – Virtual
    Homogenous – Heterogeneous
    Design Paradigms
    Storage
    CPU
    Bandwidth
    Usage Model
    Exclusive
    Shared
    Pseudo-Shared
    Size/Location
    Large Scale(AWS, Google, BM/Google),
    Small Scale(SMB, Academia)
    Purpose
    General Purpose
    Special Purpose (e.g., DB-Cloud)
    Administration/Jurisdiction
    Public
    Private
  • 11. UseCases: Large-Scale Data Analytics
    Outsourceyourdata and usecloudresourcesforanalysis
    Historical and mostlynon-criticaldata
    Parallelizable, read-mostlyworkload, high variantworkloads
    Relaxed ACID guarantees
    Examples (HadoopPoweredBy):
    Yahoo!: researchfor ad systems and Web search
    Facebook: reporting and analytics
    Netseer.com: crawling and log analysis
    Journey Dynamics: trafficspeedforecasting
    11
  • 12. UseCases: Database Hosting
    Public datasets
    Biologicaldatabases: a singlerepositoryinstead of > 700 separate databases
    Semantic Web Data, Linkeddata, ...
    Sloan Digital Sky Survey
    TwitterCache
    Already on Amazon AWS:
    annotated human genomedata,
    US census,
    Freebase, ...
    Archiving, Metadata Indexing, ...
    12
  • 13. UseCases: Service Hosting
    Data managementforSaaSsolutions
    Run theservicesnearthedata
    = ASP
    Alreadymanyexistingapplications
    CRM, e.g. Salesforce, SugarCRM
    Web Analytics
    Supply Chain Management
    HelpDesk Management
    Enterprise ResourcePlanning, e.g. SAP Business ByDesign
    ...
    13
  • 14. Foundations & Architectures
    Virtualization
    Programmingmodels
    Consistencymodels & replication
    SLAs & Workloadmanagement
    Security
    14
  • 15. Topics covered in this Seminar
    Query & Programming Model
    Logical Data Model
    Virtuali-zation
    Multi-Tenancy
    Service Level Agreements
    Storage Model
    DistributedStorage
    Replication
    Security
    15
  • 16. Current Solutions
    userperspective
    one DB for all clients
    one DB per client
    Virtualization
    Replication
    16
    DistributedStorage
    physicalperspective
  • 17. ... it‘s simple!
    17
  • 18. Virtualization
    Separating the abstract view of computing resources from the implementation of these resources
    addsflexibility and agility to the computing infrastructure
    soften problems related to provisioning, manageability, …
    lowers TCO: fewercomputingresources
    Classicaldrivingfactor: serverconsolidation
    18
    E-mail server
    Web server
    Database server
    E-mail server
    Database server
    Linux
    Linux
    Linux
    Linux
    Linux
    EDBT2008 Tutorial (Aboulnaga e.a.)
    Web server
    Linux
    Virtualization
    Consolidate
     Improved utilization using consolidation
  • 19. Whatcanbevirtualized – thebigfour.
    19
  • 20. Different TypesofVirtualization
    20
    APP 1
    APP 4
    APP 2
    APP 3
    APP 5
    OPERATING SYSTEM
    OPERATING SYSTEM
    VIRTUAL MACHINE 1
    VIRTUAL MACHINE 2
    CPU
    CPU
    CPU
    MEM
    MEM
    NET
    VIRTUAL MACHINE MONITOR (VMM)
    PHYSICAL STORAGE
    PHYSICAL MACHINE
    CPU
    MEM
    NET
    CPU
    CPU
  • 21. Virtual Machines
    21
    Technique with long history (since the 1960's)
    Prominent since IBM 370 mainframeseries
    Today
    large scale
    commodity hardware and operating systems
    Virtual Machine Monitor (Hypervisor)
    strong isolation between virtual machines (security, privacy, fault tolerance)
    flexible mapping between virtual machines and physical resources
    classical operationspause, resume, checkpoint, migrate (admin / load balancing)
    Software deployment
    Preconfigured virtual appliances
    Repositories of virtual appliances on the web
  • 22. DBMS on top of Virtual Machines
    ... yetanotherapplication?
    ... Overhead?
    SQL Server withinVMware
    22
  • 23. Virtualization Design Advisor
    What fraction of node resources goes to what DBMS?
    Configuring VM parameters
    What parameter settings are best for a given resource configuration
    Configuringthe DBMS parameters
    Example
    Workload 1: TPC-H (10GByte)
    Workload 2: TPC-H (10GByte) only Q18 (132 copies)
    Virtualization design advisor
    20% of CPU to Workload 1
    80% of CPU to Workload 2
    23
  • 24. Some Experiments
    Workload Definition based on TPC-H
    Q18 isoneofthemost CPU intensive queries
    Q21 isoneofthe least CPU intensive queries
    Workload Units
    C: 25x Q18
    I: 1x Q21
    Experiment: Sensitivity to workloadResource Needs
    W1 = 5C + 5I
    W2 = kC + (10-k)I (increaseof k -> more CPU intensive)
    Postgres
    DB2
    24
  • 25. Some Experiments (2)
    Workload Settings
    W3 = 1C
    W4 = kC
    Workload Settings
    W5 = 1C
    W6 = kI
    25
  • 26. Virtualization in DBaaS environments
    DB Layer
    DB Server
    DB Server
    DB Server
    DB
    DB
    DB
    DB
    DB
    Instance
    Layer
    Instance
    Instance
    Instance
    Instance
    Instance
    Instance
    DB Server
    Layer
    VM
    VM
    VM
    VM
    VM
    VM
    VM Layer
    HW Layer
    26
  • 27. Existing Tools for Node Virtualization
    DB Server
    DB Layer
    DB
    DB
    DB
    DB
    DB
    DB Ad2visor
    DB Workload Manager
    Instance
    Layer
    Instance
    Instance
    DB Server
    Layer
    Static Environment Assumptions
    • Advisor expects static hardware environment
    • 31. VM expects static (peak) resource requirements
    • 32. Interactions between layers can improve performance/utilization
    Node
    Ressource Model
    VM
    VM
    VM
    VM Layer
    VM Configuration
    • Monitoring
    • 33. Resources Configuration
    • 34. (manual) Migration
    HW Layer
    27
  • 35. Layer Interactions (2)
    Experiment
    DB2 on Linux
    TPC-H workload on 1GB database
    Ranges for resource grants
    Main memory (BP) – 50 MB to 1GB
    Additional storage (Indexes) – 5% to 30% DB size
    Varying advisor output (17-26 indexes)
    Different possible improvement
    Different expected Performance after improvement
    DB Advisor
    Expected Performance
    Possible Improvement
    Index
    Storage
    Index
    Storage
    35%
    90%
    25%
    25%
    20%
    20%
    15%
    15%
    <1%
    <3%
    10%
    10%
    VM Configuration
    5%
    5%
    200
    MB
    400
    MB
    600
    MB
    800
    MB
    1
    GB
    200
    MB
    400
    MB
    600
    MB
    800
    MB
    1
    GB
    BP
    BP
    28
  • 36. Storage Virtualization
    General Goal
    provide a layerofindircetiontoallowthedefinitionofvirtualstoragedevices
    minimize/avoiddowntime (local and remote mirroring)
    improveperformance (distribution/balancing – provisioning - controlplacement)
    reducecostofstorageadministration
    Operations
    create, destroy, grow, shrinkvirtualdevices
    changesize, performance, reliability, ...
    workloadfluctuations
    hierarchicalstoragemanagement
    versioning, snapshots, point-in-time copies
    backup, checkpoints
    exploit CPU and memory in the storage system
    caching
    executelow-level DBMS functions
    29
  • 37. Virtualization in DBaaS Environments (2)
    DB Layer
    DB Server
    DB Server
    DB Server
    DB
    DB
    DB
    DB
    DB
    Instance
    Layer
    Instance
    Instance
    Instance
    Instance
    Instance
    Instance
    DB Server
    Layer
    VM
    VM
    VM
    VM
    VM
    VM
    VM Layer
    Shared Disk
    HW Layer
    Storage Layer
    30
    Local Disk
  • 38. Virtualization in DBaaS Environments (2)
    DB Layer
    DB
    DB
    DB
    DB
    DB
    DB Server
    Instance
    Layer
    Instance
    Instance
    DB Server
    Layer
    VM
    VM
    VM
    VM Layer
    HW Layer
    Storage Layer
    31
    DB Advisor
    DB Workload Manager
    StorageRessource Model
    Storage Configuration
    Shared Disk
    Local Disk
  • 44. Onewaytogo? Paravirtualization
    CPU and Memory Paravirtualization
    extendstheguest to allow direct interaction withtheunderlyinghypervisor
    reducesthemonitorcostincludingmemoryand System calloperations.
    gainsfromparavirtualizationareworkloadspecific
    Device Paravirtualization
    places a highperformancevirtualization-aware device driver into the guest
    paravirtualizeddriversaremoreCPU efficient (less CPU overhead forvirtualization)
    Paravirtualizeddriverscanalso take advantage of HW features, like partial offload
  • 45. Outline
    Query & Programming Model
    Logical Data Model
    Virtuali-zation
    Multi-Tenancy
    Service Level Agreements
    Storage Model
    DistributedStorage
    Replication
    Security
    33
  • 46. Multi Tenancy
    Goal: consolidate multiple customersontothesame operational system
    best resourceutilization
    flexible,butlimitedscalability
    separate DBper tenant
    shared DBsharedschema
    shared DBseparate schema
    • Requirements:
    • 47. Extensibility: customer-specificschemachanges
    • 48. Security: preventingunauthorizeddataaccessesbyothertenants
    • 49. Performance/scalability: scale-up & scale-out
    • 50. Maintenance: on tenantlevelinstead of on databaselevel
    34
  • 51. Flexible Schema Approaches
    Goal: allowtenant-specificschemaadditions (columns)
    Universal Table
    Extension Table
    PivotTable
    35
  • 52. Flexible Schema Approaches: Comparison
    Best performance
    Flexible schemaevolution
    Pivottable
    Extension table
    Chunkfolding
    Private tables
    Applicationownstheschema
    Database ownstheschema
    Universal table
    XML columns
    Universal table:
    requirestechniquesforhandlingsparsedata
    Fine-grainedindexsupportnotpossible
    Pivottable:
    Requiresjoinsforreconstructinglogicaltuples
    Chunkfolding: similar to pivottables
    Group of columnsarecombined in a chunk and mappedinto a chunktable
    Requirescomplexquerytransformation
    36
  • 53. Access Control in Multi-Tenant DB
    Shared DB approachesrequirerow-levelaccesscontrol
    Query transformation.... whereTenantID = 42 ...
    Potential securityrisks
    DBMS-levelcontrol, e.g. IBM DB2 LBAC
    Label-based Access control
    Controls read/writeaccess to individualrows and columns
    Securitylabelswithpolicies
    Requires separate accountforeachtenant
    37
  • 54. In a Nutshell
    How shall virtualization be handled on
    Machine level (VM to HW)
    DBMS level (database to instance to database server)
    Schema level (multi tenancy)
    ... using …
    Allocation between layers
    Configuration inside layers
    Flexible schemas
    … when …
    Characteristics of the workloads are known
    Virtual machines are transparent
    Tenant-specific schema extensions
    … demanding that …
    SLAs and security are respected
    Each node’s utilization is maximized
    Number of nodes is minimized
    38
  • 55. Outline
    Query & Programming Model
    Logical Data Model
    Virtuali-zation
    Multi-Tenancy
    Service Level Agreements
    Storage Model
    DistributedStorage
    Replication
    Security
    39
  • 56. MapReduce Background
    40
    Programming model and an associated implementation for large-scale data processing
    Google and related approaches: Apache Hadoop and Microsoft Dryad
    User-defined map & reduce functions
    Infrastructure
    hides details of parallelization
    provides fault-tolerance, data distribution, I/O scheduling, load balancing, ...
    map (in_key, in_value) -> (out_key, intermediate_value) list
    reduce (out_key,intermediate_value list) -> out_value list
    M
    { (key,value) }
    R
    M
    R
    M
  • 57. Logic Flow of WordCount
    Mapper
    Hadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner…
    1  Hadoop Map/Reduce is a
    Hadoop 1
    Map  1
    17  software framework for
    Reduce  1
    is  1
    45  easily writing applications
    a  1


    Sort/Shuffle
    Reducer
    Hadoop [1, 1, 1, …,1]
    Hadoop 5
    Map  [1, 1, 1, …, 1]
    Map  12
    Reduce  [1, 1, 1, …, 1]
    Reduce  12
    is  [1, 1, 1, …, 1]
    is  42
    a  [1, 1, 1, …, 1]
    a  23
  • 58. MapRecude Disadvantages
    Extremely rigid data flow
    Common operations must be coded by hand
    join, filter, split, projection, aggregates, sorting, distinct
    User plans may be suboptimal and lead to performance degradation
    Semantics hidden inside map-reduce functions
    Inflexible, difficult to maintain, extend and optimize
    Combination of high-level declarative querying and low-level programming with MapReduce
     Dataflow Programming Languages
    Hive, JAQL and Pig
    M
    R
    42
  • 59. PigLatin
    PigLatin
    On top of map-reduce/ Hadoop
    Mix of declarative style of SQL and procedural style of map-reduce
    Consists of two parts
    PigLatin: A Data Processing Language
    Pig Infrastructure: An Evaluator for PigLatin
    programs
    Pig compiles Pig Latin into physical plans
    Plans are to be executed over Hadoop
    30% of all queriesat Yahoo! in Pig-Latin
    Open-source, http://incubator.apache.org/pig
    43
  • 60. Example
    • Task: Determine the most visited websites in each category.
    URL Info
    Visits
    44
  • 61. Implementation in MapReduce
    45
  • 62. ExampleWorkflow in Pig-Latin
    load URL Info
    load Visits
    visits = load ‘/data/visits’ as (user, url, time);
    gVisits = group visits byurl;
    visitCounts = foreachgVisitsgenerateurl, count(visits);
    urlInfo = load ‘/data/urlInfo’ as (url, category, pRank);
    visitCounts = joinvisitCountsbyurl, urlInfobyurl;
    gCategories = groupvisitCountsby category;
    topUrls = foreachgCategoriesgenerate top(visitCounts,10);
    store topUrls into ‘/data/topURLs’;
    Operatedirectly over files.
    group by url
    foreachurl
    generate count
    Schemas optional. Can be assigned dynamically.
    join on url
    User-defined functions (UDFs) can be used in every construct
    • load, store
    • 63. group, filter, foreach
    group by category
    foreachcategory
    generate top10 URLs
    46
  • 64. Compilation in MapReduce
    Every group or join operation forms a map-reduce boundary
    Other operations pipelined into map and reduce phases
    load URL Info
    load Visits
    Map1
    Map2
    group by url
    Reduce1
    foreachurl
    generate count
    join on url
    Reduce2
    Map3
    group by category
    Reduce3
    foreachcategory
    generate top10 URLs
    47
  • 65. Data warehouse infrastructure built on top of Hadoop, providing:
    Data Summarization
    Ad hoc querying
    Simple query language: Hive QL (based on SQL)
    Extendable via custom mappers and reducers
    Subproject of Hadoop
    No „Hive format“
    http://hadoop.apache.org/hive/
    Hive
    48
  • 66. Hive - Example
    LOAD DATA INPATH `/data/visits` INTO TABLE visits
    INSERT OVERWRITE TABLE visitCounts
    SELECT url, category, count(*)
    FROM visits
    GROUP BY url, category;
    LOAD DATA INPATH ‘/data/urlInfo’ INTO TABLE urlInfo
    INSERT OVERWRITE TABLE visitCounts
    SELECT vc.*, ui.*
    FROM visitCountsvc JOIN urlInfoui ON (vc.url = ui.url);
    INSERT OVERWRITE TABLE gCategories
    SELECT category, count(*)
    FROM visitCounts
    GROUP BY category;
    INSERT OVERWRITE TABLE topUrls
    SELECT TRANSFORM (visitCounts) USING ‘top10’;
    49
  • 67. Higher level query language for JSON documents
    Developed at IBM‘s Almaden research center
    Supports several operations known from SQL
    Grouping, Joining, Sorting
    Built-in support for
    Loops, Conditionals, Recursion
    Custom Java methods extend JAQL
    JAQL scripts are compiled to MapReduce jobs
    Various I/O
    Local FS, HDFS, Hbase, Custom I/O adapters
    http://www.jaql.org/
    JAQL
    50
  • 68. JAQL - Example
    registerFunction(„top“, „de.tuberlin.cs.dima.jaqlextensions.top10“);
    $visits= hdfsRead(„/data/visits“);
    $visitCounts=
    $visits
    -> groupby $url = $
    into { $url, num: count($)};
    $urlInfo= hdfsRead(„data/urlInfo“);
    $visitCounts=
    join $visitCounts, $urlInfo
    where $visitCounts.url == $urlInfo.url;
    $gCategories=
    $visitCounts
    -> group by $category = $
    into {$category, num: count($)};
    $topUrls= top10($gCategories);
    hdfsWrite(“/data/topUrls”, $topUrls);
    51
  • 69. Outline
    Query & Programming Model
    Logical Data Model
    Virtuali-zation
    Multi-Tenancy
    Service Level Agreements
    Storage Model
    DistributedStorage
    Replication
    Security
    52
  • 70. ACID vs. BASE
    Traditional distributeddatamanagement
    Web-scaledatamanagement
    ACID
    BasicallyAvailableSoft-stateEventualconsistent
    Strongconsistency
    Isolation
    Focus on „commit“
    Availability?
    Pessimistic
    Difficultevolution (e.g. schema)
    Weakconsistency
    Availabilityfirst
    Best effort
    Optimistic (aggressive)
    Fast and simple
    Easierevolution
    53
  • 71. CAP Theorem [Brewer 2000]
    Consistency: all clientshavethesameview, even in case of updates
    Availability: all clients find a replica of data, even in thepresence of failures
    Tolerance to networkpartitions: systemproperties hold evenwhenthenetwork (system) ispartitioned
    Youcanhave at mosttwoof thesepropertiesforanyshared-data system.
    54
  • 72. CAP Theorem
    No consistencyguarantees➟ updateswithconflictresolution
    On a partitionevent, simplywaituntildataisconsistentagain➟ pessimisticlocking
    All nodesare in contactwitheachotherorputeverything in a single box➟ 2 phasecommit
    55
  • 73. CAP: Explanations
    PA :=update(o)
    PB:=read(o)
    1.
    3.
    2.
    M
    Networkpartitions ➫ M isnotdelivered
    Solutions?
    Synchronousmessage: <PA,M> isatomic
    Possiblelatencyproblems (availability)
    Transaction <PA, M, PB>: requires to controlwhen PBhappens
    Impacts partitiontoleranceoravailability
    56
  • 74. Consistency Models [Vogels 2008]
    A
    B
    C
    update: D0->D1
    read(D)
    D0
    Distributedstoragesystem
    Strongconsistency:
    afterthe update completes, anysubsequentaccessfrom A, B, C will return D1
    Weakconsistency:
    doesnotguaranteethatsubsequentaccesses will returnD1 -> a number of conditionsneed to bemetbeforeD1 isreturned
    Eventualconsistency:
    Special form of weakconsistency
    Guaranteesthatif no newupdatesaremade, eventually all accesses will returnD1
    57
  • 75. Variations of EventualConsistency
    Causalconsistency:
    If A notifies B aboutthe update, B will read D1 (butnot C!)
    Read-your-writes:
    A will alwaysread D1afteritsown update
    Session consistency:
    Read-your-writesinside a session
    Monotonicreads:
    If a process has seenDk, anysubsequentaccess will neverreturnany Diwith i < k
    Monotonicwrites:
    guarantees to serializethewrites of thesameprocess
    58
  • 76. Database Replication
    storethesamedata on multiple nodes in order to improvereliability, accessibility, fault-tolerance
    Single master
    Multimaster
    Optimisticreplication
    relaxedconsistency
    1-copy consistency
    • Optimisticstrategies = lazyreplication
    • 77. Allowsreplicas to diverge; requiresconflictresolution
    • 78. Allowdatabeaccessedwithouta-priorisynchronization
    • 79. Updates arepropagated in thebackground
    • 80. Occasionalconflictsarefixedaftertheyhappen
    • 81. Improvedavailability, flexibility, scalabability, butsee CAP theorem
    59
  • 82. OptimisticReplication: Elements
    1
    2
    2
    2
    2
    1
    1
    1
    1
    2
    2
    2
    1
    1
    1. operationsubmission
    3. scheduling
    2. propagation
    1+2
    1+2
    1+2
    4. conflictresolution
    5. commitment
    60
    Y. Saito, M. Shapiro: OptimisticReplication, ACM ComputingSurveys, 5(3):1-44, 2005
  • 83. Conflict Resolution & Update Propagation
    Single master
    Thomas writerule
    Dividingobjects, ...
    Vector clocks
    App-specificorderingorpreconditions
    Prohibit
    Ignore
    Reduce
    Syntactic
    Semantic
    Detect & repair
    61
    • Epidemicinformationdissemination
    • 84. Updates pass throughthesystemlikeinfectiousdiseases
    • 85. Pairwisecommunication: a sitecontactsothers (randomlychosen) and sends ist information, e.g. aboutupdates
    • 86. All sitesprocessmessages in thesame way
    • 87. Proactivebehaviour: no failurerecoverynecessary!
    • 88. Basic approaches:anti-entropy, rumor mongering, ...
  • Outline
    Query & Programming Model
    Logical Data Model
    Virtuali-zation
    Multi-Tenancy
    Service Level Agreements
    Storage Model
    DistributedStorage
    Replication
    Security
    62
  • 89. The Notion of QoS and Predictability
    Service Level Agreement
    legal part
    technical part
    Service Level Objectives
    • Specificmeasurablescharacteristics; e.g. importance, performancegoals
    • 90. Deadlineconstraints
    • 91. Percentileconstraints
    • 92. fees, penalties, ...
    Common understandingaboutservices, guarantees, responsibilities
    63
    Application Server / middleware
    DBMS
    OS / Hardware
  • 93. TechniquesforQoS in Data Management
    64
    Providesufficientresources
    Capacityplanning: „Howmuchboxesforcustomer X?“
    Cost vs. Performance tradeoff
    Shielding
    Dedicated (virtual) systemforcustomers
    Scalability? Costefficiency?
    Scheduling
    Orderingrequests on priority
    At whichlevel?
  • 94. Workload Management
    Purpose:
    achieveperformancegoalsforclasses of requests (queries, transactions)
    Resourceprovisioning
    Aspects:
    Specification of service-levelobjectives
    Workloadclassification and modeling
    Admissioncontrol & scheduling
    Staticpriorization: DB2 Query Patroller, Oracle Resource Manager, ...
    Goal-orientedapproaches
    Economicapproaches
    Utility-basedapproaches
    65
  • 95. Workload Characteristics
    Functional
    I/O requirements (volume, bandwidth)
    CPU
    Degree of parallelism
    Response times?
    Throughput?

    Non-Functional
    Availability
    Reliability
    Durability
    Scalability

    66
  • 96. WLM: Model
    classes
    workload classification
    MPL
    result
    admission control &scheduling
    transaction
    response time
    Admission control: limit the number of simultanously executing requests (multiprogramming level = MPL)
    Scheduling: ordering requests by priority
    67
  • 97. Utility Functions
    Utility function = preferencespecification
    mappossiblesystemstates (e.g. resourceprovisioning to jobs) to a real scalarvalue
    Representsperformancefeature (response time, throughput, ...) and/oreconomicvalue
    • Goal: determinethemostvaluablefeasiblestate, i.e. maximizeutility
    • 98. Explorespace of alternative mappings (searchproblem)
    • 99. Runtimemonitoring and control
    utility
    response time
    68
    Kephart, Das: Achievingself-management via utilityfunctions. IEEE Internet Computing 2007
  • 100. WorkloadModeling & Prediction
    Goal: predictresourcerequirementsfor a givenworkload,
    i.e., find correlationbetweenqueryfeatures and performancefeatures
    Approaches: regression, correlationanalysis, KernelCanonical CA
    queryplans/job descr.
    jobfeaturematrix
    query planprojection
    KCCA
    performancestatistics
    performancefeaturematrix
    performanceprojection
    Ganapathi et al.: Predicting Multiple MetricsforQueries: BetterDecisionsEnabledbyMachineLearning. ICDE 2009
    • Prediction:
    • 101. Calculate job coordinates in query plan projectionbased on job featurevector
    • 102. Inferjob‘scoordinates on theperformanceprojection
    69
  • 103. Outline
    Query & Programming Model
    Logical Data Model
    Virtuali-zation
    Multi-Tenancy
    Service Level Agreements
    Storage Model
    DistributedStorage
    Replication
    Security
    70
  • 104. OverviewandChallenges
    outsourcing
    Data Pre-
    processor
    Private informationretrieval / Access privacy
    Data Owner
    queries
    Data confiden- tiality/ privacy
    Query
    Engine
    Query Pre/Post-
    processor
    queryresults
    User
    Completenessandcorrectness
    Service Provider
    (un-trusted)
    71
  • 105. Challenges I – Data Confidentiality/ Privacy
    Need to store data in the cloud
    But we do not trust the service providers for sensitive information
    encrypt the data and store it
    but still be able to run queries over the encrypted data
    do most of the work at the server
    Two issues
    Privacy during transmission (wells studied, e.g. through SSL/TLS)
    Privacy of stored data
    Querying over encrypted data is challenging
    needs to maintain content information on the server side, e.g. rangequeriesrequire order preserving data encryption mechanisms
    privacyperformancetradeoff
    72
  • 106. Query Processing on Encrypted Data
    Metadata
    server-side
    query
    Query
    Translator
    original
    query
    Query
    Engine
    Temporary
    Result
    client-side
    query
    encrypted
    results
    result
    Query
    Executor
    User
    Service Provider
    (un-trusted)
    Client Site
    73
  • 107. Executing SQL over Encrypted Data Hacigumus et al., (SIGMOD 2002)
    Main Steps
    Partition sensitive domains
    Order preserving: supportscomparison
    Random: query rewriting becomes hard
    Rewritequeries to targetpartitions
    Execute queries and return results
    Prune/post-processresults on client
    Privacy-Precision Trade-off
    Larger segments/partitions
     increasedprivacy
     decreasedprecision
     increasedoverheads in query
    processing
    74
  • 108. Relational Encryption
    Service Provider Site
    arbitrary encryption function,
    e.g. AES, RSA, Blowfish, DES, …
    Bucket Ids
    • Store an etuple for each tuple in the original table
    • 109. Create a coarse index for each (or selected) attribute(s) in the original table
    75
  • 110. Index and Identification Functions
    2
    7
    5
    1
    4
    • Partition function divides domain values into partitions (buckets)
    Partition (R.A) = { [0,200], (200,400], (400,600], (600,800], (800,1000] }
    • partitioning function has an impact on performance as well as privacy
    • 111. Identification function assigns a partition id to each partition of attribute A
    identR.A( (200,400] ) = 7
    • Any function can be use as identification function, e.g., hash functions
    Partition (Bucket) ids
    Meta-
    data
    =
    200
    0
    400
    600
    800
    1000
    Domain Values
    76
  • 112. Challenges II – Private Information Retrieval (PIR)
    User queries should be invisible to service provider
    More formal
    database is modeled as a string x of length N stored at remote server
    user wants to retrieve the bit xi for some i
    without disclosing any information about i to the server
    Paradox
    imagine buying in a store without the seller knowing what you buy
    X
    i
    x1, x2, …, xn
    xi
    User
    77
  • 113. Information-Theoretic 2-server PIR
    a1 = xl
    l ϵQ1
    Q1∈{1,…,n}
    i
    n
    Service Provider 1
    0
    0
    1
    1
    0
    0
    1
    1
    1
    0
    0
    0
    Q2=Q1 i
    i
    l ϵQ2
    xi = a1 a2
    Service Provider 2
    User
    +
    +
    +
    +
    a2 = xl
    78
  • 114. Conclusion & Outlook
    CurrentInfrastructures
    MS Azure
    Amazon RDS + SimpleDB
    Amazon Dynamo
    Google BigTable
    Yahoo! PNUTS
    Conclusion
    Challenges & Trends
    79
  • 115. Current Solutions
    one DB for all clients
    one DB per client
    AmazonSimpleDB / Dynamo
    Amazon RDS
    Yahoo! PNUTS
    Google Bigtable,Cassandra, Voldemort
    Amazon S3
    Microsoft SQL Azure
    Virtualization
    Replication
    DistributedStorage
    80
  • 116. Microsoft SQL Azure
    Cloud databaseserviceforAzureplatform
    Allows to create SQL server = group of databasesspreadacross multiple physicalmachines (incl. geo-location)
    Supports relational model and T-SQL (tables, views, indices, triggers, storedprocedures)
    Deployment and administrationusing SQL Server Management Studio
    Currentlimitations
    Individualdatabasesize = max. 10 GB
    No supportfor CLR, distributedqueries & transactions, spatialdata
    81
  • 117. Microsoft SQL Azure: Details
    Databases
    implemented as replicateddatapartitions
    Across multiple physicalnodes
    Provideloalbalancing and failover
    API
    SQL, ADO.NET, ODBC
    Tabular Data Streams
    SQL Server Authentication
    Sync Framework
    Prices
    1 GB database: $9.99/month, 10 GB: $99.99/month + datatransfer
    SLA: 99.9% availability
    82
  • 118. Microsoft Azure: Other Services
    AzureBlob
    Blobstorage; PUT/GET interface via REST
    Azure Table
    Structuredstorage; LINQ, ADO.NET interface
    Name
    Customer #1
    Address
    Customer
    Customer #2
    StorageAccount
    Property
    Entity
    Order
    Table
    • Propertiescanbedefined per entity; Max size of entity: 1 MB
    • 119. Partition key: usedforassigningentities to partitions; Rowkey: unique ID within a partition
    • 120. Sort order: singleindex per table
    • 121. Atomictransactionswithin a partition
    83
  • 122. Amazon RDS
    Amazon Relational Database Services
    Web Service to set up and operate a MySQLdatabase
    Full-featuredMySQL 5.1
    Automateddatabasebackup
    Java-basedcommandlinetools and Web Service API forinstanceadministration
    Native DB access
    Prices:
    Small DB instance (1.7 GB memory, 1 ECU): $0.11/hour
    Largest DB instance (68 GB, 26 ECU): $3.10/hour
    + $0.10 GB-monthstorage
    + datatransfer
    84
  • 123. Amazon Data Services
    Amazon Simple Storage Service (S3)
    DistributedBlobstorageforobjects (1 Byte ... 5 GB data)
    REST-basedinterface to read, write, and deleteobjectsidentifiedbyunique, user-definedkey
    Atomicsingle-keyupdates; no locking
    Eventualconsistency (partiallyread-after-write)
    Aug 2009: morethan 64 billionobjects
    AmazonSimpleDB (= Amazon Dynamo???)
    Distributedstructuredstorage
    Web Service API foraccess
    Eventualconsistency
    85
  • 124. AmazonSimpleDB
    Data model
    Relational-likedatamodel: domain = collection of itemsdescribedbykey-valuepairs; maxsize 10 GB
    Attributes canbeadded to certainrecords (256 per record)
    Name: Wolfgang
    Customer #1
    City: Dresden
    Customer
    Customer #2
    StorageAccount
    Attribute: Value
    Item
    Order
    Domain
    • Queries
    • 125. Restricted to a singledomain
    • 126. SFWsyntax + count() + multi-attributepredicates
    • 127. Onlystring-valueddata: lexicographicalcomparisons
    86
  • 128. Amazon Dynamo
    Highlyavailable and scalablekey-valuedatastorefortheAmazonplatform
    Managesthestate of Amazonservices
    Providingbestsellerlists, shoppingcarts, customerpreferences, productcatalogs -> requireonlyprimary-keyaccess (e.g. productid, customerid)
    Completelydecentralized, minimal needformanualadministration (e.g. partitioning, redistribution)
    Assumptions:
    Simple querymodel: put/getoperations on keys, smallobjects (< 1MB)
    Weakerconsistencybut high availability („alwayswritable“ datastore), no isolationguarantees
    Efficiency: running on commodityhardware, guaranteedlatency = SLAs, e.g. 300 ms response time for 99.9% of requests, peakload of 500 requests/sec.
    87
  • 129. Dynamo: Partitioning and Replication
    Partitioningscheme
    based on consistenthashing
    Virtualnodes: eachphysicalnodeisresponsibleformorethanonevirtualnode
    Replication
    Eachdataitemisreplicated at n nodes
    A
    Key space = ring
    B
    E
    Responsibility ofnode C
    C
    Replicas of keys
    Fromrange (B,C)
    D
    88
  • 130. Dynamo: Data Versioning
    Provideseventualconsistency -> asynchronouspropagation of updates
    Updates result in a newversion of thedata
    Vector clocksforcapturingcausalitiesbetween different versions of thesameobject
    Vector clock = list of (node, counter)
    Determinecausalordering/parallelbranches of versions
    Update requestshave to specifywhichversionis to beupdated
    Reconciliationduringclientreads!
    reconcile(D)@NA
    write(D)@NB
    write(D)@NA
    write(D)@NA
    D3([NA,2],[NB,1])
    D1([NA,1])
    D2([NA,2])
    D5([NA,3],[NB,1],[NC,1])
    write(D)@NC
    D4([NA,2],[NC,1])
    89
  • 131. Dynamo: Replicamaintenance
    Consistencyamongreplicas:
    Quorum protocol: R nodesmustparticipate in a read, W nodes in a write; R + W > N
    Sloppyquorum:
    Read/writesareperformed on thefirst N healthynodes
    Preference list: list of nodeswhichareresponsibleforstoring a givenkey
    For highestavailability: W=1
    Replicasynchronization
    Anti-entropy:
    Merkle trees:
    hashtreeswhereleavesarehashes of keys, non-leavesarehashes of children
    Ifhashvalues of twonodesareequal, no need to check children
    90
  • 132. Google BigTable
    Fast and large-scale DBMS for Google applications and services
    Designed to scaleinto PB range
    Usesdistributed Google File System (GFS) forstoringdata and log files
    Depends on a clustermanagementsystemformanagingresource, monitoringstates, scheduling, ....
    Canbeused as inputsource and outputtargetforMapReduceprograms
    91
  • 133. BigTable: Data Model
    Bigtable = sparse, distributed, multi-dimensional sortedmap
    Indexedbyrowkey, columnkey, timestamp; value = array of bytes
    Rowkeys up to 64 KB; columnkeysgrouped in columnfamilies
    Timestamp (64 bitint) usedforversioning
    Data ismaintained in lexicographic order byrowkeys
    Rowrangeisdynamicallypartitioned ➪ tablet = unit of distribution and loadbalancing
    Read/writeopsunder a singlerowkeyareatomic
    value
    columnkey
    rowkey
    t1
    t2
    92
  • 134. BigTable: System Architecture
    Single-masterdistributedstoragesystem
    masterserverresponsiblefor
    Assigningtablets to tabletservers
    Loadbalancing on tabletservers
    Detectingaddition and expiration of tabletservers
    Garbagecollection of GFS files
    Tabletservers
    Manage sets of tablets (10...1000 tablets per server, 100..200 MB per tablet)
    Handle read/writerequests
    Split tables
    Distributed, persistentlock/nameserviceChubby
    usesPaxosforreplicaconsistency (5 replicas)
    Providesnamespaceconsisting of directories and files; allowsdiscovering of tabletservers
    93
  • 135. BigTable: Tablets
    Internallystored in SSTables
    Immutable, sortedfile of key-valuepairs; organized in 64KB blocks + index (block ranges)
    TabletLocation
    Chubbycontainslocation of roottablet
    Roottabletcontainslocation of all tablets in a METADATA table
    METADATA tabletcontainslocation of usertablets + end keyrow (sparseindex)
    Three-levelschemeaddresses 234tablets
    Cachedbyclientlibrary
    User tables
    METADATA tablet
    Roottablet
    Chubbyfile
    94
  • 136. BigTable: Tablets /2
    TabletAssignment
    Startingtabletserversacquire an exclusive lock in Chubby -> allowsdiscovery of tabletservers
    Periodicallychecksbythemaster on the lock status of tabletservers
    Replication of dataperformedby GFS
    TabletServing
    Updates (mutations) arelogged and thenapplied to an in-memoryversion (memtable)
    Compactions
    ConvertmemtableintoSSTable
    MergeSSTables
    95
  • 137. Yahoo! PNUTS
    Yahoo!‘sdataservingplatform
    Data & querymodel:
    Simple relational model: tables of recordswithattributes (incl. Blobtypes)
    Flexible schemaevolutionbyaddingattributes at any time
    Queries: single-tableselection & projection
    Updates & deletionsbased on primary-keyaccess
    Storagemodel:
    Records as parsed JSON objects
    Filesystem-basedhashtablesorMySQLInnoDBengine
    96
  • 138. PNUTS Architecture
    Clients
    REST API
    Tablet
    controller
    Routers
    Message
    Broker
    Storage units
    97
  • 139. PNUTS: Consistency & Replication
    Consistencymodel:
    Per-recordtimelineconsistency: all replicasapply all updates in thesame order
    User-specificguarantees: ready-any, read-latest, read-newer-than, writes, write-after-version
    Partitioning and replication:
    Tableshorizontallypartionedintotablets (100 MB ...10 GB)
    Eachserverisresponsiblefor 100+ tables
    Asynchronousreplicationbyusingmessagebroker (publish/subscribe)
    Guarantees delivery of messages (incl. Logging)
    Provides partial ordering of messages
    Record-levelmembership + mastership-migrationprotocol
    98
  • 140. Comparison
    99
  • 141. Conclusion
    DBaaS = outsourcingdatabases to reduce TCO
    Reduce operational / administrationcosts
    Pay as yougomodel
    Widespectrum of solutions
    „rent a database“
    Cloud databases 
    Usecases
    Database hosting
    Hostedservices
    Large-scaledataanalytics
    100
  • 142. Challenges & Trends
    101
    Expressiveness:
    • Limitingfunctionality: SQL vs. put/get vs. MR
    Service-levelagreements:
    • Shielding: one (virtual) box per client
    • 143. Limitingfunctionality: SQL vs. put/getoperations
    • 144. Workloadmanagement
    Resourceprovisioning:
    • Virtualization on system and databaselevel
    Query & Programming Model
    Logical Data Model
    Virtuali-zation
    Service Level Agreements
    Storage Model
    Confidentiality and trust
    • Data encryption
    • 145. Information distribution
    DistributedStorage
    Scalability and availability
    • Throughredundancy and partitioning
    • 146. Butmayaffectconsistencymodel
  • References
    102
    F. Chang et al.: Bigtable: A DistributedStorage System forStructured Data, OSDI 2006.
    B.F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, Nick Puz , Daniel Weaver , RamanaYerneni, PNUTS: Yahoo!'shosteddataservingplatform, Proceedings of the VLDB Endowment, v.1 n.2, August 2008
    R. Baldoni, M. Raynal: Fundamentals of DistributedComputing: A Practical Tour of Vector Clock Systems, IEEE Distributed Systems Online, 2002
    E. Brewer: Towards Robust Distributed Systems, PODC 2000
    S. Gilbert, N. Lynch: Brewer‘sConjecture and theFeasibility of Consistent, Available, Partition-Tolerant Web Services, ACM SIGACT News, 2002
    W. Vogels: EventuallyConsistent – Revisited, ACM Queue 6(6), 2008
    D. Karger et al.: ConsistentHashing and RandomTrees: DistributedCachingProtocolsforRelieving Hot Spots on the World Wide Web, STOC '97
    Y. Saito, M. Shapiro: OptimisticReplication, ACM ComputingSurveys, 5(3):1-44, 2005
    S. Aulbach, T. Grust, D. Jacobs, A. Kemper, J. Rittinger: Multi-tenantdatabasesforsoftware as a service: schema-mappingtechniques. SIGMOD Conference 2008: 1195-1206
  • 147. References
    103
    G. DeCandia et al.: Dynamo: Amazon‘sHighlyAvailableKey-value Store, SOSP’07
    P. Bernstein et al.: Data Management Issues in Supporting Large-scale Web Services, IEEE Data Engineering Bulletin, Dec. 2006
    M. Brantner et al.: Building a Database on S3, SIGMOD’08
    A. Aboulnaga, C. Amza, K. Salem: Virtualization and databases: state of the art and research challenges. EDBT 2008: 746-747
    A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, S. Kamath: Automatic virtualmachineconfigurationfordatabaseworkloads. SIGMOD Conference 2008: 953-966
    C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Piglatin: a not-so-foreignlanguagefordataprocessing, Proceedingsofthe 2008 ACM SIGMOD international conference on Management ofdata, June 09-12, 2008, Vancouver, Canada
    R. Pike, S. Dorward, R. Griesemer, Se. Quinlan, Interpretingthedata: Parallel analysiswithSawzall, Scientific Programming, v.13 n.4, p.277-298, October 2005
  • 148. References
    104
    R. Chaiken, B. Jenkins , P Larson, B. Ramsey, D. Shakib, S. Weaver, J. Zhou, SCOPE: easy and efficient parallel processing of massive datasets, Proceedings of the VLDB Endowment, v.1 n.2, August 2008
    B. Hore, S. Mehrotra, G. Tsudik, A privacy-preservingindexforrangequeries, Proceedings of theThirtieth international conference on Very large databases, p.720-731, August 31-September 03, 2004, Toronto, Canada
    H. Hacigümüş, B. Iyer, C. Li, S. Mehrotra, Executing SQL overencrypteddata in thedatabase-service-providermodel, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
    D. Agrawal, A. El Abbadi, F. Emekçi, A. Metwally: Database Management as a Service: Challenges and Opportunities. ICDE 2009: 1709-1716
    A. Shamir, How to share a secret, Communications of the ACM, v.22 n.11, p.612-613, Nov. 1979
    F. Kerschbaum, J. Vayssière, Privacy-preservingdataanalytics as an outsourcedservice, Proceedings of the 2008 ACM workshop on Secure web services, October 31-31, 2008, Alexandria, Virginia, USA
    B. Chor, O. Goldreich, E. Kushilevitz , M. Sudan, Private informationretrieval, Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS'95), p.41, October 23-25, 1995
  • 149. Who has thefirstquestion?
    105
    ?
    wolfgang.lehner@tu-dresden.dekus@tu-ilmenau.de