Cloud Databases in Research and Practice
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Cloud Databases in Research and Practice

  • 745 views
Uploaded on

The combination of database systems and cloud computing is extremely attractive: unlimited storage capacities, elastic scalability and as-a-Service models seem to be within reach. This talk will......

The combination of database systems and cloud computing is extremely attractive: unlimited storage capacities, elastic scalability and as-a-Service models seem to be within reach. This talk will give an in-depth survey of existing solutions for cloud databases that evolved in the last years and provide classification and comparison. This includes real-world systems (e.g. Azure Tables, DynamoDB and Parse) as well as research approaches (e.g. RelationalCloud and ElasTras). In practice however, there are some unsolved problems. Network latency, scalable transactions, SLAs, multi-tenancy, abstract data modelling, elastic scalability and polyglot persistence pose daunting tasks for many scenarios. Therefore, we conclude with „Orestes“ a research approach based on well-known techniques such as web caching, Bloom filters and optimistic concurrency control that demonstrates how existing cloud databases can be enhanced to suit specific applications.

More in: Science , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
745
On Slideshare
742
From Embeds
3
Number of Embeds
1

Actions

Shares
Downloads
121
Comments
0
Likes
1

Embeds 3

http://www.slideee.com 3

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Felix Gessert (29.04.2014) Slides: baqend.com/nosql.pdf
  • 2. About me  PhD student (database group university of hamburg)
  • 3. About me  PhD student (database group university of hamburg) Research Project for PhD
  • 4. About me  PhD student (database group university of hamburg) Research Project for PhD Cloud Database Startup
  • 5. Outline • Categories of Cloud Databases • Properties What are Cloud Databases? Cloud Databases in the wild Research Perspectives Wrap-up and literature
  • 6. 2003: GFS (Google) 2006: BigTable (Google) 2007: Dynamo (Amazon) 2007: S3 (Amazon) 2008: SimpleDB (Amazon) 2010: SQL Azure 2011: Parse 2012: DynamoDB (Amazon) 2013: Redshift (Amazon)
  • 7. Context: What kinds of Cloud databases are there?
  • 8. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB BigQuery EMR GCS S3
  • 9. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed RDBMSs BigQuery EMR GCS S3
  • 10. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed RDBMSs Cloud-Deployment of DBMSs BigQuery EMR GCS S3
  • 11. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Cloud-Deployment of DBMSs BigQuery EMR GCS S3
  • 12. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR GCS S3
  • 13. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR Analytics-as- a-Service GCS S3
  • 14. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR Analytics-as- a-Service GCS S3 Object Stores
  • 15. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Storage APIs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR Analytics-as- a-Service GCS S3 Object Stores
  • 16. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Backend-as-a- Service Storage APIs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR Analytics-as- a-Service GCS S3 Object Stores
  • 17. Cloud-deployed database Data-Analytics- as-a-Service Database-as-a- Service SQL Managed RDBMS Managed DWH NoSQL Managed NoSQL DB Backend-as-a- Service Proprietary or Polyglot Files Object Store Cloud Databases
  • 18. Cloud-deployed database Data-Analytics- as-a-Service Database-as-a- Service SQL Managed RDBMS Managed DWH NoSQL Managed NoSQL DB Backend-as-a- Service Proprietary or Polyglot Files Object Store Cloud Databases Standard Interface
  • 19. Cloud-deployed database Data-Analytics- as-a-Service Database-as-a- Service SQL Managed RDBMS Managed DWH NoSQL Managed NoSQL DB Backend-as-a- Service Proprietary or Polyglot Files Object Store Cloud Databases Standard Interface Vendor- specific Interface
  • 20. Application Architecture <-> Cloud Database Category Architectures Applications Data Warehouse Operative Database Reporting Data MiningAnalytics DataManagementDataAnalytics
  • 21. Application Architecture <-> Cloud Database Category Architectures Applications Data Warehouse Operative Database Reporting Data MiningAnalytics DataManagementDataAnalytics DBaaS
  • 22. Application Architecture <-> Cloud Database Category Architectures Applications Data Warehouse Operative Database Reporting Data MiningAnalytics DataManagementDataAnalytics DBaaSProbaby not.
  • 23. Cloud-Deployed Database IaaS-Cloud
  • 24. Cloud-Deployed Database IaaS-Cloud Cloud-deploy your favourite database system
  • 25. Cloud-Deployed Database IaaS-Cloud Cloud-deploy your favourite database system Does not solve: Provisioning, Backups, Security, Scaling, Elasticity, Performance Tuning, Failover, Replication, ...
  • 26. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud DBaaS-Provider
  • 27. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider Provisioning, Backups, Security, Scaling, Elasticity, Performance Tuning, Failover, Replication, ...
  • 28. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider
  • 29. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider SQL Azure Google Cloud SQL RDBMS
  • 30. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider SQL Azure Google Cloud SQL RDBMSNoSQLDB
  • 31. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider Amazon Redshift SQL Azure Google Cloud SQL RDBMSNoSQLDBDWH
  • 32. Proprietary DB/Object Store Cloud Black-Box Database or file system Managed by Cloud Provider Provider‘sAPI
  • 33. Proprietary DB/Object Store Cloud Black-Box Database or file system Managed by Cloud Provider Provider‘sAPI Amazon SimpleDB Google Cloud Datastore Azure Tables Database.com BigTable, Megastore, Spanner, F1, Dynamo, PNuts, Relational Cloud, … ProprietaryDB
  • 34. Proprietary DB/Object Store Cloud Black-Box Database or file system Managed by Cloud Provider Provider‘sAPI Amazon SimpleDB Google Cloud Storage Azure Blob Storage Google Cloud Datastore Azure Tables Openstack Swift Database.com BigTable, Megastore, Spanner, F1, Dynamo, PNuts, Relational Cloud, … ProprietaryDBObjectStore
  • 35. Backend-as-a-Service, Polyglot Persistence Service IaaS-Cloud Backend API Service-Layer Data API
  • 36. Backend-as-a-Service, Polyglot Persistence Service IaaS-Cloud Backend API Service-Layer Data API Authentication, Users, Validation,etc. Maps to (different) databases
  • 37. Backend-as-a-Service, Polyglot Persistence Service IaaS-Cloud Backend API Service-Layer Data API Realtime BaaS
  • 38. Backend-as-a-Service, Polyglot Persistence Service IaaS-Cloud Backend API Service-Layer Data API Realtime BaaS BaaS AppCelerator Cloud
  • 39. Analytics-as-a-Service Cloud Analytics Cluster Provisioning, Data Ingest
  • 40. Analytics-as-a-Service Cloud Analytics Cluster Provisioning, Data Ingest Azure HDInsight Amazon Elastic MapReduce Hadoop
  • 41. Analytics-as-a-Service Cloud Analytics Cluster Provisioning, Data Ingest Azure HDInsight Google BigQuery Google Prediction API Amazon Elastic MapReduce HadoopCustom
  • 42. DBaaS: Common Aspects #1 Metric: Total Cost Daniela Florescu and Donald Kossmann “Rethinking cost and performance of database systems”, SIGMOD Rec. 2009.
  • 43. DBaaS: Common Aspects #1 Metric: Total Cost Maximum utilization of available hardware: Multi-Tenancy Daniela Florescu and Donald Kossmann “Rethinking cost and performance of database systems”, SIGMOD Rec. 2009.
  • 44. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS Private Process/DB Private Schema Shared Schema
  • 45. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS VM Hardware Resources Database Process Database Schema Private Process/DB Private Schema Shared Schema e.g. Amazon RDS
  • 46. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS VM Hardware Resources Database Process Database Schema Private Process/DB Private Schema VM Hardware Resources Database Process Database Schema Shared Schema e.g. Amazon RDS e.g. MongoHQ
  • 47. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS VM Hardware Resources Database Process Database Schema Private Process/DB Private Schema VM Hardware Resources Database Process Database Schema VM Hardware Resources Database Process Database Schema Shared Schema e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore
  • 48. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS VM Hardware Resources Database Process Database Schema Private Process/DB Private Schema VM Hardware Resources Database Process Database Schema VM Hardware Resources Database Process Database Schema Shared Schema VM Hardware Resources Database Process Database Schema Virtual Schema e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore Most SaaS Apps
  • 49. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013 Private OS Private Process/DB Private Schema Shared Schema App. indep. Isolation Ressource Util. Maintenance, Provisioning
  • 50.  Billing Models: DBaaS: Common Aspects Usage Account
  • 51.  Billing Models: DBaaS: Common Aspects Usage Account Pay-per-use Parameters: Network, Bandwidth, Storage, CPU, Requests, etc. Payment: Pre-Paid, Post-Paid Variants: On-Demand, Auction, Reserved e.g. DynamoDB
  • 52.  Billing Models: DBaaS: Common Aspects Usage Account End of month Plan-based Parameters: Allocated Plan (e.g. 2 instances + X GB storage) e.g. MongoHQ
  • 53.  Billing Models: DBaaS: Common Aspects Usage Account End of month Plan-based Parameters: Allocated Plan (e.g. 2 instances + X GB storage) Free Tier: free plan or free initial account credit e.g. MongoHQ
  • 54. DBaaS: Common Aspects Database-a- a-Service Authentication Authorization API
  • 55. DBaaS: Common Aspects Database-a- a-Service Authentication Authorization API Authenticate
  • 56. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Database-a- a-Service Authentication Authorization API Authenticate
  • 57. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively Database-a- a-Service Authentication Authorization API Authenticate
  • 58. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively Database-a- a-Service Authentication Authorization API Authenticate Token
  • 59. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request
  • 60. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively User-based Access Control Role-based Access Control Policies e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request
  • 61. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively User-based Access Control Role-based Access Control Policies e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request Response
  • 62. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively User-based Access Control Role-based Access Control Policies e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request Response
  • 63. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively User-based Access Control Role-based Access Control Policies e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request Response Federated ACLs M. Decat, B. Lagaisse, et al. “Toward efficient and confidentiality-aware federation of access control policies “, DOA Trusted Cloud 2013 • Imagine ACL: „Patient data can only be accessed by treating physician“ • Idea: decompose policy and evaluate parts near data owner
  • 64.  Service Level Agreements DBaaS: Common Aspects SLA
  • 65.  Service Level Agreements DBaaS: Common Aspects SLA Legal Part 1. Fees 2. Penalties Technical Part 1. SLO 2. SLO 3. SLO
  • 66.  Service Level Agreements DBaaS: Common Aspects SLA Legal Part 1. Fees 2. Penalties Technical Part 1. SLO 2. SLO 3. SLO Service Level Objectives: • Availability • Durability • Consistency/Staleness • Query Response Time
  • 67.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 68.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 69.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 70.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 71.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013 Maximize:
  • 72.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 73.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013 QOS for NoSQL DBs Y. Zhu et al. “Scheduling with Freshness and Performance Guarantees for Web Applications in the Cloud“, CRPIT Old: Workload management in RDBMs (DB2 and Oracle) New: Use well-known scheduling algorithms for queries in replicated DBs
  • 74.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time
  • 75.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Expected Load
  • 76.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Expected Load Provisioned Resources: • #No of Shard- or Replica servers • Computing, Storage, Network Capacities
  • 77.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Actual Load
  • 78.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Actual Load Overprovisioning: • SLAs met • Excess Capacities
  • 79.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Actual Load Overprovisioning: • SLAs met • Excess Capacities Underprovisioning: • SLAs violated • Usage maximized
  • 80.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Actual Load Overprovisioning: • SLAs met • Excess Capacities Underprovisioning: • SLAs violated • Usage maximized SmartSLA P. Xiong: “Intelligent management of virtualized resources for database systems in cloud environment”, ICDE 2011 Solution: machine learning (regression + boosting) for prediction  choose allocation that minimizes SLA penalties Resource allocation Database performance Learn Mapping
  • 81. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations
  • 82. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations
  • 83. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations
  • 84. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations aaS
  • 85. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations aaS Questions to ask: • Which requirements are met by the DB? • Which are met by the provider?  SLAs
  • 86. Outline Examples of different cloud database systems: • Cloud-deployed • Managed DBMS • SQL • NoSQL • Proprietary • BaaS What are Cloud Databases? Cloud Databases in the wild Research Perspectives Wrap-up and literature
  • 87.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces
  • 88.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces 1. Provision VM(s)
  • 89.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces 1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
  • 90.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces > whirr launch-cluster --config hbase.properties Login, cluster-size etc. Amazon EC2 1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
  • 91.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces > whirr launch-cluster --config hbase.properties Login, cluster-size etc. Amazon EC2 1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
  • 92.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 93.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 94.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 95.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 96.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 97.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 98.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 99.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific Bad: • No Clusters • Not managed (automatic Updates, Snapshots, etc.) • Private OS Multi-Tenancy  Bad Resource Usage Good: • Easy to get started
  • 100. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 101. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 102. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Job Tracker Task Tracker + HDFS Data Node Task Tracker Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 103. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Data Source and Sink Job Tracker Task Tracker + HDFS Data Node Task Tracker Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 104. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Submits Hadoop Jobs as: • JAR • Streaming • Cascading • Pig • Hive • Impala Data Source and Sink Job Tracker Task Tracker + HDFS Data Node Task Tracker Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 105. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Submits Hadoop Jobs as: • JAR • Streaming • Cascading • Pig • Hive • Impala Data Source and Sink Job Tracker Task Tracker + HDFS Data Node Task Tracker Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013 • No data locality with S3 • AWS Import/Export: send your HDD • HBase Integration • Compatible with Spot and Reserved Instances • Similar: Azure HDInsight
  • 106.  Idea: Web-scale analysis of nested data Google BigQuery BigQuery Model: Analytics-aaS Pricing: Storage + GBs Processed API: REST Google BigQuery
  • 107.  Idea: Web-scale analysis of nested data Google BigQuery BigQuery Model: Analytics-aaS Pricing: Storage + GBs Processed API: REST Google BigQuery
  • 108.  Idea: Web-scale analysis of nested data Google BigQuery BigQuery Model: Analytics-aaS Pricing: Storage + GBs Processed API: REST Google BigQuery Dremel Melnik et al. “Dremel: Interactive analysis of web-scale datasets”, VLDB 2010 Idea: Multi-Level execution tree on nested columnar data format (≥100 nodes)
  • 109.  Idea: Web-scale analysis of nested data Google BigQuery BigQuery Model: Analytics-aaS Pricing: Storage + GBs Processed API: REST Google BigQuery Dremel Melnik et al. “Dremel: Interactive analysis of web-scale datasets”, VLDB 2010 Idea: Multi-Level execution tree on nested columnar data format (≥100 nodes) • SLA: 99.9% uptime / month • Fundamentally different from relational DWHs and MapReduce • Design copied by Apache Drill, Impala, Shark
  • 110.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific
  • 111.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific
  • 112.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific • Synchronous Replication • Automatic Failover
  • 113.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific • Synchronous Replication • Automatic Failover 99,95% uptime SLA
  • 114.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific • Synchronous Replication • Automatic Failover 99,95% uptime SLA Provisioned IOPS: access to EBS volumes network- optimized (up to 4000 IOPS)
  • 115.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific
  • 116.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific EC2 instances: Up to 32 Cores, 244 GB RAM, 10 GbE
  • 117.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific EC2 instances: Up to 32 Cores, 244 GB RAM, 10 GbE Minor Version Upgrades are performed without downtime
  • 118.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific
  • 119.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific Backups are automated and scheduled
  • 120.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific Backups are automated and scheduled • Support for (asynchronous) Read Replicas • Administration: Web-based or SDKs • Only RDBMSs • “Analytic Brother“ of RDS: RedShift (PDWH)
  • 121.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 122.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 123.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 124.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 125.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 126.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication
  • 127.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Keyless Table Group: regular database Keyed Table Group: partitioned by row key Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication
  • 128.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Keyless Table Group: regular database Keyed Table Group: partitioned by row key Consistency unit (ACID boundary) Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication
  • 129.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Keyless Table Group: regular database Keyed Table Group: partitioned by row key Consistency unit (ACID boundary) Automatic Partitioning for Keyed Table Groups Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication
  • 130.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Keyless Table Group: regular database Keyed Table Group: partitioned by row key Consistency unit (ACID boundary) Automatic Partitioning for Keyed Table Groups Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication • SLA: 99.9% uptime / month • Usually Cheaper than RDS (Multi-Tenancy) • Smaller Databases (max. 150 GB) • Rich MSSQL server tooling • Keyed Table Group internal feature only
  • 131.  MySQL for Google App Engine PaaS  Support for: patching, replication, backup  SLA: 99,95 % uptime / month       Other RDBMS services Google Cloud SQL Google Cloud SQL Pricing: Database size Underlying DB: MySQL
  • 132.  MySQL for Google App Engine PaaS  Support for: patching, replication, backup  SLA: 99,95 % uptime / month  Postgres for Heroku PaaS  Hosted on EC2  No SLAs    Other RDBMS services Google Cloud SQL Google Cloud SQL Pricing: Database size Underlying DB: MySQL Heroku Postgres Pricing: Plan based Underlying DB: Postgres
  • 133.  MySQL for Google App Engine PaaS  Support for: patching, replication, backup  SLA: 99,95 % uptime / month  Postgres for Heroku PaaS  Hosted on EC2  No SLAs  MySQL for OpenStack (Icehouse)  Under development (HP driven)  VM ⇔ Tenant Other RDBMS services Google Cloud SQL Google Cloud SQL Pricing: Database size Underlying DB: MySQL Heroku Postgres Pricing: Plan based Underlying DB: Postgres Trove Pricing: Own Hardware Underlying DB: MySQL Trove
  • 134.  MySQL for Google App Engine PaaS  Support for: patching, replication, backup  SLA: 99,95 % uptime / month  Postgres for Heroku PaaS  Hosted on EC2  No SLAs  MySQL for OpenStack (Icehouse)  Under development (HP driven)  VM ⇔ Tenant Other RDBMS services Google Cloud SQL Google Cloud SQL Pricing: Database size Underlying DB: MySQL Heroku Postgres Pricing: Plan based Underlying DB: Postgres Trove Pricing: Own Hardware Underlying DB: MySQL Trove Evaluation of Cloud RDBMSs D. Kossmann,T. Kraska: An evaluation of alternative architectures for transaction processing in the cloud”, Sigmod 2010 TPC-W Benchmark (Online Shop), 2010, Emulated Browsers / RPS:
  • 135. HBase Wide- Column CP Over Row Key ~700 1/4 Apache (EMR) MongoDB Doc- ument CP yes >100 <500 4/4 GPL Riak Key- Value AP ~60 3/4 Apache (Softlayer) Cassandra Wide- Column AP With Comp. Index >300 <1000 2/4 Apache Redis Key- Value CA Through Lists, etc. manual N/A 4/4 BSD Managed NoSQL services Model CAP Scans Sec. Indices Largest Cluster Lic. Lear- ning DBaaS
  • 136. HBase Wide- Column CP Over Row Key ~700 1/4 Apache (EMR) MongoDB Doc- ument CP yes >100 <500 4/4 GPL Riak Key- Value AP ~60 3/4 Apache (Softlayer) Cassandra Wide- Column AP With Comp. Index >300 <1000 2/4 Apache Redis Key- Value CA Through Lists, etc. manual N/A 4/4 BSD Managed NoSQL services Model CAP Scans Sec. Indices Largest Cluster Lic. Lear- ning DBaaS And there are many more: • CouchDB (e.g. Cloudant) • CouchBase (e.g. KuroBase Beta) • ElasticSearch(e.g. Bonsai) • Solr (e.g. WebSolr) • …
  • 137. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 138. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 139. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 140. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 141. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 142. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce  Private Process Multi- Tenancy Scale-Up Strategy: RAM, IOPs and CPU increased at runtime Maximum Size: 1TB Free Tier
  • 143. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce  Private Process Multi- Tenancy Scale-Up Strategy: RAM, IOPs and CPU increased at runtime Maximum Size: 1TB Free Tier VM-Deployment (EC2): M1.large on EC2: $128.10 on EC2 with 1-yr-Res.: $30 With MongoHQ: $637
  • 144. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce  #1 thing that should never happen:
  • 145. MongoHQ  Problem: Scalability Client Client configconfigconfig mongos Replica Set Master Slave Slave mongos
  • 146. MongoHQ  Problem: Scalability Client Client configconfigconfig mongos Replica Set Master Slave Slave mongos What if Writes / second or data volume become bottleneck?
  • 147. MongoHQ  Problem: Scalability Client Client configconfigconfig mongos Replica Set Replica Set Master Slave Slave Master Slave Slave mongos Sharding (Scale Out) • Dynamic Scaling: Tenant adds Replica Set • Elastic Scaling: Provider adds/removes Replica Set MongoHQ: Only manual sharding on “contact us” basis
  • 148. MongoHQ  Problem: Scalability Client Client configconfigconfig mongos Replica Set Replica Set Master Slave Slave Master Slave Slave mongos Sharding (Scale Out) • Dynamic Scaling: Tenant adds Replica Set • Elastic Scaling: Provider adds/removes Replica Set MongoHQ: Only manual sharding on “contact us” basis • Bad: no SLAs, no horizontal scaling • Good: Solid Dashboard, Backups, Replication, integration with Heroku • Competitors: MongoLab, ObjectRocket, MongoSoup
  • 149. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“
  • 150. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“ Memcache or RedisMemcache can be run as a cluster (client- side sharding) Limited choice of instance types
  • 151. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“ Announced last Saturday: Snapshots
  • 152. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“
  • 153. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“ $ elasticache-modify-cache-parameter-group xy AWS CLI tools and SDKs offer a superset of Dashboard functions
  • 154.  Many Hosted NoSQL DbaaS Providers represented  Heroku DBaaS Addons
  • 155.  Many Hosted NoSQL DbaaS Providers represented  And Search Heroku DBaaS Addons
  • 156. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App:
  • 157. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App: Add Redis2Go Addon:
  • 158. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App: Add Redis2Go Addon: Use Connection URL (environment variable):
  • 159. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App: Add Redis2Go Addon: Use Connection URL (environment variable): Deploy:
  • 160. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App: Add Redis2Go Addon: Use Connection URL (environment variable): Deploy: • Very simple • Only suited for small to medium applications (no SLAs, limited control)
  • 161. SimpleDB Table- Store CP Yes (as queries) Auto- matic SQL-like (no joins, groups, …) REST + SDKs Dynamo- DB Table- Store CP By range key / index Local Sec. Global Sec. Key+Cond. On Range Key(s) REST + SDKs Automatic over Prim. Key Azure Tables Table- Store CP By range key Key+Cond. On Range Key REST + SDKs Automatic over Part. Key 99.9% uptime AE/Cloud DataStore Entity- Group CP Yes (as queries) Auto- matic Conjunct. of Eq. Predicates REST/ SDK, JDO,JPA Automatic over Entity Groups S3, Az. Blob, GCS Blob- Store AP REST + SDKs Automatic over key 99.9% uptime (S3) Proprietary Database services Model CAP Scans Sec. Indices Queries API SLA Scale- out
  • 162. SimpleDB Table- Store CP Yes (as queries) Auto- matic SQL-like (no joins, groups, …) REST + SDKs Dynamo- DB Table- Store CP By range key / index Local Sec. Global Sec. Key+Cond. On Range Key(s) REST + SDKs Automatic over Prim. Key Azure Tables Table- Store CP By range key Key+Cond. On Range Key REST + SDKs Automatic over Part. Key 99.9% uptime AE/Cloud DataStore Entity- Group CP Yes (as queries) Auto- matic Conjunct. of Eq. Predicates REST/ SDK, JDO,JPA Automatic over Entity Groups S3, Az. Blob, GCS Blob- Store AP REST + SDKs Automatic over key 99.9% uptime (S3) Proprietary Database services Model CAP Scans Sec. Indices Queries API SLA Scale- out There are many more object stores (HP, Rackspace, etc.) …but no comparable Table Stores
  • 163. Azure Storage Load-Balancing-System Application HTTP/HTTPS Worker-Role Web-Role IIS-Webserver Virtual Machines Call Other Services Not Allowed Virtual Machines
  • 164. Azure Storage Load-Balancing-System Application HTTP/HTTPS Worker-Role Web-Role IIS-Webserver Virtual Machines Call Other Services Not Allowed Virtual Machines Tables Blobs Queues
  • 165.  Table Service example: Azure Tables Partition Key Row Key (sortiert) Timestamp (autom.) Property1 Propertyn intro.pdf v1.1 14/6/2013 … … intro.pdf v1.2 15/6/2013 … präs.pptx v0.0 11/6/2013 … Partition Partition RESTAPI
  • 166.  Table Service example: Azure Tables Partition Key Row Key (sortiert) Timestamp (autom.) Property1 Propertyn intro.pdf v1.1 14/6/2013 … … intro.pdf v1.2 15/6/2013 … präs.pptx v0.0 11/6/2013 … Partition Partition RESTAPI SparseHash-distributed to parition servers No Index: Lookup only (!) by full table scan Atomic "Entity- Group Batch Transaction" possible
  • 167.  Similar to Amazon SimpleDB and DynamoDB Table Service example: Azure Tables Partition Key Row Key (sortiert) Timestamp (autom.) Property1 Propertyn intro.pdf v1.1 14/6/2013 … … intro.pdf v1.2 15/6/2013 … präs.pptx v0.0 11/6/2013 … Partition Partition RESTAPI • Indexes all attributes • Rich(er) queries • Many Limits (size, RPS, etc.) • Provisioned Throughput • On SSDs („single digit latency“) • Optional Indexes
  • 168. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries       Azure Tables
  • 169. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries       Azure Tables
  • 170. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries Good: Automatic Distribution Replicated 3x locally + 1x async. geo-replica    Azure Tables
  • 171. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries Good: Automatic Distribution Replicated 3x locally + 1x async. geo-replica Very good: SLA (99.9% uptime) Internal architecture published Azure Tables
  • 172. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries Good: Automatic Distribution Replicated 3x locally + 1x async. geo-replica Very good: SLA (99.9% uptime) Internal architecture published Azure Tables Windows Azure Storage B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011 Idea: • Layered storage infrastructure for Blobs and Tables • Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)
  • 173. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries Good: Automatic Distribution Replicated 3x locally + 1x async. geo-replica Very good: SLA (99.9% uptime) Internal architecture published Azure Tables Windows Azure Storage B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011 Idea: • Layered storage infrastructure for Blobs and Tables • Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding) Think: GFS/HDFS Think: BigTable/HBase Think: Chubby/ZooKeeper
  • 174. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB Limitations: • Slow (~50-100 RPS) • 10 GB per Domain • Query result max. 2500 records or 1 MB • Max. 1K-sized attributes
  • 175. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB dom:com.cnn content : "<html>…" Primary Key Attribute (scalar or set) page:index Range Key Item:
  • 176. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB dom:com.cnn content : "<html>…" Primary Key Attribute (scalar or set) Querying Options: • GetItem: Key Lookup • Query: Primary Key + Condition on Range Key • Scan: Full Table Scan with filter • EMR: Hive queries (for analytics) page:index Range Key Item:
  • 177. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB  Consistency:  Strongly (2x price) or Eventually Consistent Reads  Atomic (Conditional) Updates per Item  Indexing Options: ◦ Local Sec. Index: consistent additional Range Key ◦ Global Sec. Index: eventually consistent index-table (Primary Key) dom:com.cnn content : "<html>…" Primary Key Attribute (scalar or set) page:index Range Key Item:
  • 178. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB
  • 179. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB
  • 180. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB
  • 181. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB
  • 182. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB Unit of Billing
  • 183. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB Unit of Billing Good: • Low Latency (SSD) • Data partitioning and AZ-replication Bad: • Scaling not elastic (Capacity Units) • No SLAs, no internals published (≠ Dynamo!) • No built-in backups ( AWS data pipeline) • Vendor Lock-in
  • 184. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus 
  • 185. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus 
  • 186. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus 
  • 187. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus  Schemafree Entity Group (EG) data model: User ID Name Photo ID User URL Root Table Child Table 1 n
  • 188. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus  Schemafree Entity Group (EG) data model: User ID Name Photo ID User URL Root Table Child Table 1 n EG: User + n Photos • Unit of ACID transactions/ consistency • Fields autoindexed (eventually consistent)
  • 189. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus  Schemafree Entity Group (EG) data model: User ID Name Photo ID User URL Root Table Child Table 1 n EG: User + n Photos • Unit of ACID transactions/ consistency • Fields autoindexed (eventually consistent) SELECT * FROM photos WHERE ANCESTOR IS :34 AND name = „sunset“ ORDER BY date ASC LIMIT 10 OFFSET 10
  • 190. AE/Cloud DataStore  Internally: Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable
  • 191. AE/Cloud DataStore  Internally: Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable
  • 192. AE/Cloud DataStore  Internally: MegaStore J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011. • Paxos-based replication and transactions • 100 Google applications Problems: Slow Writes, Predefined Entity Groups Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable
  • 193. AE/Cloud DataStore  Internally: MegaStore J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011. • Paxos-based replication and transactions • 100 Google applications Problems: Slow Writes, Predefined Entity Groups Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable Spanner J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013 Idea: • Autosharded Entity Groups • Not based on BigTable Implementation: • TrueTime API (GPS + atomic clocks)  commit timestamps of 2PL-SI transactions • Paxos-replication per Shard
  • 194. AE/Cloud DataStore  Internally: MegaStore J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011. • Paxos-based replication and transactions • 100 Google applications Problems: Slow Writes, Predefined Entity Groups Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable Spanner J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013 Idea: • Autosharded Entity Groups • Not based on BigTable Implementation: • TrueTime API (GPS + atomic clocks)  commit timestamps of 2PL-SI transactions • Paxos-replication per Shard F1 J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB 2013 Idea: • Full SQL relational database built on Spanner, powers AdWords Implementation: • 5-way replication • Data Model: relational + hierarchy (customercampaignAdGroup) • Transactions: Snapshot-Read-Only, Pessimistic (Spanner), optimistic • Distributed SQL Engine
  • 195. AE/Cloud DataStore  Internally: MegaStore J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011. • Paxos-based replication and transactions • 100 Google applications Problems: Slow Writes, Predefined Entity Groups Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable Spanner J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013 Idea: • Autosharded Entity Groups • Not based on BigTable Implementation: • TrueTime API (GPS + atomic clocks)  commit timestamps of 2PL-SI transactions • Paxos-replication per Shard F1 J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB 2013 Idea: • Full SQL relational database built on Spanner, powers AdWords Implementation: • 5-way replication • Data Model: relational + hierarchy (customercampaignAdGroup) • Transactions: Snapshot-Read-Only, Pessimistic (Spanner), optimistic • Distributed SQL Engine Good: • Transactions (though limited) • Good scalability of data volume Bad: • Entity Groups hard to define • Bad Scale-Out for write and reads (Kossmann et al.)  no advantage over RDBMS for small data volumes A Spanner/F1 based DBaaS?
  • 196.  Idea: Blobs with RESTful CRUD Interface  Distribution: 𝐵𝑢𝑐𝑘𝑒𝑡, 𝐾𝑒𝑦 → 𝑆𝑒𝑟𝑣𝑒𝑟 + 𝑅𝑒𝑝𝑙𝑖𝑐𝑎𝑠  CDN Integration (Azure CDN, Cloudfront)  S3: > 2 Trillion (2 ⋅ 1012) objects Azure Blobs, Amazon S3, Google Cloud Storage Container Blobcontains Block 1 (up to 4 MB) Block 2 Block n REST-Request Azure Blobs: Block Blob: 4MB blocks  large files Page Blob: 512B blocks  random IO
  • 197.  Automatic Versioning  Pricing: per request (10.000 ~ 1c) and storage (1GB/month ~ 3c) and network (1GB ~ 10c) Azure Blobs, Amazon S3, Google Cloud Storage S3 Blob DELETE /puppy.jpg HTTP/1.1 Host: mybucket.s3.amazonaws.com Authorization: AWS AKIAIO... AWS Ireland DC Replicas Reduced Redundancy: only 1 replica Glacier: tape-disk archivalAmazon S3:
  • 198.  Automatic Versioning  Pricing: per request (10.000 ~ 1c) and storage (1GB/month ~ 3c) and network (1GB ~ 10c) Azure Blobs, Amazon S3, Google Cloud Storage S3 Blob DELETE /puppy.jpg HTTP/1.1 Host: mybucket.s3.amazonaws.com Authorization: AWS AKIAIO... AWS Ireland DC Replicas Reduced Redundancy: only 1 replica Glacier: tape-disk archivalAmazon S3: S3 Consistency D. Bermbach,, S. Tai. "Eventual consistency: How soon is eventual?”, MW4SOC 11 Findings: • Inconsistency window varies from 2-11 seconds • Monotonic Read Consistency is often violated
  • 199.  Automatic Versioning  Pricing: per request (10.000 ~ 1c) and storage (1GB/month ~ 3c) and network (1GB ~ 10c) Azure Blobs, Amazon S3, Google Cloud Storage S3 Blob DELETE /puppy.jpg HTTP/1.1 Host: mybucket.s3.amazonaws.com Authorization: AWS AKIAIO... AWS Ireland DC Replicas Reduced Redundancy: only 1 replica Glacier: tape-disk archivalAmazon S3: S3 Consistency D. Bermbach,, S. Tai. "Eventual consistency: How soon is eventual?”, MW4SOC 11 Findings: • Inconsistency window varies from 2-11 seconds • Monotonic Read Consistency is often violated Building a database on S3 M. Brantner, et al. "Building a database on S3." Sigmod 2008 Idea: • Use S3 as the persistent storage of a database Implementation: • Buffer Pool and Log Manager on S3 • No transaction or query support
  • 200.  Founded June, 2011  Acquired by Facebook April, 2013  Pricing: ◦ Free ◦ Pro (199$) ◦ Enterprise Parse - MBaaS Parse Model: Backend-aas Pricing: Plan-based Underlying DB: Mainly MongoDB API: SDKs, REST
  • 201.  Founded June, 2011  Acquired by Facebook April, 2013  Pricing: ◦ Free ◦ Pro (199$) ◦ Enterprise Parse - MBaaS Parse Model: Backend-aas Pricing: Plan-based Underlying DB: Mainly MongoDB API: SDKs, REST Parse Core
  • 202.  Founded June, 2011  Acquired by Facebook April, 2013  Pricing: ◦ Free ◦ Pro (199$) ◦ Enterprise Parse - MBaaS Parse Model: Backend-aas Pricing: Plan-based Underlying DB: Mainly MongoDB API: SDKs, REST Parse Core Parse Analytics
  • 203.  Founded June, 2011  Acquired by Facebook April, 2013  Pricing: ◦ Free ◦ Pro (199$) ◦ Enterprise Parse - MBaaS Parse Model: Backend-aas Pricing: Plan-based Underlying DB: Mainly MongoDB API: SDKs, REST Parse Core Parse Analytics Parse Push
  • 204. Parse - MBaaS
  • 205. Authentication User + Password OAuth: Facebook, Twitter Parse - MBaaS
  • 206. Authentication User + Password OAuth: Facebook, Twitter Parse - MBaaS Query Cinemas (new Parse.Query('Cinemas')) .withinKilometers(...) .fetch()
  • 207. Authentication User + Password OAuth: Facebook, Twitter Parse - MBaaS Query Cinemas (new Parse.Query('Cinemas')) .withinKilometers(...) .fetch() Query Movies (new Parse.Query('Movies')) .greaterThan('startAt‘, now) .notEqualTo('cinemas', cId) .fetch()
  • 208. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Web Browser Node.js + MongoDB (Single Server) WebSocket
  • 209. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket
  • 210. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Players = new Meteor. Collection("players"); if (Meteor.isServer) { //… <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket
  • 211. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Players = new Meteor. Collection("players"); if (Meteor.isServer) { //… if (Meteor.isClient) { Template.leaderboard.players = function () { return Players.find({}, {sort: {score: -1, name: 1}}); }; <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket
  • 212. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Players = new Meteor. Collection("players"); if (Meteor.isServer) { //… if (Meteor.isClient) { Template.leaderboard.players = function () { return Players.find({}, {sort: {score: -1, name: 1}}); }; <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket $ meteor deploy abc.meteor.com
  • 213. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Players = new Meteor. Collection("players"); if (Meteor.isServer) { //… if (Meteor.isClient) { Template.leaderboard.players = function () { return Players.find({}, {sort: {score: -1, name: 1}}); }; <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket $ meteor deploy abc.meteor.com Very productive for very small projects Fundamentally limited scalability: • Server tails Mongo‘s oplog • And holds the fetched data state of every client
  • 214. Outline • Hot Topics • Orestes: a scalable, low- latency architecture • Baqend: putting it into practice What are Cloud Databases? Cloud Databases in the wild Research Perspectives Wrap-up and literature
  • 215.  Example: CryptDB  Idea: Only decrypt as much as neccessary Encrypted Databases: Research RDBMS SQL-Proxy Encrypts and decrypts, rewrites queries
  • 216.  Example: CryptDB  Idea: Only decrypt as much as neccessary Encrypted Databases: Research RDBMS SQL-Proxy Encrypts and decrypts, rewrites queries
  • 217.  Example: CryptDB  Idea: Only decrypt as much as neccessary Encrypted Databases: Research RDBMS SQL-Proxy Encrypts and decrypts, rewrites queries Relational Cloud C. Curino, et al. "Relational cloud: A database-as-a-service for the cloud.“, CIDR 2011 DBaaS Architecture: • Encrypted with CryptDB • Multi-Tenancy through live migration • Workload-aware partitioning (graph-based)
  • 218.  Example: CryptDB  Idea: Only decrypt as much as neccessary Encrypted Databases: Research RDBMS SQL-Proxy Encrypts and decrypts, rewrites queries Relational Cloud C. Curino, et al. "Relational cloud: A database-as-a-service for the cloud.“, CIDR 2011 DBaaS Architecture: • Encrypted with CryptDB • Multi-Tenancy through live migration • Workload-aware partitioning (graph-based) • Early approach • Not adopted in practice, yet Dream solution: Full Homorphic Encryption
  • 219. Transactions/Consistency: Research Dynamo Eventual None 1 RT - Yahoo PNuts Timeline per key Single Key 1 RT possible COPS Causality Multi-Record 1 RT possible MySQL (async) Serializable Static Partition 1 RT possible Megastore Serializable Static Partition 2 RT - Spanner/F1 Snapshot Isolation Partition 2 RT - MDCC Read-Commited Multi-Record 1 RT - Consistency Transactional Unit Commit Latency Data Loss?
  • 220. Transactions/Consistency: Research Dynamo Eventual None 1 RT - Yahoo PNuts Timeline per key Single Key 1 RT possible COPS Causality Multi-Record 1 RT possible MySQL (async) Serializable Static Partition 1 RT possible Megastore Serializable Static Partition 2 RT - Spanner/F1 Snapshot Isolation Partition 2 RT - MDCC Read-Commited Multi-Record 1 RT - Consistency Transactional Unit Commit Latency Data Loss? Multi-Data Center Consistency T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013. Idea: • Multi-Data center commit protocol with single round-trip Implementation: • Optimistic Commit Protocol • Fast, Generalized Multi-Paxos Result: almost as fast as Dynamo-style
  • 221. Transactions/Consistency: Research Dynamo Eventual None 1 RT - Yahoo PNuts Timeline per key Single Key 1 RT possible COPS Causality Multi-Record 1 RT possible MySQL (async) Serializable Static Partition 1 RT possible Megastore Serializable Static Partition 2 RT - Spanner/F1 Snapshot Isolation Partition 2 RT - MDCC Read-Commited Multi-Record 1 RT - Consistency Transactional Unit Commit Latency Data Loss? Multi-Data Center Consistency T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013. Idea: • Multi-Data center commit protocol with single round-trip Implementation: • Optimistic Commit Protocol • Fast, Generalized Multi-Paxos Result: almost as fast as Dynamo-style Currently no NoSQL DB implements consistent Multi-DC replication
  • 222.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Data Store
  • 223.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Client WorkloadGenerator PluggableDBinterface Data Store Threads Stats
  • 224.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Client WorkloadGenerator PluggableDBinterface Workload: 1. Operation Mix 2. Record Size 3. Popularity Distribution Runtime Parameters: DB host name, threads, etc. Data Store Threads Stats
  • 225.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Client WorkloadGenerator PluggableDBinterface Workload: 1. Operation Mix 2. Record Size 3. Popularity Distribution Runtime Parameters: DB host name, threads, etc. Read() Insert() Update() Delete() Scan() Data Store Threads Stats DB protocol
  • 226.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Client WorkloadGenerator PluggableDBinterface Workload: 1. Operation Mix 2. Record Size 3. Popularity Distribution Runtime Parameters: DB host name, threads, etc. Read() Insert() Update() Delete() Scan() Data Store Threads Stats DB protocol Workload Operation Mix Distribution Example A – Update Heavy Read: 50% Update: 50% Zipfian Session Store B – Read Heavy Read: 95% Update: 5% Zipfian Photo Tagging C – Read Only Read: 100% Zipfian User Profile Cache D – Read Latest Read: 95% Insert: 5% Latest User Status Updates E – Short Ranges Scan: 95% Insert: 5% Zipfian/ Uniform Threaded Conversations
  • 227.  Example Result (Read Heavy): Benchmarking: Research
  • 228.  Example Result (Read Heavy): Benchmarking: Research Weaknesses: • Single client can be a bottleneck • No consistency & availability measurement
  • 229.  Example Result (Read Heavy): Benchmarking: Research YCSB++ S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011 • Clients coordinate through Zookeeper • Simple Read-After-Write Checks • Evaluation: Hbase & Accumulo Weaknesses: • Single client can be a bottleneck • No consistency & availability measurement
  • 230.  Example Result (Read Heavy): Benchmarking: Research YCSB++ S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011 • Clients coordinate through Zookeeper • Simple Read-After-Write Checks • Evaluation: Hbase & Accumulo Weaknesses: • Single client can be a bottleneck • No consistency & availability measurement • No Transaction Support YCSB+T A. Dey et al. “YCSB+T: Benchmarking Web-Scale Transactional Databases”, CloudDB 2014 • New workload: Transactional Bank Account • Simple anomaly detection for Lost Updates • No comparison of systems
  • 231.  Example Result (Read Heavy): Benchmarking: Research YCSB++ S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011 • Clients coordinate through Zookeeper • Simple Read-After-Write Checks • Evaluation: Hbase & Accumulo Weaknesses: • Single client can be a bottleneck • No consistency & availability measurement • No Transaction Support YCSB+T A. Dey et al. “YCSB+T: Benchmarking Web-Scale Transactional Databases”, CloudDB 2014 • New workload: Transactional Bank Account • Simple anomaly detection for Lost Updates • No comparison of systems No specific application CloudStone, CARE, TPC extensions?
  • 232. Benchmarking: state of the art
  • 233. Benchmarking: state of the art Tests latency and throughput of IaaS-Providers, CDNs and Object Stores
  • 234. Benchmarking: state of the art Tests latency and throughput of IaaS-Providers, CDNs and Object Stores Does not test: Cloud Databases
  • 235. Vision YCSB Harmony Idea:
  • 236. Vision YCSB Harmony YCSB Client YCSB Client YCSB Client YCSB Client Idea: 1. Periodically launch clients
  • 237. Vision YCSB Harmony YCSB Client YCSB Client YCSB Client YCSB Client Idea: 1. Periodically launch clients 2. Benchmark Cloud DB 3. Publish results
  • 238. Vision YCSB Harmony System Availablity Reads/s Writes/s Avg. Latency 95th perc. Latency Plot DynamoDB 99.9% 23411 34534 3.2 ms 9 ms RDS 99.8% 2342 2455 30 ms 80 ms Azure Table 99.5% 22343 23442 12 ms 20 ms Google DataStore 99.5% 3000 2000 30 ms 300 ms
  • 239. Database research at the University of Hamburg
  • 240. Motivation Classic 3-Tier-ArchitectureThree-Tier Architecture
  • 241. Motivation ClientApplicationDatabase Web Server Web Server Classic 3-Tier-ArchitectureThree-Tier Architecture
  • 242. Motivation ClientApplicationDatabase Web Server Web Server Classic 3-Tier-ArchitectureThree-Tier Architecture
  • 243. Motivation ClientApplicationDatabase Web Server Web Server High Latency Classic 3-Tier-ArchitectureThree-Tier Architecture
  • 244. Motivation ClientApplicationDatabase Web Server Web Server Average (2014): 90 HTTP Requests per page load High Latency Classic 3-Tier-ArchitectureThree-Tier Architecture
  • 245. Motivation ClientApplicationDatabase Web Server Web Server Average (2014): 90 HTTP Requests per page load High Latency Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%. Study by Amazon
  • 246. Motivation ClientApplicationDatabase Web Server Web Server Average (2014): 90 HTTP Requests per page load High Latency Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%. Study by Amazon When increasing load time of search results by 500ms, traffic decreases by 20%. Study by Google
  • 247. Motivation ClientApplicationDatabase Web Server Web Server Average (2014): 90 HTTP Requests per page load High Latency Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%. Study by Amazon When increasing load time of search results by 500ms, traffic decreases by 20%. Study by Google
  • 248. SPAs Rich ClientApplicationDatabase Web Server Web Server Single-Page Applications High Latency (Rich Client, Smart Client) Data (e.g. JSON) (AngularJS, Backbone, Ember.JS, etc.)
  • 249. ORESTES Rich ClientOrestesDB-Cluster REST Server Low Latency Cloud REST Server
  • 250. ORESTES Rich ClientOrestesDB-Cluster REST Server Web- Caches Low Latency Cloud REST Server Cacheable Database Objects
  • 251. ORESTES Rich ClientOrestesDB-Cluster REST Server Web- Caches Low Latency Scalable NoSQL DBs Cloud REST Server Cacheable Database Objects
  • 252. ORESTES Rich ClientOrestesDB-Cluster REST Server Web- Caches Low Latency Scalable NoSQL DBs Cloud Exposes the DB via REST API and handles: • Scaling • Cache Consistency • Transactions • Schema REST Server Cacheable Database Objects
  • 253.  Middleware: ◦ Scalable REST API for aggregate-oriented persistence, queries, schema management and transactions.  Caching: ◦ Consistent web caching of database objects for read scalability and latency reduction.  Transactions: ◦ Optimistic cache-aware transaction model. ORESTES: Components
  • 254. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store
  • 255. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store HTTP Server Trans- actions Querys Object Persist. Schema Key- Value Doc- uments DBaaS & BaaS Layer Transaction Validation HTTP Server Config- uration Partial Updates Access Control Multi-Tenancy Schema Management Workload Management Cache Coherence Autoscaling Database- independent Concerns Database - specific Wrappers SLAs HTTP Server
  • 256. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store ISP Forward-Proxy Caches ISP Caches Reverse-Proxy Caches and Load Balancers CDN Caches Content Delivery Networks Purge Scale HTTP Server Trans- actions Querys Object Persist. Schema Key- Value Doc- uments DBaaS & BaaS Layer Transaction Validation HTTP Server Config- uration Partial Updates Access Control Multi-Tenancy Schema Management Workload Management Cache Coherence Autoscaling Database- independent Concerns Database - specific Wrappers SLAs HTTP Server
  • 257. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store ISP Forward-Proxy Caches ISP Caches Reverse-Proxy Caches and Load Balancers CDN Caches Content Delivery Networks Purge Scale HTTP Server Trans- actions Querys Object Persist. Schema Key- Value Doc- uments DBaaS & BaaS Layer Transaction Validation HTTP Server Config- uration Partial Updates Access Control Multi-Tenancy Schema Management Workload Management Cache Coherence Autoscaling Database- independent Concerns Database - specific Wrappers SLAs HTTP Server Redis (Replicated) 10201040 10101010Counting Bloom Filter add delete Node.JS (local to Server) Stored Procedures Custom Validation
  • 258. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store ISP Forward-Proxy Caches ISP Caches Reverse-Proxy Caches and Load Balancers CDN Caches Content Delivery Networks Purge Scale HTTP Server Trans- actions Querys Object Persist. Schema Key- Value Doc- uments DBaaS & BaaS Layer Transaction Validation HTTP Server Config- uration Partial Updates Access Control Multi-Tenancy Schema Management Workload Management Cache Coherence Autoscaling Database- independent Concerns Database - specific Wrappers SLAs HTTP Server Redis (Replicated) 10201040 10101010Counting Bloom Filter add delete Node.JS (local to Server) Stored Procedures Custom Validation GET /db/{bucket}/{class}/{id} 200 OK Cache-Control: public, max-age=6000 ETag: "3" JSON Object
  • 259. Caching Orestes Client- (Browser-) Cache Proxy Caches ISP Caches CDN Caches Reverse- Proxy Caches Miss Hit Miss Miss Miss Miss 100% 50% 0% P(Cache-Hit) 0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
  • 260. Caching Orestes Client- (Browser-) Cache Proxy Caches ISP Caches CDN Caches Reverse- Proxy Caches Miss Hit Miss Miss Miss Miss 100% 50% 0% P(Cache-Hit) 0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms em.find(id) JavaScript
  • 261. Caching Orestes Client- (Browser-) Cache Proxy Caches ISP Caches CDN Caches Reverse- Proxy Caches Miss Hit Miss Miss Miss Miss 100% 50% 0% P(Cache-Hit) 0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms GET /db/posts/{id} HTTP
  • 262. Caching Orestes Client- (Browser-) Cache Proxy Caches ISP Caches CDN Caches Reverse- Proxy Caches Miss Hit Miss Miss Miss Miss 100% 50% 0% P(Cache-Hit) 0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms Cache-Hit: Return Object Cache-Miss: Forward Request
  • 263. Caching Orestes Client- (Browser-) Cache Proxy Caches ISP Caches CDN Caches Reverse- Proxy Caches Miss Hit Miss Miss Miss Miss 100% 50% 0% P(Cache-Hit) 0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms Fetch object from DB and return it with caching information
  • 264. Caching Orestes Client- (Browser-) Cache Proxy Caches ISP Caches CDN Caches Reverse- Proxy Caches Miss Hit Miss Miss Miss Miss 100% 50% 0% P(Cache-Hit) 0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms Scalability and Cache-Hits
  • 265. Caching Orestes Client- (Browser-) Cache Proxy Caches ISP Caches CDN Caches Reverse- Proxy Caches Miss Hit Miss Miss Miss Miss 100% 50% 0% P(Cache-Hit) 0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms Scalability and Cache-Hits Latency Benefit
  • 266. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020
  • 267. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 How to prevent stale reads (inconsistency)?
  • 268. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020
  • 269. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020
  • 270. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020
  • 271. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) hashB(oid)hashA(oid) 13
  • 272. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) 131 1 110 Flat(Counting Bloomfilter)
  • 273. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) 131 1 110 hashB(oid)hashA(oid)
  • 274. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) 131 1 110 hashB(oid)hashA(oid)
  • 275. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) 131 1 110
  • 276. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 hashB(oid)hashA(oid) 1 1 110
  • 277. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 hashB(oid)hashA(oid) 1 1 110 𝑓 ≈ 1 − 𝑒− 𝑘𝑛 𝑚 𝑘 𝑘 = ln 2 ⋅ ( 𝑛 𝑚 ) False-Positive Rate: Hash- Functions: With 10.000 Updates per 10 minutes and 1% error rate: 12 KByte
  • 278. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 hashB(oid)hashA(oid) 1 1 110
  • 279. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client
  • 280. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1
  • 281. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Writes (Hidden)
  • 282. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Commit: read- & write-set versions Committed OR aborted + stale objects 3 Writes (Hidden)
  • 283. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Commit: read- & write-set versions Committed OR aborted + stale objects 3 Writes (Hidden) validation 4 E.g. Redis or ZooKeeper prevent conflicting validations
  • 284. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Commit: read- & write-set versions Committed OR aborted + stale objects 3 Writes (Hidden) validation 4 E.g. Redis or ZooKeeper 5Writes (Public) Read all prevent conflicting validations
  • 285. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Commit: read- & write-set versions Committed OR aborted + stale objects 3 Writes (Hidden) validation 4 E.g. Redis or ZooKeeper 5Writes (Public) Read all prevent conflicting validations Caching → Shorter transaction duration → less aborts
  • 286. Polyglot Persistence application Orestes servers REST/HTTP protocol Redis MongoDB db4o meta data contains SLA parse SLA & route data manage materialisation resolve mapping Polyglot Persistence Mediator
  • 287. Polyglot Persistence application Orestes servers REST/HTTP protocol Redis MongoDB db4o meta data contains SLA parse SLA & route data manage materialisation resolve mapping Polyglot Persistence Mediator
  • 288. Polyglot Persistence application Orestes servers REST/HTTP protocol Redis MongoDB db4o meta data contains SLA parse SLA & route data manage materialisation resolve mapping Polyglot Persistence Mediator
  • 289. Polyglot Persistence application Orestes servers REST/HTTP protocol Redis MongoDB db4o meta data contains SLA parse SLA & route data manage materialisation resolve mapping Polyglot Persistence Mediator Results: Article-Objects with Impression Count Article ID Title … Imp. Imp. ID MongoDB Redis Sorted Set Speedup with PPM: • 50-1000% • 66% performance of Varnish
  • 290. Cloud Evaluation of ORESTES Client Machine 50 ... Web Cache Orestes Server Versant DB Amazon EC2 Ireland EC2 USA165 ms Client Machine Client Machine
  • 291. Cloud Evaluation of ORESTES Client Machine 50 ... Web Cache Orestes Server Versant DB Amazon EC2 Ireland EC2 USA165 ms Client Machine Client Machine 30 000 Objekte 500 Anfragen/ Client 30 000 Objects 500 Req./Clients 10/1 Read/Write
  • 292. Cloud Evaluation of ORESTES Client Machine 50 ... Web Cache Orestes Server Versant DB Amazon EC2 Ireland EC2 USA165 ms Client Machine Client Machine
  • 293.  Stateless Scale-out REST middleware for DBaaS  Generic Schema, Authentication, Multi-Tenancy, etc.  SCOT (Scalable Optimistic Cache-Aware Transactions): ◦ Optimistic Database-indepdent ACID transactions  BFB (Bloom Filter Bounded Staleness): ◦ Allows static caching with tunable consistency guarantee  PPM (Polyglot Persistence Mediator): ◦ SLAs  appropriate database Orestes: Summary
  • 294. From Research to Practice
  • 295.  Orestes as a startup Baqend
  • 296.  Orestes as a startup Baqend Internet Seoxy REST-API Transactions Schema Management Cache Consistency Auto-Scaling Multi-Tenancy Security and Access Control Provisioning
  • 297.  Orestes as a startup Baqend
  • 298.  Orestes as a startup Baqend
  • 299.  Orestes as a startup Baqend
  • 300.  Orestes as a startup Baqend
  • 301.  Orestes as a startup Baqend
  • 302. Baqend in Action
  • 303. Baqend in Action GET /app.html
  • 304. Baqend in Action GET /app.html db.find(Menu, 'main') .done(...); db.find(Page, 'hero') .done(...); db.query(Page, 'top3') .done(...);
  • 305. Baqend in Action GET /app.html db.find(Menu, 'main') .done(...); db.find(Page, 'hero') .done(...); db.query(Page, 'top3') .done(...); GET /img/pic005.jpg GET /img/pic017.jpg GET /img/pic022.jpg
  • 306. More: Baqend.com
  • 307. Wrap-up & book recommendations
  • 308. How to choose a cloud database: Wrap-up Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store
  • 309. How to choose a cloud database: Wrap-up Define your functional requirements Define your non-functional requirements Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store
  • 310. How to choose a cloud database: Wrap-up Define your functional requirements Define your non-functional requirements Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store 1. Underyling DB 2. Docs & books 3. Your own tests Evaluate by:
  • 311. How to choose a cloud database: Wrap-up Define your functional requirements Define your non-functional requirements Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store 1. Underyling DB 2. Docs & books 3. Your own tests 4. SLAs 5. Docs & books 6. Experience Reports Evaluate by:
  • 312. How to choose a cloud database: Wrap-up Define your functional requirements Define your non-functional requirements Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store 1. Underyling DB 2. Docs & books 3. Your own tests 4. SLAs 5. Docs & books 6. Experience Reports Evaluate by: Try it
  • 313. Book recommendations
  • 314. Book recommendations • (Non-scientific) literature is rare • Some books cover specific DBaaS and cloud platforms
  • 315. Blogs http://nosql.mypopescu.com/ http://www.dzone.com/mz/nosql http://www.dbms2.com/ http://hackingdistributed.com/ http://highscalability.com/ http://www.nosqlweekly.com/
  • 316. VLDB (Very Large Databases) SIGMOD (Special Interest Group on Management of Data) ICDE (International Conference on Data Engineering) CIDR (Conference on Innovative Data Systems Research) SOCC (Symposium on Cloud Computing) OSDI/SOSP (Operating Systems Design and Implementation/ Symposium on Operating System Principles) EuroSys Top Scientific Conferences Database Research Distributed Systems Research
  • 317. VLDB (Very Large Databases) SIGMOD (Special Interest Group on Management of Data) ICDE (International Conference on Data Engineering) CIDR (Conference on Innovative Data Systems Research) SOCC (Symposium on Cloud Computing) OSDI/SOSP (Operating Systems Design and Implementation/ Symposium on Operating System Principles) EuroSys Top Scientific Conferences Database Research Distributed Systems Research This year probably in Washington D.C. Learn more: scdm2013.com
  • 318. Thank you. Queries? Contact: felix.gessert@baqend.com http://baqend.com http://orestes.info http://scdm2013.com