Felix Gessert is a PhD student in the database group at the University of Hamburg. His research project for his PhD focuses on cloud database startups. In the presentation, Gessert categorizes different types of cloud databases including database-as-a-service, infrastructure-as-a-service, platform-as-a-service, managed RDBMS/DWH/NoSQL databases, proprietary databases/object stores, backend-as-a-service, analytics-as-a-service, and cloud-deployed databases. He then discusses common aspects of database-as-a-service including multi-tenancy approaches, billing models, authentication, authorization, and service level agreements.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
Web Performance – die effektivsten Techniken aus der PraxisFelix Gessert
Eine durchschnittliche Webseite lädt 2299KB an Daten und macht dafür 100 HTTP Anfragen. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber, welche Techniken Ladezeiten effektiv minimieren, gehen weit auseinander. Dieser Vortrag gibt einen detaillierten Überblick zu den wichtigsten Techniken der Web Performance Optimierung vom Critical Rendering Path bis zu verteilten Caching-Infrastrukturen an einem Beispiel aus der Praxis.
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesFelix Gessert
Nach aktuellem Stand (April 2016) lädt eine durchschnittliche Webseite 2299KB an Daten und macht dafür 100 HTTP Requests. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber mit welchen Techniken sich Ladezeiten effektiv minimieren lassen, gehen weit auseinander. Wir möchten einen völlig neuen Ansatz vorstellen, der in 5 Jahren Forschung im Fachbereich Informatik an der Uni Hamburg entwickelt wurde. Die Idee dahinter ist die wohl älteste Performance-Optimierung der Informatik überhaupt: Caching. Das neue an der Methode liegt darin, dass alle Arten von existierenden Web Caches vom Browser bis zum CDN durch ein paar algorithmischen Tricks dazu in der Lage versetzt werden, stets aktuelle Daten auszuliefern, anstatt mit über den Daumen gepeilten TTLs längst veralteten Content zu verteilen. Das auf Bloomfiltern, Real-Time Query Matching und Machine Learning basierende "Cache Sketch" Verfahren möchten wir im Detail diskutieren und zeigen, wie sich moderne Web-Anwendungen damit drastisch beschleunigen lassen.
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
We'll be covering some aspects of our architecture, highlighting differences between MongoDB and Cassandra. We'll go in depth to explain why Cassandra is a better choice for our general purpose Application Platform (SHIFT) as well as our Media Buying Analytics tool (the SHIFT Media Manager). We'll be going over common design patterns people might be familiar with coming from a background with MongoDB and highlight how Cassandra would be used as a better alternative. We'll also touch more on cqlengine which is nearing feature completeness as the Cassandra object mapper for Python.
Slides from workshop held on 12/14 in Asbury Park, NJ
http://www.meetup.com/Jersey-Shore-Tech/events/148118762/?gj=ro2_e&a=ro2_gnl&rv=ro2_e&_af_eid=148118762&_af=event
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB
MongoDB introduces new capabilities that change the way micro-services interact with the database, capabilities that are either absent or exist only partially in high-end commercial databases such as Oracle. In this session I will share from my experiences building a cloud-based, multi-tenant SaaS application with extreme security requirements. We will cover topics including considerations for storing multi-tenant data in the database, best practices for authentication and authorization, and performance considerations specific to security in MongoDB.
Apache Druid ingests and enables instant query on many billions of events in real-time. But how? In this talk, each of the components of an Apache Druid cluster is described – along with the data and query optimisations at its core – that unlock fresh, fast data for all.
Bio: Peter Marshall (https://linkedin.com/in/amillionbytes/) leads outreach and engineering across Europe for Imply (http://imply.io/), a company founded by the original developers of Apache Druid. He has 20 years architecture experience in CRM, EDRM, ERP, EIP, Digital Services, Security, BI, Analytics, and MDM. He is TOGAF certified and has a BA (hons) degree in Theology and Computer Studies from the University of Birmingham in the United Kingdom.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
Web Performance – die effektivsten Techniken aus der PraxisFelix Gessert
Eine durchschnittliche Webseite lädt 2299KB an Daten und macht dafür 100 HTTP Anfragen. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber, welche Techniken Ladezeiten effektiv minimieren, gehen weit auseinander. Dieser Vortrag gibt einen detaillierten Überblick zu den wichtigsten Techniken der Web Performance Optimierung vom Critical Rendering Path bis zu verteilten Caching-Infrastrukturen an einem Beispiel aus der Praxis.
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesFelix Gessert
Nach aktuellem Stand (April 2016) lädt eine durchschnittliche Webseite 2299KB an Daten und macht dafür 100 HTTP Requests. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber mit welchen Techniken sich Ladezeiten effektiv minimieren lassen, gehen weit auseinander. Wir möchten einen völlig neuen Ansatz vorstellen, der in 5 Jahren Forschung im Fachbereich Informatik an der Uni Hamburg entwickelt wurde. Die Idee dahinter ist die wohl älteste Performance-Optimierung der Informatik überhaupt: Caching. Das neue an der Methode liegt darin, dass alle Arten von existierenden Web Caches vom Browser bis zum CDN durch ein paar algorithmischen Tricks dazu in der Lage versetzt werden, stets aktuelle Daten auszuliefern, anstatt mit über den Daumen gepeilten TTLs längst veralteten Content zu verteilen. Das auf Bloomfiltern, Real-Time Query Matching und Machine Learning basierende "Cache Sketch" Verfahren möchten wir im Detail diskutieren und zeigen, wie sich moderne Web-Anwendungen damit drastisch beschleunigen lassen.
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
We'll be covering some aspects of our architecture, highlighting differences between MongoDB and Cassandra. We'll go in depth to explain why Cassandra is a better choice for our general purpose Application Platform (SHIFT) as well as our Media Buying Analytics tool (the SHIFT Media Manager). We'll be going over common design patterns people might be familiar with coming from a background with MongoDB and highlight how Cassandra would be used as a better alternative. We'll also touch more on cqlengine which is nearing feature completeness as the Cassandra object mapper for Python.
Slides from workshop held on 12/14 in Asbury Park, NJ
http://www.meetup.com/Jersey-Shore-Tech/events/148118762/?gj=ro2_e&a=ro2_gnl&rv=ro2_e&_af_eid=148118762&_af=event
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB
MongoDB introduces new capabilities that change the way micro-services interact with the database, capabilities that are either absent or exist only partially in high-end commercial databases such as Oracle. In this session I will share from my experiences building a cloud-based, multi-tenant SaaS application with extreme security requirements. We will cover topics including considerations for storing multi-tenant data in the database, best practices for authentication and authorization, and performance considerations specific to security in MongoDB.
Apache Druid ingests and enables instant query on many billions of events in real-time. But how? In this talk, each of the components of an Apache Druid cluster is described – along with the data and query optimisations at its core – that unlock fresh, fast data for all.
Bio: Peter Marshall (https://linkedin.com/in/amillionbytes/) leads outreach and engineering across Europe for Imply (http://imply.io/), a company founded by the original developers of Apache Druid. He has 20 years architecture experience in CRM, EDRM, ERP, EIP, Digital Services, Security, BI, Analytics, and MDM. He is TOGAF certified and has a BA (hons) degree in Theology and Computer Studies from the University of Birmingham in the United Kingdom.
MongoDB has been conceived for the cloud age. Making sure that MongoDB is compatible and performant around cloud providers is mandatory to achieve complete integration with platforms and systems. Azure is one of biggest IaaS platforms available and very popular amongst developers that work on Microsoft Stack.
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoData Con LA
Abstract:- Should you use SQL on NoSQL Engine ? With MySQL Document Store you can do both. In this talk we will introduce MySQL Document Store and discuss its advantages and downsides compared to purpose build Document Store database engines such as MongoDB
Eine durchschnittliche Webseite lädt 2299KB an Daten und macht dafür 100 HTTP Anfragen. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber, welche Techniken Ladezeiten effektiv minimieren, gehen weit auseinander. Dieser Vortrag gibt einen detaillierten Überblick zu den wichtigsten Techniken der Web Performance Optimierung vom Critical Rendering Path bis zu verteilten Caching-Infrastrukturen an einem Beispiel aus der Praxis.
Building a Real-Time Gaming Analytics Service with Apache DruidImply
At GameAnalytics we receive and process real time behavioural data from more than 100 million daily active users, helping thousands of game studios and developers understand user behaviour and improve their games. In this talk, you will learn how we managed to migrate our legacy backend system from using an in-house built streaming analytics service to Apache Druid, and the lessons learned along the way. By adopting Druid, we have been able to reduce development costs, increase reliability of our systems and implement new features that would have not been possible with our old stack. We will provide an overview of our approach to schema design, segments optimization, creation of our query layer, caching and datasources optimisation, which can help you better understand how you can successfully use Druid as a key component on your data processing and reporting infrastructure.
Matt Sarrel of Imply draws on his work benchmarking Apache Druid with the Star Schema Benchmark (SSB) and shows how you can performance test Druid with your workload. Virtual meetup of July 16, 2020.
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.
This is the slide collection from the second talk from:
https://www.meetup.com/druidio-la/events/254080924/
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
Innovative companies are building Internet of Things, mobile, content management, single view, and big data apps on top of MongoDB. In this session, we'll explore how the IBM POWER8 platform brings new levels of performance and ease of configuration to these solutions which already benefit from easier and faster design and development using MongoDB.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Apache Druid®: A Dance of Distributed ProcessesImply
Apache Druid® is an open source analytics database powering fresh, fast analytics in companies from AirBnB to Zeotap on clickstream, telemetry, financial transactions, applications and more. In this talk, we open the box on the three distributed processes in Druid led by the coordinator, overlord, and broker, and the ways that these come together to deliver reliable, performant query, ingestion, and management services.
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...MongoDB
Speaker: Joseph Fluckiger, Senior Software Architect, ThermoFisher Scientific
Level: 200 (Intermediate)
Track: Atlas
Mass spectrometry is the gold standard for determining chemical compositions, with spectrometers often measuring the mass of a compound down to a single electron. This level of granularity produces an enormous amount of hierarchical data that doesn't fit well into rows and columns. In this talk, learn how Thermo Fisher is using MongoDB Atlas on AWS to allow their users to get near real-time insights from mass spectrometry experiments – a process that used to take days. We also share how the underlying database service used by Thermo Fisher was built on AWS.
What You Will Learn:
- How we modeled mass spectrometry data to enable us to write and read an enormous about of experimental data efficiently.
- Learn about the best MongoDB tools and patterns for .NET applications.
- Live demo of scaling a MongoDB Atlas cluster with zero down time and visualizing live data from a million dollar Mass Spectrometer stored in MongoDB.
MongoDB 2.6 is the biggest MongoDB release ever. In this presentation you are going to explore which features, improvements and capabilities were added to the latest version and how you can smoothly upgrade your deployments.
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
MongoDB is a leading nosql database. It is horizonatally scalable, document datastore. In this introduction given at Dr Dobbs Conference, Bangalore and Pune in April 2014, I show schema design with an example blog application and Python code snippets. I delivered the same in the maiden MongoDB Evening event at Delhi and Gurgaon in May 2014.
When constructing a data model for your MongoDB collection for CMS, there are various options you can choose from, each of which has its strengths and weaknesses. The three basic patterns are:
1.Store each comment in its own document.
2.Embed all comments in the “parent” document.
3.A hybrid design, stores comments separately from the “parent,” but aggregates comments into a small number of documents, where each contains many comments.
Code sample and wiki documentation is available on https://github.com/prasoonk/mycms_mongodb/wiki.
MoPub, a Twitter company, provides monetization solutions for mobile app publishers and developers around the globe. MoPub receives over 33 Billion ad requests per day generating over 200TB of raw logs every day. We built MoPub Analytics as the analytics platform, using Druid + Imply for our end users who are Publishers, Demand side partners and Internal users.
We will talk about the architecture of the analytics platform, our Druid cluster setup, hardware choices, monitoring, use cases, limiting factors, challenges with lookups and solutions we used.
Watch video:https://imply.io/virtual-druid-summit/analytics-over-terabytes-of-data-at-twitter-apache-druid
Apache Druid ingests and enables instant query on many billions of events in real-time. But how? In this talk, each of the components of an Apache Druid cluster is described – along with the data and query optimisations at its core – that unlock fresh, fast data for all.
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
Ensuring a consistently great Netflix experience while continuously pushing innovative technology updates is no easy feat.
We'll look at how Netflix turns log streams into real-time metrics to provide visibility into how devices are performing in the field. Including sharing some of the lessons learned around optimizing Druid to handle our load.
A Presentation on MongoDB Introduction - HabilelabsHabilelabs
It is Scalable High-Performance Open-source, Document-orientated database.
Built for Speed - the performance of traditional key-value stores while maintaining functionality of traditional RDBMS.
ScyllaDB recently announced Project Alternator, a new open source project that will enable Amazon DynamoDB users to easily migrate to an open-source database that runs anywhere — on most cloud platforms, on-premises, on bare-metal, virtual machines or via Kubernetes — all while preserving their investments in their existing application code.
Project Alternator will help DynamoDB users achieve much better and more reliable performance, reduce database costs by 80% - 90%, support large items (10s of MBs) and large partitions (multiple GBs), control the number of replicas, balance cost vs. redundancy, and much more.
Join ScyllaDB founders Avi Kivity and Dor Laor and lead engineer Nadav Har’El for a live webinar on September 25th, where they will share an overview of Project Alternator, including:
Alternator’s design implementation and goals
How to configure Alternator (ok, add alternator_port: 8000 to your scylla.yaml)
Demo how to easily run it from docker/rpm
Run several examples:
Tic-tac-toe based DynamoDB example with Alternator
How to benchmark Scylla Alternator with YCSB and considerations around it
How to run a serverless application along with Alternator
How to migrate DynamoDB data to Alternator using the Spark migrator
Discuss the current limitations of Alternator
Plus we will discuss current limitations of Alternator, describe different consistencies and active-active vs leader model, share the project roadmap, and answer your questions at the end.
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Felix Gessert
In this talk we share the lessons learned while building out the Baqend Cloud platform on AWS and Docker. Baqend’s AWS-hosted architecture consists of a caching CDN-Layer, global and local load balancing, a group of REST and Node.js servers and a database cluster with Redis and MongoDB. As customers have their own set of containerized REST and Node servers, we needed a cluster that on the one hand is horizontally scalable and on the other hand easily manageable and fault-tolerant from an operational perspective. Today there are at least 4 popular systems that claim to support this:
- Kubernetes
- Apache Mesos
- Docker Swarm
- AWS Elastic Container Service (ECS)
Thinking that ECS would certainly be the easiest option on AWS, we started building our cluster on it. We quickly came to realize that while ECS was astoundingly stable and easy to use there were inherent limitations that could not be worked around. An old Docker version, missing network isolation, no means of parameterizing task and forced memory constraints are major limitations of ECS we will talk about. Seeing the daunting operational overhead of running Kubernetes or Mesos in practice we turned to Docker’s native clustering solution Swarm. We will present how Swarm works with both Docker and AWS and highlight the advantages and downsides compared to Amazon’s ECS.
MongoDB has been conceived for the cloud age. Making sure that MongoDB is compatible and performant around cloud providers is mandatory to achieve complete integration with platforms and systems. Azure is one of biggest IaaS platforms available and very popular amongst developers that work on Microsoft Stack.
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoData Con LA
Abstract:- Should you use SQL on NoSQL Engine ? With MySQL Document Store you can do both. In this talk we will introduce MySQL Document Store and discuss its advantages and downsides compared to purpose build Document Store database engines such as MongoDB
Eine durchschnittliche Webseite lädt 2299KB an Daten und macht dafür 100 HTTP Anfragen. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber, welche Techniken Ladezeiten effektiv minimieren, gehen weit auseinander. Dieser Vortrag gibt einen detaillierten Überblick zu den wichtigsten Techniken der Web Performance Optimierung vom Critical Rendering Path bis zu verteilten Caching-Infrastrukturen an einem Beispiel aus der Praxis.
Building a Real-Time Gaming Analytics Service with Apache DruidImply
At GameAnalytics we receive and process real time behavioural data from more than 100 million daily active users, helping thousands of game studios and developers understand user behaviour and improve their games. In this talk, you will learn how we managed to migrate our legacy backend system from using an in-house built streaming analytics service to Apache Druid, and the lessons learned along the way. By adopting Druid, we have been able to reduce development costs, increase reliability of our systems and implement new features that would have not been possible with our old stack. We will provide an overview of our approach to schema design, segments optimization, creation of our query layer, caching and datasources optimisation, which can help you better understand how you can successfully use Druid as a key component on your data processing and reporting infrastructure.
Matt Sarrel of Imply draws on his work benchmarking Apache Druid with the Star Schema Benchmark (SSB) and shows how you can performance test Druid with your workload. Virtual meetup of July 16, 2020.
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.
This is the slide collection from the second talk from:
https://www.meetup.com/druidio-la/events/254080924/
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
Innovative companies are building Internet of Things, mobile, content management, single view, and big data apps on top of MongoDB. In this session, we'll explore how the IBM POWER8 platform brings new levels of performance and ease of configuration to these solutions which already benefit from easier and faster design and development using MongoDB.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Apache Druid®: A Dance of Distributed ProcessesImply
Apache Druid® is an open source analytics database powering fresh, fast analytics in companies from AirBnB to Zeotap on clickstream, telemetry, financial transactions, applications and more. In this talk, we open the box on the three distributed processes in Druid led by the coordinator, overlord, and broker, and the ways that these come together to deliver reliable, performant query, ingestion, and management services.
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...MongoDB
Speaker: Joseph Fluckiger, Senior Software Architect, ThermoFisher Scientific
Level: 200 (Intermediate)
Track: Atlas
Mass spectrometry is the gold standard for determining chemical compositions, with spectrometers often measuring the mass of a compound down to a single electron. This level of granularity produces an enormous amount of hierarchical data that doesn't fit well into rows and columns. In this talk, learn how Thermo Fisher is using MongoDB Atlas on AWS to allow their users to get near real-time insights from mass spectrometry experiments – a process that used to take days. We also share how the underlying database service used by Thermo Fisher was built on AWS.
What You Will Learn:
- How we modeled mass spectrometry data to enable us to write and read an enormous about of experimental data efficiently.
- Learn about the best MongoDB tools and patterns for .NET applications.
- Live demo of scaling a MongoDB Atlas cluster with zero down time and visualizing live data from a million dollar Mass Spectrometer stored in MongoDB.
MongoDB 2.6 is the biggest MongoDB release ever. In this presentation you are going to explore which features, improvements and capabilities were added to the latest version and how you can smoothly upgrade your deployments.
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
MongoDB is a leading nosql database. It is horizonatally scalable, document datastore. In this introduction given at Dr Dobbs Conference, Bangalore and Pune in April 2014, I show schema design with an example blog application and Python code snippets. I delivered the same in the maiden MongoDB Evening event at Delhi and Gurgaon in May 2014.
When constructing a data model for your MongoDB collection for CMS, there are various options you can choose from, each of which has its strengths and weaknesses. The three basic patterns are:
1.Store each comment in its own document.
2.Embed all comments in the “parent” document.
3.A hybrid design, stores comments separately from the “parent,” but aggregates comments into a small number of documents, where each contains many comments.
Code sample and wiki documentation is available on https://github.com/prasoonk/mycms_mongodb/wiki.
MoPub, a Twitter company, provides monetization solutions for mobile app publishers and developers around the globe. MoPub receives over 33 Billion ad requests per day generating over 200TB of raw logs every day. We built MoPub Analytics as the analytics platform, using Druid + Imply for our end users who are Publishers, Demand side partners and Internal users.
We will talk about the architecture of the analytics platform, our Druid cluster setup, hardware choices, monitoring, use cases, limiting factors, challenges with lookups and solutions we used.
Watch video:https://imply.io/virtual-druid-summit/analytics-over-terabytes-of-data-at-twitter-apache-druid
Apache Druid ingests and enables instant query on many billions of events in real-time. But how? In this talk, each of the components of an Apache Druid cluster is described – along with the data and query optimisations at its core – that unlock fresh, fast data for all.
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
Ensuring a consistently great Netflix experience while continuously pushing innovative technology updates is no easy feat.
We'll look at how Netflix turns log streams into real-time metrics to provide visibility into how devices are performing in the field. Including sharing some of the lessons learned around optimizing Druid to handle our load.
A Presentation on MongoDB Introduction - HabilelabsHabilelabs
It is Scalable High-Performance Open-source, Document-orientated database.
Built for Speed - the performance of traditional key-value stores while maintaining functionality of traditional RDBMS.
ScyllaDB recently announced Project Alternator, a new open source project that will enable Amazon DynamoDB users to easily migrate to an open-source database that runs anywhere — on most cloud platforms, on-premises, on bare-metal, virtual machines or via Kubernetes — all while preserving their investments in their existing application code.
Project Alternator will help DynamoDB users achieve much better and more reliable performance, reduce database costs by 80% - 90%, support large items (10s of MBs) and large partitions (multiple GBs), control the number of replicas, balance cost vs. redundancy, and much more.
Join ScyllaDB founders Avi Kivity and Dor Laor and lead engineer Nadav Har’El for a live webinar on September 25th, where they will share an overview of Project Alternator, including:
Alternator’s design implementation and goals
How to configure Alternator (ok, add alternator_port: 8000 to your scylla.yaml)
Demo how to easily run it from docker/rpm
Run several examples:
Tic-tac-toe based DynamoDB example with Alternator
How to benchmark Scylla Alternator with YCSB and considerations around it
How to run a serverless application along with Alternator
How to migrate DynamoDB data to Alternator using the Spark migrator
Discuss the current limitations of Alternator
Plus we will discuss current limitations of Alternator, describe different consistencies and active-active vs leader model, share the project roadmap, and answer your questions at the end.
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Felix Gessert
In this talk we share the lessons learned while building out the Baqend Cloud platform on AWS and Docker. Baqend’s AWS-hosted architecture consists of a caching CDN-Layer, global and local load balancing, a group of REST and Node.js servers and a database cluster with Redis and MongoDB. As customers have their own set of containerized REST and Node servers, we needed a cluster that on the one hand is horizontally scalable and on the other hand easily manageable and fault-tolerant from an operational perspective. Today there are at least 4 popular systems that claim to support this:
- Kubernetes
- Apache Mesos
- Docker Swarm
- AWS Elastic Container Service (ECS)
Thinking that ECS would certainly be the easiest option on AWS, we started building our cluster on it. We quickly came to realize that while ECS was astoundingly stable and easy to use there were inherent limitations that could not be worked around. An old Docker version, missing network isolation, no means of parameterizing task and forced memory constraints are major limitations of ECS we will talk about. Seeing the daunting operational overhead of running Kubernetes or Mesos in practice we turned to Docker’s native clustering solution Swarm. We will present how Swarm works with both Docker and AWS and highlight the advantages and downsides compared to Amazon’s ECS.
This talk demonstrates how to develop single page apps with the new angular2 framework and TypeScript. We show the new concepts of angular2 not only in theory, but using a real application. To this end, we develop a real-time angular2 website, for users to to ask and upvote questions during a talk identified by a hash tag. The session chair can ask the most popular questions at the end of the talk.
Dieser Vortrag zeigt, wie man mit dem neuen Angular2 Framework und TypeScript schnelle Single Page Apps entwickelt. Die neuen Konzepte von Angular2 zeigen wir dabei nicht nur in der Theorie, sondern ganz praktisch. Dazu entwickeln wir live eine Real-Time Angular2 App, mit der Zuhörer während eines Vortrags – identifiziert durch einen Hash-Tag - Fragen stellen und gegenseitig upvoten können. Der Session Chair kann so am Ende eines Vortrags die bestbewertesten Fragen an den Speaker stellen.
1- Introduction about Database Mirroring Concept
2- Reference (8 Blogs )
3- Note
4- Database mirroring operation mode
5- Database Mirroring Requirement
6- Advantage of Database Mirroring
7- Disadvantage of Database Mirroring
8- Database Mirroring Enhancement in SQL Server 2008
9- Database Mirroring Installation Step by Step
10- High Availability Mode [Automatic Failover]
11- High Availability Mode [Manual Failover]
12- High Safety Mode Without witness server [Manual Failover]
13- Stander listener port in database mirroring
14- Check SQL server mirroring availability
15- Add or replace witness server to an existing mirroring database
16- How to monitor Database Mirroring
17- Mirroring in workshop not in DC (Domain Controller)
Organizations looking to the cloud now have more vendor offerings and architecture choices available to them than ever before. In order to correctly select and implement the most appropriate cloud based DBMS architecture for their shops, technology pros must create and execute a well-thought out, detailed analysis of the competing offerings.
In addition, they must consider the impact cloud based DBMS systems, like any new architecture, will have on their support environment. Changes to policies and procedures, security controls, staff roles and responsibilities, change management processes and support documentation must be evaluated.
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Amazon Web Services
(Presented by Intel) This is the best of times and the worst of times for cloud services developers. At no other time in history has open access to data, open interfaces to data analytics, and open licensing of source code come together with scalable, cost-effective, cloud infrastructures. This is the good news.
The bad news is that enterprises are being left behind. Stymied by concerns of data protection and data governance, enterprises need proof that the services and solutions built on a cloud infrastructure comply with policies and practices they’ve come to learn (not necessarily love). At its heart is the root of trust issue – how far down can I trust the cloud service, its infrastructure software, and the data that it analyzes? And how do I know my keys are safe? Join this session to learn how Intel has been enabling trusted analytics with cloud services secured top to bottom – from Apache Hadoop to Java, Xen, and Linux – without compromising security.
reliability based design optimization for cloud migrationNishmitha B
reliability based design optimization for cloud migration is an application designed to manage applications..more precisely legacy applications..whose extraction n magmt. is crucial n troublesome.
Innovation with Open Source: The New South Wales Judicial Commission experienceLinuxmalaysia Malaysia
Innovation with Open Source: The New South Wales Judicial Commission experience. MyGOSSCON 2008. Mr. Murali Sagi
Director,
Information Management & Corporate Services,
JUDICIAL COMMISSION OF NSW, SYDNEY, AUSTRALIA
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Clustrix
Watch the recording here: https://www.youtube.com/watch?v=ZwERp38ynxQ&feature=youtu.be
In this webinar, Robbie Mihayli, VP of Engineering at Clustrix explores how to set up a SQL RDBMS architecture that scales out and is both elastic and consistent, while simultaneously delivering fault tolerance and ACID compliance.
He also covers how data gets distributed in this architecture, how the query processor works, how rebalancing happens and other architectural elements. Examples cited include cloud deployments and e-commerce use-cases.
In this webinar, you will learn:
1. Five RDBMS scaling strategies along with their trade offs
2. The importance of having no single point of failure for OLTP (fault tolerance)
3. The vagaries of the cloud and how it impacts using an RDBMS in the cloud
Who should watch?
1. People interested in high performance, real-time database solutions
2. Companies who have MySQL in their infrastructure and are concerned that their growth will soon overwhelm MySQL’s single-box design
3. DBA’s who implement ‘read slaves’, ‘multiple-masters’ and ‘sharding’ for MySQL databases and want to learn about better ways to scale
Understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
Relational databases are used extensively in many applications and systems, but they are not always the best data store solution to the problem at hand. In this session we discuss the limitations of RDBMS and show which NoSQL solutions can be used to overcome these limitations. We also cover migration topics, such as how to add NoSQL databases without adding complexity to your development and operations.
Microsoft Data Platform - What's includedJames Serra
The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it. My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.
Join AWS at this session to understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
Speakers:
Andreas Chatzakis, AWS Solutions Architect
Pete Mounce, Senior Developer, JustEat
Scaling the Platform for Your Startup - Startup Talks June 2015Amazon Web Services
Join AWS at this session to understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
Storage options for Analytics are not one size fits all. To deliver the best solution, you need to understand the use case, performance requirements, and users of the system. This session will break down the options you have in Azure to build a data analytics ecosystem, and explain why everyone's talking about data lakes and where's best to build your data warehouse.
Samedi SQL Québec - La plateforme data de AzureMSDEVMTL
6 juin 2015
Samedi SQL à Québec
Session 3 - Data (SQL Azure, Table et Blob Storage) (Eric Moreau)
SQL Azure est une base de données relationnelle en tant que service, Azure Storage permet de stocker et d'extraire de gros volumes de données non structurées (par exemple, des documents et fichiers multimédias) avec les objets blob Azure ; de données NoSql structurées avec les tables Azure ; de messages fiables avec les files d'attente Azure.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Enterprises have been using both Big Data and Cloud Computing technologies for years. Until recently, the two have not been combined. Now the agility and efficiency benefits of self-service elastic infrastructure are being extended to Big Data initiatives – whether on-premises or in the public cloud.
This session at Hadoop Summit in San Jose, California (June 2016) discusses the emerging category of Big-Data-as-a-Service (BDaaS) - representing the intersection of Big Data and Cloud Computing.
In this session, Kris Applegate (Cloud and Big Data Solution Architect at Dell) and Thomas Phelan (Co-Founder and Chief Architect at BlueData) outlined the following:
- Innovations that paved the way for Big-Data-as-a-Service
- Definition and categories of Big-Data-as-a-Service
- Key considerations for Big-Data-as-a-Service in the enterprise, including public cloud or on-premises deployment options
A video replay can also be found here: https://youtu.be/_ucPoTKuj8Q
Similar to Cloud Databases in Research and Practice (20)
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
2. About me
PhD student (database group university of hamburg)
3. About me
PhD student (database group university of hamburg)
Research Project for PhD
4. About me
PhD student (database group university of hamburg)
Research Project for PhD
Cloud Database Startup
5. Outline
• Categories of Cloud
Databases
• Properties
What are Cloud
Databases?
Cloud Databases in
the wild
Research Perspectives
Wrap-up and
literature
42. DBaaS: Common Aspects
#1 Metric: Total Cost
Daniela Florescu and Donald Kossmann “Rethinking cost and
performance of database systems”, SIGMOD Rec. 2009.
43. DBaaS: Common Aspects
#1 Metric: Total Cost
Maximum utilization of
available hardware:
Multi-Tenancy
Daniela Florescu and Donald Kossmann “Rethinking cost and
performance of database systems”, SIGMOD Rec. 2009.
44. DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS Private Process/DB Private Schema Shared Schema
45. DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema Shared Schema
e.g. Amazon RDS
46. DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
e.g. Amazon RDS e.g. MongoHQ
47. DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore
48. DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
VM
Hardware Resources
Database Process
Database
Schema
Virtual Schema
e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore Most SaaS Apps
49. DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Private OS
Private
Process/DB
Private Schema
Shared Schema
App.
indep.
Isolation
Ressource
Util.
Maintenance,
Provisioning
51. Billing Models:
DBaaS: Common Aspects
Usage
Account
Pay-per-use
Parameters: Network, Bandwidth,
Storage, CPU, Requests, etc.
Payment: Pre-Paid, Post-Paid
Variants: On-Demand, Auction, Reserved
e.g. DynamoDB
52. Billing Models:
DBaaS: Common Aspects
Usage
Account
End of
month
Plan-based
Parameters: Allocated Plan (e.g.
2 instances + X GB storage)
e.g. MongoHQ
53. Billing Models:
DBaaS: Common Aspects
Usage
Account
End of
month
Plan-based
Parameters: Allocated Plan (e.g.
2 instances + X GB storage)
Free Tier: free plan or free initial
account credit
e.g. MongoHQ
56. DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
57. DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
58. DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
59. DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
60. DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access
Control
Role-based Access
Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
61. DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access
Control
Role-based Access
Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
62. DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access
Control
Role-based Access
Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
63. DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access
Control
Role-based Access
Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
Federated ACLs
M. Decat, B. Lagaisse, et al. “Toward efficient and confidentiality-aware
federation of access control policies “, DOA Trusted Cloud 2013
• Imagine ACL: „Patient data can only be
accessed by treating physician“
• Idea: decompose policy and evaluate parts near
data owner
65. Service Level Agreements
DBaaS: Common Aspects
SLA
Legal Part
1. Fees
2. Penalties
Technical Part
1. SLO
2. SLO
3. SLO
66. Service Level Agreements
DBaaS: Common Aspects
SLA
Legal Part
1. Fees
2. Penalties
Technical Part
1. SLO
2. SLO
3. SLO
Service Level Objectives:
• Availability
• Durability
• Consistency/Staleness
• Query Response Time
67. SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
68. SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
69. SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
70. SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
71. SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Maximize:
72. SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
73. SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
QOS for NoSQL DBs
Y. Zhu et al. “Scheduling with Freshness and Performance Guarantees for
Web Applications in the Cloud“, CRPIT
Old:
Workload management in RDBMs (DB2 and
Oracle)
New:
Use well-known scheduling algorithms for queries
in replicated DBs
74. Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
75. Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Expected
Load
76. Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Expected
Load
Provisioned Resources:
• #No of Shard- or Replica
servers
• Computing, Storage,
Network Capacities
77. Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual
Load
78. Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual
Load
Overprovisioning:
• SLAs met
• Excess Capacities
79. Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual
Load
Overprovisioning:
• SLAs met
• Excess Capacities
Underprovisioning:
• SLAs violated
• Usage maximized
80. Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual
Load
Overprovisioning:
• SLAs met
• Excess Capacities
Underprovisioning:
• SLAs violated
• Usage maximized
SmartSLA
P. Xiong: “Intelligent management of virtualized resources for database
systems in cloud environment”, ICDE 2011
Solution: machine learning (regression + boosting)
for prediction choose allocation that minimizes
SLA penalties
Resource
allocation
Database
performance
Learn
Mapping
81. Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
82. Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
83. Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
84. Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
aaS
85. Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
aaS
Questions to ask:
• Which requirements are met by the DB?
• Which are met by the provider? SLAs
86. Outline
Examples of different cloud
database systems:
• Cloud-deployed
• Managed DBMS
• SQL
• NoSQL
• Proprietary
• BaaS
What are Cloud
Databases?
Cloud Databases in
the wild
Research Perspectives
Wrap-up and
literature
87. Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
88. Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
1. Provision VM(s)
89. Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
1. Provision VM(s) 2. Install DBMS (manual, script,
Chef, Puppet)
90. Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
> whirr launch-cluster --config
hbase.properties
Login, cluster-size etc. Amazon EC2
1. Provision VM(s) 2. Install DBMS (manual, script,
Chef, Puppet)
91. Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
> whirr launch-cluster --config
hbase.properties
Login, cluster-size etc. Amazon EC2
1. Provision VM(s) 2. Install DBMS (manual, script,
Chef, Puppet)
92. Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
93. Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
94. Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
95. Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
96. Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
97. Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
98. Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
99. Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Bad:
• No Clusters
• Not managed (automatic Updates, Snapshots, etc.)
• Private OS Multi-Tenancy Bad Resource Usage
Good:
• Easy to get started
100. Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
101. Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
102. Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Job Tracker
Task Tracker +
HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
103. Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Data Source
and Sink
Job Tracker
Task Tracker +
HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
104. Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Submits Hadoop Jobs as:
• JAR
• Streaming
• Cascading
• Pig
• Hive
• Impala
Data Source
and Sink
Job Tracker
Task Tracker +
HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
105. Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Submits Hadoop Jobs as:
• JAR
• Streaming
• Cascading
• Pig
• Hive
• Impala
Data Source
and Sink
Job Tracker
Task Tracker +
HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
• No data locality with S3
• AWS Import/Export: send your HDD
• HBase Integration
• Compatible with Spot and Reserved Instances
• Similar: Azure HDInsight
106. Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
Google
BigQuery
107. Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
Google
BigQuery
108. Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
Google
BigQuery
Dremel
Melnik et al. “Dremel: Interactive analysis
of web-scale datasets”, VLDB 2010
Idea:
Multi-Level execution tree on
nested columnar data format
(≥100 nodes)
109. Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
Google
BigQuery
Dremel
Melnik et al. “Dremel: Interactive analysis
of web-scale datasets”, VLDB 2010
Idea:
Multi-Level execution tree on
nested columnar data format
(≥100 nodes)
• SLA: 99.9% uptime / month
• Fundamentally different from relational DWHs
and MapReduce
• Design copied by Apache Drill, Impala, Shark
119. Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Backups are automated and
scheduled
120. Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Backups are automated and
scheduled
• Support for (asynchronous) Read Replicas
• Administration: Web-based or SDKs
• Only RDBMSs
• “Analytic Brother“ of RDS: RedShift (PDWH)
121. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
122. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
123. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
124. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
125. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
126. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
127. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular
database
Keyed Table Group: partitioned
by row key
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
128. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular
database
Keyed Table Group: partitioned
by row key
Consistency unit
(ACID boundary)
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
129. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular
database
Keyed Table Group: partitioned
by row key
Consistency unit
(ACID boundary)
Automatic Partitioning
for Keyed Table Groups
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
130. Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular
database
Keyed Table Group: partitioned
by row key
Consistency unit
(ACID boundary)
Automatic Partitioning
for Keyed Table Groups
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
• SLA: 99.9% uptime / month
• Usually Cheaper than RDS (Multi-Tenancy)
• Smaller Databases (max. 150 GB)
• Rich MSSQL server tooling
• Keyed Table Group internal feature only
131. MySQL for Google App Engine PaaS
Support for: patching, replication, backup
SLA: 99,95 % uptime / month
Other RDBMS services
Google
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
132. MySQL for Google App Engine PaaS
Support for: patching, replication, backup
SLA: 99,95 % uptime / month
Postgres for Heroku PaaS
Hosted on EC2
No SLAs
Other RDBMS services
Google
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
133. MySQL for Google App Engine PaaS
Support for: patching, replication, backup
SLA: 99,95 % uptime / month
Postgres for Heroku PaaS
Hosted on EC2
No SLAs
MySQL for OpenStack (Icehouse)
Under development (HP driven)
VM ⇔ Tenant
Other RDBMS services
Google
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
Trove
Pricing:
Own Hardware
Underlying DB:
MySQL
Trove
134. MySQL for Google App Engine PaaS
Support for: patching, replication, backup
SLA: 99,95 % uptime / month
Postgres for Heroku PaaS
Hosted on EC2
No SLAs
MySQL for OpenStack (Icehouse)
Under development (HP driven)
VM ⇔ Tenant
Other RDBMS services
Google
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
Trove
Pricing:
Own Hardware
Underlying DB:
MySQL
Trove
Evaluation of Cloud RDBMSs
D. Kossmann,T. Kraska: An evaluation of alternative architectures for transaction processing in the cloud”, Sigmod 2010
TPC-W Benchmark (Online Shop), 2010, Emulated Browsers / RPS:
135. HBase Wide-
Column
CP Over
Row Key
~700 1/4 Apache
(EMR)
MongoDB Doc-
ument
CP yes >100
<500
4/4 GPL
Riak Key-
Value
AP ~60 3/4 Apache
(Softlayer)
Cassandra Wide-
Column
AP With
Comp.
Index
>300
<1000
2/4 Apache
Redis Key-
Value
CA Through
Lists,
etc.
manual N/A 4/4 BSD
Managed NoSQL services
Model CAP Scans
Sec.
Indices
Largest
Cluster
Lic.
Lear-
ning DBaaS
136. HBase Wide-
Column
CP Over
Row Key
~700 1/4 Apache
(EMR)
MongoDB Doc-
ument
CP yes >100
<500
4/4 GPL
Riak Key-
Value
AP ~60 3/4 Apache
(Softlayer)
Cassandra Wide-
Column
AP With
Comp.
Index
>300
<1000
2/4 Apache
Redis Key-
Value
CA Through
Lists,
etc.
manual N/A 4/4 BSD
Managed NoSQL services
Model CAP Scans
Sec.
Indices
Largest
Cluster
Lic.
Lear-
ning DBaaS
And there are many more:
• CouchDB (e.g. Cloudant)
• CouchBase (e.g. KuroBase Beta)
• ElasticSearch(e.g. Bonsai)
• Solr (e.g. WebSolr)
• …
160. Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Use Connection URL (environment variable):
Deploy:
• Very simple
• Only suited for small to medium
applications (no SLAs, limited control)
161. SimpleDB Table-
Store
CP Yes (as
queries)
Auto-
matic
SQL-like
(no joins,
groups, …)
REST +
SDKs
Dynamo-
DB
Table-
Store
CP By range
key /
index
Local Sec.
Global
Sec.
Key+Cond.
On Range
Key(s)
REST +
SDKs
Automatic
over Prim.
Key
Azure
Tables
Table-
Store
CP By range
key
Key+Cond.
On Range
Key
REST +
SDKs
Automatic
over Part.
Key
99.9%
uptime
AE/Cloud
DataStore
Entity-
Group
CP Yes (as
queries)
Auto-
matic
Conjunct.
of Eq.
Predicates
REST/
SDK,
JDO,JPA
Automatic
over Entity
Groups
S3, Az.
Blob, GCS
Blob-
Store
AP REST +
SDKs
Automatic
over key
99.9%
uptime
(S3)
Proprietary Database services
Model CAP Scans
Sec.
Indices
Queries API SLA
Scale-
out
162. SimpleDB Table-
Store
CP Yes (as
queries)
Auto-
matic
SQL-like
(no joins,
groups, …)
REST +
SDKs
Dynamo-
DB
Table-
Store
CP By range
key /
index
Local Sec.
Global
Sec.
Key+Cond.
On Range
Key(s)
REST +
SDKs
Automatic
over Prim.
Key
Azure
Tables
Table-
Store
CP By range
key
Key+Cond.
On Range
Key
REST +
SDKs
Automatic
over Part.
Key
99.9%
uptime
AE/Cloud
DataStore
Entity-
Group
CP Yes (as
queries)
Auto-
matic
Conjunct.
of Eq.
Predicates
REST/
SDK,
JDO,JPA
Automatic
over Entity
Groups
S3, Az.
Blob, GCS
Blob-
Store
AP REST +
SDKs
Automatic
over key
99.9%
uptime
(S3)
Proprietary Database services
Model CAP Scans
Sec.
Indices
Queries API SLA
Scale-
out
There are many more object stores (HP, Rackspace,
etc.)
…but no comparable Table Stores
166.
Table Service example: Azure Tables
Partition
Key
Row Key
(sortiert)
Timestamp
(autom.)
Property1 Propertyn
intro.pdf v1.1 14/6/2013 … …
intro.pdf v1.2 15/6/2013 …
präs.pptx v0.0 11/6/2013 …
Partition
Partition
RESTAPI
SparseHash-distributed to
parition servers
No Index: Lookup only (!) by full table scan
Atomic "Entity-
Group Batch
Transaction" possible
167. Similar to Amazon SimpleDB and DynamoDB
Table Service example: Azure Tables
Partition
Key
Row Key
(sortiert)
Timestamp
(autom.)
Property1 Propertyn
intro.pdf v1.1 14/6/2013 … …
intro.pdf v1.2 15/6/2013 …
präs.pptx v0.0 11/6/2013 …
Partition
Partition
RESTAPI
• Indexes all attributes
• Rich(er) queries
• Many Limits (size, RPS, etc.)
• Provisioned Throughput
• On SSDs („single digit latency“)
• Optional Indexes
168. Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling)
Very “basic“ queries
Azure Tables
169. Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling)
Very “basic“ queries
Azure Tables
170. Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling)
Very “basic“ queries
Good:
Automatic Distribution
Replicated 3x locally + 1x async. geo-replica
Azure Tables
171. Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling)
Very “basic“ queries
Good:
Automatic Distribution
Replicated 3x locally + 1x async. geo-replica
Very good:
SLA (99.9% uptime)
Internal architecture published
Azure Tables
172. Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling)
Very “basic“ queries
Good:
Automatic Distribution
Replicated 3x locally + 1x async. geo-replica
Very good:
SLA (99.9% uptime)
Internal architecture published
Azure Tables
Windows Azure Storage
B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011
Idea:
• Layered storage infrastructure for Blobs and Tables
• Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)
173. Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling)
Very “basic“ queries
Good:
Automatic Distribution
Replicated 3x locally + 1x async. geo-replica
Very good:
SLA (99.9% uptime)
Internal architecture published
Azure Tables
Windows Azure Storage
B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011
Idea:
• Layered storage infrastructure for Blobs and Tables
• Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)
Think: GFS/HDFS
Think: BigTable/HBase
Think: Chubby/ZooKeeper
176. DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
dom:com.cnn content : "<html>…"
Primary Key Attribute
(scalar or set)
Querying Options:
• GetItem: Key Lookup
• Query: Primary Key + Condition on Range Key
• Scan: Full Table Scan with filter
• EMR: Hive queries (for analytics)
page:index
Range Key
Item:
177. DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
Consistency:
Strongly (2x price) or Eventually Consistent Reads
Atomic (Conditional) Updates per Item
Indexing Options:
◦ Local Sec. Index: consistent additional Range Key
◦ Global Sec. Index: eventually consistent index-table (Primary Key)
dom:com.cnn content : "<html>…"
Primary Key Attribute
(scalar or set)
page:index
Range Key
Item:
183. DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
Unit of Billing
Good:
• Low Latency (SSD)
• Data partitioning and AZ-replication
Bad:
• Scaling not elastic (Capacity Units)
• No SLAs, no internals published (≠ Dynamo!)
• No built-in backups ( AWS data pipeline)
• Vendor Lock-in
187. AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:
◦ Megastore BigTable Colossus
Schemafree Entity Group (EG) data model:
User
ID
Name
Photo
ID
User
URL
Root Table Child Table
1
n
188. AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:
◦ Megastore BigTable Colossus
Schemafree Entity Group (EG) data model:
User
ID
Name
Photo
ID
User
URL
Root Table Child Table
1
n
EG: User + n Photos
• Unit of ACID transactions/
consistency
• Fields autoindexed
(eventually consistent)
189. AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:
◦ Megastore BigTable Colossus
Schemafree Entity Group (EG) data model:
User
ID
Name
Photo
ID
User
URL
Root Table Child Table
1
n
EG: User + n Photos
• Unit of ACID transactions/
consistency
• Fields autoindexed
(eventually consistent)
SELECT * FROM photos
WHERE ANCESTOR IS :34 AND name = „sunset“
ORDER BY date ASC
LIMIT 10
OFFSET 10
190. AE/Cloud DataStore
Internally:
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
191. AE/Cloud DataStore
Internally:
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
192. AE/Cloud DataStore
Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable,
Highly Available Storage for Interactive Services."
CIDR 2011.
• Paxos-based replication and
transactions
• 100 Google applications
Problems: Slow Writes, Predefined
Entity Groups
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
193. AE/Cloud DataStore
Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable,
Highly Available Storage for Interactive Services."
CIDR 2011.
• Paxos-based replication and
transactions
• 100 Google applications
Problems: Slow Writes, Predefined
Entity Groups
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally
distributed database." TOCS 2013
Idea:
• Autosharded Entity Groups
• Not based on BigTable
Implementation:
• TrueTime API (GPS + atomic
clocks) commit timestamps
of 2PL-SI transactions
• Paxos-replication per Shard
194. AE/Cloud DataStore
Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable,
Highly Available Storage for Interactive Services."
CIDR 2011.
• Paxos-based replication and
transactions
• 100 Google applications
Problems: Slow Writes, Predefined
Entity Groups
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally
distributed database." TOCS 2013
Idea:
• Autosharded Entity Groups
• Not based on BigTable
Implementation:
• TrueTime API (GPS + atomic
clocks) commit timestamps
of 2PL-SI transactions
• Paxos-replication per Shard
F1
J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB
2013
Idea:
• Full SQL relational database built on
Spanner, powers AdWords
Implementation:
• 5-way replication
• Data Model: relational + hierarchy
(customercampaignAdGroup)
• Transactions: Snapshot-Read-Only,
Pessimistic (Spanner), optimistic
• Distributed SQL Engine
195. AE/Cloud DataStore
Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable,
Highly Available Storage for Interactive Services."
CIDR 2011.
• Paxos-based replication and
transactions
• 100 Google applications
Problems: Slow Writes, Predefined
Entity Groups
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally
distributed database." TOCS 2013
Idea:
• Autosharded Entity Groups
• Not based on BigTable
Implementation:
• TrueTime API (GPS + atomic
clocks) commit timestamps
of 2PL-SI transactions
• Paxos-replication per Shard
F1
J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB
2013
Idea:
• Full SQL relational database built on
Spanner, powers AdWords
Implementation:
• 5-way replication
• Data Model: relational + hierarchy
(customercampaignAdGroup)
• Transactions: Snapshot-Read-Only,
Pessimistic (Spanner), optimistic
• Distributed SQL Engine
Good:
• Transactions (though limited)
• Good scalability of data volume
Bad:
• Entity Groups hard to define
• Bad Scale-Out for write and reads (Kossmann et
al.) no advantage over RDBMS for small data
volumes
A Spanner/F1 based DBaaS?
197. Automatic Versioning
Pricing: per request (10.000 ~ 1c) and storage
(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud
Storage
S3 Blob
DELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replica
Glacier: tape-disk archivalAmazon S3:
198. Automatic Versioning
Pricing: per request (10.000 ~ 1c) and storage
(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud
Storage
S3 Blob
DELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replica
Glacier: tape-disk archivalAmazon S3:
S3 Consistency
D. Bermbach,, S. Tai. "Eventual consistency:
How soon is eventual?”, MW4SOC 11
Findings:
• Inconsistency window varies
from 2-11 seconds
• Monotonic Read Consistency
is often violated
199. Automatic Versioning
Pricing: per request (10.000 ~ 1c) and storage
(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud
Storage
S3 Blob
DELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replica
Glacier: tape-disk archivalAmazon S3:
S3 Consistency
D. Bermbach,, S. Tai. "Eventual consistency:
How soon is eventual?”, MW4SOC 11
Findings:
• Inconsistency window varies
from 2-11 seconds
• Monotonic Read Consistency
is often violated
Building a database on S3
M. Brantner, et al. "Building a database on S3." Sigmod 2008
Idea:
• Use S3 as the persistent storage of a
database
Implementation:
• Buffer Pool and Log Manager on S3
• No transaction or query support
200. Founded June, 2011
Acquired by Facebook April, 2013
Pricing:
◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
201. Founded June, 2011
Acquired by Facebook April, 2013
Pricing:
◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Parse Core
202. Founded June, 2011
Acquired by Facebook April, 2013
Pricing:
◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Parse Core Parse Analytics
207. Authentication
User + Password
OAuth: Facebook, Twitter
Parse - MBaaS
Query Cinemas
(new Parse.Query('Cinemas'))
.withinKilometers(...)
.fetch()
Query Movies
(new Parse.Query('Movies'))
.greaterThan('startAt‘, now)
.notEqualTo('cinemas', cId)
.fetch()
208. Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
209. Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
210. Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
211. Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
212. Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
$ meteor deploy abc.meteor.com
213. Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
$ meteor deploy abc.meteor.com
Very productive for very small projects
Fundamentally limited scalability:
• Server tails Mongo‘s oplog
• And holds the fetched data state of
every client
214. Outline
• Hot Topics
• Orestes: a scalable, low-
latency architecture
• Baqend: putting it into
practice
What are Cloud
Databases?
Cloud Databases in
the wild
Research Perspectives
Wrap-up and
literature
215.
216. Example: CryptDB
Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
217. Example: CryptDB
Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
218. Example: CryptDB
Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
Relational Cloud
C. Curino, et al. "Relational cloud: A database-as-a-service
for the cloud.“, CIDR 2011
DBaaS Architecture:
• Encrypted with CryptDB
• Multi-Tenancy through live
migration
• Workload-aware partitioning
(graph-based)
219. Example: CryptDB
Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
Relational Cloud
C. Curino, et al. "Relational cloud: A database-as-a-service
for the cloud.“, CIDR 2011
DBaaS Architecture:
• Encrypted with CryptDB
• Multi-Tenancy through live
migration
• Workload-aware partitioning
(graph-based)
• Early approach
• Not adopted in practice, yet
Dream solution:
Full Homorphic Encryption
220. Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional Unit
Commit
Latency
Data
Loss?
221. Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional Unit
Commit
Latency
Data
Loss?
Multi-Data Center Consistency
T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013.
Idea:
• Multi-Data center commit protocol with
single round-trip
Implementation:
• Optimistic Commit Protocol
• Fast, Generalized Multi-Paxos
Result: almost as fast as Dynamo-style
222. Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional Unit
Commit
Latency
Data
Loss?
Multi-Data Center Consistency
T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013.
Idea:
• Multi-Data center commit protocol with
single round-trip
Implementation:
• Optimistic Commit Protocol
• Fast, Generalized Multi-Paxos
Result: almost as fast as Dynamo-style
Currently no NoSQL DB implements
consistent Multi-DC replication
223. YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Data Store
224. YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
WorkloadGenerator
PluggableDBinterface
Data Store
Threads
Stats
225. YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
WorkloadGenerator
PluggableDBinterface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Data Store
Threads
Stats
226. YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
WorkloadGenerator
PluggableDBinterface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Read()
Insert()
Update()
Delete()
Scan()
Data Store
Threads
Stats
DB protocol
227. YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
WorkloadGenerator
PluggableDBinterface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Read()
Insert()
Update()
Delete()
Scan()
Data Store
Threads
Stats
DB protocol
Workload Operation Mix Distribution Example
A – Update Heavy Read: 50%
Update: 50%
Zipfian Session Store
B – Read Heavy Read: 95%
Update: 5%
Zipfian Photo Tagging
C – Read Only Read: 100% Zipfian User Profile Cache
D – Read Latest Read: 95%
Insert: 5%
Latest User Status Updates
E – Short Ranges Scan: 95%
Insert: 5%
Zipfian/
Uniform
Threaded Conversations
229. Example Result
(Read Heavy):
Benchmarking: Research
Weaknesses:
• Single client can be a
bottleneck
• No consistency &
availability measurement
230. Example Result
(Read Heavy):
Benchmarking: Research
YCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and
performance debugging advanced features in scalable
table stores“, SOCC 2011
• Clients coordinate through
Zookeeper
• Simple Read-After-Write Checks
• Evaluation: Hbase & Accumulo
Weaknesses:
• Single client can be a
bottleneck
• No consistency &
availability measurement
231. Example Result
(Read Heavy):
Benchmarking: Research
YCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and
performance debugging advanced features in scalable
table stores“, SOCC 2011
• Clients coordinate through
Zookeeper
• Simple Read-After-Write Checks
• Evaluation: Hbase & Accumulo
Weaknesses:
• Single client can be a
bottleneck
• No consistency &
availability measurement
• No Transaction Support
YCSB+T
A. Dey et al. “YCSB+T: Benchmarking Web-Scale
Transactional Databases”, CloudDB 2014
• New workload: Transactional
Bank Account
• Simple anomaly detection for
Lost Updates
• No comparison of systems
232. Example Result
(Read Heavy):
Benchmarking: Research
YCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and
performance debugging advanced features in scalable
table stores“, SOCC 2011
• Clients coordinate through
Zookeeper
• Simple Read-After-Write Checks
• Evaluation: Hbase & Accumulo
Weaknesses:
• Single client can be a
bottleneck
• No consistency &
availability measurement
• No Transaction Support
YCSB+T
A. Dey et al. “YCSB+T: Benchmarking Web-Scale
Transactional Databases”, CloudDB 2014
• New workload: Transactional
Bank Account
• Simple anomaly detection for
Lost Updates
• No comparison of systems
No specific application
CloudStone, CARE, TPC
extensions?
239. Vision
YCSB Harmony
System Availablity Reads/s Writes/s Avg. Latency 95th perc.
Latency
Plot
DynamoDB 99.9% 23411 34534 3.2 ms 9 ms
RDS 99.8% 2342 2455 30 ms 80 ms
Azure Table 99.5% 22343 23442 12 ms 20 ms
Google
DataStore
99.5% 3000 2000 30 ms 300 ms
247. Motivation
ClientApplicationDatabase
Web
Server
Web
Server
Average (2014):
90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page
load time, revenue decreases by 1%.
Study by Amazon
When increasing load time of search
results by 500ms, traffic decreases by
20%.
Study by Google
248. Motivation
ClientApplicationDatabase
Web
Server
Web
Server
Average (2014):
90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page
load time, revenue decreases by 1%.
Study by Amazon
When increasing load time of search
results by 500ms, traffic decreases by
20%.
Study by Google
256. Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
HTTP Server
Trans-
actions
Querys
Object
Persist.
Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
uration
Partial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-
independent
Concerns
Database -
specific
Wrappers
SLAs
HTTP Server
257. Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content Delivery
Networks
Purge
Scale
HTTP Server
Trans-
actions
Querys
Object
Persist.
Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
uration
Partial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-
independent
Concerns
Database -
specific
Wrappers
SLAs
HTTP Server
258. Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content Delivery
Networks
Purge
Scale
HTTP Server
Trans-
actions
Querys
Object
Persist.
Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
uration
Partial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-
independent
Concerns
Database -
specific
Wrappers
SLAs
HTTP Server
Redis (Replicated)
10201040
10101010Counting
Bloom Filter
add
delete
Node.JS (local to Server)
Stored Procedures
Custom Validation
259. Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content Delivery
Networks
Purge
Scale
HTTP Server
Trans-
actions
Querys
Object
Persist.
Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
uration
Partial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-
independent
Concerns
Database -
specific
Wrappers
SLAs
HTTP Server
Redis (Replicated)
10201040
10101010Counting
Bloom Filter
add
delete
Node.JS (local to Server)
Stored Procedures
Custom Validation
GET /db/{bucket}/{class}/{id}
200 OK
Cache-Control: public, max-age=6000
ETag: "3"
JSON Object
290. Polyglot
Persistence
application
Orestes servers
REST/HTTP
protocol
Redis MongoDB db4o
meta data contains SLA
parse SLA
& route data
manage
materialisation
resolve mapping
Polyglot Persistence
Mediator
Results:
Article-Objects with Impression Count
Article
ID
Title
…
Imp.
Imp.
ID
MongoDB Redis Sorted Set
Speedup with PPM:
• 50-1000%
• 66% performance of Varnish
291. Cloud Evaluation of ORESTES
Client Machine
50
...
Web
Cache
Orestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
292. Cloud Evaluation of ORESTES
Client Machine
50
...
Web
Cache
Orestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
30 000 Objekte
500 Anfragen/
Client
30 000 Objects
500 Req./Clients
10/1 Read/Write
293. Cloud Evaluation of ORESTES
Client Machine
50
...
Web
Cache
Orestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
297. Orestes as a startup
Baqend
Internet
Seoxy
REST-API Transactions Schema Management Cache Consistency
Auto-Scaling Multi-Tenancy Security and Access Control Provisioning
305. Baqend in Action
GET /app.html
db.find(Menu, 'main')
.done(...);
db.find(Page, 'hero')
.done(...);
db.query(Page, 'top3')
.done(...);
306. Baqend in Action
GET /app.html
db.find(Menu, 'main')
.done(...);
db.find(Page, 'hero')
.done(...);
db.query(Page, 'top3')
.done(...);
GET /img/pic005.jpg
GET /img/pic017.jpg
GET /img/pic022.jpg
309. How to choose a cloud database:
Wrap-up
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
310. How to choose a cloud database:
Wrap-up
Define your functional
requirements
Define your non-functional
requirements
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
311. How to choose a cloud database:
Wrap-up
Define your functional
requirements
Define your non-functional
requirements
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
1. Underyling DB
2. Docs & books
3. Your own tests
Evaluate by:
312. How to choose a cloud database:
Wrap-up
Define your functional
requirements
Define your non-functional
requirements
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
1. Underyling DB
2. Docs & books
3. Your own tests
4. SLAs
5. Docs & books
6. Experience Reports
Evaluate by:
313. How to choose a cloud database:
Wrap-up
Define your functional
requirements
Define your non-functional
requirements
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
1. Underyling DB
2. Docs & books
3. Your own tests
4. SLAs
5. Docs & books
6. Experience Reports
Evaluate by:
Try it
317. VLDB (Very Large Databases)
SIGMOD (Special Interest Group on Management of Data)
ICDE (International Conference on Data Engineering)
CIDR (Conference on Innovative Data Systems Research)
SOCC (Symposium on Cloud Computing)
OSDI/SOSP (Operating Systems Design and
Implementation/ Symposium on Operating System Principles)
EuroSys
Top Scientific Conferences
Database
Research
Distributed
Systems
Research
318. VLDB (Very Large Databases)
SIGMOD (Special Interest Group on Management of Data)
ICDE (International Conference on Data Engineering)
CIDR (Conference on Innovative Data Systems Research)
SOCC (Symposium on Cloud Computing)
OSDI/SOSP (Operating Systems Design and
Implementation/ Symposium on Operating System Principles)
EuroSys
Top Scientific Conferences
Database
Research
Distributed
Systems
Research
This year probably in Washington D.C.
Learn more: scdm2013.com