What Every Developer Should Know About Database Scalability

•

85 likes•31,767 views

Replication. Partitioning. Relational databases. Bigtable. Dynamo. There is no one-size-fits-all approach to scaling your database, and the CAP theorem proved that there never will be. This talk will explain the advantages and limits of the approaches to scaling traditional relational databases, as well as the tradeoffs made by the designers of newer distributed systems like Cassandra. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7955

Technology Business

Disk is the new tape*
• ~8ms to seek
– ~4ms on expensive 15k rpm disks

Two kinds of operations
• Reads

• Writes

Caching
• Memcached

• Ehcache

• etc cache

DB

Cache invalidation
• Implicit

• Explicit

Cache set invalidation
get_cached_cart(cart=13, offset=10,
limit=10)

get('cart:13:10:10')

?

Set invalidation 2
prefix = get('cart_prefix:13')

get(prefix + ':10:10')

del('cart_prefix:13')

http://www.aminus.org/blogs/index.php/2007/1

Types of replication
• Master → slave
– Master → slave → other slaves

• Master ↔ master
– multi-master

Types of replication 2
• Synchronous

• Asynchronous

Synchronous
• Synchronous = slow(er)

• Complexity (e.g. 2pc)

• PGCluster

• Oracle

Asynchronous master/slave
• Easiest

• Failover

• MySQL replication

• Slony, Londiste, WAL shipping

• Tungsten

Asynchronous multi-master
• Conflict resolution
– O(N3) or O(N2) as you add nodes
– http://research.microsoft.com/~gray/replicas.ps

• Bucardo

• MySQL Cluster

Achtung!
• Asynchronous replication can lose data if
the master fails

“Architecture”
• Primarily about how you cope with failure
scenarios

Scaling writes
• Partitioning aka sharding
– Key / horizontal
– Vertical
– Directed

Key based partitioning
• PK of “root” table controls destination
– e.g. user id

• Retains referential integrity

Example: blogger.com

Users

Blogs

Comments

Example: blogger.com

Users

Blogs Comments'

Comments

Vertical partitioning
• Tables on separate nodes

• Often a table that is too big to keep with
the other tables, gets too big for a single
node

Directed partitioning
• Central db that knows what server owns a
key

• Makes adding machines easier

• Single point of failure

What these have in common
• Ad hoc

• Error-prone

• Manpower-intensive

To summarize
• Scaling reads sucks

• Scaling writes sucks more

*
Distributed databases
• Data is automatically partitioned

• Transparent to application

• Add capacity without downtime

• Failure tolerant

*Like Bigtable, not Lotus Notes

Two famous papers
• Bigtable: A distributed storage system for
structured data, 2006

• Dynamo: amazon's highly available key-
value store, 2007

The world doesn't need another
half-assed key/value store

(See also Olin Shivers' 100% and 80%
solutions)

Two approaches
• Bigtable: “How can we build a distributed
database on top of GFS?”

• Dynamo: “How can we build a distributed
hash table appropriate for the data
center?”

Eventually consistent
• Amazon:
http://www.allthingsdistributed.com/2008/12

• eBay:
http://queue.acm.org/detail.cfm?id=139412

Consistency in a BASE world
• If W + R > N, you are 100% consistent

• W=1, R=N

• W=N, R=1

• W=Q, R=Q where Q = N / 2 + 1

ColumnFamilies

keyA column1 column2 column3
keyC column1 column7 column11

Column
Byte[] Name
Byte[] Value
I64 timestamp

LSM write properties
• No reads

• No seeks

• Fast

• Atomic within ColumnFamily

vs MySQL with 50GB of data
• MySQL
– ~300ms write
– ~350ms read

• Cassandra
– ~0.12ms write
– ~15ms read

• Achtung!

What's hot

Megastore: Providing scalable and highly available storageNiels Claeys

An Overview to MySQL SYS Schema Mydbops

Unified Stream and Batch Processing with Apache FlinkDataWorks Summit/Hadoop Summit

Running MariaDB in multiple data centersMariaDB plc

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent

Vitess VReplication: Standing on the Shoulders of a MySQL GiantMatt Lord

Stability Patterns for Microservicespflueras

Sql Server Performance TuningBala Subra

HBase in Practicelarsgeorge

How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB

Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang

Introduction to Vertica (Architecture & More)LivePerson

M|18 Deep Dive: InnoDB Transactions and Write PathsMariaDB plc

Using all of the high availability options in MariaDBMariaDB plc

Apache Spark ArchitectureAlexey Grishchenko

MariaDB MaxScaleMariaDB plc

Redis OverviewHoang Long

MAA Best Practices for Oracle Database 19cMarkus Michalewicz

Scalability, Availability & Stability PatternsJonas Bonér

What's hot (20)

Megastore: Providing scalable and highly available storage

An Overview to MySQL SYS Schema

Unified Stream and Batch Processing with Apache Flink

Running MariaDB in multiple data centers

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...

Vitess VReplication: Standing on the Shoulders of a MySQL Giant

Stability Patterns for Microservices

Sql Server Performance Tuning

HBase in Practice

How to Build a Scylla Database Cluster that Fits Your Needs

Efficient Data Storage for Analytics with Apache Parquet 2.0

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark

Introduction to Vertica (Architecture & More)

M|18 Deep Dive: InnoDB Transactions and Write Paths

Using all of the high availability options in MariaDB

Apache Spark Architecture

MariaDB MaxScale

Redis Overview

MAA Best Practices for Oracle Database 19c

Scalability, Availability & Stability Patterns

Viewers also liked

Banco de Dados Não Relacionais vs Banco de Dados Relacionaisalexculpado

Hadoop and Cassandra at RackspaceStu Hood

Cassandra @Formspringmartincozzi

From 100s to 100s of MillionsErik Onnen

Cassandra by example - the path of read and write requestsgrro

Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft

BI, Reporting and Analytics on Apache CassandraVictor Coustenoble

Management ConsultingAlexandros Chatzopoulos

Oprah WinfreyUtkarsh Haldia

An Overview of Apache CassandraDataStax

Reverse Engineeringdswanson

ChessChuck Vohs

Lionel MessiNaliKardan

Lionel messiDipanker Singh

Jeff jonas big data new physicsMIT Forum of Israel

Unit Test JavaScriptDan Vitoriano

Growth Hacking Centre for Digital Marketing & Communication

WorkshopBeth Kanter

DatomicJordan Leigh

DatomicChristophe Marchal

Viewers also liked (20)

Banco de Dados Não Relacionais vs Banco de Dados Relacionais

Hadoop and Cassandra at Rackspace

Cassandra @Formspring

From 100s to 100s of Millions

Cassandra by example - the path of read and write requests

Migrating Netflix from Datacenter Oracle to Global Cassandra

BI, Reporting and Analytics on Apache Cassandra

Management Consulting

Oprah Winfrey

An Overview of Apache Cassandra

Reverse Engineering

Chess

Lionel Messi

Lionel messi

Jeff jonas big data new physics

Unit Test JavaScript

Growth Hacking

Workshop

Datomic

Similar to What Every Developer Should Know About Database Scalability

What every developer should know about database scalability, PyCon 2010jbellis

Intro to CassandraJon Haddad

Cassandra Core ConceptsDataStax Academy

CPU Cachesshinolajla

Scaling with sync_replication using Galera and EC2Marco Tusa

Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services

Scaling the Web: Databases & NoSQLRichard Schneeman

MyRocks Deep DiveYoshinori Matsunobu

Is NoSQL The Future of Data Storage?Saltmarch Media

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services

Cpu Cachesshinolajla

CPU Caches - Jamie Allenjaxconf

L6.sp17.pptxSudheerKumar499932

High-Performance Storage Services with HailDB and Javasunnygleason

Big Data (NJ SQL Server User Group)Don Demcsak

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn

Cassandra for mission critical dataOleksandr Semenov

ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati

Intro to Big Data and NoSQLDon Demcsak

Databases which, why and usage tipsavnerner

Similar to What Every Developer Should Know About Database Scalability (20)

What every developer should know about database scalability, PyCon 2010

Intro to Cassandra

Cassandra Core Concepts

CPU Caches

Scaling with sync_replication using Galera and EC2

Best Practices for Migrating Your Data Warehouse to Amazon Redshift

Scaling the Web: Databases & NoSQL

MyRocks Deep Dive

Is NoSQL The Future of Data Storage?

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...

Cpu Caches

CPU Caches - Jamie Allen

L6.sp17.pptx

High-Performance Storage Services with HailDB and Java

Big Data (NJ SQL Server User Group)

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn

Cassandra for mission critical data

ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)

Intro to Big Data and NoSQL

Databases which, why and usage tips

Recently uploaded

Modernizing Legacy Systems Using BallerinaWSO2

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

Simplifying Mobile A11y Presentation.pptxMarkSteadman7

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

Why Teams call analytics are critical to your entire businesspanagenda

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

API Governance and Monetization - The evolution of API governanceWSO2

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

AI in Action: Real World Use Cases by AnitarajAnitaRaj43

Quantum Leap in Next-Generation ComputingWSO2

Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2

WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2

Elevate Developer Efficiency & build GenAI Application with Amazon QBhuvaneswari Subramani

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37

Exploring Multimodal Embeddings with MilvusZilliz

Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE

Recently uploaded (20)

Modernizing Legacy Systems Using Ballerina

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Simplifying Mobile A11y Presentation.pptx

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Vector Search -An Introduction in Oracle Database 23ai.pptx

Why Teams call analytics are critical to your entire business

Artificial Intelligence Chap.5 : Uncertainty

API Governance and Monetization - The evolution of API governance

Understanding the FAA Part 107 License ..

AI in Action: Real World Use Cases by Anitaraj

Quantum Leap in Next-Generation Computing

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...

WSO2's API Vision: Unifying Control, Empowering Developers

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

JohnPollard-hybrid-app-RailsConf2024.pptx

Exploring Multimodal Embeddings with Milvus

Decarbonising Commercial Real Estate: The Role of Operational Performance

What Every Developer Should Know About Database Scalability

1. Database scalability Jonathan Ellis

2. Classic RDBMS persistence Index Data

3. Disk is the new tape* • ~8ms to seek – ~4ms on expensive 15k rpm disks

5. performance What scaling means money

6. Performance • Latency • Throughput

7. Two kinds of operations • Reads • Writes

8. Caching • Memcached • Ehcache • etc cache DB

9. Cache invalidation • Implicit • Explicit

10. Cache set invalidation get_cached_cart(cart=13, offset=10, limit=10) get('cart:13:10:10') ?

11. Set invalidation 2 prefix = get('cart_prefix:13') get(prefix + ':10:10') del('cart_prefix:13') http://www.aminus.org/blogs/index.php/2007/1

12. Replication

13. Types of replication • Master → slave – Master → slave → other slaves • Master ↔ master – multi-master

14. Types of replication 2 • Synchronous • Asynchronous

15. Synchronous • Synchronous = slow(er) • Complexity (e.g. 2pc) • PGCluster • Oracle

16. Asynchronous master/slave • Easiest • Failover • MySQL replication • Slony, Londiste, WAL shipping • Tungsten

17. Asynchronous multi-master • Conflict resolution – O(N3) or O(N2) as you add nodes – http://research.microsoft.com/~gray/replicas.ps • Bucardo • MySQL Cluster

18. Achtung! • Asynchronous replication can lose data if the master fails

19. “Architecture” • Primarily about how you cope with failure scenarios

20. Replication does not scale writes

21. Scaling writes • Partitioning aka sharding – Key / horizontal – Vertical – Directed

22. Partitioning

23. Key based partitioning • PK of “root” table controls destination – e.g. user id • Retains referential integrity

24. Example: blogger.com Users Blogs Comments

25. Example: blogger.com Users Blogs Comments' Comments

26. Vertical partitioning • Tables on separate nodes • Often a table that is too big to keep with the other tables, gets too big for a single node

27. Growing is hard

28. Directed partitioning • Central db that knows what server owns a key • Makes adding machines easier • Single point of failure

29. Partitioning

30. Partitioning with replication

31. What these have in common • Ad hoc • Error-prone • Manpower-intensive

32. To summarize • Scaling reads sucks • Scaling writes sucks more

33. * Distributed databases • Data is automatically partitioned • Transparent to application • Add capacity without downtime • Failure tolerant *Like Bigtable, not Lotus Notes

34. Two famous papers • Bigtable: A distributed storage system for structured data, 2006 • Dynamo: amazon's highly available key- value store, 2007

35. The world doesn't need another half-assed key/value store (See also Olin Shivers' 100% and 80% solutions)

36. Two approaches • Bigtable: “How can we build a distributed database on top of GFS?” • Dynamo: “How can we build a distributed hash table appropriate for the data center?”

37. Bigtable architecture

38.

39. Lookup in Bigtable

40. Dynamo

41. Eventually consistent • Amazon: http://www.allthingsdistributed.com/2008/12 • eBay: http://queue.acm.org/detail.cfm?id=139412

42. Consistency in a BASE world • If W + R > N, you are 100% consistent • W=1, R=N • W=N, R=1 • W=Q, R=Q where Q = N / 2 + 1

43. Cassandra

44. Memtable / SSTable Disk Commit log

45. ColumnFamilies keyA column1 column2 column3 keyC column1 column7 column11 Column Byte[] Name Byte[] Value I64 timestamp

46. LSM write properties • No reads • No seeks • Fast • Atomic within ColumnFamily

47. vs MySQL with 50GB of data • MySQL – ~300ms write – ~350ms read • Cassandra – ~0.12ms write – ~15ms read • Achtung!

48. Classic RDBMS persistence Index Data

49. Questions

What Every Developer Should Know About Database Scalability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to What Every Developer Should Know About Database Scalability

Similar to What Every Developer Should Know About Database Scalability (20)

More from jbellis

More from jbellis (20)

Recently uploaded

Recently uploaded (20)

What Every Developer Should Know About Database Scalability