Designing data intensive applications - Oleg Mürk

January 2019
Oleg Mürk
Designing Data
Intensive Applications
!1
Senior Systems Architect

January 2019 !2
YOUR FIRST DATA ARCHITECTURE
Microservice(s)
(Python, node.js)
Cache
(Redis, Memcache)
Database
(MySQL, Postgres)
Message Queue(s) 
(RabbitMQ)
Deployment
(Ansible, Docker)

January 2019 !3
Requirements:
• Scalability
• Request Throughput & Latency
• Data Volume & Throughput & Latency
• Reliability (MTBF)
• Availability (Uptime)
• Maintainability
• Operability
• Evolvability
WITH GREAT SUCCESS COME … GREAT CHALLENGES
Challenges:
• DB partitioning for scalability
• More than 10K transactions/second
• More data than fits on large DB node
• DB replication for reliability
• DB fail-over for availability
• Data queries (joins) take too long
• Data joins across multiple DBs
• Process 100K events/second in 100ms
• Historical data approaches PB / year
• Need to run large analytical queries
• Need to reprocess historical events

Resource Scheduling & Monitoring & Logging
Batch
Processing
Stream
Processing
January 2019 !4
DATA ARCHITECTURE 5 YEARS LATER
Reactive Microservice(s)
Message
Queue(s)
Message / Event 
Formats
In-Memory
Store
Event 
Topic(s)
Operational
Store
Search
Store
Serving
Store
Analytical
Store
Object
Store
Workﬂow
Processing

January 2019 !5
• Message Broker
• Workers consuming from shared Queue
• Each message is processed by one worker
• Example: RabbitMQ (10K msgs/sec)
• Event Log
• Partitioned Event Topics
• Each consumer maintains his own offsets
• Example: Kafka (1M events/sec)
• Formats & Schema Evolution
• JSON & XML
• Protocol Buffers & Thrift & Avro
MESSAGES & EVENTS & FORMATS

• Scheduling tasks with ~1 minute granularity
• Directed Acyclic Graph (DAG)
• Tasks
• Dependencies
• External & time triggers
• Use cases
• Extract-Transform-Load jobs
• Aggregations, Reporting
• Scheduling Batch Processing jobs
• Examples
• Airflow, Luigi 
January 2019 !6
WORKFLOW PROCESSING

• Executing distributed computation jobs
• Main abstraction
• Distributed partitioned dataset (eg Spark RDD)
• Data dependencies between dataset partitions (Narrow, Wide)
• Use-cases
• Extract-Transform-Load
• Analytical queries (SQL)
• Training/updating ML models
• Examples
• Hadoop Map/Reduce
• Spark
• Flink
January 2019 !7
BATCH PROCESSING

January 2019 !8
BATCH PROCESSING

• Processing events
• Less than 100ms-1sec latency
• More than 10K-100K events/sec
• Equivalent of SQL on event streams
• Filter, Map, Join, Group, Aggregate
• Use-cases
• Extract-Transform-Load
• Data enrichment
• Event detection
• Session analysis
• Examples
• Spark Streaming, Flink, Kafka Streams & KSQL
January 2019 !9
STREAM PROCESSING

• Event Sourcing
• Command Query Responsibility Separation
• Lambda & Kappa Architecture
• Change Data Capture
• Unbundled Database
January 2019 !10
IMMUTABLE EVENT LOG & DENORMALIZATION OF STATE
Replication
Stream
Leader Follower
Writes
Traditional Database
Streaming
Transform
Materialized 
View
Unbundled Database
Streaming
Join
Streaming 
Aggregation
Events
Transformed
Events

Workﬂow Scheduler
January 2019 !11
TRADITIONAL DATABASE USE CASE
Event Microservice
Operational Store
Notiﬁcations
Alerts
Alert Job
Events
Query Microservice
Queries
Command Microservice Writes
Queries
Queries
Message Broker
Report Job

January 2019 !12
UNBUNDLED DATABASE
Event StreamStreaming Transform Streaming Join
Event 
Stream
External ConnectorChange Data Capture
Event Microservice
Event 
Stream
Operational Store
Event 
Stream
Event 
Stream
Serving Store
Streaming Aggregation
Notiﬁcations
Alerts
External Connector
Command Microservice
Writes
Query Microservice
Reads
Message Broker
Event Stream

January 2019 !13
• Supported Workloads
• Cache, Transactional, Search, Serving, Analytical, Objects, etc
• Data Structures & Indexes
• Read vs Write Optimized
• Query types
• Replication
• Partitioning
• Transactions
• Consistency vs Availability
CHOOSING DATA STORE(S)

January 2019 !14
• In-Memory Cache / Data Grid
• Memcache, Redis, Ignite, Hazelcast
• SQL
• OLTP: MySQL, Postgres
• OLAP: Redshift, Vertica, HIVE
• NoSQL = Not yet SQL
• Key-Value Stores: Redis, Riak
• Wide Column Stores: Cassandra, HBase, RocksDB
• Document Stores: MongoDB, Elastic
• Specialized: Time-Series, Graph, RDF, Object-Oriented
• Object Stores
• S3, HDFS
DATA STORE ZOO

January 2019 !15
• B+ Trees
• Optimized for random reads
• Balanced tree with block size ~4KB
• Random writes are less efficient
• SQL OLTP DBs (MySQL, Postgres)
• Log Structured Merge Trees
• Optimized for random writes
• Hierarchical compaction scheme
• Random reads are less efficient
• NoSQL stores (Cassandra, HBase)
DATA STRUCTURES (FOR RANDOM READS/WRITES)

Last
Name
First
Name
E-mail Phone #
Street
Address
January 2019 !16
• Columnar representation
• Each column is compressed separately
• Each column chunk has metadata (eg min/max values)
• Can read only column chunks that match filter
• Formats: Parquet, ORC
• SQL OLAP DBs: Redshift, Vertica, HIVE
COLUMNAR STORAGE (FOR ANALYTICS)
Last
Name
First
Name
E-mail Phone #
Street
Address
Row Storage Columnar Storage

January 2019 !17
• Last 50 years of Database Research in 3 slides!
• Last 40 years of Distributed Systems Research in 1 slide!
AND NOW LADIES AND GENTLEMEN…
Leslie LamportMichael Stonebraker

• Single-Leader Replication
• Synchronous vs Asynchronous
• Failover & Fencing (epoch numbers, STONITH) 
• Multi-Leader Replication
• Write conflict resolution (OT in Google Docs) 
• Leaderless Replication
• Quorums for reading and writing: w + r > n (Cassandra)
• CRDT: Convergent Replicated Datatypes (Akka Cluster, Riak) 
• Consistency of Reads
• Read-after-Write Consistency (S3 sometimes)
January 2019 !18
REPLICATION

January 2019 !19
• Partitioning of Key-Value Data
• By Hash of Key (Cassandra, Elastic)
• By Key Range (HBase) 
• Skewed Workloads & Hot Spots
• Salting (HBase, S3) 
• Partitioning & Indexes
• By document (Elastic, Cassandra)
• By term (Cassandra Materialized View)
• Re-partitioning / Over-partitioning
PARTITIONING
• Partitions & Replication
• Consistent Hashing (Elastic, Cassandra)
• Partition Leaders & Followers (Kafka)
• Region Servers (HBase)

January 2019 !20
• Consistency
• ACID: Atomicity & Consistency & Isolation & Durability
• BASE: Basically Available, Soft state, Eventual consistency 
• Single vs Multi-Object Transactions
• Check-And-Put (HBase, Cassandra LWT) 
• Isolation Levels
• Read Committed (RW Locks, MVCC)
• Repeatable Read (Snaphost Isolation)
• Serializable (2PL, Serializable Snaphost Isolation)
TRANSACTIONS

January 2019 !21
• Distributed Consensus
• FLP: not possible in asynchronous networks
• Zookeeper, Etcd, Bitcoin: … can do in practice!
• Atomic Broadcast aka Distributed Log
• On Network Partition
• CAP: pick either Consistency or Availability
• AP: Elastic, Cassandra, Akka Cluster
• CP: Kafka, HBase, Cassandra LWT
• HAT: think of consistency/latency requirements case-by-case
• NoSQL = Not Only SQL
• Google Spanner, CockroachDB, FaunaDB, FoundationDB
CONSISTENCY VS AVAILABILITY

January 2019 !22
DESIGNING DATA-INTENSIVE APPLICATIONS

BUILDING TRUST FOR THE CONNECTED WORLD

Designing data intensive applications - Oleg Mürk

More Related Content

What's hot

Similar to Designing data intensive applications - Oleg Mürk

Recently uploaded

Designing data intensive applications - Oleg Mürk