SlideShare a Scribd company logo
1 of 30
Download to read offline
Horizontally Scalable Relational
Databases with Spark
Cody Koeninger
Kixer
What is Citus?
• Standard Postgres
• Sharded across multiple nodes
• CREATE EXTENSION citus; -- not a fork
• Good for live analytics, multi-tenant
• Open source, commercial support
2
Where does Citus fit with Spark?
1. Shove your data into Kafka
2. Munge it with Spark
3. ???
4. Profit!
3
Where does Citus fit with Spark?
1. Shove your data into Kafka
2. Munge it with Spark
3. Serve live traffic using
• ML models or key-value stores
• ? Spark SQL ?
• ? Relational database ?
4. Profit!
4
Spark SQL + HDFS Pain Points
• Multi-user
• Query latency
• Mutable rows
• Co-locating related writes for joins
5
Relational Database Pain Points
• “Schemaless” data
• Scaling out, without giving up
• Aggregations
• Joins
• Transactions
6
“Schemaless” Data
master# create table no_schema (
data JSONB);
master# create index on no_schema using gin(data);
master# insert into no_schema values
('{"user":"cody","cart":["apples","peanut_butter"]}'),
('{"user":"jake","cart":["whiskey"],"drunk":true}'),
('{"user":"omar","cart":["fireball"],"drunk":true}');
master# select data->'user' from no_schema
where data->'drunk' = ‘true';
?column?
----------
"jake"
"omar"
(2 rows)
7
Scaling Out
+--------------+
SELECT FROM table_1001 | |
+-------------------------> | Worker 1 |
| | |
| | table_1001 |
| SELECT FROM table_1003 | table_1003 |
+------------+ +-------------------------> | |
| | | | |
| Master | | +--------------+
SELECT FROM table | | |
+-------------------> | table | ---+
| metadata | | +--------------+
| | | SELECT FROM table_1002 | |
| | +-------------------------> | Worker 2 |
+------------+ | | |
| | table_1002 |
| SELECT FROM table_1004 | table_1004 |
+-------------------------> | |
| |
+--------------+
8
Choosing a Distribution Key
• Commonly queried column (e.g. customer id)
• 1 master query : 1 worker query
• easy join on distribution key
• possible hot spots if load is skewed
• Evenly distributed column (e.g. event GUID)
• 1 master query : N shards worker queries
• hard to join
• no hot spots 9
Creating Distributed Tables
master# create table impressions(
date date,
ad_id integer,
site_id integer,
total integer);
master# set citus.shard_replication_factor = 1;
master# set citus.shard_count = 4;
master# select create_distributed_table('impressions', 'ad_id');
10
• Metadata in master tables:
master# select * from pg_dist_shard;
logicalrelid | shardid | shardminvalue | shardmaxvalue
--------------+---------+---------------+---------------
impressions | 102008 | -2147483648 | -1073741825
impressions | 102009 | -1073741824 | -1
impressions | 102010 | 0 | 1073741823
impressions | 102011 | 1073741824 | 2147483647
• Data in worker shard tables:
worker1# d
List of relations
Schema | Name | Type | Owner
--------+--------------------+-------+-------
public | impressions_102008 | table | cody
public | impressions_102010 | table | cody 11
Writing Data
master# insert into impressions values (now(), 23, 42, 1337);
master# update impressions set total = total + 1
where ad_id = 23 and site_id = 42;
master# copy impressions from ‘/var/tmp/bogus_impressions';
• Can also write directly to worker shards
• as long as you hash correctly (at your own risk)
12
Aggregations
• Commutative and associative operations "just work":
master# select site_id, avg(total)
from impressions group by 1 order by 2 desc limit 1;
site_id | avg
---------+---------------------
5225 | 790503.061538461538
(1 row)
13
• In worker1’s log:
COPY (SELECT site_id, sum(total) AS avg, count(total) AS avg
FROM impressions_102010 impressions WHERE true GROUP BY site_id)
TO STDOUT
COPY (SELECT site_id, sum(total) AS avg, count(total) AS avg
FROM impressions_102008 impressions WHERE true GROUP BY site_id)
TO STDOUT
• For non-associative operations
• query a subset of data to temp table on master
• then use arbitrary SQL
14
Joins: Distribution key = fast
master# create table clicks(
date date,
ad_id integer,
site_id integer,
price numeric,
total integer);
master# select create_distributed_table('clicks', ‘ad_id');
master# select i.ad_id, sum(c.total) / sum(i.total)::float as
clickthrough from impressions i
inner join clicks c on i.ad_id = c.ad_id and i.date = c.date
and i.site_id = c.site_id group by 1 order by 2 desc limit 1;
ad_id | clickthrough
-------+----------
814 | 0.1
15
Joins: Non-distribution key = slow
master# create table discounts(
date date,
amt numeric);
master# select create_distributed_table('discounts', ‘date');
master# select c.ad_id, max(c.price * d.amt) from clicks c
inner join discounts d on c.date = d.date group by 1 order by 2 desc limit 1;
ERROR: cannot use real time executor with repartition jobs
HINT: Set citus.task_executor_type to "task-tracker".
master# SET citus.task_executor_type = 'task-tracker';
master# select c.ad_id, max(c.price * d.amt) from clicks c
inner join discounts d on c.date = d.date group by 1 order by 2 desc limit 1;
ad_id | max
-------+-----------------------------------
587 | 67.887149187736200000000000000000
(1 row)
Time: 3464.598 ms 16
Joins: Table on all workers = fast
-- Replication factor equal to number of nodes
master# SET citus.shard_replication_factor = 2;
master# select create_reference_table('discounts2');
master# select c.ad_id, max(c.price * d.amt) from clicks c
inner join discounts2 d on c.date = d.date group by 1 order by 2
desc limit 1;
ad_id | max
-------+-----------------------------------
587 | 67.887149187736200000000000000000
(1 row)
Time: 31.237 ms
17
Transactions
• No global transactions (Postgres-XL)
• Individual worker transactions still very useful
• Spark output actions aren’t exactly-once
• Transactions allow exactly-once semantics
• Even for non-idempotent updates
• If sharded by user, each sees consistent world
18
Transactional writes: Spark to 1 DB
19
• Table of offset ranges
• foreachPartition, transaction on DB
• insert or update results
• update offsets table rows
• roll back if offsets weren't as expected
• On failure
• begin from minimum offset range
• recalculate all results for that range (due to shuffle)
Transactional writes: Spark to Citus
20
• Partition Spark results to match Citus shards
• Table of offset ranges on each worker
• foreachPartition, transaction on corresponding worker
• insert or update results
• update offsets table rows for that shard
• roll back if offsets weren't as expected
• On failure
• begin from minimum offset range of all workers
• recalculate all results for that range (due to shuffle)
• skip writes for shards that already have that range
Spark Custom Partitioner
/** same number of Spark partitions as Citus shards */
override def numPartitions: Int
/** given a key from data, which Spark partition */
override def getPartition(key: Any): Int
/** given a Spark partition, which worker shard */
def shardPlacement(partitionId: Int): ShardPlacement
21
• Citus uses Postgres hash (hashfunc.c) based on
Jenkins’ 2006 hash
• Min / Max hash values for given shard are in
pg_dist_shard table
• Worker nodes for a given shard are in
pg_dist_shard_placement table
• Query once at partitioner creation time, build a
lookup array
22
select
(ds.logicalrelid::regclass)::varchar as tablename,
ds.shardmaxvalue::integer as hmax,
ds.shardid::integer,
p.nodename::varchar,
p.nodeport::integer
from pg_dist_shard ds
left join pg_dist_shard_placement p on ds.shardid = p.shardid
where (ds.logicalrelid::regclass)::varchar in ('impressions')
order by tablename, hmax asc;
tablename | hmax | shardid | nodename | nodeport
-------------+-------------+---------+-----------+----------
impressions | -1073741825 | 102008 | host1 | 9701
impressions | -1 | 102009 | host2 | 9702
impressions | 1073741823 | 102010 | host1 | 9701
impressions | 2147483647 | 102011 | host2 | 9702
-- key with hash of -2 would go into shard 102009
23
• To find Spark partition
• hash key using Jenkins
• walk array until hmax >= hashed value
• To find worker shard, index directly by partition #
• See github link at end of slides for working code
24
Offsets Table
app | topic | part | shard_table_name | off
-------+-------------+------+--------------------+---------
myapp | impressions | 0 | impressions_102008 | 20000
myapp | impressions | 0 | impressions_102009 | 20000
myapp | impressions | 0 | impressions_102010 | 19000 -- behind
myapp | impressions | 0 | impressions_102011 | 20000
myapp | impressions | 1 | impressions_102008 | 20001
myapp | impressions | 1 | impressions_102009 | 20001
myapp | impressions | 1 | impressions_102010 | 18000 -- behind
myapp | impressions | 1 | impressions_102011 | 20001
25
• In this case, failure occurred before writes to
impressions_102010 finished
• Restart Spark app from Kafka offset ranges
• partition 0 offsets 19000 -> 20000
• partition 1 offsets 18000 -> 20001
• Writes to impressions_102010 succeed, other
shards are skipped
26
Lies, Damn Lies, and Bar Charts
27
Lies, Damn Lies, and Bar Charts
28
Lessons Learned
• When moving to sharding, avoid other changes
• PgBouncer is necessary in front of workers too
• Some cognitive overhead about what SQL works
• still better than “SQL” on different data model
• Want hash distribution on top of date partitions
• can be done manually, easy way on roadmap
29
Questions?
https://github.com/koeninger/spark-citus
cody@koeninger.org

More Related Content

What's hot

Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 

What's hot (20)

Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Automated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureAutomated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative Infrastructure
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
DataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New WorldDataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New World
 
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 

Viewers also liked

Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Spark Summit
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Spark Summit
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Spark Summit
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Spark Summit
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Spark Summit
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark Summit
 

Viewers also liked (20)

Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
 
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas PatilCustom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil
 

Similar to Horizontally Scalable Relational Databases with Spark: Spark Summit East talk by Cody Koeninger

Similar to Horizontally Scalable Relational Databases with Spark: Spark Summit East talk by Cody Koeninger (20)

Part5 sql tune
Part5 sql tunePart5 sql tune
Part5 sql tune
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
MySQL performance monitoring using Statsd and Graphite (PLUK2013)
MySQL performance monitoring using Statsd and Graphite (PLUK2013)MySQL performance monitoring using Statsd and Graphite (PLUK2013)
MySQL performance monitoring using Statsd and Graphite (PLUK2013)
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
MySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and GraphiteMySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and Graphite
 
Around the world with extensions | PostgreSQL Conference Europe 2018 | Craig ...
Around the world with extensions | PostgreSQL Conference Europe 2018 | Craig ...Around the world with extensions | PostgreSQL Conference Europe 2018 | Craig ...
Around the world with extensions | PostgreSQL Conference Europe 2018 | Craig ...
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
Explain this!
Explain this!Explain this!
Explain this!
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 

Recently uploaded (20)

怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Horizontally Scalable Relational Databases with Spark: Spark Summit East talk by Cody Koeninger

  • 1. Horizontally Scalable Relational Databases with Spark Cody Koeninger Kixer
  • 2. What is Citus? • Standard Postgres • Sharded across multiple nodes • CREATE EXTENSION citus; -- not a fork • Good for live analytics, multi-tenant • Open source, commercial support 2
  • 3. Where does Citus fit with Spark? 1. Shove your data into Kafka 2. Munge it with Spark 3. ??? 4. Profit! 3
  • 4. Where does Citus fit with Spark? 1. Shove your data into Kafka 2. Munge it with Spark 3. Serve live traffic using • ML models or key-value stores • ? Spark SQL ? • ? Relational database ? 4. Profit! 4
  • 5. Spark SQL + HDFS Pain Points • Multi-user • Query latency • Mutable rows • Co-locating related writes for joins 5
  • 6. Relational Database Pain Points • “Schemaless” data • Scaling out, without giving up • Aggregations • Joins • Transactions 6
  • 7. “Schemaless” Data master# create table no_schema ( data JSONB); master# create index on no_schema using gin(data); master# insert into no_schema values ('{"user":"cody","cart":["apples","peanut_butter"]}'), ('{"user":"jake","cart":["whiskey"],"drunk":true}'), ('{"user":"omar","cart":["fireball"],"drunk":true}'); master# select data->'user' from no_schema where data->'drunk' = ‘true'; ?column? ---------- "jake" "omar" (2 rows) 7
  • 8. Scaling Out +--------------+ SELECT FROM table_1001 | | +-------------------------> | Worker 1 | | | | | | table_1001 | | SELECT FROM table_1003 | table_1003 | +------------+ +-------------------------> | | | | | | | | Master | | +--------------+ SELECT FROM table | | | +-------------------> | table | ---+ | metadata | | +--------------+ | | | SELECT FROM table_1002 | | | | +-------------------------> | Worker 2 | +------------+ | | | | | table_1002 | | SELECT FROM table_1004 | table_1004 | +-------------------------> | | | | +--------------+ 8
  • 9. Choosing a Distribution Key • Commonly queried column (e.g. customer id) • 1 master query : 1 worker query • easy join on distribution key • possible hot spots if load is skewed • Evenly distributed column (e.g. event GUID) • 1 master query : N shards worker queries • hard to join • no hot spots 9
  • 10. Creating Distributed Tables master# create table impressions( date date, ad_id integer, site_id integer, total integer); master# set citus.shard_replication_factor = 1; master# set citus.shard_count = 4; master# select create_distributed_table('impressions', 'ad_id'); 10
  • 11. • Metadata in master tables: master# select * from pg_dist_shard; logicalrelid | shardid | shardminvalue | shardmaxvalue --------------+---------+---------------+--------------- impressions | 102008 | -2147483648 | -1073741825 impressions | 102009 | -1073741824 | -1 impressions | 102010 | 0 | 1073741823 impressions | 102011 | 1073741824 | 2147483647 • Data in worker shard tables: worker1# d List of relations Schema | Name | Type | Owner --------+--------------------+-------+------- public | impressions_102008 | table | cody public | impressions_102010 | table | cody 11
  • 12. Writing Data master# insert into impressions values (now(), 23, 42, 1337); master# update impressions set total = total + 1 where ad_id = 23 and site_id = 42; master# copy impressions from ‘/var/tmp/bogus_impressions'; • Can also write directly to worker shards • as long as you hash correctly (at your own risk) 12
  • 13. Aggregations • Commutative and associative operations "just work": master# select site_id, avg(total) from impressions group by 1 order by 2 desc limit 1; site_id | avg ---------+--------------------- 5225 | 790503.061538461538 (1 row) 13
  • 14. • In worker1’s log: COPY (SELECT site_id, sum(total) AS avg, count(total) AS avg FROM impressions_102010 impressions WHERE true GROUP BY site_id) TO STDOUT COPY (SELECT site_id, sum(total) AS avg, count(total) AS avg FROM impressions_102008 impressions WHERE true GROUP BY site_id) TO STDOUT • For non-associative operations • query a subset of data to temp table on master • then use arbitrary SQL 14
  • 15. Joins: Distribution key = fast master# create table clicks( date date, ad_id integer, site_id integer, price numeric, total integer); master# select create_distributed_table('clicks', ‘ad_id'); master# select i.ad_id, sum(c.total) / sum(i.total)::float as clickthrough from impressions i inner join clicks c on i.ad_id = c.ad_id and i.date = c.date and i.site_id = c.site_id group by 1 order by 2 desc limit 1; ad_id | clickthrough -------+---------- 814 | 0.1 15
  • 16. Joins: Non-distribution key = slow master# create table discounts( date date, amt numeric); master# select create_distributed_table('discounts', ‘date'); master# select c.ad_id, max(c.price * d.amt) from clicks c inner join discounts d on c.date = d.date group by 1 order by 2 desc limit 1; ERROR: cannot use real time executor with repartition jobs HINT: Set citus.task_executor_type to "task-tracker". master# SET citus.task_executor_type = 'task-tracker'; master# select c.ad_id, max(c.price * d.amt) from clicks c inner join discounts d on c.date = d.date group by 1 order by 2 desc limit 1; ad_id | max -------+----------------------------------- 587 | 67.887149187736200000000000000000 (1 row) Time: 3464.598 ms 16
  • 17. Joins: Table on all workers = fast -- Replication factor equal to number of nodes master# SET citus.shard_replication_factor = 2; master# select create_reference_table('discounts2'); master# select c.ad_id, max(c.price * d.amt) from clicks c inner join discounts2 d on c.date = d.date group by 1 order by 2 desc limit 1; ad_id | max -------+----------------------------------- 587 | 67.887149187736200000000000000000 (1 row) Time: 31.237 ms 17
  • 18. Transactions • No global transactions (Postgres-XL) • Individual worker transactions still very useful • Spark output actions aren’t exactly-once • Transactions allow exactly-once semantics • Even for non-idempotent updates • If sharded by user, each sees consistent world 18
  • 19. Transactional writes: Spark to 1 DB 19 • Table of offset ranges • foreachPartition, transaction on DB • insert or update results • update offsets table rows • roll back if offsets weren't as expected • On failure • begin from minimum offset range • recalculate all results for that range (due to shuffle)
  • 20. Transactional writes: Spark to Citus 20 • Partition Spark results to match Citus shards • Table of offset ranges on each worker • foreachPartition, transaction on corresponding worker • insert or update results • update offsets table rows for that shard • roll back if offsets weren't as expected • On failure • begin from minimum offset range of all workers • recalculate all results for that range (due to shuffle) • skip writes for shards that already have that range
  • 21. Spark Custom Partitioner /** same number of Spark partitions as Citus shards */ override def numPartitions: Int /** given a key from data, which Spark partition */ override def getPartition(key: Any): Int /** given a Spark partition, which worker shard */ def shardPlacement(partitionId: Int): ShardPlacement 21
  • 22. • Citus uses Postgres hash (hashfunc.c) based on Jenkins’ 2006 hash • Min / Max hash values for given shard are in pg_dist_shard table • Worker nodes for a given shard are in pg_dist_shard_placement table • Query once at partitioner creation time, build a lookup array 22
  • 23. select (ds.logicalrelid::regclass)::varchar as tablename, ds.shardmaxvalue::integer as hmax, ds.shardid::integer, p.nodename::varchar, p.nodeport::integer from pg_dist_shard ds left join pg_dist_shard_placement p on ds.shardid = p.shardid where (ds.logicalrelid::regclass)::varchar in ('impressions') order by tablename, hmax asc; tablename | hmax | shardid | nodename | nodeport -------------+-------------+---------+-----------+---------- impressions | -1073741825 | 102008 | host1 | 9701 impressions | -1 | 102009 | host2 | 9702 impressions | 1073741823 | 102010 | host1 | 9701 impressions | 2147483647 | 102011 | host2 | 9702 -- key with hash of -2 would go into shard 102009 23
  • 24. • To find Spark partition • hash key using Jenkins • walk array until hmax >= hashed value • To find worker shard, index directly by partition # • See github link at end of slides for working code 24
  • 25. Offsets Table app | topic | part | shard_table_name | off -------+-------------+------+--------------------+--------- myapp | impressions | 0 | impressions_102008 | 20000 myapp | impressions | 0 | impressions_102009 | 20000 myapp | impressions | 0 | impressions_102010 | 19000 -- behind myapp | impressions | 0 | impressions_102011 | 20000 myapp | impressions | 1 | impressions_102008 | 20001 myapp | impressions | 1 | impressions_102009 | 20001 myapp | impressions | 1 | impressions_102010 | 18000 -- behind myapp | impressions | 1 | impressions_102011 | 20001 25
  • 26. • In this case, failure occurred before writes to impressions_102010 finished • Restart Spark app from Kafka offset ranges • partition 0 offsets 19000 -> 20000 • partition 1 offsets 18000 -> 20001 • Writes to impressions_102010 succeed, other shards are skipped 26
  • 27. Lies, Damn Lies, and Bar Charts 27
  • 28. Lies, Damn Lies, and Bar Charts 28
  • 29. Lessons Learned • When moving to sharding, avoid other changes • PgBouncer is necessary in front of workers too • Some cognitive overhead about what SQL works • still better than “SQL” on different data model • Want hash distribution on top of date partitions • can be done manually, easy way on roadmap 29