SlideShare a Scribd company logo
1 of 58
Download to read offline
Five data models for sharding
Craig Kerstiens, Head of Cloud at Citus
Who am I?
Craig Kerstiens
http://www.craigkerstiens.com
@craigkerstiens
Postgres weekly
Run Citus Cloud
Co-chair PostgresOpen
What is sharding
Practice of separating a large database into smaller, faster, more easily
managed parts called data shards.
Source: the internet
LogicalPhysical
What is a shard?
A table
A schema
A PG database
A node
An instance
A PG cluster
Is it a shard?
NO YES
2 Nodes - 32 shards
4 nodes - still 32 shards
Five models
• Geography
• Multi-tenant
• Entity id
• Graph model
• Time series
But first, two approaches
RangeHash
Hash - The steps
1. Hash your id
2. Define a shard range x shards, and each contain some range of hash
values. Route all inserts/updates/deletes to the shard
3. Profit
More details
• Hash based on some id
• Postgres internal hash can work fine, or so can your own
• Define your number of shards up front, make this larger than you expect
to grow to in terms of nodes
• (2 is bad)
• (2 million is also bad)
• Factors of 2 are nice, but not actually required
Don’t just route values
• 1-10 -> shard 1
• 2-20 -> shard 2
Create range of hash values
• hash 1 = 46154
• hash 2 = 27193
• Shard 13 = ranges 26624 to 28672
Range - The steps
1. Ensure you’ve created your new destination for your range
2. Route your range to the right bucket
3. Profit
@citusdatawww.citusdata.com
Thanks
Craig Kerstiens
@craigkerstiens
Five models
• Geography
• Multi-tenant
• Entity id
• Graph model
• Time series
Click to edit master tile stylegeography
Shard by Geography
• Is there a clear line I can draw for a geographical boundary
• Good examples: income by state, healthcare, etc.
• Bad examples:
• Text messages: 256 sends to 510, both want a copy of this data…
Will geography sharding work for you?
• Do you join across geographies?
• Does data easily cross boundaries?
• Is data queries across boundaries or a different access frequently?
More specifics
• Granular vs. broad
• State vs. zip code
• (California and texas are bad)
• Zip codes might work, but does that work for your app?
Common use cases
• If your go to market is geography focused
• Instacart/Shipt
• Uber/Lyft
Real world application
• Range sharding makes moving things around harder here
• Combining the geography and giving each and id, then hashing (but using
smaller set of shards) can give better balance to your data
multi-tenant
Sharding by tenant
• Is each customer’s data their own?
• What’s your data’s distribution?
• (If one tenant/customer is 50% of your data tenant sharding won’t help)
• If it’s 10% of your data you may be okay
Common use cases
• Saas/B2B
• Salesforce
• Marketing automation
• Any online SaaS
Guidelines for multi-tenant sharding
• Put your tenant_id on every table that’s relevant
• Yes, denormalize
• Ensure primary keys and foreign keys are composite ones (with
tenant_id)
• Enforce your tenant_id is on all queries so things are appropriately
scoped
Salesforce schema
CREATE TABLE leads (
id serial primary key,
first_name text,
last_name text,
email text
);
CREATE TABLE accounts (
id serial primary key,
name text,
state varchar(2),
size int
);
CREATE TABLE opportunity (
id serial primary key,
name text,
amount int
);
Salesforce schema - with orgs
CREATE TABLE leads (
id serial primary key,
first_name text,
last_name text,
email text,
org_id int
);
CREATE TABLE accounts (
id serial primary key,
name text,
state varchar(2),
size int
org_id int
);
CREATE TABLE opportunity (
id serial primary key,
name text,
amount int
org_id int
);
Salesforce schema - with orgs
CREATE TABLE leads (
id serial primary key,
first_name text,
last_name text,
email text,
org_id int
);
CREATE TABLE accounts (
id serial primary key,
name text,
state varchar(2),
size int
org_id int
);
CREATE TABLE opportunity (
id serial primary key,
name text,
amount int
org_id int
);
Salesforce schema - with keys
CREATE TABLE leads (
id serial,
first_name text,
last_name text,
email text,
org_id int,
primary key (org_id, id)
);
CREATE TABLE accounts (
id serial,
name text,
state varchar(2),
size int,
org_id int,
primary key (org_id, id)
);
CREATE TABLE opportunity (
id serial,
name text,
amount int,
Salesforce schema - with keys
CREATE TABLE leads (
id serial,
first_name text,
last_name text,
email text,
org_id int,
primary key (org_id, id)
);
CREATE TABLE accounts (
id serial,
name text,
state varchar(2),
size int,
org_id int,
primary key (org_id, id)
);
CREATE TABLE opportunity (
id serial,
name text,
amount int,
Warnings about multi-tenant implementations
• Danger ahead if using schemas on older PG versions
• Have to reinvent the wheel for even the basics
• Schema migrations
• Connection limits
• Think twice before using a schema or database per tenant
Click to edit master tile styleentity_id
Entity id
• What’s an entity id?
• Something granular
• Want to join where you can though
• Optimizing for parallelism and less for data in memory
Examples tell it best
• Web analytics
• Shard by visitor_id
• Shard both sessions and views
• Key is to co-locate things you’ll join on
Key considerations
• SQL will be more limited OR slow
• Think in terms of map reduce
Map reduce examples
• Count (*)
• SUM of 32 smaller count (*)
• Average
• SUM of 32 smaller SUM(foo) / SUM of 32 smaller count(*)
• Median
• uh….
But I like medians and more
• Count distinct
• HyperLogLog
• Ordered list approximation
• Top-n
• Median
• T-digest or HDR
Graph modelgraph model
When you use a graph database
• You’ll know, really you will
Very different approach
Craig
posted
photo
Daniel
liked
Will
posted
comment
But what about sharding?
• Within a graph model you’re going to duplicate your data
• Shard based on both:
• The objects themselves
• The objects subscribed to other objects
Read this
https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf
time series
Time series: It’s obvious right?
• Well it depends
Querying long ranges
Not removing data
Always querying time
Querying a subset
Remove old data
Good Okay/Bad
Time series
• Range partitioning
• 2016 in a bucket, 2017 in a bucket
• 2016-01-01 in a bucket, 2016-01-02 in a bucket…
• Key steps
• Determine your ranges
• Make sure you setup enough in advance, or automate creating new ones
• Delete
Sensor data
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
Sensor data - initial partition
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
Sensor data - initial partition
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
Sensor data - setting up partitions
CREATE TABLE measurement_y2017m10 PARTITION OF measurement
FOR VALUES FROM ('2017-10-01') TO ('2017-10-31');
CREATE TABLE measurement_y2017m11 PARTITION OF measurement
FOR VALUES FROM ('2017-11-01') TO ('2017-11-30');
Sensor data - indexing
CREATE TABLE measurement_y2017m10 PARTITION OF measurement
FOR VALUES FROM ('2017-10-01') TO ('2017-10-31');
CREATE TABLE measurement_y2017m11 PARTITION OF measurement
FOR VALUES FROM ('2017-11-01') TO ('2017-11-30');
CREATE INDEX ON measurement_y2017m10 (logdate);
CREATE INDEX ON measurement_y2017m11 (logdate);
Sensor data - inserting
CREATE TRIGGER insert_measurement_trigger
BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();
Sensor data - inserting
CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.logdate >= DATE '2017-02-01' AND
NEW.logdate < DATE '2017-03-01' ) THEN
INSERT INTO measurement_y2017m02 VALUES (NEW.*);
ELSIF ( NEW.logdate >= DATE '2017-03-01' AND
NEW.logdate < DATE '2017-04-01' ) THEN
INSERT INTO measurement_y2017m03 VALUES (NEW.*);
...
ELSIF ( NEW.logdate >= DATE '2018-01-01' AND
NEW.logdate < DATE '2018-02-01' ) THEN
INSERT INTO measurement_y2018m01 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Five models
• Geography
• Multi-tenant
• Entity id
• Graph model
• Time series
Recap
• Not sharding is always easier than sharding
• Identify your sharding approach/key early, denormalize it even when
you’re small
• Don’t force it into one model. No model is perfect, but disqualify where
you can
• Sharding used to be much more painful, it’s not quite a party yet, but it’s
@citusdatawww.citusdata.com
Thanks
Craig Kerstiens
@craigkerstiens
https://2018.nordicpgday.org/feedback

More Related Content

What's hot

My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionMurtaza Doctor
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Citus Data
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Citus Data
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
 
Graph computation
Graph computationGraph computation
Graph computationSigmoid
 
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Denny Lee
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Patterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldPatterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldSriskandarajah Suhothayan
 
vJUG - Introduction to data streaming
vJUG - Introduction to data streamingvJUG - Introduction to data streaming
vJUG - Introduction to data streamingNicolas Fränkel
 
Data analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableauData analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableaupkale1708
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processingYogi Devendra Vyavahare
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooAll Things Open
 
Project A Data Modelling Best Practices Part I: How to model data in a data w...
Project A Data Modelling Best Practices Part I: How to model data in a data w...Project A Data Modelling Best Practices Part I: How to model data in a data w...
Project A Data Modelling Best Practices Part I: How to model data in a data w...Martin Loetzsch
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALMark Tabladillo
 

What's hot (20)

My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
Graph computation
Graph computationGraph computation
Graph computation
 
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Patterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldPatterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real World
 
vJUG - Introduction to data streaming
vJUG - Introduction to data streamingvJUG - Introduction to data streaming
vJUG - Introduction to data streaming
 
Data analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableauData analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableau
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
 
Project A Data Modelling Best Practices Part I: How to model data in a data w...
Project A Data Modelling Best Practices Part I: How to model data in a data w...Project A Data Modelling Best Practices Part I: How to model data in a data w...
Project A Data Modelling Best Practices Part I: How to model data in a data w...
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 

Similar to Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens

Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsEDB
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfjill734733
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really DoingDave Stokes
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Citus Data
 
High-Performance Advanced Analytics with Spark-Alchemy
High-Performance Advanced Analytics with Spark-AlchemyHigh-Performance Advanced Analytics with Spark-Alchemy
High-Performance Advanced Analytics with Spark-AlchemyDatabricks
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for PythonWes McKinney
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2akitda
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra QUONTRASOLUTIONS
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015StampedeCon
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setSoner Altin
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...Facultad de Informática UCM
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
 

Similar to Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens (20)

Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data Models
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
 
High-Performance Advanced Analytics with Spark-Alchemy
High-Performance Advanced Analytics with Spark-AlchemyHigh-Performance Advanced Analytics with Spark-Alchemy
High-Performance Advanced Analytics with Spark-Alchemy
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 

More from Citus Data

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...Citus Data
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Citus Data
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberCitus Data
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncCitus Data
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...Citus Data
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisCitus Data
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Citus Data
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncCitus Data
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Citus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberCitus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberCitus Data
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoCitus Data
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Citus Data
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Citus Data
 

More from Citus Data (20)

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will Leinweber
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
 

Recently uploaded

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 

Recently uploaded (17)

Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 

Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens

  • 1. Five data models for sharding Craig Kerstiens, Head of Cloud at Citus
  • 2. Who am I? Craig Kerstiens http://www.craigkerstiens.com @craigkerstiens Postgres weekly Run Citus Cloud Co-chair PostgresOpen
  • 3. What is sharding Practice of separating a large database into smaller, faster, more easily managed parts called data shards. Source: the internet
  • 5. A table A schema A PG database A node An instance A PG cluster Is it a shard? NO YES
  • 6. 2 Nodes - 32 shards
  • 7. 4 nodes - still 32 shards
  • 8. Five models • Geography • Multi-tenant • Entity id • Graph model • Time series
  • 9. But first, two approaches
  • 11. Hash - The steps 1. Hash your id 2. Define a shard range x shards, and each contain some range of hash values. Route all inserts/updates/deletes to the shard 3. Profit
  • 12. More details • Hash based on some id • Postgres internal hash can work fine, or so can your own • Define your number of shards up front, make this larger than you expect to grow to in terms of nodes • (2 is bad) • (2 million is also bad) • Factors of 2 are nice, but not actually required
  • 13. Don’t just route values • 1-10 -> shard 1 • 2-20 -> shard 2
  • 14. Create range of hash values • hash 1 = 46154 • hash 2 = 27193 • Shard 13 = ranges 26624 to 28672
  • 15. Range - The steps 1. Ensure you’ve created your new destination for your range 2. Route your range to the right bucket 3. Profit
  • 17. Five models • Geography • Multi-tenant • Entity id • Graph model • Time series
  • 18. Click to edit master tile stylegeography
  • 19. Shard by Geography • Is there a clear line I can draw for a geographical boundary • Good examples: income by state, healthcare, etc. • Bad examples: • Text messages: 256 sends to 510, both want a copy of this data…
  • 20. Will geography sharding work for you? • Do you join across geographies? • Does data easily cross boundaries? • Is data queries across boundaries or a different access frequently?
  • 21. More specifics • Granular vs. broad • State vs. zip code • (California and texas are bad) • Zip codes might work, but does that work for your app?
  • 22. Common use cases • If your go to market is geography focused • Instacart/Shipt • Uber/Lyft
  • 23. Real world application • Range sharding makes moving things around harder here • Combining the geography and giving each and id, then hashing (but using smaller set of shards) can give better balance to your data
  • 25. Sharding by tenant • Is each customer’s data their own? • What’s your data’s distribution? • (If one tenant/customer is 50% of your data tenant sharding won’t help) • If it’s 10% of your data you may be okay
  • 26. Common use cases • Saas/B2B • Salesforce • Marketing automation • Any online SaaS
  • 27. Guidelines for multi-tenant sharding • Put your tenant_id on every table that’s relevant • Yes, denormalize • Ensure primary keys and foreign keys are composite ones (with tenant_id) • Enforce your tenant_id is on all queries so things are appropriately scoped
  • 28. Salesforce schema CREATE TABLE leads ( id serial primary key, first_name text, last_name text, email text ); CREATE TABLE accounts ( id serial primary key, name text, state varchar(2), size int ); CREATE TABLE opportunity ( id serial primary key, name text, amount int );
  • 29. Salesforce schema - with orgs CREATE TABLE leads ( id serial primary key, first_name text, last_name text, email text, org_id int ); CREATE TABLE accounts ( id serial primary key, name text, state varchar(2), size int org_id int ); CREATE TABLE opportunity ( id serial primary key, name text, amount int org_id int );
  • 30. Salesforce schema - with orgs CREATE TABLE leads ( id serial primary key, first_name text, last_name text, email text, org_id int ); CREATE TABLE accounts ( id serial primary key, name text, state varchar(2), size int org_id int ); CREATE TABLE opportunity ( id serial primary key, name text, amount int org_id int );
  • 31. Salesforce schema - with keys CREATE TABLE leads ( id serial, first_name text, last_name text, email text, org_id int, primary key (org_id, id) ); CREATE TABLE accounts ( id serial, name text, state varchar(2), size int, org_id int, primary key (org_id, id) ); CREATE TABLE opportunity ( id serial, name text, amount int,
  • 32. Salesforce schema - with keys CREATE TABLE leads ( id serial, first_name text, last_name text, email text, org_id int, primary key (org_id, id) ); CREATE TABLE accounts ( id serial, name text, state varchar(2), size int, org_id int, primary key (org_id, id) ); CREATE TABLE opportunity ( id serial, name text, amount int,
  • 33. Warnings about multi-tenant implementations • Danger ahead if using schemas on older PG versions • Have to reinvent the wheel for even the basics • Schema migrations • Connection limits • Think twice before using a schema or database per tenant
  • 34. Click to edit master tile styleentity_id
  • 35. Entity id • What’s an entity id? • Something granular • Want to join where you can though • Optimizing for parallelism and less for data in memory
  • 36. Examples tell it best • Web analytics • Shard by visitor_id • Shard both sessions and views • Key is to co-locate things you’ll join on
  • 37. Key considerations • SQL will be more limited OR slow • Think in terms of map reduce
  • 38. Map reduce examples • Count (*) • SUM of 32 smaller count (*) • Average • SUM of 32 smaller SUM(foo) / SUM of 32 smaller count(*) • Median • uh….
  • 39. But I like medians and more • Count distinct • HyperLogLog • Ordered list approximation • Top-n • Median • T-digest or HDR
  • 41. When you use a graph database • You’ll know, really you will
  • 43. But what about sharding? • Within a graph model you’re going to duplicate your data • Shard based on both: • The objects themselves • The objects subscribed to other objects
  • 46. Time series: It’s obvious right? • Well it depends
  • 47. Querying long ranges Not removing data Always querying time Querying a subset Remove old data Good Okay/Bad
  • 48. Time series • Range partitioning • 2016 in a bucket, 2017 in a bucket • 2016-01-01 in a bucket, 2016-01-02 in a bucket… • Key steps • Determine your ranges • Make sure you setup enough in advance, or automate creating new ones • Delete
  • 49. Sensor data CREATE TABLE measurement ( city_id int not null, logdate date not null, peaktemp int, unitsales int );
  • 50. Sensor data - initial partition CREATE TABLE measurement ( city_id int not null, logdate date not null, peaktemp int, unitsales int ) PARTITION BY RANGE (logdate);
  • 51. Sensor data - initial partition CREATE TABLE measurement ( city_id int not null, logdate date not null, peaktemp int, unitsales int ) PARTITION BY RANGE (logdate);
  • 52. Sensor data - setting up partitions CREATE TABLE measurement_y2017m10 PARTITION OF measurement FOR VALUES FROM ('2017-10-01') TO ('2017-10-31'); CREATE TABLE measurement_y2017m11 PARTITION OF measurement FOR VALUES FROM ('2017-11-01') TO ('2017-11-30');
  • 53. Sensor data - indexing CREATE TABLE measurement_y2017m10 PARTITION OF measurement FOR VALUES FROM ('2017-10-01') TO ('2017-10-31'); CREATE TABLE measurement_y2017m11 PARTITION OF measurement FOR VALUES FROM ('2017-11-01') TO ('2017-11-30'); CREATE INDEX ON measurement_y2017m10 (logdate); CREATE INDEX ON measurement_y2017m11 (logdate);
  • 54. Sensor data - inserting CREATE TRIGGER insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();
  • 55. Sensor data - inserting CREATE OR REPLACE FUNCTION measurement_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.logdate >= DATE '2017-02-01' AND NEW.logdate < DATE '2017-03-01' ) THEN INSERT INTO measurement_y2017m02 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2017-03-01' AND NEW.logdate < DATE '2017-04-01' ) THEN INSERT INTO measurement_y2017m03 VALUES (NEW.*); ... ELSIF ( NEW.logdate >= DATE '2018-01-01' AND NEW.logdate < DATE '2018-02-01' ) THEN INSERT INTO measurement_y2018m01 VALUES (NEW.*); ELSE RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql;
  • 56. Five models • Geography • Multi-tenant • Entity id • Graph model • Time series
  • 57. Recap • Not sharding is always easier than sharding • Identify your sharding approach/key early, denormalize it even when you’re small • Don’t force it into one model. No model is perfect, but disqualify where you can • Sharding used to be much more painful, it’s not quite a party yet, but it’s