SlideShare a Scribd company logo
1 of 45
Download to read offline
Scaling
Marty Weiner
Grayskull, Eternia
Yashh Nelapati
Gotham City
Pinterest is . . .
An online pinboard to organize and
share what inspires you.
Scaling Pinterest
Relationships
Scaling Pinterest
Marty Weiner
Grayskull, Eternia
Yashh Nelapati
Gotham City
Mar 2010 Jan 2011 Jan 2012
Scaling Pinterest
Mar 2010 Jan 2011 Jan 2012
Scaling Pinterest
· RackSpace
· 1 small Web Engine
· 1 small MySQL DB
Mar 2010 Jan 2011 Jan 2012
Scaling Pinterest
· Amazon EC2 + S3 + CloudFront
· 1 NGinX, 4 Web Engines
· 1 MySQL DB + 1 Read Slave
· 1 Task Queue + 2 Task Processors
· 1 MongoDB
Mar 2010 Jan 2011 Jan 2012
Scaling Pinterest
Mar 2010 Jan 2011 Jan 2012
Scaling Pinterest
· Amazon EC2 + S3 + CloudFront
· 2 NGinX, 16 Web Engines + 2 API Engines
· 5 Functionally Sharded MySQL DB + 9 read slaves
· 4 Cassandra Nodes
· 15 Membase Nodes (3 separate clusters)
· 8 Memcache Nodes
· 10 Redis Nodes
· 3 Task Routers + 4 Task Processors
· 4 Elastic Search Nodes
· 3 Mongo Clusters
Lesson Learned #1
It will fail. Keep it simple.
Scaling Pinterest
Mar 2010 Jan 2011 Jan 2012
Scaling Pinterest
· Amazon EC2 + S3 + ELB, Akamai
· 90+ Web Engines + 50 API Engines
· 66 MySQL DBs (m1.xlarge) + 1 slave each
· 59 Redis Instances
· 51 Memcache Instances
· 1 Redis Task Manager + 25 Task Processors
· Sharded Solr
Why Amazon EC2/S3?
· Very good reliability, reporting, and support
· Very good peripherals, such as managed cache,
DB, load balancing, DNS, map reduce, and more...
· New instances ready in seconds
Scaling Pinterest
· Con: Limited choice
· Pro: Limited choice
· Extremely mature
· Well known and well liked
· Rarely catastrophic loss of data
· Response time to request rate increases linearly
· Very good software support - XtraBackup, Innotop, Maatkit
· Solid active community
· Very good support from Percona
· Free
Scaling Pinterest
Why MySQL?
Why Memcache?
· Extremely mature
· Very good performance
· Well known and well liked
· Never crashes, and few failure modes
· Free
Scaling Pinterest
Why Redis?
· Variety of convenient data structures
· Has persistence and replication
· Well known and well liked
· Atomic operations
· Consistently good performance
· Free
Scaling Pinterest
What are the cons?
· They don’t do everything for you
· Out of the box, they wont scale past 1 server,
won’t have high availability, won’t bring you a
drink.
Scaling Pinterest
Clustering
vs
Sharding
Scaling Pinterest
Scaling Pinterest
Distributes data across
nodes automatically
Distributes data across
nodes manually
Data can move Data does not move
Rebalances to distribute load Split data to distribute load
Nodes communicate with
each other (gossip)
Nodes are not aware
of each other
Clustering vs Sharding
Why Clustering?
· Examples: Cassandra, MemBase, HBase, Riak
· Automatically scale your datastore
· Easy to set up
· Spatially distribute and colocate your data
· High availability
· Load balancing
· No single point of failure
Scaling Pinterest
Scaling Pinterest
What could possibly go wrong?
source: thereifixedit.com
Why Not Clustering?
· Still fairly young
· Less community support
· Fewer engineers with working knowledge
· Difficult and scary upgrade mechanisms
· And, yes, there is a single point of failure. A BIG one.
Scaling Pinterest
Cluster
Management
Algorithm
Scaling Pinterest
Clustering Single Point of Failure
Cluster Manager
· Same complex code replicated over all nodes
· Failure modes:
· Data rebalance breaks
· Data corruption across all nodes
· Improper balancing that cannot be fixed (easily)
· Data authority failure
Scaling Pinterest
Lesson Learned #2
Clustering is scary.
Scaling Pinterest
Why Sharding?
· Can split your databases to add more capacity
· Spatially distribute and colocate your data
· High availability
· Load balancing
· Algorithm for placing data is very simple
· ID generation is simplistic
Scaling Pinterest
When to shard?
· Sharding makes schema design harder
· Solidify site design and backend architecture
· Remove all joins and complex queries, add cache
· Functionally shard as much as possible
· Still growing? Shard.
Scaling Pinterest
Our Transition
1 DB + Foreign Keys + Joins
1 DB + Denormalized + Cache
Several functionally sharded DBs + Read slaves + Cache
1 DB + Read slaves + Cache
ID sharded DBs + Backup slaves + Cache
Scaling Pinterest
Watch out for...
Scaling Pinterest
· Lost the ability to perform simple JOINS
· No transaction capabilities
· Extra effort to maintain unique constraints
· Schema changes requires more planning
· Single report requires running same query on all
shards
How we sharded
Scaling Pinterest
Sharded Server Topology
Initially, 8 physical servers, each with 512 DBs
Scaling Pinterest
db00001
db00002
.......
db00512
db00513
db00514
.......
db01024
db03584
db03585
.......
db04096
db03072
db03073
.......
db03583
High Availability
Multi Master replication
Scaling Pinterest
db00001
db00002
.......
db00512
db00513
db00514
.......
db01024
db03584
db03585
.......
db04096
db03072
db03073
.......
db03583
Increased load on DB?
To increase capacity, a server is replicated and the
new replica becomes responsible for some DBs
Scaling Pinterest
db00001
db00002
.......
db00512
db00001
db00002
.......
db00256
db00257
db00258
.......
db00512
ID Structure
· A lookup data structure has physical server to shard
ID range ( cached by each app server process)
· Shard ID denotes which shard
· Type denotes object type (e.g., pins)
· Local ID denotes position in table
Shard ID Local ID
64 bits
Scaling Pinterest
Type
Why not a ID service?
· It is a single point of failure
· Extra look up to compute a UUID
Scaling Pinterest
Lookup Structure
Scaling Pinterest
sharddb003a
{“sharddb001a”: ( 1, 512),
“sharddb002b”: ( 513, 1024),
“sharddb003a”: (1025, 1536),
...
“sharddb008b”: (3585, 4096)}
DB01025 users
users
user_has_boards
boards
1 ser-data
2 ser-data
3 ser-data
· New users are randomly distributed across shards
· Boards, pins, etc. try to be collocated with user
· Local ID’s are assigned by auto-increment
· Enough ID space for 65536 shards, but only first
4096 opened initially. Can expand horizontally.
Scaling Pinterest
ID Structure
Objects and Mappings
· Object tables (e.g., pin, board, user, comment)
· Local ID MySQL blob (JSON / Serialized thrift)
· Mapping tables (e.g., user has boards, pin has likes)
· Full ID Full ID (+ timestamp)
· Naming schema is noun_verb_noun
· Queries are PK or index lookups (no joins)
· Data DOES NOT MOVE
· All tables exist on all shards
· No schema changes required (index = new table)
Scaling Pinterest
Scaling Pinterest
def create_new_pin(board_id, data):
shard_id, type, local_board_id = decompose_id(board_id)
local_id = write_pin_to_shard(shard_id, PIN_TYPE, data)
pin_id = compose_id(shard_id, PIN_TYPE, local_id)
return pin_id
Loading a Page
· Rendering user profile
· Most of these calls will be a cache hit
· Omitting offset/limits and mapping sequence id sort
SELECT body FROM users WHERE id=<local_user_id>
SELECT board_id FROM user_has_boards WHERE user_id=<user_id>
SELECT body FROM boards WHERE id IN (<board_ids>)
SELECT pin_id FROM board_has_pins WHERE board_id=<board_id>
SELECT body FROM pins WHERE id IN (pin_ids)
Scaling Pinterest
Scripting
· Must get old data into your shiny new shard
· 500M pins, 1.6B follower rows, etc
· Build a scripting farm
· Spawn more workers and complete the task faster
· Pyres - based on Github’s Resque queue
Scaling Pinterest
Future problems
· Connection limits
· Isolation of functionality
Scaling Pinterest
· Use read slaves for read only as a temporary measure
· Lag can cause difficult to catch bugs
· Use Memcache/Redis as a feed!
· Append new values. If the key does not exist,
append will fail so no worries over partial lists
· Split background tasks by priority
· Write a custom ORM tailored to your sharding
Scaling Pinterest
Interesting Tidbits
Lesson Learned #3
Working at Pinterest is AWESOME
Scaling Pinterest
We are Hiring!
jobs@pinterest.com
Scaling Pinterest
Questions?
Scaling Pinterest
marty@pinterest.com yashh@pinterest.com

More Related Content

What's hot

A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0Krishna Sankar
 
Converting from MySQL to PostgreSQL
Converting from MySQL to PostgreSQLConverting from MySQL to PostgreSQL
Converting from MySQL to PostgreSQLJohn Ashmead
 
Meetup Google BigQuery powered by ai
Meetup Google BigQuery powered by aiMeetup Google BigQuery powered by ai
Meetup Google BigQuery powered by aiIdo Volff
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data storesColin Charles
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Clustermysqlops
 
introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)Farzin Bagheri
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Share point 2013 on azure
Share point 2013 on azureShare point 2013 on azure
Share point 2013 on azurePrabath Fonseka
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Gavin Heavyside
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesDoiT International
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToSATOSHI TAGOMORI
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101Ike Ellis
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoDJamey Hanson
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation Ericsson Labs
 

What's hot (20)

A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0
 
Converting from MySQL to PostgreSQL
Converting from MySQL to PostgreSQLConverting from MySQL to PostgreSQL
Converting from MySQL to PostgreSQL
 
Meetup Google BigQuery powered by ai
Meetup Google BigQuery powered by aiMeetup Google BigQuery powered by ai
Meetup Google BigQuery powered by ai
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data stores
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Share point 2013 on azure
Share point 2013 on azureShare point 2013 on azure
Share point 2013 on azure
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Presto
PrestoPresto
Presto
 

Similar to Plmce2012 scaling pinterest

Pinterest的数据库分片架构
Pinterest的数据库分片架构Pinterest的数据库分片架构
Pinterest的数据库分片架构Tommy Chiu
 
Pinterest arch summit august 2012 - scaling pinterest
Pinterest arch summit   august 2012 - scaling pinterestPinterest arch summit   august 2012 - scaling pinterest
Pinterest arch summit august 2012 - scaling pinterestdrewz lin
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlHAFIZ Islam
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]Huy Do
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at nightMichael Yarichuk
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshareColin Charles
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 

Similar to Plmce2012 scaling pinterest (20)

Pinterest的数据库分片架构
Pinterest的数据库分片架构Pinterest的数据库分片架构
Pinterest的数据库分片架构
 
Pinterest arch summit august 2012 - scaling pinterest
Pinterest arch summit   august 2012 - scaling pinterestPinterest arch summit   august 2012 - scaling pinterest
Pinterest arch summit august 2012 - scaling pinterest
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using Mysql
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshare
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Plmce2012 scaling pinterest

  • 2. Pinterest is . . . An online pinboard to organize and share what inspires you. Scaling Pinterest
  • 3.
  • 4.
  • 5.
  • 6. Relationships Scaling Pinterest Marty Weiner Grayskull, Eternia Yashh Nelapati Gotham City
  • 7. Mar 2010 Jan 2011 Jan 2012 Scaling Pinterest Mar 2010 Jan 2011 Jan 2012 Scaling Pinterest · RackSpace · 1 small Web Engine · 1 small MySQL DB
  • 8. Mar 2010 Jan 2011 Jan 2012 Scaling Pinterest · Amazon EC2 + S3 + CloudFront · 1 NGinX, 4 Web Engines · 1 MySQL DB + 1 Read Slave · 1 Task Queue + 2 Task Processors · 1 MongoDB
  • 9. Mar 2010 Jan 2011 Jan 2012 Scaling Pinterest Mar 2010 Jan 2011 Jan 2012 Scaling Pinterest · Amazon EC2 + S3 + CloudFront · 2 NGinX, 16 Web Engines + 2 API Engines · 5 Functionally Sharded MySQL DB + 9 read slaves · 4 Cassandra Nodes · 15 Membase Nodes (3 separate clusters) · 8 Memcache Nodes · 10 Redis Nodes · 3 Task Routers + 4 Task Processors · 4 Elastic Search Nodes · 3 Mongo Clusters
  • 10. Lesson Learned #1 It will fail. Keep it simple. Scaling Pinterest
  • 11. Mar 2010 Jan 2011 Jan 2012 Scaling Pinterest · Amazon EC2 + S3 + ELB, Akamai · 90+ Web Engines + 50 API Engines · 66 MySQL DBs (m1.xlarge) + 1 slave each · 59 Redis Instances · 51 Memcache Instances · 1 Redis Task Manager + 25 Task Processors · Sharded Solr
  • 12. Why Amazon EC2/S3? · Very good reliability, reporting, and support · Very good peripherals, such as managed cache, DB, load balancing, DNS, map reduce, and more... · New instances ready in seconds Scaling Pinterest · Con: Limited choice · Pro: Limited choice
  • 13. · Extremely mature · Well known and well liked · Rarely catastrophic loss of data · Response time to request rate increases linearly · Very good software support - XtraBackup, Innotop, Maatkit · Solid active community · Very good support from Percona · Free Scaling Pinterest Why MySQL?
  • 14. Why Memcache? · Extremely mature · Very good performance · Well known and well liked · Never crashes, and few failure modes · Free Scaling Pinterest
  • 15. Why Redis? · Variety of convenient data structures · Has persistence and replication · Well known and well liked · Atomic operations · Consistently good performance · Free Scaling Pinterest
  • 16. What are the cons? · They don’t do everything for you · Out of the box, they wont scale past 1 server, won’t have high availability, won’t bring you a drink. Scaling Pinterest
  • 18. Scaling Pinterest Distributes data across nodes automatically Distributes data across nodes manually Data can move Data does not move Rebalances to distribute load Split data to distribute load Nodes communicate with each other (gossip) Nodes are not aware of each other Clustering vs Sharding
  • 19. Why Clustering? · Examples: Cassandra, MemBase, HBase, Riak · Automatically scale your datastore · Easy to set up · Spatially distribute and colocate your data · High availability · Load balancing · No single point of failure Scaling Pinterest
  • 20. Scaling Pinterest What could possibly go wrong? source: thereifixedit.com
  • 21. Why Not Clustering? · Still fairly young · Less community support · Fewer engineers with working knowledge · Difficult and scary upgrade mechanisms · And, yes, there is a single point of failure. A BIG one. Scaling Pinterest
  • 23. Cluster Manager · Same complex code replicated over all nodes · Failure modes: · Data rebalance breaks · Data corruption across all nodes · Improper balancing that cannot be fixed (easily) · Data authority failure Scaling Pinterest
  • 24. Lesson Learned #2 Clustering is scary. Scaling Pinterest
  • 25. Why Sharding? · Can split your databases to add more capacity · Spatially distribute and colocate your data · High availability · Load balancing · Algorithm for placing data is very simple · ID generation is simplistic Scaling Pinterest
  • 26. When to shard? · Sharding makes schema design harder · Solidify site design and backend architecture · Remove all joins and complex queries, add cache · Functionally shard as much as possible · Still growing? Shard. Scaling Pinterest
  • 27. Our Transition 1 DB + Foreign Keys + Joins 1 DB + Denormalized + Cache Several functionally sharded DBs + Read slaves + Cache 1 DB + Read slaves + Cache ID sharded DBs + Backup slaves + Cache Scaling Pinterest
  • 28. Watch out for... Scaling Pinterest · Lost the ability to perform simple JOINS · No transaction capabilities · Extra effort to maintain unique constraints · Schema changes requires more planning · Single report requires running same query on all shards
  • 30. Sharded Server Topology Initially, 8 physical servers, each with 512 DBs Scaling Pinterest db00001 db00002 ....... db00512 db00513 db00514 ....... db01024 db03584 db03585 ....... db04096 db03072 db03073 ....... db03583
  • 31. High Availability Multi Master replication Scaling Pinterest db00001 db00002 ....... db00512 db00513 db00514 ....... db01024 db03584 db03585 ....... db04096 db03072 db03073 ....... db03583
  • 32. Increased load on DB? To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest db00001 db00002 ....... db00512 db00001 db00002 ....... db00256 db00257 db00258 ....... db00512
  • 33. ID Structure · A lookup data structure has physical server to shard ID range ( cached by each app server process) · Shard ID denotes which shard · Type denotes object type (e.g., pins) · Local ID denotes position in table Shard ID Local ID 64 bits Scaling Pinterest Type
  • 34. Why not a ID service? · It is a single point of failure · Extra look up to compute a UUID Scaling Pinterest
  • 35. Lookup Structure Scaling Pinterest sharddb003a {“sharddb001a”: ( 1, 512), “sharddb002b”: ( 513, 1024), “sharddb003a”: (1025, 1536), ... “sharddb008b”: (3585, 4096)} DB01025 users users user_has_boards boards 1 ser-data 2 ser-data 3 ser-data
  • 36. · New users are randomly distributed across shards · Boards, pins, etc. try to be collocated with user · Local ID’s are assigned by auto-increment · Enough ID space for 65536 shards, but only first 4096 opened initially. Can expand horizontally. Scaling Pinterest ID Structure
  • 37. Objects and Mappings · Object tables (e.g., pin, board, user, comment) · Local ID MySQL blob (JSON / Serialized thrift) · Mapping tables (e.g., user has boards, pin has likes) · Full ID Full ID (+ timestamp) · Naming schema is noun_verb_noun · Queries are PK or index lookups (no joins) · Data DOES NOT MOVE · All tables exist on all shards · No schema changes required (index = new table) Scaling Pinterest
  • 38. Scaling Pinterest def create_new_pin(board_id, data): shard_id, type, local_board_id = decompose_id(board_id) local_id = write_pin_to_shard(shard_id, PIN_TYPE, data) pin_id = compose_id(shard_id, PIN_TYPE, local_id) return pin_id
  • 39. Loading a Page · Rendering user profile · Most of these calls will be a cache hit · Omitting offset/limits and mapping sequence id sort SELECT body FROM users WHERE id=<local_user_id> SELECT board_id FROM user_has_boards WHERE user_id=<user_id> SELECT body FROM boards WHERE id IN (<board_ids>) SELECT pin_id FROM board_has_pins WHERE board_id=<board_id> SELECT body FROM pins WHERE id IN (pin_ids) Scaling Pinterest
  • 40. Scripting · Must get old data into your shiny new shard · 500M pins, 1.6B follower rows, etc · Build a scripting farm · Spawn more workers and complete the task faster · Pyres - based on Github’s Resque queue Scaling Pinterest
  • 41. Future problems · Connection limits · Isolation of functionality Scaling Pinterest
  • 42. · Use read slaves for read only as a temporary measure · Lag can cause difficult to catch bugs · Use Memcache/Redis as a feed! · Append new values. If the key does not exist, append will fail so no worries over partial lists · Split background tasks by priority · Write a custom ORM tailored to your sharding Scaling Pinterest Interesting Tidbits
  • 43. Lesson Learned #3 Working at Pinterest is AWESOME Scaling Pinterest