Movingthe
needleofthePin:
Oct, 2018Henry Cai
www.linkedin.com/in/hecai
Streaming100TBofpinsfrom
MySQLtoS3/Hadoop
continuously@Pinterest
Pinterestisthe
visualdiscovery
engine.Mission
Helppeoplediscoveranddowhattheylove. 
>250M
80%
75%
ofsignupsare
fromoutside

theU.S.
ofPinnersuse
Pinterestfrom
mobile
100B
Pinsand

2BBoards
monthly
activeusers
Data-driven
products
• Personalized
recommendation
• SpamControl
• SearchQuality
• A/BExperiments
• RelatedPins
• …
DataPipeline
stats
• >1PBdata/day
• >10Mmessages/second
• >800Bmessages/day
• >2,000kafkabrokers
• >50,000clienthosts
Dataingestion
types
• Onlinelogging
• Databasesnapshots
2016
pipeline
Dataingestion@Pinterest 2016
Pinterest Services
Singer
Kafka
Dataingestion@Pinterest 2016
Pinterest Services
Singer
Kafka
events
Dataingestion@Pinterest 2016
Pinterest Services
Singer
Kafka
events
Real-time 

consumers
Merced
Tracker
Dataingestion@Pinterest 2016
Pinterest Services
Singer
Kafka
events
Real-time 

consumers
Databases
Merced
Tracker
Dataingestion@Pinterest 2016
Pinterest Services
Singer
Kafka
events
Real-time 

consumers
Databases
Merced
Tracker
Dataingestion@Pinterest 2016
Pinterest Services
Singer
Kafka
events
Real-time 

consumers
Databases Logical backup
Merced
Tracker
DBingestion@Pinterest
Version1
DatabasesShard1 Slave
Shard1 DrSlave
Shard1 Master
Mysqldump
Hadoop
Streaming
Mapper1
Shard2 Slave
Shard2 DrSlave
Shard2 Master
Mysqldump Hadoop
Streaming
Mapper2
DBingestion@Pinterest
Version2
Databases logical csv 

backup
Tracker
Version1
Shard1 Slave
Shard1 DrSlave
Shard1 Master
Mysqldump
Hadoop
Streaming
Mapper1
Shard2 Slave
Shard2 DrSlave
Shard2 Master
Mysqldump Hadoop
Streaming
Mapper2
Painpoints
Constraints
• Reliabilitycausedbymysqlhostshiccup
• Pullingover100TBdatadailybutonlyafewTB
changedeveryday
• Longlatency>24hour
Future:DBChangeStreams
• Trulycapturesdbtransactions
• Across-regioncacheinvalidation
• Realtimesearchindexbuilding
• RealtimeRecommendationEngine
The

newpipeline
Dataingestion@Pinterest now
Pinterest Services
Singer
Kafka
events
Dataingestion@Pinterest now
Pinterest Services
Singer
Kafka
events
Databases
DB/Kafka
Bridge
Dataingestion@Pinterest now
Pinterest Services
Singer
Kafka
events
Databases
DB/Kafka
Bridge
Merced
Dataingestion@Pinterest now
Pinterest Services
Singer
Kafka
events
Databases
DB/Kafka
Bridge
Merced
Watermill
Dataingestion@Pinterest now
Pinterest Services
Singer
Kafka
events
Real-time 

consumers
Databases
DB/Kafka
Bridge
Merced
Watermill
DB/KafkaBridge(Maxwell)
Pinterest Services
Singer
Kafka
events
Real-time 

consumers
Databases
Merced
Watermill
DB/Kafka
Bridge
DB/KafkaBridge
Replica-SetNode
Maxwell_position
Maxwell_schema
MySQL Processes and Schemas
Maxwell Tables
Binlog File
Shard1 Shard2 Shard3
User Tables
DB/KafkaBridge
Replica-SetNode
Maxwell_position
Maxwell_schema
MySQL Processes and Schemas
Maxwell Tables
MySQL Processes (Co-located with MySQL Process)
Binlog File
Shard1 Shard2 Shard3
User Tables
Kafka User Topic
Kafka Pin Topic
BinLog
Tailer
Thread
InMemory
Queue
Async
Kafka
Producer
Thread
• BasedonMaxwell/Binlog-Connector
• AddGTIDsupport
• Addhandlingforretry/out-of-ordermessages
• Co-locatewithmysql
• Listensonmaster/slave
DB/Kafka
Bridge
Watermillcompaction
Pinterest Services
Singer
Kafka
events
Real-time 

consumers
Databases
Merced
Watermill
Compaction
ForOneShard
• HashJoinbetweensnapshotanddelta
• Deltaloadedinmemoryfirstassidelookup
• Basesnapshotwaspipedthroughthemappernodeand
compareagainstlookuptable
- Lookupfail,snapshotrecordemittooutput
- Lookupsucceed,butsnapshotrecordold,skipthe
snapshot
- Lookupsucceed,butsnapshotrecordnewer,remove
lookuprecord
• Attheend,appendtheremaininglookuprecordstooutput
Delta Shard 1
Old Snapshot 

Shard 1
Compactor
New Snapshot 

Shard 1
IncrementalDBingestionsequence
MySQL
Maxwell
Kafka
IncrementalDBingestionsequence
MySQL
Maxwell Merced
Delta
Kafka
IncrementalDBingestionsequence
MySQL
Maxwell Merced Periodic
Compaction
Snapshot1
Delta
Snapshot2
Kafka
IncrementalDBingestionsequence
MySQL
Tracker
Batch
Backup
Backup
Snapshot
Maxwell Merced Periodic
Compaction
Snapshot1
Delta
Snapshot2
Bootstrapper
Kafka
IncrementalDBingestionsequence
MySQL
Tracker
Batch
Backup
Backup
Snapshot
Maxwell Merced Periodic
Compaction
Periodic
FileGC
Snapshot1
Delta
Snapshot2
Differ
Bootstrapper
Kafka
IncrementalDBingestionsequence
MySQL
Tracker
Batch
Backup
Maxwell Merced Periodic
Compaction
Periodic
FileGC
SELECT
FROM
rt_users
Snapshot1
Delta
Snapshot2
Custom
Input

Format
Differ
Bootstrapper
Backup
Snapshot
Kafka
DataLifecycleandTimelineManagement
DailyDump
11:30
Bootstrap
Snapshot
11:55
1
1
:
3
0
1
1
:
5
5
Timeline
DataLifecycleandTimelineManagement
Merced Delta
12:01
DailyDump
11:30
Bootstrap
Snapshot
11:55
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
Kafka
Timeline
DataLifecycleandTimelineManagement
Merced CompactionDelta
12:01
Snapshot
12:10AM
DailyDump
11:30
Bootstrap
Snapshot
11:55
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
Kafka
Timeline
DataLifecycleandTimelineManagement
Merced CompactionDelta
12:01
Snapshot
12:10AM
12:15
Select
DailyDump
11:30
Bootstrap
Snapshot
11:55
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
Kafka
Timeline
DataLifecycleandTimelineManagement
Merced CompactionDelta
12:01
Snapshot
12:10AM
12:15
Select
DailyDump
11:30
Bootstrap
Snapshot
11:55
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
Kafka
Timeline
ProcessedUpTo
CurrentSnapshot
DataLifecycleandTimelineManagement
Merced CompactionDelta
12:01
Snapshot
12:10AM
12:15
Select
DailyDump
11:30
Bootstrap
Snapshot
11:55
DailyDump
11:45
Bootstrap
Snapshot
12:20
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
1
1
:
4
5
1
2
:
2
0
Kafka
Timeline
DataLifecycleandTimelineManagement
Merced CompactionDelta
12:01
Snapshot
12:10AM
12:15
Select
DailyDump
11:30
Bootstrap
Snapshot
11:55
DailyDump
11:45
Bootstrap
Snapshot
12:20
12:25
Select
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
1
1
:
4
5
1
2
:
2
0
Kafka
Timeline
DataLifecycleandTimelineManagement
Merced CompactionDelta
12:01
Snapshot
12:10AM
12:15
Select
DailyDump
11:30
Bootstrap
Snapshot
11:55
DailyDump
11:45
Bootstrap
Snapshot
12:20
12:25
Select
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
1
1
:
4
5
1
2
:
2
0
Kafka
Timeline
CurrentSnapshot
ProcessedUpto
DataLifecycleandTimelineManagement
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
1
1
:
4
5
1
2
:
2
0
Timeline
Merced CompactionDelta
12:01
Snapshot
12:10AM
12:15
Select
DailyDump
11:30
Bootstrap
Snapshot
11:55
DailyDump
11:45
Bootstrap
Snapshot
12:20
12:25
Select
Kafka
ProcessedUpto
… …NextCompaction……
DataLifecycleandTimelineManagement
Merced CompactionDelta
12:01
Snapshot
12:10AM
DailyDump
11:30
Bootstrap
Snapshot
11:55
DailyDump
11:45
Bootstrap
Snapshot
12:20
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
1
1
:
4
5
1
2
:
2
0
Kafka
Timeline
CurrentSnapshot
Periodic
GC
DataLifecycleandTimelineManagement
Merced CompactionDelta
12:01
Snapshot
12:10AM
DailyDump
11:30
Bootstrap
Snapshot
11:55
DailyDump
11:45
Bootstrap
Snapshot
12:20
1
1
:
3
0
1
1
:
5
5
1
2
:
0
1
1
2
:
1
0
Kafka
Timeline
PossibleRewind
Periodic
GC
Consistency
• MySQLMaster/SlaveFailover,ShardMigration
• MySQLTransactions:
• Splitbetweentables,splitbetweenKafkamessages
• Ordering
• BetweenINSERTandUPDATE
• BetweenUPDATEandDELETE
• SoftDELETEvs.HardDELETE
• Consistencybetweenmultiplebootstrapand
incrementalstreams
• DuplicateRecords
Scalability
• Partitioning
• ShardedMySQL
- Shardbaseddbsnapshotanddeltafiles
- Twolevelsharinginthecasethatoriginalshardsarenot
balanced
• UnShardeddataset
- Usehash+modtopartitionthedataonbothsnapshot
anddeltafile
• Filefilteringusingpredicatepushdown:
• Onshard/partitionlevel
• OnS3directory,fileandrecordlevel
10X
KafkaNuances
• MessageOrdering:
• Asyncproducerbutstillneedtomaintainmessageorder
• MaintainorderbetweenS3fileandwithinS3file
• At-least-oncedelivery
• Duplicatemessages
• MySQLGTIDnotalwaysincreasing
• DealwithKafkaclusterhiccup:
• produceracks=2
• cleanleaderelection
S3Nuances
● Eventual Consistency
● Read-after-write is OK, but not PUT followed
by LIST
● Directory listing is slow
● Shorter SLA —> More smaller files
● In early iterations, directly listing >> file content
reading
● Rate Limit:
● Launching thousands of mappers would
quickly hit S3 rate limit
PIIProcessing
• username,emailaddressetcneedstobe
filteredout
• ipaddressneedstobefilteredout
john.doe@abc.com
Justin Bieber
192.168.0.1
PIIProcessing
• username,emailaddressetcneedstobe
filteredout
• ipaddressneedstobefilteredout
john.doe@abc.com
Justin Bieber
192.168.0.1
ColumnarLayout
andIncremental
Processing
• Useparquetformattosupportfastquerieson
subsetofcolumns
• ingest_timeasnewcolumntogetthe
incrementalresultsincethelastprocessing;
Operation
Bootstrap,synchronize&rewind
MySQL
Tracker
Batch
Backup
Backup
Snapshot
Maxwell Merced Periodic
Compaction
Snapshot1
Delta
Snapshot2
Bootstrapper
Kafka
Bootstrap,synchronize&rewind(cont)
• Wehavetheabilitytosynchronizeandrewind
• Incaseofsoftwarebugsornetworkglitches
• Snapshot(s)ontoBootstraptosynchronize
• AbilitytorewindviatheSnapshots/Bootstrapmechanism
MySQL
Tracker
Batch
Backup
Backup
Snapshot
Maxwell Merced Periodic
Compaction
Snapshot1
Delta
Snapshot2
Bootstrapper
Kafka
Schema
Management
andSchema
Change
• SchemaisUsedfor
• Identifytheprimarykeyoftherow
• Drivetheparquetfilegeneration
• DealingWithSchemachange
• Willissueanewbootstraponofflinetableschema
• Compactionwillstillusethesnapshotschema

(whichmightbeold)
ID C1 C2
123 …. …
124 … ….
125 …. …
126 … ….
dbname.table_name
new_column
….
…
….
…
Validation
• Validation
• CreatingcompactionbasedonfromandtoGTIDrange
• Compactionoutputvsbatchbackupoutput
• Monitoring
• Error,failure,stall
• Latencyoncompaction
Backup
Snapshot
Periodic
Compaction
Snapshot1 Snapshot2
Differ
Bootstrapper
Summary
Comparison

toother
technologies
• UberHudi(Hoodie)
• NotsupportingS3,OnlysupportJava8+,Avro
Comparison

toother
technologies
• UberHudi(Hoodie)
• NotsupportingS3,OnlysupportJava8+,Avro
• KafkaConnect
• Onlyingestion,nocompacting,synchronizebetween
bootstrap/incremental
Comparison

toother
technologies
• UberHudi(Hoodie)
• NotsupportingS3,OnlysupportJava8+,Avro
• KafkaConnect/Debezium
• Onlyingestion,nocompacting,synchronizebetween
bootstrap/incremental
• ApacheSqoop
• BasedonBatchMode
Takeaway
• Scalability
• support100TBofdatabasedata
• E2Elatencyof15minutes
• Reliability
• Strongdatabaseconsistencyonglobaltransactions,
messageordering,duplicatemessagehandling
• ValidationandMonitoring
• Operability
• Bootstrap,re-synchronize
• Schemamanagement
Futurework
• AdoptingKafkaExact-OnceProcessing
Model
• Kafkaasthedatabasechangestream
• Cacheinvalidationacrossdatacenters
• BuildingMaterializedViewsforMySQL
• GeneratingIncrementalRecommendationSignals
• OpenSource
Acknowledgement
• Jointworkfrommany
engineering,including
YuYang,ChunyanWang,
IndyPrentice,Shawn
Nguyen,YinianQi, and
manyothers
Thanks!
© Copyright, All Rights Reserved, Pinterest Inc. 2018

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously