Ian White
CTO and Co-Founder, Sailthru
@eonwhite
ian@sailthru.com
www.sailthru.com ian@sailthru.com
App Sharding to Autosharding
• Every user is unique
• Email, onsite, mobile, social, offline personalization on an individual level
• Optimizes conversion and drives retention for eCommerce and media
• Founded in 2008 by three engineers
• 170 employees in NYC, SF, LA, London
Sailthru
Using MongoDB 120 40 TB
Since 2009
primary datastore
Replicaset nodes on
metal infrastructure
25,000
writes/second
Basic Sailthru Objects
850 Million 75 Million 2.5 Billion
User
Profiles
Content
Documents
Messages
Per Month
The Challenge
• Sailthru is both
• Some apps are ready-heavy
• Some apps are write-heavy
Why Shard?
• Using MongoDB since 2009
• No autosharding capabilities at the time
• Too much data for a single node
Application Sharding?
• Application-level sharding
• Partition data by client
• Db class examines query
and routes to an appropriate
replica set and collection
Application Sharding
Query
db[‘profile’].find(
{“client_id”:450,
”email”:”ian@sailthru.com”}
Query
db[‘profile.450’].find(
{”email”:”ian@sailthru.com”})
Shard Map Config File
{“profile”:
{“shard_key”:”client_id”,”shards”:
{“450”:”profile1”,
“766”:”profile2”}
}
}
App Sharding: Advantages
• Smaller indexes due to collection partitioning
• Ability to add specific indices per client
(not done much in practice)
App Sharding: Problems
• Uneven load distribution
• Writes bottlenecked by capacity of
single server
• Manual rebalancing and allocation = lots
of work for DB team
Solution:
Autosharding
(Since MongoDB 1.6)
Selecting a Shard Key
• Individual reads
• Individual writes
• Cursored reads
Shard Key Options
• client_id? Uneven distribution
• email? Hard to handle null bucket
• _id? Uneven time-based distribution
Best Option
sh.shardCollection( "profile", { _id: "hashed" } )
• hash of _id
• Available since
MongoDB 2.4
What about lookups by email?
Don’t want to hit every shard on every lookup
Solution: key collection
{‘_id’:’<client> <keytype> <sha256_of_value>’,
‘sid’:<mongoid>}
profileprofile.key
_id _i
d
• Two quick lookups to individual shards is more scalable than hitting all.
• And autoshard that!
How We Did
The Move.
Uptime is critical- cannot bring
service down for infrastructure
changes
Solution:
Mongo-Connector
Created by MongoDB interns
two summers ago.
The Swiss army knife of
moving data from set to set.
Solution: Mongo-Connector
• Tail oplog in legacy replica set
• Pipe data into autoshard cluster
with mongo-connector
• Repoint app to read/write autoshard
• Zero downtime
Solution: Mongo-Connector
• Our fork contains some improvements
• ts(timestamp) and ns(namespace)
get added in separate collection
instead of the target document
https://github.com/sailthru/mongo-connector
But Wait! There’s More
• Mongo-Connector can also be used to
• Pipe data into alternate data stores
(Hadoop, Solr, etc)
• Change autoshard keys if you made a mistake
In Conclusion
• Autosharding is helpful
• Think about shard key early
• Start by writing to a mongos,
even when its just one set
profileprofile.key
_id _i
d
Q&A
www.sailthru.com
sales@sailthru.com
817.812.8689
@sailthru
NYC HQ
160 Varick St., 12th Floor
New York, NY 10013
San Francisco
25 Taylor St., Room 724
San Francisco, CA 94102
London
18 Soho Square
London, UK, W1D 3QL
Los Angeles
7083 Hollywood Blvd
Los Angeles, CA 90028
Ian White
CTO and Co-Founder, Sailthru
@eonwhite
ian@sailthru.com

App Sharding to Autosharding at Sailthru

Editor's Notes

  • #7 2/3+ of people under 45 are always addressable
  • #8 2/3+ of people under 45 are always addressable
  • #9 2/3+ of people under 45 are always addressable
  • #10 2/3+ of people under 45 are always addressable
  • #11 2/3+ of people under 45 are always addressable
  • #12 2/3+ of people under 45 are always addressable
  • #14 2/3+ of people under 45 are always addressable
  • #15 2/3+ of people under 45 are always addressable
  • #16 2/3+ of people under 45 are always addressable
  • #17 2/3+ of people under 45 are always addressable
  • #18 2/3+ of people under 45 are always addressable
  • #19 Only 16% of companies place primary focus here though (vs. acquisition)
  • #20 Only 16% of companies place primary focus here though (vs. acquisition)
  • #21 2/3+ of people under 45 are always addressable
  • #22 2/3+ of people under 45 are always addressable
  • #23 2/3+ of people under 45 are always addressable
  • #24 2/3+ of people under 45 are always addressable
  • #25 Mention engagement as the end-all metric again