Ian White
CTO and Co-Founder, Sailthru
@eonwhite
ian@sailthru.com
www.sailthru.com ian@sailthru.com
App Sharding to Autosh...
• Every user is unique
• Email, onsite, mobile, social, offline personalization on an individual level
• Optimizes convers...
Sailthru
Using MongoDB 120 40 TB
Since 2009
primary datastore
Replicaset nodes on
metal infrastructure
25,000
writes/second
Basic Sailthru Objects
850 Million 75 Million 2.5 Billion
User
Profiles
Content
Documents
Messages
Per Month
The Challenge
• Sailthru is both
• Some apps are ready-heavy
• Some apps are write-heavy
Why Shard?
• Using MongoDB since 2009
• No autosharding capabilities at the time
• Too much data for a single node
Application Sharding?
• Application-level sharding
• Partition data by client
• Db class examines query
and routes to an a...
Application Sharding
Query
db[‘profile’].find(
{“client_id”:450,
”email”:”ian@sailthru.com”}
Query
db[‘profile.450’].find(...
App Sharding: Advantages
• Smaller indexes due to collection partitioning
• Ability to add specific indices per client
(no...
App Sharding: Problems
• Uneven load distribution
• Writes bottlenecked by capacity of
single server
• Manual rebalancing ...
Solution:
Autosharding
(Since MongoDB 1.6)
Selecting a Shard Key
• Individual reads
• Individual writes
• Cursored reads
Shard Key Options
• client_id? Uneven distribution
• email? Hard to handle null bucket
• _id? Uneven time-based distributi...
Best Option
sh.shardCollection( "profile", { _id: "hashed" } )
• hash of _id
• Available since
MongoDB 2.4
What about lookups by email?
Don’t want to hit every shard on every lookup
Solution: key collection
{‘_id’:’<client> <keytype> <sha256_of_value>’,
‘sid’:<mongoid>}
profileprofile.key
_id _i
d
• Two...
How We Did
The Move.
Uptime is critical- cannot bring
service down for infrastructure
changes
Solution:
Mongo-Connector
Created by MongoDB interns
two summers ago.
The Swiss army knife of
moving data from set to set.
Solution: Mongo-Connector
• Tail oplog in legacy replica set
• Pipe data into autoshard cluster
with mongo-connector
• Rep...
Solution: Mongo-Connector
• Our fork contains some improvements
• ts(timestamp) and ns(namespace)
get added in separate co...
But Wait! There’s More
• Mongo-Connector can also be used to
• Pipe data into alternate data stores
(Hadoop, Solr, etc)
• ...
In Conclusion
• Autosharding is helpful
• Think about shard key early
• Start by writing to a mongos,
even when its just o...
Q&A
www.sailthru.com
sales@sailthru.com
817.812.8689
@sailthru
NYC HQ
160 Varick St., 12th Floor
New York, NY 10013
San Fr...
App Sharding to Autosharding at Sailthru
Upcoming SlideShare
Loading in...5
×

App Sharding to Autosharding at Sailthru

2,954

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,954
On Slideshare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
22
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • Only 16% of companies place primary focus here though (vs. acquisition)
  • Only 16% of companies place primary focus here though (vs. acquisition)
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • 2/3+ of people under 45 are always addressable
  • Mention engagement as the end-all metric again
  • App Sharding to Autosharding at Sailthru

    1. 1. Ian White CTO and Co-Founder, Sailthru @eonwhite ian@sailthru.com www.sailthru.com ian@sailthru.com App Sharding to Autosharding
    2. 2. • Every user is unique • Email, onsite, mobile, social, offline personalization on an individual level • Optimizes conversion and drives retention for eCommerce and media • Founded in 2008 by three engineers • 170 employees in NYC, SF, LA, London
    3. 3. Sailthru Using MongoDB 120 40 TB Since 2009 primary datastore Replicaset nodes on metal infrastructure 25,000 writes/second
    4. 4. Basic Sailthru Objects 850 Million 75 Million 2.5 Billion User Profiles Content Documents Messages Per Month
    5. 5. The Challenge • Sailthru is both • Some apps are ready-heavy • Some apps are write-heavy
    6. 6. Why Shard? • Using MongoDB since 2009 • No autosharding capabilities at the time • Too much data for a single node
    7. 7. Application Sharding? • Application-level sharding • Partition data by client • Db class examines query and routes to an appropriate replica set and collection
    8. 8. Application Sharding Query db[‘profile’].find( {“client_id”:450, ”email”:”ian@sailthru.com”} Query db[‘profile.450’].find( {”email”:”ian@sailthru.com”}) Shard Map Config File {“profile”: {“shard_key”:”client_id”,”shards”: {“450”:”profile1”, “766”:”profile2”} } }
    9. 9. App Sharding: Advantages • Smaller indexes due to collection partitioning • Ability to add specific indices per client (not done much in practice)
    10. 10. App Sharding: Problems • Uneven load distribution • Writes bottlenecked by capacity of single server • Manual rebalancing and allocation = lots of work for DB team
    11. 11. Solution: Autosharding (Since MongoDB 1.6)
    12. 12. Selecting a Shard Key • Individual reads • Individual writes • Cursored reads
    13. 13. Shard Key Options • client_id? Uneven distribution • email? Hard to handle null bucket • _id? Uneven time-based distribution
    14. 14. Best Option sh.shardCollection( "profile", { _id: "hashed" } ) • hash of _id • Available since MongoDB 2.4
    15. 15. What about lookups by email? Don’t want to hit every shard on every lookup
    16. 16. Solution: key collection {‘_id’:’<client> <keytype> <sha256_of_value>’, ‘sid’:<mongoid>} profileprofile.key _id _i d • Two quick lookups to individual shards is more scalable than hitting all. • And autoshard that!
    17. 17. How We Did The Move. Uptime is critical- cannot bring service down for infrastructure changes
    18. 18. Solution: Mongo-Connector Created by MongoDB interns two summers ago. The Swiss army knife of moving data from set to set.
    19. 19. Solution: Mongo-Connector • Tail oplog in legacy replica set • Pipe data into autoshard cluster with mongo-connector • Repoint app to read/write autoshard • Zero downtime
    20. 20. Solution: Mongo-Connector • Our fork contains some improvements • ts(timestamp) and ns(namespace) get added in separate collection instead of the target document https://github.com/sailthru/mongo-connector
    21. 21. But Wait! There’s More • Mongo-Connector can also be used to • Pipe data into alternate data stores (Hadoop, Solr, etc) • Change autoshard keys if you made a mistake
    22. 22. In Conclusion • Autosharding is helpful • Think about shard key early • Start by writing to a mongos, even when its just one set profileprofile.key _id _i d
    23. 23. Q&A www.sailthru.com sales@sailthru.com 817.812.8689 @sailthru NYC HQ 160 Varick St., 12th Floor New York, NY 10013 San Francisco 25 Taylor St., Room 724 San Francisco, CA 94102 London 18 Soho Square London, UK, W1D 3QL Los Angeles 7083 Hollywood Blvd Los Angeles, CA 90028 Ian White CTO and Co-Founder, Sailthru @eonwhite ian@sailthru.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×