MongoDB
@ShopWiki.com
  our swiss-army datastore
Overview

Introductions
Uses at ShopWiki
Benefits and Tradeoffs
Gotchas
ShopWiki - what we do
MongoDB Founder
   Pedigree
Uses at ShopWiki
Uses at ShopWiki

Site Visit Analytics
Uses at ShopWiki

Site Visit Analytics
Datafeeds
Uses at ShopWiki

Site Visit Analytics
Datafeeds
Site Browsers
Uses at ShopWiki

Site Visit Analytics
Datafeeds
Site Browsers
Image/Thumbnail Server
Uses at ShopWiki

Site Visit Analytics
Datafeeds
Site Browsers
Image/Thumbnail Server
One-offs of All Kinds
Visit Analytics - contents
Data Size
   Total On Disk: 869GB
   Largest collection:
       count : 88729347 items
       s...
Visit Analytics - usage
Typical
 inserts/s   query/s update/s delete/s getmore/s locked %   conn

   222        133      2...
Datafeeds
{
    ProductID : 2309,
    Title : “Elephant Leash”,
    Brand : “Acme”,
    Price : 49.99,
    Breadcrumbs : [...
Site-Browsing Datastore
Image/Thumbnail Server


Before: custom append-only datastore
After: MongoDB all the way!
Benefits
Benefits

Prototype to Production, always extensible
Benefits

Prototype to Production, always extensible
JSON objects > ORM
Benefits

Prototype to Production, always extensible
JSON objects > ORM
No joins in code
Benefits

Prototype to Production, always extensible
JSON objects > ORM
No joins in code
One-Button Replication
Tradeoffs

No “DESCRIBE” (use indices instead)
Denormalization: Storage and Replication
Date handling
Typos mean schema co...
Many-to-many
NodeID    Color    Shape                ProductID    Feel     Temp
 890      Purple   Round                  ...
Inverted List Pairs
{ NodeID : 890, Products : [ 202, 98 ], Color : "Purple",
Shape : "Round" },
{ NodeID : 6029, Products...
Inverted List Pairs
{ NodeID : 890, Products : [ 202, 98 ], Color : "Purple",
Shape : "Round" },
{ NodeID : 6029, Products...
Inverted List Pairs
{ NodeID : 890, Products : [ 202, 98 ], Color : "Purple",
Shape : "Round" },
{ NodeID : 6029, Products...
Datafeed alerting, RDB

     No “INTERVAL 1 DAY”
Feed status
         feed        offer_count        site_id    date

Aler...
Datafeed alerting, Mongo
  No joins, selects... index sub-objects
{
 feed,
 offer_count,
 site_id,
 date,
 alert : [ statu...
Gotchas
Gotchas

Prototype to Production: ensureIndex() is cheap
Gotchas

Prototype to Production: ensureIndex() is cheap
ext3 -- banished from the land
Gotchas

Prototype to Production: ensureIndex() is cheap
ext3 -- banished from the land
oplog size for replication
Gotchas

Prototype to Production: ensureIndex() is cheap
ext3 -- banished from the land
oplog size for replication
{number...
AFTER PARTY @SLATE
 SPECIAL THANKS TO GILT FOR SPONSORING
           54 WEST 21st STREET
Upcoming SlideShare
Loading in …5
×

Using Mongo At Shopwiki

3,203 views
3,137 views

Published on

Presentation by Avery Rosen, CTO of ShopWiki.com, on how MongoDB is being used all over their enterprise.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,203
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
39
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide


  • Shopping search engine; crawl the web using AI to aggregate; add data feeds; in-memory search; web front-end
  • relationship with founders, opportunity, Eliot: final project together, I was playing with QT he wrote a DB and network protocol. Dwight wrote the adserver, code I became highly familiar with on the adserver team.





  • Largest, write-only
  • highly utilized
  • perfect for document oriented architecture, same format as we use to eventually index
  • browse structure for consumers and SEO, daily updates, live access, cached in front-end
  • historical note: doubleclick’s imageserver. no brainer to convert backend to avoid maintenance overhead
  • Prototype: schema extensible, no need for table alters, as in visit table; JSON instead of ORM; joins can be ugly and unpredictable
  • Prototype: schema extensible, no need for table alters, as in visit table; JSON instead of ORM; joins can be ugly and unpredictable
  • Prototype: schema extensible, no need for table alters, as in visit table; JSON instead of ORM; joins can be ugly and unpredictable
  • Prototype: schema extensible, no need for table alters, as in visit table; JSON instead of ORM; joins can be ugly and unpredictable
  • Many to many joins are missing, but you might not miss them. Storage is cheap, although has consequences for replication; correct for typos with testing

  • denormalization, but storage is cheap
  • denormalization, but storage is cheap
  • date functions missing
  • with document, can key on alerting, no hunting for last_good_count
  • it’s easy to roll out code without indices; ext3 is just terrible; big data, 10% of empty too much, custom oplog size, too small; some people using false-ORM to minify attribute labels
  • it’s easy to roll out code without indices; ext3 is just terrible; big data, 10% of empty too much, custom oplog size, too small; some people using false-ORM to minify attribute labels
  • it’s easy to roll out code without indices; ext3 is just terrible; big data, 10% of empty too much, custom oplog size, too small; some people using false-ORM to minify attribute labels
  • it’s easy to roll out code without indices; ext3 is just terrible; big data, 10% of empty too much, custom oplog size, too small; some people using false-ORM to minify attribute labels

  • Using Mongo At Shopwiki

    1. 1. MongoDB @ShopWiki.com our swiss-army datastore
    2. 2. Overview Introductions Uses at ShopWiki Benefits and Tradeoffs Gotchas
    3. 3. ShopWiki - what we do
    4. 4. MongoDB Founder Pedigree
    5. 5. Uses at ShopWiki
    6. 6. Uses at ShopWiki Site Visit Analytics
    7. 7. Uses at ShopWiki Site Visit Analytics Datafeeds
    8. 8. Uses at ShopWiki Site Visit Analytics Datafeeds Site Browsers
    9. 9. Uses at ShopWiki Site Visit Analytics Datafeeds Site Browsers Image/Thumbnail Server
    10. 10. Uses at ShopWiki Site Visit Analytics Datafeeds Site Browsers Image/Thumbnail Server One-offs of All Kinds
    11. 11. Visit Analytics - contents Data Size Total On Disk: 869GB Largest collection: count : 88729347 items size : 165GB totalIndexSize : 18GB
    12. 12. Visit Analytics - usage Typical inserts/s query/s update/s delete/s getmore/s locked % conn 222 133 284 0 2 11% 738 Use spike inserts/s query/s update/s delete/s getmore/s locked % conn 710 420 654 0 9 10% 650
    13. 13. Datafeeds { ProductID : 2309, Title : “Elephant Leash”, Brand : “Acme”, Price : 49.99, Breadcrumbs : [ “Pets”, “Exotic”, “Accessories” ], Description : “Horton will love this stylish and functional leash, and you won’t violate any local statutes when you walk around with the Acme Elephant Leash!” }
    14. 14. Site-Browsing Datastore
    15. 15. Image/Thumbnail Server Before: custom append-only datastore After: MongoDB all the way!
    16. 16. Benefits
    17. 17. Benefits Prototype to Production, always extensible
    18. 18. Benefits Prototype to Production, always extensible JSON objects > ORM
    19. 19. Benefits Prototype to Production, always extensible JSON objects > ORM No joins in code
    20. 20. Benefits Prototype to Production, always extensible JSON objects > ORM No joins in code One-Button Replication
    21. 21. Tradeoffs No “DESCRIBE” (use indices instead) Denormalization: Storage and Replication Date handling Typos mean schema corruption
    22. 22. Many-to-many NodeID Color Shape ProductID Feel Temp 890 Purple Round 98 Soft 50 1039 Brown Square 202 Hard 98 6029 Brown Triangle 451 Squishy 102 NodeID ProductID 890 202 890 98 6029 451 1039 451
    23. 23. Inverted List Pairs { NodeID : 890, Products : [ 202, 98 ], Color : "Purple", Shape : "Round" }, { NodeID : 6029, Products : [ 451 ], Color : "Brown", Shape : "Triangle" }, etc...
    24. 24. Inverted List Pairs { NodeID : 890, Products : [ 202, 98 ], Color : "Purple", Shape : "Round" }, { NodeID : 6029, Products : [ 451 ], Color : "Brown", Shape : "Triangle" }, etc... YOUR CODE HERE
    25. 25. Inverted List Pairs { NodeID : 890, Products : [ 202, 98 ], Color : "Purple", Shape : "Round" }, { NodeID : 6029, Products : [ 451 ], Color : "Brown", Shape : "Triangle" }, etc... YOUR CODE HERE { ProductID : 451, BrowseNodes : [ 6029, 1039 ], Feel : "Squishy", Temp : 102 }, { ProductID : 202, BrowseNodes : [ 890 ], Feel : "Hard", Temp : 98 }, etc...
    26. 26. Datafeed alerting, RDB No “INTERVAL 1 DAY” Feed status feed offer_count site_id date Alerts feed offer_count site_id date ack fix_target
    27. 27. Datafeed alerting, Mongo No joins, selects... index sub-objects { feed, offer_count, site_id, date, alert : [ status, time, etc...], last_good_count }
    28. 28. Gotchas
    29. 29. Gotchas Prototype to Production: ensureIndex() is cheap
    30. 30. Gotchas Prototype to Production: ensureIndex() is cheap ext3 -- banished from the land
    31. 31. Gotchas Prototype to Production: ensureIndex() is cheap ext3 -- banished from the land oplog size for replication
    32. 32. Gotchas Prototype to Production: ensureIndex() is cheap ext3 -- banished from the land oplog size for replication {number_of_times_the_user_clicked : 1}
    33. 33. AFTER PARTY @SLATE SPECIAL THANKS TO GILT FOR SPONSORING 54 WEST 21st STREET

    ×