Data Modeling for Performance

559 views

Published on

My talk for Mongo Boulder on data modeling.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
559
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Modeling for Performance

  1. 1. Data Modeling for PerformanceMongo Boulder Michael DwanJanuary 21, 2010 Snapjoy
  2. 2. i’m michael dwan @michaeldwan on the twitter
  3. 3. the project Company X
  4. 4. • find business details (web + api)• search by category/keyword + geo (web + api)• update (api) application spec
  5. 5. 100,000 30,000 100,000,000geo areas tags partners 2,300 15,000,000 categories businesses 2,000,000 requests daily24,000,000 urls in sitemap why is this interesting?
  6. 6. • infrequent changes• monthly updates w/ 12M monthly changes• “zero downtime” updates
  7. 7. the problem mo’ data, mo’ problems
  8. 8. complexity
  9. 9. providers mappings phone_numbers zips assets businesses _phone_numbers citiescategorizations businesses states categories businesses_neighborhoods taggings users tags neighborhoods
  10. 10. xxx x architecture
  11. 11. read performance
  12. 12. dow n ti mesolr
  13. 13. solr getting fussy
  14. 14. dow n ti memigrations
  15. 15. the solution
  16. 16. > gem install acts_as_web_scale
  17. 17. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz",} a business...
  18. 18. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz",} a business... has many phone numbers
  19. 19. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ]} a business... has many phone numbers
  20. 20. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ]} a business... has coordinates
  21. 21. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ]} a business... has coordinates
  22. 22. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ]} a business... has many tags
  23. 23. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ]} a business... has many tags
  24. 24. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ]} a business... has an address
  25. 25. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }} a business... has an address
  26. 26. belongs to?
  27. 27. { "_id" : ObjectId("4ce82937961552247900000f"), "name" : "Illinois", "slug" : "il", ...} a state
  28. 28. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }} a business... belongs to a state
  29. 29. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }} a business... belongs to a state
  30. 30. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } }} a business... belongs to a state
  31. 31. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } }} a business... belongs to a city
  32. 32. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, }} a business... belongs to a city
  33. 33. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, }} a business... belongs to a zip code
  34. 34. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }} a business... belongs to a zip code
  35. 35. many-to-many?
  36. 36. { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "name" : "Auto Glass", "slug" : "3063-auto-glass", "tags" : [ "windshields" ], ...} a category
  37. 37. "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }} a business... belongs to a zip code
  38. 38. } }} a business... belongs to many categories
  39. 39. } }, "categories" : [ { "_id" : ObjectId("4ce82e50d3dfaa16360004f2"), "meta" : { "slug" : "282-glass", "tags" : [ "windows" ], }, "display_name" : "Glass" }, { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "meta" : { "slug" : "3063-auto-glass", "tags" : [ "windshields" ], }, "display_name" : "Auto Glass" } ]} a business... belongs to many categories
  40. 40. queries & indexes know what you want
  41. 41. #1 find a business I want *that* one
  42. 42. // single businessdb.businesses.findOne({ _id: ObjectId("4ce838ef4a882579960001b9")}) find a business
  43. 43. #2 find by location Businesses in San Francisco, CA
  44. 44. // find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")}) find businesses by state/city/zip
  45. 45. // find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})// find all within citydb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")}) find businesses by state/city/zip
  46. 46. // find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})// find all within citydb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")})// find all within zipdb.businesses.find({ "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")}) find businesses by state/city/zip
  47. 47. // the indexesdb.businesses.ensureIndex({"location.city._id": 1})db.businesses.ensureIndex({"location.zip._id": 1}) 1.5GB each skip “location.state._id” -- only 51 possibilities indexes
  48. 48. #3 find by category Businesses in the Auto Repair category
  49. 49. // find by category iddb.businesses.find({ "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")})// the indexdb.businesses.ensureIndex({ "categories._id":1}) businesses by category
  50. 50. #4 - find by category + location Businesses in the Plumbing category in Chicago, IL
  51. 51. // find by city id and category iddb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"), "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")}) businesses by category + city
  52. 52. // city id {"location.city._id":1} ~ or ~ // category id {"categories._id":1} answer: both suckwe need a compound index which index should we use?
  53. 53. db.businesses.ensureIndex({ "location.city._id" : 1, "categories._id" : 1 }) ~ or ~ db.businesses.ensureIndex({ "categories._id" : 1, "location.city._id" : 1 }) 35,000 cities & 2,500 categories answer: cities → categoriescreate one for zip codes and categories too! which order?
  54. 54. {"location.city._id" : 1} {"location.city._id" : 1, "categories._id" : 1} answer: yesdb.businesses.dropIndex("location.city._id_1") don’t we have 2 indexes on city id?
  55. 55. #5 - find by keyword “something awesome” in Boulder, CO
  56. 56. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "keywords" : [ "glass", "repair", "acme", ... ]}db.businesses.ensureIndex({ "location.city._id":1, "keywords":1})db.businesses.find({ "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"), "keywords":/glass/i}) find businesses in city by keyword
  57. 57. me: we’re switching from postgres+solr to mongokyle: oh wow, you can replace solr with mongo?me: with some creativitykyle: seems like it’d still be hard to get just rightme: it works wellkyle: gotcha chat with Kyle Banker
  58. 58. i was wrong, kyle was right
  59. 59. I I’ll never leave you again...until MongoDB supports full text later this year :)
  60. 60. aggregationmap/reduce to the rescue
  61. 61. sitemapsbig list of every url
  62. 62. • xml files containing each unique url ~ 24M• 50,000 urls per file, about 500 files• urls are generated from live data• http://companyx.com/sitemaps/1.xml sitemaps
  63. 63. >> "hello!".hash % 6 #=> 5>> "/ny/new-york/c/apartments".hash % 6 #=> 5 returns an integer between 0 and the number specified partition by consistent hash
  64. 64. 1. map each url in the site to a partition2. reduce all partitions to a single document containing all urls in that partition3. save to a permanent collection map/reduce
  65. 65. /il/chicago/c/pizza 4 1/ny/new-york/c/apartments 1nd/rugby/c/apartments 6 2/14076500-bayside-marina 2/13401000-comtrak-logistics-inc 3 3/12347500-allstate-auto-insurance 1il/downers-grove/c/computer-web-design 6 4/1009500-heidelberg-lodges 5mn/redwood-falls/c/food-service 4 5/14077000-bank-of-america 5mn/savage/c/audio-visual-equipment 1 6... map
  66. 66. { { "total" : 2, "total" : 1, "urls" : [ "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment" "/ny/new-york/c/apartments" ] ] }} { "_id" : 1, "value" : { "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] } } reduce
  67. 67. db.sitemaps.findOne({_id:1}).value.urls[ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments"] usage
  68. 68. wrap up
  69. 69. 115ms average response times 2 months later
  70. 70. thank you @michaeldwan

×