• Share
  • Email
  • Embed
  • Like
  • Private Content
Modeling for Performance
 

Modeling for Performance

on

  • 664 views

Mongo Boulder talk by Michael Dwan

Mongo Boulder talk by Michael Dwan

Statistics

Views

Total Views
664
Views on SlideShare
664
Embed Views
0

Actions

Likes
0
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Modeling for Performance Modeling for Performance Presentation Transcript

    • Data Modeling for PerformanceMongo Boulder Michael DwanJanuary 21, 2010 Snapjoy
    • i’m michael dwan @michaeldwan on the twitter
    • the project Company X
    • • find business details (web + api)• search by category/keyword + geo (web + api)• update (api) application spec
    • 100,000 30,000 100,000,000geo areas tags partners 2,300 15,000,000 categories businesses 2,000,000 requests daily24,000,000 urls in sitemap why is this interesting?
    • • infrequent changes• monthly updates w/ 12M monthly changes• “zero downtime” updates
    • the problem mo’ data, mo’ problems
    • complexity
    • providers mappings phone_numbers zips assets businesses _phone_numbers citiescategorizations businesses states categories businesses_neighborhoods taggings users tags neighborhoods
    • xxx x architecture
    • read performance
    • dow n ti mesolr
    • solr getting fussy
    • dow n ti memigrations
    • the solution
    • > gem install acts_as_web_scale
    • { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz",} a business...
    • { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz",} a business... has many phone numbers
    • { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ]} a business... has many phone numbers
    • "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ]} a business... has coordinates
    • "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ]} a business... has coordinates
    • "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ]} a business... has many tags
    • "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ]} a business... has many tags
    • "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ]} a business... has an address
    • "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }} a business... has an address
    • belongs to?
    • { "_id" : ObjectId("4ce82937961552247900000f"), "name" : "Illinois", "slug" : "il", ...} a state
    • "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }} a business... belongs to a state
    • "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }} a business... belongs to a state
    • "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } }} a business... belongs to a state
    • "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } }} a business... belongs to a city
    • "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, }} a business... belongs to a city
    • }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, }} a business... belongs to a zip code
    • }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }} a business... belongs to a zip code
    • many-to-many?
    • { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "name" : "Auto Glass", "slug" : "3063-auto-glass", "tags" : [ "windshields" ], ...} a category
    • "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }} a business... belongs to a zip code
    • } }} a business... belongs to many categories
    • } }, "categories" : [ { "_id" : ObjectId("4ce82e50d3dfaa16360004f2"), "meta" : { "slug" : "282-glass", "tags" : [ "windows" ], }, "display_name" : "Glass" }, { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "meta" : { "slug" : "3063-auto-glass", "tags" : [ "windshields" ], }, "display_name" : "Auto Glass" } ]} a business... belongs to many categories
    • queries & indexes know what you want
    • #1 find a business I want *that* one
    • // single businessdb.businesses.findOne({ _id: ObjectId("4ce838ef4a882579960001b9")}) find a business
    • #2 find by location Businesses in San Francisco, CA
    • // find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")}) find businesses by state/city/zip
    • // find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})// find all within citydb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")}) find businesses by state/city/zip
    • // find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})// find all within citydb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")})// find all within zipdb.businesses.find({ "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")}) find businesses by state/city/zip
    • // the indexesdb.businesses.ensureIndex({"location.city._id": 1})db.businesses.ensureIndex({"location.zip._id": 1}) 1.5GB each skip “location.state._id” -- only 51 possibilities indexes
    • #3 find by category Businesses in the Auto Repair category
    • // find by category iddb.businesses.find({ "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")})// the indexdb.businesses.ensureIndex({ "categories._id":1}) businesses by category
    • #4 - find by category + location Businesses in the Plumbing category in Chicago, IL
    • // find by city id and category iddb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"), "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")}) businesses by category + city
    • // city id {"location.city._id":1} ~ or ~ // category id {"categories._id":1} answer: both suckwe need a compound index which index should we use?
    • db.businesses.ensureIndex({ "location.city._id" : 1, "categories._id" : 1 }) ~ or ~ db.businesses.ensureIndex({ "categories._id" : 1, "location.city._id" : 1 }) 35,000 cities & 2,500 categories answer: cities ! categoriescreate one for zip codes and categories too! which order?
    • {"location.city._id" : 1} {"location.city._id" : 1, "categories._id" : 1} answer: yesdb.businesses.dropIndex("location.city._id_1") don’t we have 2 indexes on city id?
    • #5 - find by keyword “something awesome” in Boulder, CO
    • { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "keywords" : [ "glass", "repair", "acme", ... ]}db.businesses.ensureIndex({ "location.city._id":1, "keywords":1})db.businesses.find({ "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"), "keywords":/glass/i}) find businesses in city by keyword
    • me: we’re switching from postgres+solr to mongokyle: oh wow, you can replace solr with mongo?me: with some creativitykyle: seems like it’d still be hard to get just rightme: it works wellkyle: gotcha chat with Kyle Banker
    • i was wrong, kyle was right
    • I I’ll never leave you again...until MongoDB supports full text later this year :)
    • aggregationmap/reduce to the rescue
    • sitemapsbig list of every url
    • • xml files containing each unique url ~ 24M• 50,000 urls per file, about 500 files• urls are generated from live data• http://companyx.com/sitemaps/1.xml sitemaps
    • >> "hello!".hash % 6 #=> 5>> "/ny/new-york/c/apartments".hash % 6 #=> 5 returns an integer between 0 and the number specified partition by consistent hash
    • 1. map each url in the site to a partition2. reduce all partitions to a single document containing all urls in that partition3. save to a permanent collection map/reduce
    • /il/chicago/c/pizza 4 1/ny/new-york/c/apartments 1nd/rugby/c/apartments 6 2/14076500-bayside-marina 2/13401000-comtrak-logistics-inc 3 3/12347500-allstate-auto-insurance 1il/downers-grove/c/computer-web-design 6 4/1009500-heidelberg-lodges 5mn/redwood-falls/c/food-service 4 5/14077000-bank-of-america 5mn/savage/c/audio-visual-equipment 1 6... map
    • { { "total" : 2, "total" : 1, "urls" : [ "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment" "/ny/new-york/c/apartments" ] ] }} { "_id" : 1, "value" : { "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] } } reduce
    • db.sitemaps.findOne({_id:1}).value.urls[ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments"] usage
    • wrap up
    • 115ms average response times 2 months later
    • thank you @michaeldwan