Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MongoDB VersatilityScaling the MapMyFitness Platform          Sept 14, 2012
Introductionl     MapMyFitness	  founded	  in	  2007l     Offices	  in	  Denver,	  CO	  &	  AusRn,	  T X     (w/	  associa...
Platform Overview and Background•      Origins	  in	  the	  L AMP	  stack       (Linux-­‐Apache-­‐MySQL-­‐PHP)•      Scale...
Functional Scaling•   IdenRfy	  high-­‐growth	  /	  large-­‐data	  collecRons•   Must	  be	  able	  to	  live	  outside	  ...
Use Case 1: Route Data Store•   Geo-­‐locaRon	  data	  stored	  in	  json	  blocks•   MySQL	  →	  S3	  →	  File	  Server	 ...
Route presentation example (Lost in Seattle)
Route Data Example{    id:          "e4da3b7fbbce2345d7772b06          74a318d5",    updated_date: "2005-07-23       15:47...
Solution SummaryMigraRon	  PaSern:•   RESTful	  A PI	  modified	  to	  use	  Mongo	  P HP	  driver	  •   Implemented	  a	  ...
SAN storage and MongoDBl     Needed	  to	  quickly	  expand	  available	  diskl     Implemented	  high-­‐end	  SAN	  sub...
“Gotchas”            a.k.a. Lessons Learned•   Pay	  aSenRon	  to	  potenRal	  document	  size    (URlize	  GridFS	  for	 ...
Use Case 2: Django Session Store• Django	  sessions	  not	  scaling	  in	  MySQL• Modified	  core	  methods	  to	  use	  Mo...
Use Case 3: Athletic Live Tracking• Beta	  feature	  uRlized	  T T	  +	  MySQL   (did	  not	  scale	  for	  large	  events...
Use Case 3: Athletic Live Tracking•   RS	  Cloud,	  3+n	  MongoDB	  replica	  set	  •   Quickly	  scalable	  via	  MongoDB...
As a DBA: Ease of Administration•     ReplicaRon	  made	  elegant      (as	  compared	  with	  MySQL)•     Ridiculously	  ...
Use Case 4: Micro-Messaging Framework  •   IniRal	  use	  case	  providing	  micro-­‐goals	        (user-­‐defined	  stats	...
Indexing Patterns or “Know Your App”•   Proper	  indexing	  criRcal	  to	  performance	  at	  scale•   MongoDB	  is	  ulRm...
Use Case 5: API Logging DB•   MongoDB	  is	  great	  for	  logging	      (especially	  if	  you	  log	  in	  json	  format...
Capped Collections• Used	  for	  retaining	  a	  fixed	  amount	  of	  data  (based	  on	  data	  size,	  not	  number	  of...
Monitoring MongoDB at MMF•   Monitor	  for	  real-­‐Rme	  system	  events    (Faster	  response	  Rme	  =	  less	  impact)...
Conclusion•   MongoDB	  is	  extremely	  versaRle,	  and	  can	  help	      your	  applicaRon	  scale,	  even	  if	  you	 ...
Were	  Hiring!hSp://www.mapmyfitness.com/careers  mongo-­‐sea4le@mapmyfitness.com
Upcoming SlideShare
Loading in …5
×

MongoDB Versatility: Scaling the MapMyFitness Platform

3,743 views

Published on

Chris Merz, Manager of Operations, MapMyFitness

The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit its ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.

Published in: Technology

MongoDB Versatility: Scaling the MapMyFitness Platform

  1. 1. MongoDB VersatilityScaling the MapMyFitness Platform Sept 14, 2012
  2. 2. Introductionl MapMyFitness  founded  in  2007l Offices  in  Denver,  CO  &  AusRn,  T X (w/  associates  in  S F,  Boston,  New  York,  L A,  and  Chicago)l Over  11  million  registered  usersl ~60  million  geo-­‐data  routes   (runs,  rides,  walks,  hikes,  etc)l Core  sites,  mobile  apps,  A PI,  white-­‐label (MapMyRun,  MapMyRide,  MapMyWalk,  MapMyTri,  MapMyHike,   MapMyFitness,  MapMyRace)
  3. 3. Platform Overview and Background• Origins  in  the  L AMP  stack (Linux-­‐Apache-­‐MySQL-­‐PHP)• Scaled  well  to  ~2  million  users• Redesigned  in  Python/Django• MySQL  backend  not  sufficient“How  to  scale  from  2.5  to  6  million  users?”
  4. 4. Functional Scaling• IdenRfy  high-­‐growth  /  large-­‐data  collecRons• Must  be  able  to  live  outside  the  exisRng   relaRonal  schema• Integrate  via  remote  resource  mapping  tables   in  the  R DBMS• FuncRonal  Scaling  can  facilitate  movement   towards  a  Service  Oriented  Architecture
  5. 5. Use Case 1: Route Data Store• Geo-­‐locaRon  data  stored  in  json  blocks• MySQL  →  S3  →  File  Server  →  MongoDB• IniRal  size  of  ~500GB,  ~18  million  objects• 3  member  replica  set• Dedicated  iron  servers  with  24GB  R AM
  6. 6. Route presentation example (Lost in Seattle)
  7. 7. Route Data Example{ id: "e4da3b7fbbce2345d7772b06 74a318d5", updated_date: "2005-07-23 15:47:31", city: "San Diego", user_id: "4", created_date: "2005-07-23 15:47:31", route_name: "balboa park", state: "CA",
  8. 8. Solution SummaryMigraRon  PaSern:• RESTful  A PI  modified  to  use  Mongo  P HP  driver  • Implemented  a  pass  thru  migraRon  funcRon• Batch  backfill  migraRons  via  pass-­‐thru• Data  transform  handled  in  P HP  code
  9. 9. SAN storage and MongoDBl Needed  to  quickly  expand  available  diskl Implemented  high-­‐end  SAN  subsysteml Impressive  i/o  performance  with  MongoDBl MigraRon  to  SAN  painless  thanks  to  OpLogl Easily  expandable  due  to  the  use  of  X FSl Over  100  million  objects,  ~7TB  of  data  
  10. 10. “Gotchas” a.k.a. Lessons Learned• Pay  aSenRon  to  potenRal  document  size (URlize  GridFS  for  larger  objects)• Allocate  enough  R AM  for  indexes!   (Especially  important  for  Large  data   collecRons)• File  dump  backups  may  not  scale  for  T B+   size  datasets. (URlize  delayed  and  hidden  member  for   DR)• Evaluate  filesystem  choice  carefully   (hint:  xfs)
  11. 11. Use Case 2: Django Session Store• Django  sessions  not  scaling  in  MySQL• Modified  core  methods  to  use  MongoDB• Cutover  of  new  data   (Test  for  Mongo  data,  fallback  to  MySQL)• MigraRon  of  data  via  export/import (Simple  python  transform  script  using  pymongo)
  12. 12. Use Case 3: Athletic Live Tracking• Beta  feature  uRlized  T T  +  MySQL (did  not  scale  for  large  events)• Required  to  be  “burstable”  for  Live  Events (deployable  in  The  Cloud)• Data  size  relaRvely  small   (compared  to  Routes  D B)• “Live”  data,  no  archiving  required  
  13. 13. Use Case 3: Athletic Live Tracking• RS  Cloud,  3+n  MongoDB  replica  set  • Quickly  scalable  via  MongoDB  replicaRon• Highly  opRmized,  indexes  for  every  query• Low  administraRon  overhead  (vs  MySQL) “Gotchas” l Know  your  applicaRon   (tune  indexes  and  find()  ops  accordingly) l Know  your  driver (python  pooling  driver  defaults  way  too  
  14. 14. As a DBA: Ease of Administration• ReplicaRon  made  elegant (as  compared  with  MySQL)• Ridiculously  simple  to  add  addl  members• Be  sure  to  run  IniRalSync  from  a  secondary rs.add(  “host”  :  “livetrack_db09”,  “iniRalSync”  :   {  “state”  :  2  }  )
  15. 15. Use Case 4: Micro-Messaging Framework • IniRal  use  case  providing  micro-­‐goals   (user-­‐defined  stats  aggregaRon) • MongoDB  for  persistence  of  aggregates • Python  server  +  RabbitMQ  (AMQP) • Implemented  between  Django  and  MySQL (service  subscribes  to  interesRng  stats) • Horizontally  scalable  into  the  cloud,  with  base   capacity  on  dedicated  iron • Messaging  system  expanded  to  handle  real-­‐Rme   course  analysis  and  push  noRficaRons  
  16. 16. Indexing Patterns or “Know Your App”• Proper  indexing  criRcal  to  performance  at  scale• MongoDB  is  ulRmately  flexible,  being  schemaless (mongo  gives  you  enough  rope  to  hang  yourself)• Avoid  un-­‐indexed  queries  at  all  costs (no.    really.    quickest  way  to  crater  your  app)• Onus  on  DevOps  to  match  applicaRon  to  indexes (know  your  query  profile,  never  assume)• Shoot  for  covered  queries  wherever  possible (answer  can  be  obtained  from  indexes  only)
  17. 17. Use Case 5: API Logging DB• MongoDB  is  great  for  logging   (especially  if  you  log  in  json  format!)• Good  applicaRon  for  capped  collecRons (cap  by  data  size,  or  T TL)• Running  with  safe  mode  off  for  speed (fire-­‐n-­‐forget  logging  can  reduce  latency)• Cloud  servers  are  a  good  fit  for  logging  apps
  18. 18. Capped Collections• Used  for  retaining  a  fixed  amount  of  data (based  on  data  size,  not  number  of  rows)• URlizes  F IFO  method  for  pruning  collecRon (Especially  useful  for  data  that  devalues  with  age)• TTL  CollecRons  (2.2)  age  out  data  based  on  a   retenRon  date  limit  (useful  for  a  variety  of  data   types)Gotcha! Explicitly  create  the  capped  collecRon before  any  data  is  put  into  the  system   to  avoid  auto-­‐creaRon  of  collecRon
  19. 19. Monitoring MongoDB at MMF• Monitor  for  real-­‐Rme  system  events (Faster  response  Rme  =  less  impact)• Track  historical  performance  data  trends (Useful  for  predicRve  failure  analysis  and  scaling  need   projecRons)• MMS  –  MongoDB  Monitoring  Service    (Now  our  default  visual  metrics  system)• Zabbix  open  source  monitoring  • Makoomi  Zabbix  plugins  for  MongoDB• Mongostat  –  realRme  troubleshooRng  godsend  
  20. 20. Conclusion• MongoDB  is  extremely  versaRle,  and  can  help   your  applicaRon  scale,  even  if  you  dont  design   your  app  with  MongoDB  from  the  start.• MongoDB  fits  well  into  both  dedicated  and  virtual   architecture  environments.• Low  maintenance  overhead  compared  to   tradiRonal  R DMBS.• Provides  the  horizontal  scaling  path  required  for   Internet  Sized  applicaRons.
  21. 21. Were  Hiring!hSp://www.mapmyfitness.com/careers mongo-­‐sea4le@mapmyfitness.com

×