Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

MongoDB Versatility: Scaling the MapMyFitness Platform

3,046
views

Published on

Chris Merz, Manager of Operations, MapMyFitness …

Chris Merz, Manager of Operations, MapMyFitness

The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit its ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.

Published in: Technology

1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
3,046
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MongoDB VersatilityScaling the MapMyFitness Platform Sept 14, 2012
  • 2. Introductionl MapMyFitness  founded  in  2007l Offices  in  Denver,  CO  &  AusRn,  T X (w/  associates  in  S F,  Boston,  New  York,  L A,  and  Chicago)l Over  11  million  registered  usersl ~60  million  geo-­‐data  routes   (runs,  rides,  walks,  hikes,  etc)l Core  sites,  mobile  apps,  A PI,  white-­‐label (MapMyRun,  MapMyRide,  MapMyWalk,  MapMyTri,  MapMyHike,   MapMyFitness,  MapMyRace)
  • 3. Platform Overview and Background• Origins  in  the  L AMP  stack (Linux-­‐Apache-­‐MySQL-­‐PHP)• Scaled  well  to  ~2  million  users• Redesigned  in  Python/Django• MySQL  backend  not  sufficient“How  to  scale  from  2.5  to  6  million  users?”
  • 4. Functional Scaling• IdenRfy  high-­‐growth  /  large-­‐data  collecRons• Must  be  able  to  live  outside  the  exisRng   relaRonal  schema• Integrate  via  remote  resource  mapping  tables   in  the  R DBMS• FuncRonal  Scaling  can  facilitate  movement   towards  a  Service  Oriented  Architecture
  • 5. Use Case 1: Route Data Store• Geo-­‐locaRon  data  stored  in  json  blocks• MySQL  →  S3  →  File  Server  →  MongoDB• IniRal  size  of  ~500GB,  ~18  million  objects• 3  member  replica  set• Dedicated  iron  servers  with  24GB  R AM
  • 6. Route presentation example (Lost in Seattle)
  • 7. Route Data Example{ id: "e4da3b7fbbce2345d7772b06 74a318d5", updated_date: "2005-07-23 15:47:31", city: "San Diego", user_id: "4", created_date: "2005-07-23 15:47:31", route_name: "balboa park", state: "CA",
  • 8. Solution SummaryMigraRon  PaSern:• RESTful  A PI  modified  to  use  Mongo  P HP  driver  • Implemented  a  pass  thru  migraRon  funcRon• Batch  backfill  migraRons  via  pass-­‐thru• Data  transform  handled  in  P HP  code
  • 9. SAN storage and MongoDBl Needed  to  quickly  expand  available  diskl Implemented  high-­‐end  SAN  subsysteml Impressive  i/o  performance  with  MongoDBl MigraRon  to  SAN  painless  thanks  to  OpLogl Easily  expandable  due  to  the  use  of  X FSl Over  100  million  objects,  ~7TB  of  data  
  • 10. “Gotchas” a.k.a. Lessons Learned• Pay  aSenRon  to  potenRal  document  size (URlize  GridFS  for  larger  objects)• Allocate  enough  R AM  for  indexes!   (Especially  important  for  Large  data   collecRons)• File  dump  backups  may  not  scale  for  T B+   size  datasets. (URlize  delayed  and  hidden  member  for   DR)• Evaluate  filesystem  choice  carefully   (hint:  xfs)
  • 11. Use Case 2: Django Session Store• Django  sessions  not  scaling  in  MySQL• Modified  core  methods  to  use  MongoDB• Cutover  of  new  data   (Test  for  Mongo  data,  fallback  to  MySQL)• MigraRon  of  data  via  export/import (Simple  python  transform  script  using  pymongo)
  • 12. Use Case 3: Athletic Live Tracking• Beta  feature  uRlized  T T  +  MySQL (did  not  scale  for  large  events)• Required  to  be  “burstable”  for  Live  Events (deployable  in  The  Cloud)• Data  size  relaRvely  small   (compared  to  Routes  D B)• “Live”  data,  no  archiving  required  
  • 13. Use Case 3: Athletic Live Tracking• RS  Cloud,  3+n  MongoDB  replica  set  • Quickly  scalable  via  MongoDB  replicaRon• Highly  opRmized,  indexes  for  every  query• Low  administraRon  overhead  (vs  MySQL) “Gotchas” l Know  your  applicaRon   (tune  indexes  and  find()  ops  accordingly) l Know  your  driver (python  pooling  driver  defaults  way  too  
  • 14. As a DBA: Ease of Administration• ReplicaRon  made  elegant (as  compared  with  MySQL)• Ridiculously  simple  to  add  addl  members• Be  sure  to  run  IniRalSync  from  a  secondary rs.add(  “host”  :  “livetrack_db09”,  “iniRalSync”  :   {  “state”  :  2  }  )
  • 15. Use Case 4: Micro-Messaging Framework • IniRal  use  case  providing  micro-­‐goals   (user-­‐defined  stats  aggregaRon) • MongoDB  for  persistence  of  aggregates • Python  server  +  RabbitMQ  (AMQP) • Implemented  between  Django  and  MySQL (service  subscribes  to  interesRng  stats) • Horizontally  scalable  into  the  cloud,  with  base   capacity  on  dedicated  iron • Messaging  system  expanded  to  handle  real-­‐Rme   course  analysis  and  push  noRficaRons  
  • 16. Indexing Patterns or “Know Your App”• Proper  indexing  criRcal  to  performance  at  scale• MongoDB  is  ulRmately  flexible,  being  schemaless (mongo  gives  you  enough  rope  to  hang  yourself)• Avoid  un-­‐indexed  queries  at  all  costs (no.    really.    quickest  way  to  crater  your  app)• Onus  on  DevOps  to  match  applicaRon  to  indexes (know  your  query  profile,  never  assume)• Shoot  for  covered  queries  wherever  possible (answer  can  be  obtained  from  indexes  only)
  • 17. Use Case 5: API Logging DB• MongoDB  is  great  for  logging   (especially  if  you  log  in  json  format!)• Good  applicaRon  for  capped  collecRons (cap  by  data  size,  or  T TL)• Running  with  safe  mode  off  for  speed (fire-­‐n-­‐forget  logging  can  reduce  latency)• Cloud  servers  are  a  good  fit  for  logging  apps
  • 18. Capped Collections• Used  for  retaining  a  fixed  amount  of  data (based  on  data  size,  not  number  of  rows)• URlizes  F IFO  method  for  pruning  collecRon (Especially  useful  for  data  that  devalues  with  age)• TTL  CollecRons  (2.2)  age  out  data  based  on  a   retenRon  date  limit  (useful  for  a  variety  of  data   types)Gotcha! Explicitly  create  the  capped  collecRon before  any  data  is  put  into  the  system   to  avoid  auto-­‐creaRon  of  collecRon
  • 19. Monitoring MongoDB at MMF• Monitor  for  real-­‐Rme  system  events (Faster  response  Rme  =  less  impact)• Track  historical  performance  data  trends (Useful  for  predicRve  failure  analysis  and  scaling  need   projecRons)• MMS  –  MongoDB  Monitoring  Service    (Now  our  default  visual  metrics  system)• Zabbix  open  source  monitoring  • Makoomi  Zabbix  plugins  for  MongoDB• Mongostat  –  realRme  troubleshooRng  godsend  
  • 20. Conclusion• MongoDB  is  extremely  versaRle,  and  can  help   your  applicaRon  scale,  even  if  you  dont  design   your  app  with  MongoDB  from  the  start.• MongoDB  fits  well  into  both  dedicated  and  virtual   architecture  environments.• Low  maintenance  overhead  compared  to   tradiRonal  R DMBS.• Provides  the  horizontal  scaling  path  required  for   Internet  Sized  applicaRons.
  • 21. Were  Hiring!hSp://www.mapmyfitness.com/careers mongo-­‐sea4le@mapmyfitness.com