SlideShare a Scribd company logo
1 of 21
Download to read offline
MongoDB at MMF
From a DevOps Perspective

      Jan 24, 2013
Introduction

l
     MapMyFitness	
  was	
  founded	
  in	
  2007
l
     Offices	
  in	
  Denver,	
  C O	
  &	
  AusRn,	
  T X
     (w/	
  associates	
  in	
  S F,	
  Boston,	
  New	
  York,	
  L A,	
  and	
  Chicago)
l
     Over	
  13	
  million	
  registered	
  users
l
     ~80	
  million	
  geo-­‐data	
  routes	
  
     (runs,	
  rides,	
  walks,	
  hikes,	
  etc)
l
     Core	
  sites,	
  mobile	
  apps,	
  A PI,	
  white-­‐label
     (MapMyRun,	
  MapMyRide,	
  MapMyFitness)
MMF Platform Overview

•	
  Python	
  (django)	
  &	
  P HP	
  (legacy	
  A PI)

•	
  Although	
  MySQL	
  is	
  the	
  core	
  backing	
  db	
  for	
  Django,	
  the	
  majority	
  of
	
  	
  M MF	
  data	
  lives	
  in	
  various	
  MongoDB	
  datastores.	
  	
  

•	
  Routes	
  datastore	
  has	
  ~120	
  million	
  objects,	
  currently	
  7TB+	
  of	
  data
	
  	
  (3	
  member	
  replica	
  set	
  backed	
  by	
  a	
  EMC	
  SAN,	
  48GB	
  RAM	
  each)

•	
  Django	
  sessions	
  converted	
  to	
  using	
  MongoDB	
  
	
  	
  (funcRonal	
  scaling	
  example,	
  600M	
  sessions	
  stored)

•	
  Live	
  Tracking	
  system	
  uRlizes	
  elasRc	
  replica	
  set	
  membership	
  to
	
  	
  handle	
  load	
  scaling	
  for	
  events

•	
  Granular	
  A PI	
  access/error	
  logging	
  via	
  json	
  to	
  MongoDB
Route & Elevation data example
   (Lost on the way to MongoSeattle)
Implementation Patterns

•	
  Standard	
  Datastore	
  -­‐	
  3	
  member	
  replica	
  set
 	
  	
  	
  	
  (small	
  to	
  med	
  implementaRons)

•	
  Big	
  Data	
  implementaRon	
  –	
  sharded	
  cluster	
  (TB+)

•	
  Buffering	
  Layer	
  -­‐	
  high	
  memory	
  
	
  	
  	
  	
  (load	
  all	
  data	
  and	
  index	
  files	
  into	
  R AM)

•	
  Write	
  Heavy	
  -­‐	
  uRlize	
  sharding	
  to	
  opRmize	
  for	
  writes

•	
  Read	
  Heavy	
  -­‐	
  3+n	
  replica	
  set	
  configuraRon	
  for	
  rapid	
  read	
  scaling
	
  	
  	
  	
  (up	
  to	
  12	
  nodes)
Implementation Patterns

•	
  In	
  the	
  cloud,	
  tune	
  the	
  instance	
  type	
  to	
  the	
  mongo	
  implementaRon

•	
  On	
  iron,	
  plan	
  carefully	
  and	
  dedicate	
  servers	
  completely	
  to	
  mongo	
  to	
  
       avoid	
  memory	
  map	
  contenRon

•	
  For	
  D R,	
  spin	
  up	
  a	
  delayed,	
  hidden	
  replica	
  node	
  (preferably	
  in	
  a	
  
       different	
  datacenter)

•	
  AggregaRon	
  framework	
  can	
  be	
  used	
  in	
  myriad	
  ways,	
  including	
  
       bridging	
  the	
  gap	
  to	
  S QL	
  data	
  warehousing	
  via	
  E TL.

•	
  Automate	
  install	
  paSerns	
  for	
  rapid	
  development,	
  prototyping,	
  and	
  
       infrastructure	
  scaling.
Operational Automation
( example of automated mongodb install via puppet )
Replica Set Expansion


•   MongoDB	
  is	
  “replicaRon	
  made	
  elegant”
•   Ridiculously	
  simple	
  to	
  add	
  addiRonal	
  members
•   Be	
  sure	
  to	
  run	
  IniRalSync	
  from	
  a	
  secondary!
    rs.add(	
  “host”	
  :	
  “livetrack_db09”,	
  “ini8alSync”	
  :	
  {	
  “state”	
  :	
  2	
  }	
  )
•   Both	
  rs.add()	
  and	
  rs.remove()	
  can	
  be	
  scripted	
  and	
  connected	
  to	
  
    Monitoring	
  systems	
  for	
  autoscaling
Monitoring and Introspection

•	
  M MS,	
  10gen's	
  cloud-­‐based	
  monitoring	
  service	
  (best	
  available)

•	
  Supported	
  by	
  Zabbix,	
  Nagios,	
  Munin,	
  Server	
  Density,	
  etc

•	
  mongostat,	
  mongotop,	
  R EST	
  interface,	
  database	
  profiler

•	
  Monitoring	
  system	
  triggers	
  can	
  iniRate	
  node	
  addiRons,
	
  	
  removals,	
  service	
  restarts,	
  etc

•	
  In	
  addiRon	
  to	
  service-­‐level	
  monitoring,	
  use	
  more	
  advanced
	
  	
  tests	
  to	
  check	
  for	
  and	
  alert	
  on	
  query	
  latency	
  spikes
10gen's MMS
(the one-stop shop for mongdb metrics)
Mongo in Zabbix
( Mikoomi Plugins: http://code.google.com/p/mikoomi )
mongostat
( Very useful for real-time troubleshooting )
Operational Automation
( example of automated mongodb restart action )
Security Considerations

•	
  MongoDB	
  provides	
  authenRcaRon	
  support	
  and	
  basic	
  permissions

•	
  Auth	
  is	
  turned	
  off	
  by	
  default	
  to	
  allow	
  for	
  opRmal	
  performance	
  

•	
  Always	
  run	
  databases	
  in	
  a	
  trusted	
  network	
  environment

•	
  Lock	
  down	
  host	
  based	
  firewalls	
  to	
  limit	
  access	
  to	
  required	
  clients	
  

•	
  Automate	
  iptables	
  with	
  puppet	
  or	
  chef,	
  in	
  EC2	
  use	
  security	
  groups
Network Security Automation

## Puppet Pattern for Mongodb network security


class iptables::public {

      iptables::add_rule { '001 MongoDB established':
          rule => '-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT'
      }

      iptables::add_rule { '002 MongoDB':
          rule => '-A RH-Firewall-1-INPUT -i eth1 -p tcp -m tcp --dport 27017 -j ACCEPT'
      }

      iptables::add_rule { '003 MongoDB MMF Phase II Network':
          rule => '-A RH-Firewall-1-INPUT -i eth0 -s 172.16.16.0/20 -p tcp -m tcp --dport 27017 -j ACCEPT'
      }

      iptables::add_rule { '004 MongoDB MMF Cloud Network':
          rule => '-A RH-Firewall-1-INPUT -i eth0 -s 10.178.52.0/24 -p tcp -m tcp --dport 27017 -j ACCEPT'
      }

  }
Security Considerations

•	
  Use	
  the	
  rule	
  of	
  least-­‐privilege	
  to	
  allow	
  access	
  to	
  environments	
  

•	
  Data	
  sensiRvity	
  should	
  determine	
  the	
  extent	
  of	
  security	
  measures

•	
  For	
  non-­‐sensiRve	
  data,	
  good	
  network	
  security	
  can	
  be	
  sufficient	
  

•	
  In	
  open	
  environments,	
  be	
  sure	
  experience	
  matches	
  access	
  level

•	
  Lack	
  of	
  granular	
  perms	
  allows	
  for	
  full	
  admin	
  access,	
  use	
  discreRon
Maintenance

•	
  Far	
  less	
  maintenance	
  required	
  than	
  tradiRonal	
  R DMBS	
  systems

•	
  Regularly	
  perform	
  query	
  profile	
  analysis	
  and	
  index	
  audiRng

•	
  Rebuild	
  databases	
  to	
  reclaim	
  space	
  lost	
  due	
  to	
  fragmentaRon

•	
  Automate	
  checks	
  of	
  log	
  files	
  for	
  known	
  red-­‐flags

•	
  Regularly	
  review	
  data	
  throughput	
  rate,	
  storage	
  growth	
  rate,	
  and
	
  	
  overall	
  business	
  growth	
  graphs	
  to	
  inform	
  capacity	
  planning.

•	
  For	
  H A	
  tesRng,	
  periodically	
  step-­‐down	
  the	
  primary	
  to	
  force	
  failover
Indexing Patterns or “Know Your App”

•   Proper	
  indexing	
  criRcal	
  to	
  performance	
  at	
  scale
    (monitor	
  slow	
  queries	
  to	
  catch	
  non-­‐performant	
  requests)
•   MongoDB	
  is	
  ulRmately	
  flexible,	
  being	
  schemaless
    (mongo	
  gives	
  you	
  enough	
  rope	
  to	
  hang	
  yourself,	
  choose	
  wisely)
•   Avoid	
  un-­‐indexed	
  queries	
  at	
  all	
  costs	
  
    (it's	
  quickest	
  way	
  to	
  crater	
  your	
  app...	
  consider	
  -­‐-­‐notablescan)
•   Onus	
  on	
  DevOps	
  to	
  match	
  applicaRon	
  to	
  indexes
    (know	
  your	
  query	
  profile,	
  never	
  assume)
•   Shoot	
  for	
  'covered	
  queries'	
  wherever	
  possible
    (answer	
  can	
  be	
  obtained	
  from	
  indexes	
  only)
Capped Collections

• Use	
  standard	
  capped	
  collecRons	
  for	
  retaining	
  a	
  fixed	
  amount	
  
  of	
  data.	
  	
  Uses	
  a	
  F IFO	
  strategy	
  for	
  pruning.
    (based	
  on	
  data	
  size,	
  not	
  number	
  of	
  rows)

• TTL	
  CollecRons	
  (2.2)	
  age	
  out	
  data	
  based	
  on	
  a	
  retenRon	
  Rme	
  
  configuraRon.	
  	
  
    (great	
  for	
  data	
  retenRon	
  requirements	
  of	
  all	
  types)

    Gotcha!
    Explicitly	
  create	
  the	
  capped	
  collecRon	
  before	
  any	
  data	
  is	
  put	
  
    into	
  the	
  system	
  to	
  avoid	
  auto-­‐creaRon	
  of	
  collecRon
Lessons Learned

•	
  Mongo	
  2.2	
  upgrade	
  containing	
  a	
  capped	
  collecRon	
  created	
  in	
  1.8.4.	
  	
  This	
  severely	
  impacted	
  
        replicaRon	
  (RC:	
  no	
  "_id"	
  index,	
  	
  F IX:	
  add	
  "_id"	
  index)	
  

•	
  Never	
  start	
  mongo	
  when	
  a	
  mount	
  point	
  is	
  missing	
  or	
  incorrectly	
  configured.	
  Mongo	
  may	
  
        decide	
  to	
  take	
  maSers	
  into	
  it's	
  own	
  hands	
  and	
  resync	
  itself	
  with	
  the	
  replica	
  set.	
  	
   Make	
  
        sure	
  your	
  devops	
  and	
  your	
  hos2ng	
  provider	
  admins	
  are	
  aware	
  of	
  this

•	
  Some	
  drivers	
  that	
  use	
  connecRon	
  pooling	
  can	
  freak	
  the	
  freaky	
  freak	
  when	
  the	
  primary	
  
        member	
  changes	
  (older	
  pymongo).	
  	
  Kicking	
  the	
  applicaRon	
  can	
  fix,	
  also:	
  upgrade	
  drivers

•	
  High	
  locked	
  %	
  is	
  a	
  big	
  red-­‐flag,	
  and	
  can	
  be	
  caused	
  by	
  a	
  large	
  number	
  of	
  simultaneous	
  dml	
  
         acRons	
  (high	
  insert	
  rate,	
  high	
  update	
  rate).	
  Consider	
  this	
  in	
  the	
  design	
  phase.

•	
  Be	
  wary	
  of	
  automaRon	
  that	
  can	
  change	
  the	
  state	
  of	
  a	
  node	
  during	
  maintenance	
  mode.	
  	
  
            Disable	
  automaRon	
  agents	
  for	
  reduced	
  risk	
  during	
  criRcal	
  administraRve	
  operaRons	
  
            (filesystem	
  maint,	
  etc)
Thank	
  you!
chris@mapmyfitness.com

More Related Content

What's hot

Rakuten LeoFs - distributed file system
Rakuten LeoFs - distributed file systemRakuten LeoFs - distributed file system
Rakuten LeoFs - distributed file systemRakuten Group, Inc.
 
PulsarCast - Scaling Pub-Sub over the distributed web
PulsarCast - Scaling Pub-Sub over the distributed webPulsarCast - Scaling Pub-Sub over the distributed web
PulsarCast - Scaling Pub-Sub over the distributed webJoão Antunes
 
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기Jinsu Moon
 
SPE effiency on modern hardware paper presentation
SPE effiency on modern hardware   paper presentationSPE effiency on modern hardware   paper presentation
SPE effiency on modern hardware paper presentationPanagiotisSavvaidis
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascaleinside-BigData.com
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Distributed Tracing, from internal SAAS insights
Distributed Tracing, from internal SAAS insightsDistributed Tracing, from internal SAAS insights
Distributed Tracing, from internal SAAS insightsHuy Do
 
Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门frogd
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016DataStax
 
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageHBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageCloudera, Inc.
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netYan Vugenfirer
 
Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)Huy Do
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingCREATE-NET
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkXiaoxi Chen
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...Jorge E. López de Vergara Méndez
 
Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed PostgresStas Kelvich
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016DataStax
 
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)Pekka Männistö
 

What's hot (20)

Rakuten LeoFs - distributed file system
Rakuten LeoFs - distributed file systemRakuten LeoFs - distributed file system
Rakuten LeoFs - distributed file system
 
PulsarCast - Scaling Pub-Sub over the distributed web
PulsarCast - Scaling Pub-Sub over the distributed webPulsarCast - Scaling Pub-Sub over the distributed web
PulsarCast - Scaling Pub-Sub over the distributed web
 
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
 
SPE effiency on modern hardware paper presentation
SPE effiency on modern hardware   paper presentationSPE effiency on modern hardware   paper presentation
SPE effiency on modern hardware paper presentation
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascale
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Distributed Tracing, from internal SAAS insights
Distributed Tracing, from internal SAAS insightsDistributed Tracing, from internal SAAS insights
Distributed Tracing, from internal SAAS insights
 
Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门
 
Open shmem
Open shmemOpen shmem
Open shmem
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageHBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computing
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmark
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
 
Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed Postgres
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
 
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)
 

Similar to MongoDB at MapMyFitness from a DevOps Perspective

MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitnessMapMyFitness
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013Christopher Hogue
 
Run MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSRun MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSMongoDB
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB
 
QueueMetrics - Tips and Tricks
QueueMetrics - Tips and TricksQueueMetrics - Tips and Tricks
QueueMetrics - Tips and TricksClarotech_Events
 
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB
 
Webinar: Introduction to MongoDB 3.0
Webinar: Introduction to MongoDB 3.0Webinar: Introduction to MongoDB 3.0
Webinar: Introduction to MongoDB 3.0MongoDB
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...ronwarshawsky
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Evolution Of MongoDB Replicaset
Evolution Of MongoDB ReplicasetEvolution Of MongoDB Replicaset
Evolution Of MongoDB ReplicasetM Malai
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8Rahul Gupta
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp
 

Similar to MongoDB at MapMyFitness from a DevOps Perspective (20)

MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitness
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
 
Run MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSRun MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMS
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
QueueMetrics - Tips and Tricks
QueueMetrics - Tips and TricksQueueMetrics - Tips and Tricks
QueueMetrics - Tips and Tricks
 
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 
Webinar: Introduction to MongoDB 3.0
Webinar: Introduction to MongoDB 3.0Webinar: Introduction to MongoDB 3.0
Webinar: Introduction to MongoDB 3.0
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Evolution Of MongoDB Replicaset
Evolution Of MongoDB ReplicasetEvolution Of MongoDB Replicaset
Evolution Of MongoDB Replicaset
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

MongoDB at MapMyFitness from a DevOps Perspective

  • 1. MongoDB at MMF From a DevOps Perspective Jan 24, 2013
  • 2. Introduction l MapMyFitness  was  founded  in  2007 l Offices  in  Denver,  C O  &  AusRn,  T X (w/  associates  in  S F,  Boston,  New  York,  L A,  and  Chicago) l Over  13  million  registered  users l ~80  million  geo-­‐data  routes   (runs,  rides,  walks,  hikes,  etc) l Core  sites,  mobile  apps,  A PI,  white-­‐label (MapMyRun,  MapMyRide,  MapMyFitness)
  • 3. MMF Platform Overview •  Python  (django)  &  P HP  (legacy  A PI) •  Although  MySQL  is  the  core  backing  db  for  Django,  the  majority  of    M MF  data  lives  in  various  MongoDB  datastores.     •  Routes  datastore  has  ~120  million  objects,  currently  7TB+  of  data    (3  member  replica  set  backed  by  a  EMC  SAN,  48GB  RAM  each) •  Django  sessions  converted  to  using  MongoDB      (funcRonal  scaling  example,  600M  sessions  stored) •  Live  Tracking  system  uRlizes  elasRc  replica  set  membership  to    handle  load  scaling  for  events •  Granular  A PI  access/error  logging  via  json  to  MongoDB
  • 4. Route & Elevation data example (Lost on the way to MongoSeattle)
  • 5. Implementation Patterns •  Standard  Datastore  -­‐  3  member  replica  set        (small  to  med  implementaRons) •  Big  Data  implementaRon  –  sharded  cluster  (TB+) •  Buffering  Layer  -­‐  high  memory          (load  all  data  and  index  files  into  R AM) •  Write  Heavy  -­‐  uRlize  sharding  to  opRmize  for  writes •  Read  Heavy  -­‐  3+n  replica  set  configuraRon  for  rapid  read  scaling        (up  to  12  nodes)
  • 6. Implementation Patterns •  In  the  cloud,  tune  the  instance  type  to  the  mongo  implementaRon •  On  iron,  plan  carefully  and  dedicate  servers  completely  to  mongo  to   avoid  memory  map  contenRon •  For  D R,  spin  up  a  delayed,  hidden  replica  node  (preferably  in  a   different  datacenter) •  AggregaRon  framework  can  be  used  in  myriad  ways,  including   bridging  the  gap  to  S QL  data  warehousing  via  E TL. •  Automate  install  paSerns  for  rapid  development,  prototyping,  and   infrastructure  scaling.
  • 7. Operational Automation ( example of automated mongodb install via puppet )
  • 8. Replica Set Expansion • MongoDB  is  “replicaRon  made  elegant” • Ridiculously  simple  to  add  addiRonal  members • Be  sure  to  run  IniRalSync  from  a  secondary! rs.add(  “host”  :  “livetrack_db09”,  “ini8alSync”  :  {  “state”  :  2  }  ) • Both  rs.add()  and  rs.remove()  can  be  scripted  and  connected  to   Monitoring  systems  for  autoscaling
  • 9. Monitoring and Introspection •  M MS,  10gen's  cloud-­‐based  monitoring  service  (best  available) •  Supported  by  Zabbix,  Nagios,  Munin,  Server  Density,  etc •  mongostat,  mongotop,  R EST  interface,  database  profiler •  Monitoring  system  triggers  can  iniRate  node  addiRons,    removals,  service  restarts,  etc •  In  addiRon  to  service-­‐level  monitoring,  use  more  advanced    tests  to  check  for  and  alert  on  query  latency  spikes
  • 10. 10gen's MMS (the one-stop shop for mongdb metrics)
  • 11. Mongo in Zabbix ( Mikoomi Plugins: http://code.google.com/p/mikoomi )
  • 12. mongostat ( Very useful for real-time troubleshooting )
  • 13. Operational Automation ( example of automated mongodb restart action )
  • 14. Security Considerations •  MongoDB  provides  authenRcaRon  support  and  basic  permissions •  Auth  is  turned  off  by  default  to  allow  for  opRmal  performance   •  Always  run  databases  in  a  trusted  network  environment •  Lock  down  host  based  firewalls  to  limit  access  to  required  clients   •  Automate  iptables  with  puppet  or  chef,  in  EC2  use  security  groups
  • 15. Network Security Automation ## Puppet Pattern for Mongodb network security class iptables::public { iptables::add_rule { '001 MongoDB established': rule => '-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT' } iptables::add_rule { '002 MongoDB': rule => '-A RH-Firewall-1-INPUT -i eth1 -p tcp -m tcp --dport 27017 -j ACCEPT' } iptables::add_rule { '003 MongoDB MMF Phase II Network': rule => '-A RH-Firewall-1-INPUT -i eth0 -s 172.16.16.0/20 -p tcp -m tcp --dport 27017 -j ACCEPT' } iptables::add_rule { '004 MongoDB MMF Cloud Network': rule => '-A RH-Firewall-1-INPUT -i eth0 -s 10.178.52.0/24 -p tcp -m tcp --dport 27017 -j ACCEPT' } }
  • 16. Security Considerations •  Use  the  rule  of  least-­‐privilege  to  allow  access  to  environments   •  Data  sensiRvity  should  determine  the  extent  of  security  measures •  For  non-­‐sensiRve  data,  good  network  security  can  be  sufficient   •  In  open  environments,  be  sure  experience  matches  access  level •  Lack  of  granular  perms  allows  for  full  admin  access,  use  discreRon
  • 17. Maintenance •  Far  less  maintenance  required  than  tradiRonal  R DMBS  systems •  Regularly  perform  query  profile  analysis  and  index  audiRng •  Rebuild  databases  to  reclaim  space  lost  due  to  fragmentaRon •  Automate  checks  of  log  files  for  known  red-­‐flags •  Regularly  review  data  throughput  rate,  storage  growth  rate,  and    overall  business  growth  graphs  to  inform  capacity  planning. •  For  H A  tesRng,  periodically  step-­‐down  the  primary  to  force  failover
  • 18. Indexing Patterns or “Know Your App” • Proper  indexing  criRcal  to  performance  at  scale (monitor  slow  queries  to  catch  non-­‐performant  requests) • MongoDB  is  ulRmately  flexible,  being  schemaless (mongo  gives  you  enough  rope  to  hang  yourself,  choose  wisely) • Avoid  un-­‐indexed  queries  at  all  costs   (it's  quickest  way  to  crater  your  app...  consider  -­‐-­‐notablescan) • Onus  on  DevOps  to  match  applicaRon  to  indexes (know  your  query  profile,  never  assume) • Shoot  for  'covered  queries'  wherever  possible (answer  can  be  obtained  from  indexes  only)
  • 19. Capped Collections • Use  standard  capped  collecRons  for  retaining  a  fixed  amount   of  data.    Uses  a  F IFO  strategy  for  pruning. (based  on  data  size,  not  number  of  rows) • TTL  CollecRons  (2.2)  age  out  data  based  on  a  retenRon  Rme   configuraRon.     (great  for  data  retenRon  requirements  of  all  types) Gotcha! Explicitly  create  the  capped  collecRon  before  any  data  is  put   into  the  system  to  avoid  auto-­‐creaRon  of  collecRon
  • 20. Lessons Learned •  Mongo  2.2  upgrade  containing  a  capped  collecRon  created  in  1.8.4.    This  severely  impacted   replicaRon  (RC:  no  "_id"  index,    F IX:  add  "_id"  index)   •  Never  start  mongo  when  a  mount  point  is  missing  or  incorrectly  configured.  Mongo  may   decide  to  take  maSers  into  it's  own  hands  and  resync  itself  with  the  replica  set.     Make   sure  your  devops  and  your  hos2ng  provider  admins  are  aware  of  this •  Some  drivers  that  use  connecRon  pooling  can  freak  the  freaky  freak  when  the  primary   member  changes  (older  pymongo).    Kicking  the  applicaRon  can  fix,  also:  upgrade  drivers •  High  locked  %  is  a  big  red-­‐flag,  and  can  be  caused  by  a  large  number  of  simultaneous  dml   acRons  (high  insert  rate,  high  update  rate).  Consider  this  in  the  design  phase. •  Be  wary  of  automaRon  that  can  change  the  state  of  a  node  during  maintenance  mode.     Disable  automaRon  agents  for  reduced  risk  during  criRcal  administraRve  operaRons   (filesystem  maint,  etc)