Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MongoFr : MongoDB as a log Collector


Published on

MongoDB can be used simply as a log collector using for example a capped collection. Fotopedia has such a system which is used for quick introspection and realtime analysis.

Speech done the 23rd of March, 2011 at MongoFR days in Paris, la Cantine by Pierre Baillet and Mathieu Poumeyrol

Published in: Technology
  • Sex in your area is here: ♥♥♥ ♥♥♥
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❤❤❤ ❤❤❤
    Are you sure you want to  Yes  No
    Your message goes here

MongoFr : MongoDB as a log Collector

  1. 1. MONGODB AS A LOG COLLECTOR photo by Jean-Michel BAUD Pierre Bai!et & Mathieu Poumeyrol oct & kali @
  2. 2. DB.SLIDES.FIND({‘TYPE’:‘TITLE’})Fotopedia, who we are, what we do, how we doMongoDB at Fotopedia, current state of our artLogging, the answer to life, the universe and everythingHow we fullfilled this needLog usage on a daily basisFuture work
  3. 3. FOTOPEDIA«Photos de fami!e»
  4. 4. FOTOPEDIA WHO ARE WE ?Company created in 2006Located in Paris, near the Opéra17 people, including 8 MongoDB regular users (akadevelopers)we’re hiring
  5. 5. FOTOPEDIA WHAT DO WE DO ?Images for HumanityOpen to anyone, Amateur or professionnalCreative Commons awareBeautiful Wikipedia ( tablebooks (iPhone too): Heritage, National Parks andMemory of Color
  6. 6. INFRASTRUCTUREBased on Amazon Web ServicesAround 20 servers located in the US datacentersUse centralized deployment procedure (Chef)Deploy at least once a week with no downtime
  7. 7. KEY TECHNOLOGIESRuby on Rails (with REE) Lackr (in house java proxy)Unicorn SinatraVarnish Redis and ResqueHAProxy MysqlNGinx MongoDB
  8. 8. MONGODB AT FOTOPEDIA«C:UtilisateursfotopediaMes Documents»
  9. 9. CURRENT STATE OF OUR ARTLast year speech about our MongoDB powered metacacheStore complete Wikipedia data in > 10 languagesSince spring 2010, all new database-centric features havebeen developped with MongoDBOur goal : slowly migrate all DB feature to MongoDBwhenever possible
  10. 10. MYSQL MIGRATIONS Alter table 3022.5 15 7.5 0 08/Q3 08/Q4 09/Q1 09/Q2 09/Q3 09/Q4 10/Q1 10/Q2 10/Q3 10/Q4 2011
  11. 11. OUR SETUP4 clusters (business data, log and reporting, wikipedia, andone more)3 EC-2 XL virtual machines hosting 5 replica-setat the current time, one machine is master on all RS5 replica-set are allocated to one of the clustersevery instance holds the 4 mongos
  12. 12. SOME FIGURESin production since september 2009wikipedia data: wikipedia/en: 5GB, 8M documents (andabout 10 other languages), batch load: 17k insert/swebcache: 2GB, 11M records, avg 60 op/s, peak 300 op/soverall, average 250 op/s
  13. 13. jm3LOGGING «l’oeil du lynx»
  14. 14. ORIGINAL PHILOSOPHY Log everything, don’t delete Collected by Scribe Comprehensive daily log stored in AWS S3 Hadoop jobs to generates statistics grep and his merry friends for issue inquiringQuite efficient, but cumbersome and slow
  15. 15. WHY IMPROVEIssue analysis in realtime (debugging)Realtime activity analysis Traffic spikes Misbehaving crawlers and other suspicious activity
  17. 17. Stefano ConstanzoHOW WE SOLVED THIS ISSUE «démons et mervei!es»
  18. 18. NORMALIZED LOG FORMAT{ "_id" : ObjectId("4d7e11cc7ea68d34fb01f2ac2"),"facility" : "varnish","instance" : "a01","date" : NumberLong("1300107724534"),"http_host" : "","method" : "GET","http_version" : "HTTP/1.1","path" : "/albums/fotopedia-fr-Cath%C3%A9drale_m%C3%A9tropolitaine_de_Buenos_Aires","status" : "404","size" : 13,"elapsed" : 0.00007748600182821974 }
  19. 19. LOG COLLECTINGFile logging daemons (NGinx, HAProxy) Ruby tailer scriptMemory logging daemons (Varnish) Dedicated binary that streams varnish SHM into MongoDBOther Daemons (Lackr, Picor) Extended logging system to store data in MongoDB also log ruby exceptions into MongoDB
  20. 20. MONGO SHARDINGAll servers host the «logs» mongos on port 27002.All daemons push their logs to«localhost:27002»The actual storage is a capped collection in a non-shardeddatabase.
  22. 22. Jesús García FerrerLOG USAGE ON A DAILY BASIS «l’aigui!e dans la meule de sapin»
  23. 23. SAPIN: EXCEPTION LOGGING View Latest Errors
  24. 24. SAPIN: EXCEPTION LOGGING Useful informations: •Source url and parameters •Date and time •Browser identifiers (IP, cookie values, User-Agent) •Full stack dump •Full headers dump •Full user model dump
  25. 25. SAPIN: EXCEPTION LOGGING Searching in Exceptions
  26. 26. RAMPLR: SAMPLING ANALYSISSample analysis
  27. 27. SAPIN: REALTIME LOGGINGjQuery-ui based interfaceSinatra BackedFilter by FacilitySearchable criterias: IP Address, Follow Operation-IDDisplay HTTP execution Timeline
  28. 28. SAPIN: REALTIME LOGGING Facility Filtering
  29. 29. SAPIN: REALTIME LOGGING Url Filtering
  30. 30. SAPIN: REALTIME LOGGING IP Address Filtering
  31. 31. SAPIN: REALTIME LOGGING Operation ID Filtering
  32. 32. SAPIN: REALTIME LOGGING Timeline display
  33. 33. ISSUE WITH MONGODBScalability of using a capped collection Official doc says no indicesSize limit vs indices efficiency (400 000 lines for < 2 hours oflog) : our plan is to have 2 days worth of logs.
  34. 34. The Library of CongressFUTURE WORK «vers l’infini et au delà»
  35. 35. FUTURE WORKLeaner interface Ugly and jquery-ui based. Should switch to Sencha frameworkKeep more log Abandon Capped collections Keep log longer, one collection per day(?)
  36. 36. Great BeyondQUESTIONS ? «je vous dis : au revoir.»