MONGODB AS A LOG COLLECTOR



                                     photo by Jean-Michel BAUD




   Pierre Bai!et & Mathieu Poumeyrol
        oct & kali @ fotopedia.com
DB.SLIDES.FIND({‘TYPE’:‘TITLE’})


Fotopedia, who we are, what we do, how we do

MongoDB at Fotopedia, current state of our art

Logging, the answer to life, the universe and everything

How we fullfilled this need

Log usage on a daily basis

Future work
FOTOPEDIA
«Photos de fami!e»
FOTOPEDIA
                WHO ARE WE ?

Company created in 2006

Located in Paris, near the Opéra

17 people, including 8 MongoDB regular users (aka
developers)

we’re hiring
FOTOPEDIA
             WHAT DO WE DO ?
Images for Humanity

Open to anyone, Amateur or professionnal

Creative Commons aware

Beautiful Wikipedia (http://www.fotopedia.com)

iPad tablebooks (iPhone too): Heritage, National Parks and
Memory of Color
INFRASTRUCTURE


Based on Amazon Web Services

Around 20 servers located in the US datacenters

Use centralized deployment procedure (Chef)

Deploy at least once a week with no downtime
KEY TECHNOLOGIES

Ruby on Rails (with REE)   Lackr (in house java proxy)


Unicorn                    Sinatra


Varnish                    Redis and Resque


HAProxy                    Mysql


NGinx                      MongoDB
MONGODB AT FOTOPEDIA
«C:UtilisateursfotopediaMes Documents»
CURRENT STATE OF OUR ART



Last year speech about our MongoDB powered metacache

Store complete Wikipedia data in > 10 languages

Since spring 2010, all new database-centric features have
been developped with MongoDB

Our goal : slowly migrate all DB feature to MongoDB
whenever possible
MYSQL MIGRATIONS
                                           Alter table

 30


22.5


 15


 7.5


  0
       08/Q3 08/Q4 09/Q1 09/Q2 09/Q3 09/Q4 10/Q1 10/Q2 10/Q3 10/Q4 2011
OUR SETUP

4 clusters (business data, log and reporting, wikipedia, and
one more)

3 EC-2 XL virtual machines hosting 5 replica-set

at the current time, one machine is master on all RS

5 replica-set are allocated to one of the clusters

every instance holds the 4 mongos
SOME FIGURES


in production since september 2009

wikipedia data: wikipedia/en: 5GB, 8M documents (and
about 10 other languages), batch load: 17k insert/s

webcache: 2GB, 11M records, avg 60 op/s, peak 300 op/s

overall, average 250 op/s
jm3




LOGGING
 «l’oeil du lynx»
ORIGINAL PHILOSOPHY

 Log everything, don’t delete

 Collected by Scribe

 Comprehensive daily log stored in AWS S3

 Hadoop jobs to generates statistics

 grep and his merry friends for issue inquiring

Quite efficient, but cumbersome and slow
WHY IMPROVE


Issue analysis in realtime (debugging)

Realtime activity analysis

  Traffic spikes

  Misbehaving crawlers and other suspicious activity
ORIGINAL STACK LAYOUT
Stefano Constanzo




HOW WE SOLVED THIS ISSUE
      «démons et mervei!es»
NORMALIZED LOG FORMAT

{ "_id" : ObjectId("4d7e11cc7ea68d34fb01f2ac2"),
"facility" : "varnish",

"instance" : "a01",

"date" : NumberLong("1300107724534"),

"http_host" : "www.fotopedia.com",

"method" : "GET",

"http_version" : "HTTP/1.1",

"path" : "/albums/fotopedia-fr-Cath%C3%A9drale_m%C3%A9tropolitaine_de_Buenos_Aires",

"status" : "404",

"size" : 13,

"elapsed" : 0.00007748600182821974 }
LOG COLLECTING

File logging daemons (NGinx, HAProxy)

  Ruby tailer script

Memory logging daemons (Varnish)

  Dedicated binary that streams varnish SHM into MongoDB

Other Daemons (Lackr, Picor)

  Extended logging system to store data in MongoDB

  also log ruby exceptions into MongoDB
MONGO SHARDING


All servers host the «logs» mongos on port 27002.

All daemons push their logs to«localhost:27002»

The actual storage is a capped collection in a non-sharded
database.
CURRENT STACK LAYOUT
Jesús García Ferrer




LOG USAGE ON A DAILY BASIS
    «l’aigui!e dans la meule de sapin»
SAPIN: EXCEPTION LOGGING

        View Latest Errors
SAPIN: EXCEPTION LOGGING

                     Useful informations:


                 •Source url and parameters

                 •Date and time

                 •Browser identifiers (IP, cookie
                 values, User-Agent)


                 •Full stack dump

                 •Full headers dump

                 •Full user model dump
SAPIN: EXCEPTION LOGGING

       Searching in Exceptions
RAMPLR: SAMPLING ANALYSIS




Sample analysis
SAPIN: REALTIME LOGGING


jQuery-ui based interface

Sinatra Backed

Filter by Facility

Searchable criterias: IP Address, Follow Operation-ID

Display HTTP execution Timeline
SAPIN: REALTIME LOGGING

        Facility Filtering
SAPIN: REALTIME LOGGING

         Url Filtering
SAPIN: REALTIME LOGGING

       IP Address Filtering
SAPIN: REALTIME LOGGING

       Operation ID Filtering
SAPIN: REALTIME LOGGING

        Timeline display
ISSUE WITH MONGODB


Scalability of using a capped collection

  Official doc says no indices

Size limit vs indices efficiency (400 000 lines for < 2 hours of
log) : our plan is to have 2 days worth of logs.
The Library of Congress




FUTURE WORK
 «vers l’infini et au delà»
FUTURE WORK

Leaner interface

  Ugly and jquery-ui based. Should switch to Sencha
  framework

Keep more log

  Abandon Capped collections

  Keep log longer, one collection per day(?)
Great Beyond




QUESTIONS ?
 «je vous dis : au revoir.»

MongoFr : MongoDB as a log Collector

  • 2.
    MONGODB AS ALOG COLLECTOR photo by Jean-Michel BAUD Pierre Bai!et & Mathieu Poumeyrol oct & kali @ fotopedia.com
  • 3.
    DB.SLIDES.FIND({‘TYPE’:‘TITLE’}) Fotopedia, who weare, what we do, how we do MongoDB at Fotopedia, current state of our art Logging, the answer to life, the universe and everything How we fullfilled this need Log usage on a daily basis Future work
  • 4.
  • 5.
    FOTOPEDIA WHO ARE WE ? Company created in 2006 Located in Paris, near the Opéra 17 people, including 8 MongoDB regular users (aka developers) we’re hiring
  • 6.
    FOTOPEDIA WHAT DO WE DO ? Images for Humanity Open to anyone, Amateur or professionnal Creative Commons aware Beautiful Wikipedia (http://www.fotopedia.com) iPad tablebooks (iPhone too): Heritage, National Parks and Memory of Color
  • 7.
    INFRASTRUCTURE Based on AmazonWeb Services Around 20 servers located in the US datacenters Use centralized deployment procedure (Chef) Deploy at least once a week with no downtime
  • 8.
    KEY TECHNOLOGIES Ruby onRails (with REE) Lackr (in house java proxy) Unicorn Sinatra Varnish Redis and Resque HAProxy Mysql NGinx MongoDB
  • 9.
  • 10.
    CURRENT STATE OFOUR ART Last year speech about our MongoDB powered metacache Store complete Wikipedia data in > 10 languages Since spring 2010, all new database-centric features have been developped with MongoDB Our goal : slowly migrate all DB feature to MongoDB whenever possible
  • 11.
    MYSQL MIGRATIONS Alter table 30 22.5 15 7.5 0 08/Q3 08/Q4 09/Q1 09/Q2 09/Q3 09/Q4 10/Q1 10/Q2 10/Q3 10/Q4 2011
  • 12.
    OUR SETUP 4 clusters(business data, log and reporting, wikipedia, and one more) 3 EC-2 XL virtual machines hosting 5 replica-set at the current time, one machine is master on all RS 5 replica-set are allocated to one of the clusters every instance holds the 4 mongos
  • 13.
    SOME FIGURES in productionsince september 2009 wikipedia data: wikipedia/en: 5GB, 8M documents (and about 10 other languages), batch load: 17k insert/s webcache: 2GB, 11M records, avg 60 op/s, peak 300 op/s overall, average 250 op/s
  • 14.
  • 15.
    ORIGINAL PHILOSOPHY Logeverything, don’t delete Collected by Scribe Comprehensive daily log stored in AWS S3 Hadoop jobs to generates statistics grep and his merry friends for issue inquiring Quite efficient, but cumbersome and slow
  • 16.
    WHY IMPROVE Issue analysisin realtime (debugging) Realtime activity analysis Traffic spikes Misbehaving crawlers and other suspicious activity
  • 17.
  • 18.
    Stefano Constanzo HOW WESOLVED THIS ISSUE «démons et mervei!es»
  • 19.
    NORMALIZED LOG FORMAT {"_id" : ObjectId("4d7e11cc7ea68d34fb01f2ac2"), "facility" : "varnish", "instance" : "a01", "date" : NumberLong("1300107724534"), "http_host" : "www.fotopedia.com", "method" : "GET", "http_version" : "HTTP/1.1", "path" : "/albums/fotopedia-fr-Cath%C3%A9drale_m%C3%A9tropolitaine_de_Buenos_Aires", "status" : "404", "size" : 13, "elapsed" : 0.00007748600182821974 }
  • 20.
    LOG COLLECTING File loggingdaemons (NGinx, HAProxy) Ruby tailer script Memory logging daemons (Varnish) Dedicated binary that streams varnish SHM into MongoDB Other Daemons (Lackr, Picor) Extended logging system to store data in MongoDB also log ruby exceptions into MongoDB
  • 21.
    MONGO SHARDING All servershost the «logs» mongos on port 27002. All daemons push their logs to«localhost:27002» The actual storage is a capped collection in a non-sharded database.
  • 22.
  • 23.
    Jesús García Ferrer LOGUSAGE ON A DAILY BASIS «l’aigui!e dans la meule de sapin»
  • 24.
    SAPIN: EXCEPTION LOGGING View Latest Errors
  • 25.
    SAPIN: EXCEPTION LOGGING Useful informations: •Source url and parameters •Date and time •Browser identifiers (IP, cookie values, User-Agent) •Full stack dump •Full headers dump •Full user model dump
  • 26.
    SAPIN: EXCEPTION LOGGING Searching in Exceptions
  • 27.
  • 28.
    SAPIN: REALTIME LOGGING jQuery-uibased interface Sinatra Backed Filter by Facility Searchable criterias: IP Address, Follow Operation-ID Display HTTP execution Timeline
  • 29.
    SAPIN: REALTIME LOGGING Facility Filtering
  • 30.
  • 31.
    SAPIN: REALTIME LOGGING IP Address Filtering
  • 32.
    SAPIN: REALTIME LOGGING Operation ID Filtering
  • 33.
    SAPIN: REALTIME LOGGING Timeline display
  • 34.
    ISSUE WITH MONGODB Scalabilityof using a capped collection Official doc says no indices Size limit vs indices efficiency (400 000 lines for < 2 hours of log) : our plan is to have 2 days worth of logs.
  • 35.
    The Library ofCongress FUTURE WORK «vers l’infini et au delà»
  • 36.
    FUTURE WORK Leaner interface Ugly and jquery-ui based. Should switch to Sencha framework Keep more log Abandon Capped collections Keep log longer, one collection per day(?)
  • 37.
    Great Beyond QUESTIONS ? «je vous dis : au revoir.»

Editor's Notes

  • #2 \n
  • #3 pierre baillet, server architect\nmathieu poumeyrol, director of cloud engineering\n
  • #4 \n
  • #5 \n
  • #6 next slide is what we do\n\n
  • #7 next slide is about how we do\n
  • #8 next slide is about key technologies\n
  • #9 Dernier slide de la section\n
  • #10 \n
  • #11 \n
  • #12 Dernier slide de la section\n
  • #13 \n
  • #14 \n
  • #15 \n
  • #16 next slide is why should we improve\n
  • #17 next slide show original logging layout\n
  • #18 Dernier Slide de la section\n
  • #19 \n
  • #20 \n
  • #21 \n
  • #22 \n
  • #23 Dernier slide de la section\n
  • #24 \n
  • #25 details on next slide\n
  • #26 search in exception in next slide\n
  • #27 next slide is about sampling and ramplr\n
  • #28 next slide is about technologies used in sapin\n
  • #29 next slide is about facility filtering\n
  • #30 describe sapin facility:\n- column selection\n- reloading\n- list of facility\n\nnext slide is about url filtering\n
  • #31 next slide is about url filtering\n
  • #32 next slide details an op-id session\n
  • #33 next slide shows a timeline\n
  • #34 next slide is about current issues\n
  • #35 Dernier slide de la section\n
  • #36 \n
  • #37 Dernier slide de la pr&amp;#xE9;sentation et de la section avant les questions\n
  • #38 \n