MongoFr : MongoDB as a log Collector

MONGODB AS A LOG COLLECTOR

photo by Jean-Michel BAUD

Pierre Bai!et & Mathieu Poumeyrol
oct & kali @ fotopedia.com

DB.SLIDES.FIND({‘TYPE’:‘TITLE’})

Fotopedia, who we are, what we do, how we do

MongoDB at Fotopedia, current state of our art

Logging, the answer to life, the universe and everything

How we fullfilled this need

Log usage on a daily basis

Future work

FOTOPEDIA
«Photos de fami!e»

FOTOPEDIA
WHO ARE WE ?

Company created in 2006

Located in Paris, near the Opéra

17 people, including 8 MongoDB regular users (aka
developers)

we’re hiring

FOTOPEDIA
WHAT DO WE DO ?
Images for Humanity

Open to anyone, Amateur or professionnal

Creative Commons aware

Beautiful Wikipedia (http://www.fotopedia.com)

iPad tablebooks (iPhone too): Heritage, National Parks and
Memory of Color

INFRASTRUCTURE

Based on Amazon Web Services

Around 20 servers located in the US datacenters

Use centralized deployment procedure (Chef)

Deploy at least once a week with no downtime

KEY TECHNOLOGIES

Ruby on Rails (with REE) Lackr (in house java proxy)

Unicorn Sinatra

Varnish Redis and Resque

HAProxy Mysql

NGinx MongoDB

MONGODB AT FOTOPEDIA
«C:UtilisateursfotopediaMes Documents»

CURRENT STATE OF OUR ART

Last year speech about our MongoDB powered metacache

Store complete Wikipedia data in > 10 languages

Since spring 2010, all new database-centric features have
been developped with MongoDB

Our goal : slowly migrate all DB feature to MongoDB
whenever possible

MYSQL MIGRATIONS
Alter table

30

22.5

15

7.5

0
08/Q3 08/Q4 09/Q1 09/Q2 09/Q3 09/Q4 10/Q1 10/Q2 10/Q3 10/Q4 2011

OUR SETUP

4 clusters (business data, log and reporting, wikipedia, and
one more)

3 EC-2 XL virtual machines hosting 5 replica-set

at the current time, one machine is master on all RS

5 replica-set are allocated to one of the clusters

every instance holds the 4 mongos

SOME FIGURES

in production since september 2009

wikipedia data: wikipedia/en: 5GB, 8M documents (and
about 10 other languages), batch load: 17k insert/s

webcache: 2GB, 11M records, avg 60 op/s, peak 300 op/s

overall, average 250 op/s

jm3

LOGGING
«l’oeil du lynx»

ORIGINAL PHILOSOPHY

Log everything, don’t delete

Collected by Scribe

Comprehensive daily log stored in AWS S3

Hadoop jobs to generates statistics

grep and his merry friends for issue inquiring

Quite efficient, but cumbersome and slow

WHY IMPROVE

Issue analysis in realtime (debugging)

Realtime activity analysis

Traffic spikes

Misbehaving crawlers and other suspicious activity

Stefano Constanzo

HOW WE SOLVED THIS ISSUE
«démons et mervei!es»

NORMALIZED LOG FORMAT

{ "_id" : ObjectId("4d7e11cc7ea68d34fb01f2ac2"),
"facility" : "varnish",

"instance" : "a01",

"date" : NumberLong("1300107724534"),

"http_host" : "www.fotopedia.com",

"method" : "GET",

"http_version" : "HTTP/1.1",

"path" : "/albums/fotopedia-fr-Cath%C3%A9drale_m%C3%A9tropolitaine_de_Buenos_Aires",

"status" : "404",

"size" : 13,

"elapsed" : 0.00007748600182821974 }

LOG COLLECTING

File logging daemons (NGinx, HAProxy)

Ruby tailer script

Memory logging daemons (Varnish)

Dedicated binary that streams varnish SHM into MongoDB

Other Daemons (Lackr, Picor)

Extended logging system to store data in MongoDB

also log ruby exceptions into MongoDB

MONGO SHARDING

All servers host the «logs» mongos on port 27002.

All daemons push their logs to«localhost:27002»

The actual storage is a capped collection in a non-sharded
database.

Jesús García Ferrer

LOG USAGE ON A DAILY BASIS
«l’aigui!e dans la meule de sapin»

SAPIN: EXCEPTION LOGGING

View Latest Errors


Useful informations:

•Source url and parameters

•Date and time

•Browser identifiers (IP, cookie
values, User-Agent)

•Full stack dump

•Full headers dump

•Full user model dump


Searching in Exceptions

RAMPLR: SAMPLING ANALYSIS

Sample analysis

SAPIN: REALTIME LOGGING

jQuery-ui based interface

Sinatra Backed

Filter by Facility

Searchable criterias: IP Address, Follow Operation-ID

Display HTTP execution Timeline


Facility Filtering


Url Filtering


IP Address Filtering


Operation ID Filtering


Timeline display

ISSUE WITH MONGODB

Scalability of using a capped collection

Official doc says no indices

Size limit vs indices efficiency (400 000 lines for < 2 hours of
log) : our plan is to have 2 days worth of logs.

The Library of Congress

FUTURE WORK
«vers l’inﬁni et au delà»

FUTURE WORK

Leaner interface

Ugly and jquery-ui based. Should switch to Sencha
framework

Keep more log

Abandon Capped collections

Keep log longer, one collection per day(?)

Great Beyond

QUESTIONS ?
«je vous dis : au revoir.»

MongoFr : MongoDB as a log Collector

More Related Content

What's hot

Viewers also liked

Similar to MongoFr : MongoDB as a log Collector

Recently uploaded

MongoFr : MongoDB as a log Collector

Editor's Notes