SlideShare a Scribd company logo
MongoDB at fotopedia
         Timeline storage
   Our Swiss Army Knife Database
MongoDB at fotopedia

•   Context

•   Wikipedia data storage

•   Metacache
Fotopedia
•   Fotonauts, an American/French company

•   Photo — Encyclopedia

•   Heavily interconnected system : flickr,
    facebook, wikipedia, picassa, twitter…

•   MongoDB in production since last october

•   main store lives in MySQL… for now
First contact


•   Wikipedia imported data
Wikipedia queries

•   wikilinks from one article

•   links to one article

•   geo coordinates

•   redirect

•   why not use wikipedia API ?
Download
~ 5.7GB gzip
XML



        Geo
        Redirect
        Backlink
        Related
    ~12GB tabular data
Problem


Load ~12GB into a K/V store
CouchDB 0.9 attempt


•   CouchDB had no dedicated import tool

•   need to go through HTTP / Rest API
“DATA LOADING”



   LOADING!




   (obviously hijacked from xkcd.com)
Problem, rephrased


  Load ~12GB into any K/V store
        in hours, not days
Hadoop HBase ?

•   as we were already using Hadoop Map/
    Reduce for preparation

•   bulk load was just emerging at that time,
    requiring to code against HBase private APIs,
    generate the data in an ad-hoc binary
    format, ...
photo by neural.it on Flickr
Problem, rerephrased
      Load ~12GB into any K/V store
            in hours, not days
  without wasting a week on development
       and another week on setup
      and several months on tuning
                please ?
MongoDB attempt
•   Transforming the tabular data into a JSON
    form : about half an hour or code, 45 minutes
    of hadoop parallel processing

•   setup mongo server : 15 minutes

•   mongoimport : 3 minutes to start it, 90 minutes
    to run

•   plug RoR app on mongo : minutes

•   prototype was done in a day
Batch   Synchronous


  Download
  ~ 5.7GB gzip



    Geo
    Redirect
    Backlink      Ruby on Rails
    Related
~12GB, 12M docs
Hot swap ?
•   Indexing was locking everything.

•   Just run two instances of MongoDB.
    •   One instance is servicing the web app

    •   One instance is asleep or loading data

•   One third instance knows the status of the two
    instances.
We loved:
•   JSON import format

•   efficiency of mongoimport

•   simple and flexible installation
    •   just one cumbersome dependency

    •   easy to start (we use runit)

    •   easy to have several instances on one box
Second contact

•   itʼs just all about graphes, anyway.
    •   wikilinks

    •   people following people

    •   related community albums

    •   and soon, interlanguage links
all about graphes...

•   ... and itʼs also all about cache.

•   The application needs to “feel” faster, letʼs
    cache more.

•   The application needs to “feel” right, so letʼs
    cache less.

•   or — big sigh — invalidate.
Page fragment caching
                                                 photo by Mykl Roventine on Flickr


    Nginx SSI


                     photo by Aires Dos Santos




Varnish HTTP cache



  RoR application                                  photo by Leslie Chatfield on Flickr
Haiku ?

There are only two hard things
in Computer Science:
cache invalidation and naming things.



                                        Phil Karlton
Naming things


•   REST have been a strong design principle in
    fotopedia since the early days, and the efforts
    are paying.
/en/2nd_arrondissement_of_Paris




  /en/Paris/fragment/left_col

 /users/john/fragment/contrib




  /en/Paris/fragment/related
Invalidating


•   Rest allows us to invalidate by URL prefix.

•   When the Paris album changes, we have to
    invalidate /en/Paris.*
Varnish invalidation

•   Varnish built-in regexp based invalidation is
    not designed for intensive, fine grained
    invalidation.

•   We need to invalidate URLs individually.
/en/Paris.*

/en/Paris

/en/Paris/fragment/left_col

/en/Paris/photos.json?skip=0&number=20

/en/Paris/photos.json?skip=13&number=27
Metacache workflow
                                           /en/Paris.*


                    invalidation worker

    Nginx SSI
                      /en/Paris
                      /en/Paris/fragment/left_col
                      /en/Paris/photos.json?skip=0&number=20
                      /en/Paris/photos.json?skip=13&number=27
Varnish HTTP cache

             varnish log
                                 /en/Paris/fragment/left_col

  RoR application
                         metacache feeder
Waw.

•   This time we are actually using MongoDB as a
    BTree. Impressive.

•   The metacache has been running fine for
    several months, and we want to go further.
Invalidate less
•   We need to be more specific as to what we
    invalidate.

•   Today, if somebody votes on a photo in the
    Paris album, we invalidate all /en/Paris prefix,
    and most of it is unchanged.

•   We will move towards a more clever
    metacache.
Metacache reloaded
•   Pub/Sub metacache

•   Have the backend send a specific header to
    be caught by the metacache-feeder, conaining
    “subscribe” message.

•   This header will be a JSON document, to be
    pushed to the metacache.

•   The purge commands will be mongo search
    queries.
/en/Paris

      /en/Paris/fragment/left_col

      /en/Paris/photos.json?skip=0&number=20

      /en/Paris/photos.json?skip=13&number=27



{url:/en/Paris, observe:[summary,links]}

{url:/en/Paris/fragment/left_col, observe: [cover]}

{url:/en/Paris/photos.json?skip=0&number=20,
observe:[photos]}

{url:/en/Paris/photos.json?skip=0&number=20,
observe:[photos]}
when somebody votes
       { url:/en/Paris.*, observe:photos }

               when the summary changes
       { url:/en/Paris.*, observe:summary }

              when the a new link is created
        { url:/en/Paris.*, observe:links }


{url:/en/Paris, observe:[summary,links]}

{url:/en/Paris/fragment/left_col, observe: [cover]}

{url:/en/Paris/photos.json?skip=0&number=20,
observe:[photos]}

{url:/en/Paris/photos.json?skip=0&number=20,
observe:[photos]}
Other uses cases

•   Timeline activities storage: just one more BTree
    usage.

•   Moderation workflow data: tiny dataset, but more
    complex queries, map/reduce.

•   Suspended experimentation around log
    collection and analysis
Current situation

•   Mysql: main data store

•   CouchDB: old timelines (+ chef)

•   MongoDB: metacache, wikipedia, moderation,
    new timelines

•   Redis: raw data cache for counters, recent
    activity (+ resque)
What about the main store ?


  •   albums are good fit for documents

  •   votes and score may be more tricky

  •   recent introduction of resque
In short
•   Simple, fast.

•   Hackable: in a language most can read.

•   Clear roadmap.

•   Very helpful and efficient team.

•   Designed with application developer needs in
    mind.

More Related Content

What's hot

The Future of Bundled Bundler
The Future of Bundled BundlerThe Future of Bundled Bundler
The Future of Bundled Bundler
Hiroshi SHIBATA
 
Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
SATOSHI TAGOMORI
 
Code4 lib 20141129 python
Code4 lib 20141129 pythonCode4 lib 20141129 python
Code4 lib 20141129 python
tdsmithCapU
 
Gems on Ruby
Gems on RubyGems on Ruby
Gems on Ruby
Hiroshi SHIBATA
 
The Future of Dependency Management for Ruby
The Future of Dependency Management for RubyThe Future of Dependency Management for Ruby
The Future of Dependency Management for Ruby
Hiroshi SHIBATA
 
Roadmap for RubyGems 4 and Bundler 3
Roadmap for RubyGems 4 and Bundler 3Roadmap for RubyGems 4 and Bundler 3
Roadmap for RubyGems 4 and Bundler 3
Hiroshi SHIBATA
 
Productive web applications that run only on the frontend
Productive web applications that run only on the frontendProductive web applications that run only on the frontend
Productive web applications that run only on the frontend
Stefan Adolf
 
Catmandu presentation at SWIB 2013
Catmandu presentation at SWIB 2013Catmandu presentation at SWIB 2013
Catmandu presentation at SWIB 2013nicsteenlant
 
Apache Camel K - Fredericia
Apache Camel K - FredericiaApache Camel K - Fredericia
Apache Camel K - Fredericia
Claus Ibsen
 
Gems on Ruby
Gems on RubyGems on Ruby
Gems on Ruby
Hiroshi SHIBATA
 
Developing OpenResty Framework
Developing OpenResty FrameworkDeveloping OpenResty Framework
Developing OpenResty Framework
OpenRestyCon
 
Dependency Resolution with Standard Libraries
Dependency Resolution with Standard LibrariesDependency Resolution with Standard Libraries
Dependency Resolution with Standard Libraries
Hiroshi SHIBATA
 
Be a microservices hero
Be a microservices heroBe a microservices hero
Be a microservices hero
OpenRestyCon
 
Not only SQL
Not only SQL Not only SQL
Not only SQL
Niklas Gustavsson
 
The Future of library dependency management of Ruby
 The Future of library dependency management of Ruby The Future of library dependency management of Ruby
The Future of library dependency management of Ruby
Hiroshi SHIBATA
 
How to distribute Ruby to the world
How to distribute Ruby to the worldHow to distribute Ruby to the world
How to distribute Ruby to the world
Hiroshi SHIBATA
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2
Claus Ibsen
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
SATOSHI TAGOMORI
 
How to distribute Ruby to the world
How to distribute Ruby to the worldHow to distribute Ruby to the world
How to distribute Ruby to the world
Hiroshi SHIBATA
 
OSS Security the hard way
OSS Security the hard wayOSS Security the hard way
OSS Security the hard way
Hiroshi SHIBATA
 

What's hot (20)

The Future of Bundled Bundler
The Future of Bundled BundlerThe Future of Bundled Bundler
The Future of Bundled Bundler
 
Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Code4 lib 20141129 python
Code4 lib 20141129 pythonCode4 lib 20141129 python
Code4 lib 20141129 python
 
Gems on Ruby
Gems on RubyGems on Ruby
Gems on Ruby
 
The Future of Dependency Management for Ruby
The Future of Dependency Management for RubyThe Future of Dependency Management for Ruby
The Future of Dependency Management for Ruby
 
Roadmap for RubyGems 4 and Bundler 3
Roadmap for RubyGems 4 and Bundler 3Roadmap for RubyGems 4 and Bundler 3
Roadmap for RubyGems 4 and Bundler 3
 
Productive web applications that run only on the frontend
Productive web applications that run only on the frontendProductive web applications that run only on the frontend
Productive web applications that run only on the frontend
 
Catmandu presentation at SWIB 2013
Catmandu presentation at SWIB 2013Catmandu presentation at SWIB 2013
Catmandu presentation at SWIB 2013
 
Apache Camel K - Fredericia
Apache Camel K - FredericiaApache Camel K - Fredericia
Apache Camel K - Fredericia
 
Gems on Ruby
Gems on RubyGems on Ruby
Gems on Ruby
 
Developing OpenResty Framework
Developing OpenResty FrameworkDeveloping OpenResty Framework
Developing OpenResty Framework
 
Dependency Resolution with Standard Libraries
Dependency Resolution with Standard LibrariesDependency Resolution with Standard Libraries
Dependency Resolution with Standard Libraries
 
Be a microservices hero
Be a microservices heroBe a microservices hero
Be a microservices hero
 
Not only SQL
Not only SQL Not only SQL
Not only SQL
 
The Future of library dependency management of Ruby
 The Future of library dependency management of Ruby The Future of library dependency management of Ruby
The Future of library dependency management of Ruby
 
How to distribute Ruby to the world
How to distribute Ruby to the worldHow to distribute Ruby to the world
How to distribute Ruby to the world
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
How to distribute Ruby to the world
How to distribute Ruby to the worldHow to distribute Ruby to the world
How to distribute Ruby to the world
 
OSS Security the hard way
OSS Security the hard wayOSS Security the hard way
OSS Security the hard way
 

Similar to Mongodb, our Swiss Army Knife Database

VelocityConf EU 2013 - Turbocharge your mobile web apps by using offline
VelocityConf EU 2013 - Turbocharge your mobile web apps by using offline VelocityConf EU 2013 - Turbocharge your mobile web apps by using offline
VelocityConf EU 2013 - Turbocharge your mobile web apps by using offline
Jan Jongboom
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
Divante
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
Konstantin Gredeskoul
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.js
async_io
 
Optimization of modern web applications
Optimization of modern web applicationsOptimization of modern web applications
Optimization of modern web applications
Eugene Lazutkin
 
A Tale of 2 Systems
A Tale of 2 SystemsA Tale of 2 Systems
A Tale of 2 Systems
David Newman
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017
ElifTech
 
Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015
ontopic
 
Isomorphic JavaScript with Node, WebPack, and React
Isomorphic JavaScript with Node, WebPack, and ReactIsomorphic JavaScript with Node, WebPack, and React
Isomorphic JavaScript with Node, WebPack, and React
Tyler Peterson
 
Introduce flux & react in practice
Introduce flux & react in practiceIntroduce flux & react in practice
Introduce flux & react in practice
Hsuan Fu Lien
 
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Rhio Kim
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
LINAGORA
 
They why behind php frameworks
They why behind php frameworksThey why behind php frameworks
They why behind php frameworks
Kirk Madera
 
Firefox Crash Reporting (@ Open Source Bridge)
Firefox Crash Reporting (@ Open Source Bridge)Firefox Crash Reporting (@ Open Source Bridge)
Firefox Crash Reporting (@ Open Source Bridge)lauraxthomson
 
Crash reports pycodeconf
Crash reports pycodeconfCrash reports pycodeconf
Crash reports pycodeconflauraxthomson
 
bol.com Dutch Container Day presentation
bol.com Dutch Container Day presentationbol.com Dutch Container Day presentation
bol.com Dutch Container Day presentation
Maarten Dirkse
 
Webdevcon Keynote hh-2012-09-18
Webdevcon Keynote hh-2012-09-18Webdevcon Keynote hh-2012-09-18
Webdevcon Keynote hh-2012-09-18Pierre Joye
 
Client Side Performance for Back End Developers - Cambridge .NET User Group -...
Client Side Performance for Back End Developers - Cambridge .NET User Group -...Client Side Performance for Back End Developers - Cambridge .NET User Group -...
Client Side Performance for Back End Developers - Cambridge .NET User Group -...
Bart Read
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 

Similar to Mongodb, our Swiss Army Knife Database (20)

VelocityConf EU 2013 - Turbocharge your mobile web apps by using offline
VelocityConf EU 2013 - Turbocharge your mobile web apps by using offline VelocityConf EU 2013 - Turbocharge your mobile web apps by using offline
VelocityConf EU 2013 - Turbocharge your mobile web apps by using offline
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.js
 
Optimization of modern web applications
Optimization of modern web applicationsOptimization of modern web applications
Optimization of modern web applications
 
A Tale of 2 Systems
A Tale of 2 SystemsA Tale of 2 Systems
A Tale of 2 Systems
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017
 
Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015
 
Isomorphic JavaScript with Node, WebPack, and React
Isomorphic JavaScript with Node, WebPack, and ReactIsomorphic JavaScript with Node, WebPack, and React
Isomorphic JavaScript with Node, WebPack, and React
 
Introduce flux & react in practice
Introduce flux & react in practiceIntroduce flux & react in practice
Introduce flux & react in practice
 
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
They why behind php frameworks
They why behind php frameworksThey why behind php frameworks
They why behind php frameworks
 
Firefox Crash Reporting (@ Open Source Bridge)
Firefox Crash Reporting (@ Open Source Bridge)Firefox Crash Reporting (@ Open Source Bridge)
Firefox Crash Reporting (@ Open Source Bridge)
 
Crash reports pycodeconf
Crash reports pycodeconfCrash reports pycodeconf
Crash reports pycodeconf
 
bol.com Dutch Container Day presentation
bol.com Dutch Container Day presentationbol.com Dutch Container Day presentation
bol.com Dutch Container Day presentation
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Webdevcon Keynote hh-2012-09-18
Webdevcon Keynote hh-2012-09-18Webdevcon Keynote hh-2012-09-18
Webdevcon Keynote hh-2012-09-18
 
Client Side Performance for Back End Developers - Cambridge .NET User Group -...
Client Side Performance for Back End Developers - Cambridge .NET User Group -...Client Side Performance for Back End Developers - Cambridge .NET User Group -...
Client Side Performance for Back End Developers - Cambridge .NET User Group -...
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

Mongodb, our Swiss Army Knife Database

  • 1. MongoDB at fotopedia Timeline storage Our Swiss Army Knife Database
  • 2. MongoDB at fotopedia • Context • Wikipedia data storage • Metacache
  • 3. Fotopedia • Fotonauts, an American/French company • Photo — Encyclopedia • Heavily interconnected system : flickr, facebook, wikipedia, picassa, twitter… • MongoDB in production since last october • main store lives in MySQL… for now
  • 4. First contact • Wikipedia imported data
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Wikipedia queries • wikilinks from one article • links to one article • geo coordinates • redirect • why not use wikipedia API ?
  • 10. Download ~ 5.7GB gzip XML Geo Redirect Backlink Related ~12GB tabular data
  • 11. Problem Load ~12GB into a K/V store
  • 12. CouchDB 0.9 attempt • CouchDB had no dedicated import tool • need to go through HTTP / Rest API
  • 13. “DATA LOADING” LOADING! (obviously hijacked from xkcd.com)
  • 14. Problem, rephrased Load ~12GB into any K/V store in hours, not days
  • 15. Hadoop HBase ? • as we were already using Hadoop Map/ Reduce for preparation • bulk load was just emerging at that time, requiring to code against HBase private APIs, generate the data in an ad-hoc binary format, ...
  • 16. photo by neural.it on Flickr
  • 17. Problem, rerephrased Load ~12GB into any K/V store in hours, not days without wasting a week on development and another week on setup and several months on tuning please ?
  • 18. MongoDB attempt • Transforming the tabular data into a JSON form : about half an hour or code, 45 minutes of hadoop parallel processing • setup mongo server : 15 minutes • mongoimport : 3 minutes to start it, 90 minutes to run • plug RoR app on mongo : minutes • prototype was done in a day
  • 19. Batch Synchronous Download ~ 5.7GB gzip Geo Redirect Backlink Ruby on Rails Related ~12GB, 12M docs
  • 20. Hot swap ? • Indexing was locking everything. • Just run two instances of MongoDB. • One instance is servicing the web app • One instance is asleep or loading data • One third instance knows the status of the two instances.
  • 21. We loved: • JSON import format • efficiency of mongoimport • simple and flexible installation • just one cumbersome dependency • easy to start (we use runit) • easy to have several instances on one box
  • 22. Second contact • itʼs just all about graphes, anyway. • wikilinks • people following people • related community albums • and soon, interlanguage links
  • 23.
  • 24. all about graphes... • ... and itʼs also all about cache. • The application needs to “feel” faster, letʼs cache more. • The application needs to “feel” right, so letʼs cache less. • or — big sigh — invalidate.
  • 25.
  • 26. Page fragment caching photo by Mykl Roventine on Flickr Nginx SSI photo by Aires Dos Santos Varnish HTTP cache RoR application photo by Leslie Chatfield on Flickr
  • 27. Haiku ? There are only two hard things in Computer Science: cache invalidation and naming things. Phil Karlton
  • 28. Naming things • REST have been a strong design principle in fotopedia since the early days, and the efforts are paying.
  • 29. /en/2nd_arrondissement_of_Paris /en/Paris/fragment/left_col /users/john/fragment/contrib /en/Paris/fragment/related
  • 30. Invalidating • Rest allows us to invalidate by URL prefix. • When the Paris album changes, we have to invalidate /en/Paris.*
  • 31. Varnish invalidation • Varnish built-in regexp based invalidation is not designed for intensive, fine grained invalidation. • We need to invalidate URLs individually.
  • 33. Metacache workflow /en/Paris.* invalidation worker Nginx SSI /en/Paris /en/Paris/fragment/left_col /en/Paris/photos.json?skip=0&number=20 /en/Paris/photos.json?skip=13&number=27 Varnish HTTP cache varnish log /en/Paris/fragment/left_col RoR application metacache feeder
  • 34. Waw. • This time we are actually using MongoDB as a BTree. Impressive. • The metacache has been running fine for several months, and we want to go further.
  • 35. Invalidate less • We need to be more specific as to what we invalidate. • Today, if somebody votes on a photo in the Paris album, we invalidate all /en/Paris prefix, and most of it is unchanged. • We will move towards a more clever metacache.
  • 36. Metacache reloaded • Pub/Sub metacache • Have the backend send a specific header to be caught by the metacache-feeder, conaining “subscribe” message. • This header will be a JSON document, to be pushed to the metacache. • The purge commands will be mongo search queries.
  • 37. /en/Paris /en/Paris/fragment/left_col /en/Paris/photos.json?skip=0&number=20 /en/Paris/photos.json?skip=13&number=27 {url:/en/Paris, observe:[summary,links]} {url:/en/Paris/fragment/left_col, observe: [cover]} {url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]} {url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}
  • 38. when somebody votes { url:/en/Paris.*, observe:photos } when the summary changes { url:/en/Paris.*, observe:summary } when the a new link is created { url:/en/Paris.*, observe:links } {url:/en/Paris, observe:[summary,links]} {url:/en/Paris/fragment/left_col, observe: [cover]} {url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]} {url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}
  • 39. Other uses cases • Timeline activities storage: just one more BTree usage. • Moderation workflow data: tiny dataset, but more complex queries, map/reduce. • Suspended experimentation around log collection and analysis
  • 40. Current situation • Mysql: main data store • CouchDB: old timelines (+ chef) • MongoDB: metacache, wikipedia, moderation, new timelines • Redis: raw data cache for counters, recent activity (+ resque)
  • 41. What about the main store ? • albums are good fit for documents • votes and score may be more tricky • recent introduction of resque
  • 42. In short • Simple, fast. • Hackable: in a language most can read. • Clear roadmap. • Very helpful and efficient team. • Designed with application developer needs in mind.