Your SlideShare is downloading. ×
  • Like
MongoATL: How Sourceforge is Using MongoDB
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

MongoATL: How Sourceforge is Using MongoDB


How Sourceforge is Using MongoDB

How Sourceforge is Using MongoDB

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
  • 2. “BlackOps”: User Editable! Web 2.0! (ish) Not Ugly!
  • 3. Moving to NoSQL
    • used CouchDB (NoSQL)
    • “ Just adding new fields was trivial, and was happening all the time” – Mark Ramm
    • Scaling up to the level of needs research
      • CouchDB
      • MongoDB
      • Tokyo Cabinet/Tyrant
      • Cassandra... and others
  • 4. Rewriting “Consume”
    • Most traffic on hits 3 types of pages:
      • Project Summary
      • File Browser
      • Download
    • Pages are read-mostly, with infrequent updates from the “Develop” side of
    • Original goal is 1 MongoDB document per project
      • Later split release data because some projects have lots of releases
    • Periodic updates via RSS and AMQP from “Develop”
  • 5. Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
  • 6. Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
    • Allow projects to use SourceForge mirror network
    • Stats calculated in Hadoop and stored/served from MongoDB
    • Same deployment architecture as Consume (4 web, 1 db)
  • 8. Allura ( “beta” devtools)
    • Rewrite developer tools with new architecture
    • Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come
    • Single MongoDB replica set manually sharded by project
    • Release early & often
  • 9. What We Liked
    • Performance, performance, performance – Easily handle 90% of traffic from 1 DB server, 4 web servers
    • Schemaless server allows fast schema evolution in development, making many migrations unnecessary
    • Replication is easy , making scalability and backups easy
      • Keep a “backup slave” running
      • Kill backup slave, copy off database, bring back up the slave
      • Automatic re-sync with master
    • Query Language
      • You mean I can have performance without map-reduce?
    • GridFS
  • 10. Pitfalls
    • Too-large documents
      • Store less per document
      • Return only a few fields
    • Ignoring indexing
      • Watch your server log; bad queries show up there
    • Ignoring your data’s schema
    • Using many databases when one will do
    • Using too many queries
  • 11. Ming – an “Object-Document Mapper?”
    • Your data has a schema
      • Your database can define and enforce it
      • It can live in your application (as with MongoDB)
      • Nice to have the schema defined in one place in the code
    • Sometimes you need a “migration”
      • Changing the structure/meaning of fields
      • Adding indexes
      • Sometimes lazy, sometimes eager
    • Queuing up all your updates can be handy
    • Python dicts are nice; objects are nicer
  • 12. Ming Concepts
    • Inspired by SQLAlchemy
    • Group of classes to which you map your collections
    • Each class defines its schema, including indexes
    • Convenience methods for loading/saving objects and ensuring indexes are created
    • Migrations
    • Unit of Work – great for web applications
    • MIM – “Mongo in Memory” nice for unit tests
  • 13. Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = 'wiki_page' _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping
  • 14. Open Source
    • Ming
    • MIT License
    • Allura
    • Apache License
  • 15. Future Work
    • mongos
    • New Allura Tools
    • Migrating legacy projects to Allura
    • Stats all in MongoDB rather than Hadoop?
    • Better APIs to access your project data
  • 16. Questions?
  • 17. Rick Copeland @rick446 [email_address]