MongoATL: How Sourceforge is Using MongoDB

How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]

SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!

Moving to NoSQL FossFor.us used CouchDB (NoSQL) “ Just adding new fields was trivial, and was happening all the time” – Mark Ramm Scaling up to the level of SF.net needs research CouchDB MongoDB Tokyo Cabinet/Tyrant Cassandra... and others

Rewriting “Consume” Most traffic on SF.net hits 3 types of pages: Project Summary File Browser Download Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net Original goal is 1 MongoDB document per project Later split release data because some projects have lots of releases Periodic updates via RSS and AMQP from “Develop”

Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave

Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0

SF.net Downloads Allow non-sf.net projects to use SourceForge mirror network Stats calculated in Hadoop and stored/served from MongoDB Same deployment architecture as Consume (4 web, 1 db)

Allura (SF.net “beta” devtools) Rewrite developer tools with new architecture Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come Single MongoDB replica set manually sharded by project Release early & often

What We Liked Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers Schemaless server allows fast schema evolution in development, making many migrations unnecessary Replication is easy , making scalability and backups easy Keep a “backup slave” running Kill backup slave, copy off database, bring back up the slave Automatic re-sync with master Query Language You mean I can have performance without map-reduce? GridFS

Pitfalls Too-large documents Store less per document Return only a few fields Ignoring indexing Watch your server log; bad queries show up there Ignoring your data’s schema Using many databases when one will do Using too many queries

Ming – an “Object-Document Mapper?” Your data has a schema Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes Sometimes lazy, sometimes eager Queuing up all your updates can be handy Python dicts are nice; objects are nicer

Ming Concepts Inspired by SQLAlchemy Group of classes to which you map your collections Each class defines its schema, including indexes Convenience methods for loading/saving objects and ensuring indexes are created Migrations Unit of Work – great for web applications MIM – “Mongo in Memory” nice for unit tests

Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = 'wiki_page' _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping

Open Source Ming http://sf.net/projects/merciless/ MIT License Allura http://sf.net/p/allura/ Apache License

Future Work mongos New Allura Tools Migrating legacy SF.net projects to Allura Stats all in MongoDB rather than Hadoop? Better APIs to access your project data

Rick Copeland @rick446 [email_address]

MongoATL: How Sourceforge is Using MongoDB

More Related Content

What's hot

Viewers also liked

Similar to MongoATL: How Sourceforge is Using MongoDB

More from Rick Copeland

Recently uploaded

MongoATL: How Sourceforge is Using MongoDB