Your SlideShare is downloading. ×
0
How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
Moving to NoSQL <ul><li>FossFor.us used CouchDB (NoSQL) </li></ul><ul><li>“ Just adding new fields was trivial, and was ha...
Rewriting “Consume” <ul><li>Most traffic on SF.net hits 3 types of pages: </li></ul><ul><ul><li>Project Summary </li></ul>...
Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2...
Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node  performance...
SF.net Downloads <ul><li>Allow non-sf.net projects to use SourceForge mirror network </li></ul><ul><li>Stats calculated in...
Allura  (SF.net “beta” devtools) <ul><li>Rewrite developer tools with new architecture </li></ul><ul><li>Wiki, Tracker, Di...
What We Liked <ul><li>Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web ...
Pitfalls <ul><li>Too-large documents </li></ul><ul><ul><li>Store less per document </li></ul></ul><ul><ul><li>Return only ...
Ming –  an “Object-Document Mapper?” <ul><li>Your data has a schema </li></ul><ul><ul><li>Your database can define and enf...
Ming Concepts <ul><li>Inspired by SQLAlchemy </li></ul><ul><li>Group of classes to which you map your collections </li></u...
Ming Example from   ming   import  schema from   ming.orm   import  MappedClass from   ming.orm   import  (FieldProperty, ...
Open Source <ul><li>Ming </li></ul><ul><li>http://sf.net/projects/merciless/ </li></ul><ul><li>MIT License </li></ul><ul><...
Future Work <ul><li>mongos </li></ul><ul><li>New Allura Tools </li></ul><ul><li>Migrating legacy SF.net projects to Allura...
Questions?
Rick Copeland @rick446 [email_address]
Upcoming SlideShare
Loading in...5
×

MongoATL: How Sourceforge is Using MongoDB

7,399

Published on

How Sourceforge is Using MongoDB

Published in: Technology
2 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
7,399
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
106
Comments
2
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "MongoATL: How Sourceforge is Using MongoDB"

  1. 1. How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
  2. 2. SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
  3. 3. Moving to NoSQL <ul><li>FossFor.us used CouchDB (NoSQL) </li></ul><ul><li>“ Just adding new fields was trivial, and was happening all the time” – Mark Ramm </li></ul><ul><li>Scaling up to the level of SF.net needs research </li></ul><ul><ul><li>CouchDB </li></ul></ul><ul><ul><li>MongoDB </li></ul></ul><ul><ul><li>Tokyo Cabinet/Tyrant </li></ul></ul><ul><ul><li>Cassandra... and others </li></ul></ul>
  4. 4. Rewriting “Consume” <ul><li>Most traffic on SF.net hits 3 types of pages: </li></ul><ul><ul><li>Project Summary </li></ul></ul><ul><ul><li>File Browser </li></ul></ul><ul><ul><li>Download </li></ul></ul><ul><li>Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net </li></ul><ul><li>Original goal is 1 MongoDB document per project </li></ul><ul><ul><li>Later split release data because some projects have lots of releases </li></ul></ul><ul><li>Periodic updates via RSS and AMQP from “Develop” </li></ul>
  5. 5. Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
  6. 6. Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
  7. 7. SF.net Downloads <ul><li>Allow non-sf.net projects to use SourceForge mirror network </li></ul><ul><li>Stats calculated in Hadoop and stored/served from MongoDB </li></ul><ul><li>Same deployment architecture as Consume (4 web, 1 db) </li></ul>
  8. 8. Allura (SF.net “beta” devtools) <ul><li>Rewrite developer tools with new architecture </li></ul><ul><li>Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come </li></ul><ul><li>Single MongoDB replica set manually sharded by project </li></ul><ul><li>Release early & often </li></ul>
  9. 9. What We Liked <ul><li>Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers </li></ul><ul><li>Schemaless server allows fast schema evolution in development, making many migrations unnecessary </li></ul><ul><li>Replication is easy , making scalability and backups easy </li></ul><ul><ul><li>Keep a “backup slave” running </li></ul></ul><ul><ul><li>Kill backup slave, copy off database, bring back up the slave </li></ul></ul><ul><ul><li>Automatic re-sync with master </li></ul></ul><ul><li>Query Language </li></ul><ul><ul><li>You mean I can have performance without map-reduce? </li></ul></ul><ul><li>GridFS </li></ul>
  10. 10. Pitfalls <ul><li>Too-large documents </li></ul><ul><ul><li>Store less per document </li></ul></ul><ul><ul><li>Return only a few fields </li></ul></ul><ul><li>Ignoring indexing </li></ul><ul><ul><li>Watch your server log; bad queries show up there </li></ul></ul><ul><li>Ignoring your data’s schema </li></ul><ul><li>Using many databases when one will do </li></ul><ul><li>Using too many queries </li></ul>
  11. 11. Ming – an “Object-Document Mapper?” <ul><li>Your data has a schema </li></ul><ul><ul><li>Your database can define and enforce it </li></ul></ul><ul><ul><li>It can live in your application (as with MongoDB) </li></ul></ul><ul><ul><li>Nice to have the schema defined in one place in the code </li></ul></ul><ul><li>Sometimes you need a “migration” </li></ul><ul><ul><li>Changing the structure/meaning of fields </li></ul></ul><ul><ul><li>Adding indexes </li></ul></ul><ul><ul><li>Sometimes lazy, sometimes eager </li></ul></ul><ul><li>Queuing up all your updates can be handy </li></ul><ul><li>Python dicts are nice; objects are nicer </li></ul>
  12. 12. Ming Concepts <ul><li>Inspired by SQLAlchemy </li></ul><ul><li>Group of classes to which you map your collections </li></ul><ul><li>Each class defines its schema, including indexes </li></ul><ul><li>Convenience methods for loading/saving objects and ensuring indexes are created </li></ul><ul><li>Migrations </li></ul><ul><li>Unit of Work – great for web applications </li></ul><ul><li>MIM – “Mongo in Memory” nice for unit tests </li></ul>
  13. 13. Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = 'wiki_page' _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping
  14. 14. Open Source <ul><li>Ming </li></ul><ul><li>http://sf.net/projects/merciless/ </li></ul><ul><li>MIT License </li></ul><ul><li>Allura </li></ul><ul><li>http://sf.net/p/allura/ </li></ul><ul><li>Apache License </li></ul>
  15. 15. Future Work <ul><li>mongos </li></ul><ul><li>New Allura Tools </li></ul><ul><li>Migrating legacy SF.net projects to Allura </li></ul><ul><li>Stats all in MongoDB rather than Hadoop? </li></ul><ul><li>Better APIs to access your project data </li></ul>
  16. 16. Questions?
  17. 17. Rick Copeland @rick446 [email_address]
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×