Your SlideShare is downloading. ×
0
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MongoATL: How Sourceforge is Using MongoDB

7,371

Published on

How Sourceforge is Using MongoDB

How Sourceforge is Using MongoDB

Published in: Technology
2 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
7,371
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
104
Comments
2
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
  • 2. SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
  • 3. Moving to NoSQL <ul><li>FossFor.us used CouchDB (NoSQL) </li></ul><ul><li>“ Just adding new fields was trivial, and was happening all the time” – Mark Ramm </li></ul><ul><li>Scaling up to the level of SF.net needs research </li></ul><ul><ul><li>CouchDB </li></ul></ul><ul><ul><li>MongoDB </li></ul></ul><ul><ul><li>Tokyo Cabinet/Tyrant </li></ul></ul><ul><ul><li>Cassandra... and others </li></ul></ul>
  • 4. Rewriting “Consume” <ul><li>Most traffic on SF.net hits 3 types of pages: </li></ul><ul><ul><li>Project Summary </li></ul></ul><ul><ul><li>File Browser </li></ul></ul><ul><ul><li>Download </li></ul></ul><ul><li>Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net </li></ul><ul><li>Original goal is 1 MongoDB document per project </li></ul><ul><ul><li>Later split release data because some projects have lots of releases </li></ul></ul><ul><li>Periodic updates via RSS and AMQP from “Develop” </li></ul>
  • 5. Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
  • 6. Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
  • 7. SF.net Downloads <ul><li>Allow non-sf.net projects to use SourceForge mirror network </li></ul><ul><li>Stats calculated in Hadoop and stored/served from MongoDB </li></ul><ul><li>Same deployment architecture as Consume (4 web, 1 db) </li></ul>
  • 8. Allura (SF.net “beta” devtools) <ul><li>Rewrite developer tools with new architecture </li></ul><ul><li>Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come </li></ul><ul><li>Single MongoDB replica set manually sharded by project </li></ul><ul><li>Release early &amp; often </li></ul>
  • 9. What We Liked <ul><li>Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers </li></ul><ul><li>Schemaless server allows fast schema evolution in development, making many migrations unnecessary </li></ul><ul><li>Replication is easy , making scalability and backups easy </li></ul><ul><ul><li>Keep a “backup slave” running </li></ul></ul><ul><ul><li>Kill backup slave, copy off database, bring back up the slave </li></ul></ul><ul><ul><li>Automatic re-sync with master </li></ul></ul><ul><li>Query Language </li></ul><ul><ul><li>You mean I can have performance without map-reduce? </li></ul></ul><ul><li>GridFS </li></ul>
  • 10. Pitfalls <ul><li>Too-large documents </li></ul><ul><ul><li>Store less per document </li></ul></ul><ul><ul><li>Return only a few fields </li></ul></ul><ul><li>Ignoring indexing </li></ul><ul><ul><li>Watch your server log; bad queries show up there </li></ul></ul><ul><li>Ignoring your data’s schema </li></ul><ul><li>Using many databases when one will do </li></ul><ul><li>Using too many queries </li></ul>
  • 11. Ming – an “Object-Document Mapper?” <ul><li>Your data has a schema </li></ul><ul><ul><li>Your database can define and enforce it </li></ul></ul><ul><ul><li>It can live in your application (as with MongoDB) </li></ul></ul><ul><ul><li>Nice to have the schema defined in one place in the code </li></ul></ul><ul><li>Sometimes you need a “migration” </li></ul><ul><ul><li>Changing the structure/meaning of fields </li></ul></ul><ul><ul><li>Adding indexes </li></ul></ul><ul><ul><li>Sometimes lazy, sometimes eager </li></ul></ul><ul><li>Queuing up all your updates can be handy </li></ul><ul><li>Python dicts are nice; objects are nicer </li></ul>
  • 12. Ming Concepts <ul><li>Inspired by SQLAlchemy </li></ul><ul><li>Group of classes to which you map your collections </li></ul><ul><li>Each class defines its schema, including indexes </li></ul><ul><li>Convenience methods for loading/saving objects and ensuring indexes are created </li></ul><ul><li>Migrations </li></ul><ul><li>Unit of Work – great for web applications </li></ul><ul><li>MIM – “Mongo in Memory” nice for unit tests </li></ul>
  • 13. Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = &apos;wiki_page&apos; _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( &apos;WikiComment&apos; ) MappedClass . compile_all() # Lets ming know about the mapping
  • 14. Open Source <ul><li>Ming </li></ul><ul><li>http://sf.net/projects/merciless/ </li></ul><ul><li>MIT License </li></ul><ul><li>Allura </li></ul><ul><li>http://sf.net/p/allura/ </li></ul><ul><li>Apache License </li></ul>
  • 15. Future Work <ul><li>mongos </li></ul><ul><li>New Allura Tools </li></ul><ul><li>Migrating legacy SF.net projects to Allura </li></ul><ul><li>Stats all in MongoDB rather than Hadoop? </li></ul><ul><li>Better APIs to access your project data </li></ul>
  • 16. Questions?
  • 17. Rick Copeland @rick446 [email_address]

×