Your SlideShare is downloading. ×
MongoATL: How Sourceforge is Using MongoDB
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MongoATL: How Sourceforge is Using MongoDB

7,329
views

Published on

How Sourceforge is Using MongoDB

How Sourceforge is Using MongoDB

Published in: Technology

2 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
7,329
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
104
Comments
2
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
  • 2. SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
  • 3. Moving to NoSQL
    • FossFor.us used CouchDB (NoSQL)
    • “ Just adding new fields was trivial, and was happening all the time” – Mark Ramm
    • Scaling up to the level of SF.net needs research
      • CouchDB
      • MongoDB
      • Tokyo Cabinet/Tyrant
      • Cassandra... and others
  • 4. Rewriting “Consume”
    • Most traffic on SF.net hits 3 types of pages:
      • Project Summary
      • File Browser
      • Download
    • Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net
    • Original goal is 1 MongoDB document per project
      • Later split release data because some projects have lots of releases
    • Periodic updates via RSS and AMQP from “Develop”
  • 5. Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
  • 6. Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
  • 7. SF.net Downloads
    • Allow non-sf.net projects to use SourceForge mirror network
    • Stats calculated in Hadoop and stored/served from MongoDB
    • Same deployment architecture as Consume (4 web, 1 db)
  • 8. Allura (SF.net “beta” devtools)
    • Rewrite developer tools with new architecture
    • Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come
    • Single MongoDB replica set manually sharded by project
    • Release early & often
  • 9. What We Liked
    • Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers
    • Schemaless server allows fast schema evolution in development, making many migrations unnecessary
    • Replication is easy , making scalability and backups easy
      • Keep a “backup slave” running
      • Kill backup slave, copy off database, bring back up the slave
      • Automatic re-sync with master
    • Query Language
      • You mean I can have performance without map-reduce?
    • GridFS
  • 10. Pitfalls
    • Too-large documents
      • Store less per document
      • Return only a few fields
    • Ignoring indexing
      • Watch your server log; bad queries show up there
    • Ignoring your data’s schema
    • Using many databases when one will do
    • Using too many queries
  • 11. Ming – an “Object-Document Mapper?”
    • Your data has a schema
      • Your database can define and enforce it
      • It can live in your application (as with MongoDB)
      • Nice to have the schema defined in one place in the code
    • Sometimes you need a “migration”
      • Changing the structure/meaning of fields
      • Adding indexes
      • Sometimes lazy, sometimes eager
    • Queuing up all your updates can be handy
    • Python dicts are nice; objects are nicer
  • 12. Ming Concepts
    • Inspired by SQLAlchemy
    • Group of classes to which you map your collections
    • Each class defines its schema, including indexes
    • Convenience methods for loading/saving objects and ensuring indexes are created
    • Migrations
    • Unit of Work – great for web applications
    • MIM – “Mongo in Memory” nice for unit tests
  • 13. Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = 'wiki_page' _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping
  • 14. Open Source
    • Ming
    • http://sf.net/projects/merciless/
    • MIT License
    • Allura
    • http://sf.net/p/allura/
    • Apache License
  • 15. Future Work
    • mongos
    • New Allura Tools
    • Migrating legacy SF.net projects to Allura
    • Stats all in MongoDB rather than Hadoop?
    • Better APIs to access your project data
  • 16. Questions?
  • 17. Rick Copeland @rick446 [email_address]

×