MongoATL: How Sourceforge is Using MongoDB
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

MongoATL: How Sourceforge is Using MongoDB

  • 7,947 views
Uploaded on

How Sourceforge is Using MongoDB

How Sourceforge is Using MongoDB

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
7,947
On Slideshare
6,363
From Embeds
1,584
Number of Embeds
25

Actions

Shares
Downloads
100
Comments
2
Likes
4

Embeds 1,584

http://blog.nosqlfan.com 862
http://blog.pythonisito.com 306
http://cloud.csdn.net 173
http://feeds.feedburner.com 102
http://www.csdn.net 81
http://www.lifeyun.com 19
http://lanyrd.com 10
http://static.slidesharecdn.com 6
http://twitter.com 3
https://www.linkedin.com 3
http://www.linkedin.com 3
http://reader.youdao.com 2
http://xue.uplook.cn 2
http://zoomq.qiniudn.com 1
http://www.uplook.cn 1
http://www.niwozhi.net 1
http://www.haohtml.com 1
https://vtunnel.com 1
http://xss.yandex.net 1
http://paper.li 1
http://cache.baidu.com 1
http://translate.googleusercontent.com 1
http://xianguo.com 1
http://www.zhuaxia.com 1
http://cache.baiducontent.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
  • 2. SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
  • 3. Moving to NoSQL
    • FossFor.us used CouchDB (NoSQL)
    • “ Just adding new fields was trivial, and was happening all the time” – Mark Ramm
    • Scaling up to the level of SF.net needs research
      • CouchDB
      • MongoDB
      • Tokyo Cabinet/Tyrant
      • Cassandra... and others
  • 4. Rewriting “Consume”
    • Most traffic on SF.net hits 3 types of pages:
      • Project Summary
      • File Browser
      • Download
    • Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net
    • Original goal is 1 MongoDB document per project
      • Later split release data because some projects have lots of releases
    • Periodic updates via RSS and AMQP from “Develop”
  • 5. Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
  • 6. Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
  • 7. SF.net Downloads
    • Allow non-sf.net projects to use SourceForge mirror network
    • Stats calculated in Hadoop and stored/served from MongoDB
    • Same deployment architecture as Consume (4 web, 1 db)
  • 8. Allura (SF.net “beta” devtools)
    • Rewrite developer tools with new architecture
    • Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come
    • Single MongoDB replica set manually sharded by project
    • Release early & often
  • 9. What We Liked
    • Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers
    • Schemaless server allows fast schema evolution in development, making many migrations unnecessary
    • Replication is easy , making scalability and backups easy
      • Keep a “backup slave” running
      • Kill backup slave, copy off database, bring back up the slave
      • Automatic re-sync with master
    • Query Language
      • You mean I can have performance without map-reduce?
    • GridFS
  • 10. Pitfalls
    • Too-large documents
      • Store less per document
      • Return only a few fields
    • Ignoring indexing
      • Watch your server log; bad queries show up there
    • Ignoring your data’s schema
    • Using many databases when one will do
    • Using too many queries
  • 11. Ming – an “Object-Document Mapper?”
    • Your data has a schema
      • Your database can define and enforce it
      • It can live in your application (as with MongoDB)
      • Nice to have the schema defined in one place in the code
    • Sometimes you need a “migration”
      • Changing the structure/meaning of fields
      • Adding indexes
      • Sometimes lazy, sometimes eager
    • Queuing up all your updates can be handy
    • Python dicts are nice; objects are nicer
  • 12. Ming Concepts
    • Inspired by SQLAlchemy
    • Group of classes to which you map your collections
    • Each class defines its schema, including indexes
    • Convenience methods for loading/saving objects and ensuring indexes are created
    • Migrations
    • Unit of Work – great for web applications
    • MIM – “Mongo in Memory” nice for unit tests
  • 13. Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = 'wiki_page' _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping
  • 14. Open Source
    • Ming
    • http://sf.net/projects/merciless/
    • MIT License
    • Allura
    • http://sf.net/p/allura/
    • Apache License
  • 15. Future Work
    • mongos
    • New Allura Tools
    • Migrating legacy SF.net projects to Allura
    • Stats all in MongoDB rather than Hadoop?
    • Better APIs to access your project data
  • 16. Questions?
  • 17. Rick Copeland @rick446 [email_address]