MongoATL: How Sourceforge is Using MongoDB
Upcoming SlideShare
Loading in...5
×
 

MongoATL: How Sourceforge is Using MongoDB

on

  • 7,782 views

How Sourceforge is Using MongoDB

How Sourceforge is Using MongoDB

Statistics

Views

Total Views
7,782
Views on SlideShare
6,219
Embed Views
1,563

Actions

Likes
4
Downloads
98
Comments
2

24 Embeds 1,563

http://blog.nosqlfan.com 851
http://blog.pythonisito.com 306
http://cloud.csdn.net 173
http://feeds.feedburner.com 102
http://www.csdn.net 72
http://www.lifeyun.com 19
http://lanyrd.com 10
http://static.slidesharecdn.com 6
http://www.linkedin.com 3
https://www.linkedin.com 3
http://twitter.com 3
http://reader.youdao.com 2
http://xue.uplook.cn 2
http://www.uplook.cn 1
http://zoomq.qiniudn.com 1
http://www.haohtml.com 1
https://vtunnel.com 1
http://xss.yandex.net 1
http://paper.li 1
http://cache.baidu.com 1
http://translate.googleusercontent.com 1
http://xianguo.com 1
http://www.zhuaxia.com 1
http://www.niwozhi.net 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MongoATL: How Sourceforge is Using MongoDB MongoATL: How Sourceforge is Using MongoDB Presentation Transcript

  • How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
  • SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
  • Moving to NoSQL
    • FossFor.us used CouchDB (NoSQL)
    • “ Just adding new fields was trivial, and was happening all the time” – Mark Ramm
    • Scaling up to the level of SF.net needs research
      • CouchDB
      • MongoDB
      • Tokyo Cabinet/Tyrant
      • Cassandra... and others
  • Rewriting “Consume”
    • Most traffic on SF.net hits 3 types of pages:
      • Project Summary
      • File Browser
      • Download
    • Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net
    • Original goal is 1 MongoDB document per project
      • Later split release data because some projects have lots of releases
    • Periodic updates via RSS and AMQP from “Develop”
  • Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
  • Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
  • SF.net Downloads
    • Allow non-sf.net projects to use SourceForge mirror network
    • Stats calculated in Hadoop and stored/served from MongoDB
    • Same deployment architecture as Consume (4 web, 1 db)
  • Allura (SF.net “beta” devtools)
    • Rewrite developer tools with new architecture
    • Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come
    • Single MongoDB replica set manually sharded by project
    • Release early & often
  • What We Liked
    • Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers
    • Schemaless server allows fast schema evolution in development, making many migrations unnecessary
    • Replication is easy , making scalability and backups easy
      • Keep a “backup slave” running
      • Kill backup slave, copy off database, bring back up the slave
      • Automatic re-sync with master
    • Query Language
      • You mean I can have performance without map-reduce?
    • GridFS
  • Pitfalls
    • Too-large documents
      • Store less per document
      • Return only a few fields
    • Ignoring indexing
      • Watch your server log; bad queries show up there
    • Ignoring your data’s schema
    • Using many databases when one will do
    • Using too many queries
  • Ming – an “Object-Document Mapper?”
    • Your data has a schema
      • Your database can define and enforce it
      • It can live in your application (as with MongoDB)
      • Nice to have the schema defined in one place in the code
    • Sometimes you need a “migration”
      • Changing the structure/meaning of fields
      • Adding indexes
      • Sometimes lazy, sometimes eager
    • Queuing up all your updates can be handy
    • Python dicts are nice; objects are nicer
  • Ming Concepts
    • Inspired by SQLAlchemy
    • Group of classes to which you map your collections
    • Each class defines its schema, including indexes
    • Convenience methods for loading/saving objects and ensuring indexes are created
    • Migrations
    • Unit of Work – great for web applications
    • MIM – “Mongo in Memory” nice for unit tests
  • Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = 'wiki_page' _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping
  • Open Source
    • Ming
    • http://sf.net/projects/merciless/
    • MIT License
    • Allura
    • http://sf.net/p/allura/
    • Apache License
  • Future Work
    • mongos
    • New Allura Tools
    • Migrating legacy SF.net projects to Allura
    • Stats all in MongoDB rather than Hadoop?
    • Better APIs to access your project data
  • Questions?
  • Rick Copeland @rick446 [email_address]