• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
MongoATL: How Sourceforge is Using MongoDB
 

MongoATL: How Sourceforge is Using MongoDB

on

  • 7,657 views

How Sourceforge is Using MongoDB

How Sourceforge is Using MongoDB

Statistics

Views

Total Views
7,657
Views on SlideShare
6,104
Embed Views
1,553

Actions

Likes
4
Downloads
98
Comments
2

24 Embeds 1,553

http://blog.nosqlfan.com 848
http://blog.pythonisito.com 305
http://cloud.csdn.net 173
http://feeds.feedburner.com 102
http://www.csdn.net 66
http://www.lifeyun.com 19
http://lanyrd.com 10
http://static.slidesharecdn.com 6
http://www.linkedin.com 3
https://www.linkedin.com 3
http://twitter.com 3
http://reader.youdao.com 2
http://xue.uplook.cn 2
http://www.uplook.cn 1
http://zoomq.qiniudn.com 1
http://www.haohtml.com 1
https://vtunnel.com 1
http://xss.yandex.net 1
http://paper.li 1
http://cache.baidu.com 1
http://translate.googleusercontent.com 1
http://xianguo.com 1
http://www.zhuaxia.com 1
http://www.niwozhi.net 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    MongoATL: How Sourceforge is Using MongoDB MongoATL: How Sourceforge is Using MongoDB Presentation Transcript

    • How SourceForge is Using MongoDB Rick Copeland @rick446 [email_address]
    • SF.net “BlackOps”: FossFor.us User Editable! Web 2.0! (ish) Not Ugly!
    • Moving to NoSQL
      • FossFor.us used CouchDB (NoSQL)
      • “ Just adding new fields was trivial, and was happening all the time” – Mark Ramm
      • Scaling up to the level of SF.net needs research
        • CouchDB
        • MongoDB
        • Tokyo Cabinet/Tyrant
        • Cassandra... and others
    • Rewriting “Consume”
      • Most traffic on SF.net hits 3 types of pages:
        • Project Summary
        • File Browser
        • Download
      • Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net
      • Original goal is 1 MongoDB document per project
        • Later split release data because some projects have lots of releases
      • Periodic updates via RSS and AMQP from “Develop”
    • Deployment Architecture Load Balancer / Proxy Gobble Server Develop Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave Apache mod_wsgi / TG 2.0 MongoDB Slave
    • Deployment Architecture (revised) Load Balancer / Proxy Gobble Server Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0
    • SF.net Downloads
      • Allow non-sf.net projects to use SourceForge mirror network
      • Stats calculated in Hadoop and stored/served from MongoDB
      • Same deployment architecture as Consume (4 web, 1 db)
    • Allura (SF.net “beta” devtools)
      • Rewrite developer tools with new architecture
      • Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come
      • Single MongoDB replica set manually sharded by project
      • Release early & often
    • What We Liked
      • Performance, performance, performance – Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers
      • Schemaless server allows fast schema evolution in development, making many migrations unnecessary
      • Replication is easy , making scalability and backups easy
        • Keep a “backup slave” running
        • Kill backup slave, copy off database, bring back up the slave
        • Automatic re-sync with master
      • Query Language
        • You mean I can have performance without map-reduce?
      • GridFS
    • Pitfalls
      • Too-large documents
        • Store less per document
        • Return only a few fields
      • Ignoring indexing
        • Watch your server log; bad queries show up there
      • Ignoring your data’s schema
      • Using many databases when one will do
      • Using too many queries
    • Ming – an “Object-Document Mapper?”
      • Your data has a schema
        • Your database can define and enforce it
        • It can live in your application (as with MongoDB)
        • Nice to have the schema defined in one place in the code
      • Sometimes you need a “migration”
        • Changing the structure/meaning of fields
        • Adding indexes
        • Sometimes lazy, sometimes eager
      • Queuing up all your updates can be handy
      • Python dicts are nice; objects are nicer
    • Ming Concepts
      • Inspired by SQLAlchemy
      • Group of classes to which you map your collections
      • Each class defines its schema, including indexes
      • Convenience methods for loading/saving objects and ensuring indexes are created
      • Migrations
      • Unit of Work – great for web applications
      • MIM – “Mongo in Memory” nice for unit tests
    • Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage (MappedClass): class __mongometa__ : session = session name = 'wiki_page' _id = FieldProperty(schema . ObjectId) title = FieldProperty( str ) text = FieldProperty( str ) comments = RelationProperty( 'WikiComment' ) MappedClass . compile_all() # Lets ming know about the mapping
    • Open Source
      • Ming
      • http://sf.net/projects/merciless/
      • MIT License
      • Allura
      • http://sf.net/p/allura/
      • Apache License
    • Future Work
      • mongos
      • New Allura Tools
      • Migrating legacy SF.net projects to Allura
      • Stats all in MongoDB rather than Hadoop?
      • Better APIs to access your project data
    • Questions?
    • Rick Copeland @rick446 [email_address]