In 2009, SourceForge embarked on a quest to modernize our websites, converting a site written for a hodge-podge of relational databases in PHP to a MongoDB and Python-powered site, with a small development team and a tight deadline. We have now completely rewritten both the consumer and producer parts of the site with better usability, more functionality and better performance. This talk focuses on how we're using MongoDB, the pymongo driver, and Ming, an ORM-like library implemented at SourceForge, to continually improve and expand our offerings, with a special focus on how3 anyone can quickly become productive with Ming and pymongo without having to apologize for poor performance.
21. Repository Cache Lessons Learned Using MongoDB to represent graph structures (commit graph, commit trees) requires careful query planning. Pointer-chasing is no fun! Sometimes Ming validation and ORM overhead can be prohibitively expensive – time to drop down a layer. Benchmarking and profiling are your friends, as are queries like {‘_id’: {‘$in’:[…]}} for returning multiple objects
28. Open Source Ming http://sf.net/projects/merciless/ MIT License Allura http://sf.net/p/allura/ Apache License Zarkov http://sf.net/p/zarkov/ Apache License
Used to be PHP + MySQL + Postgres + ….. How did we get started down the NoSQL path?
New team hired, which had some new ideas Crazy short timeline
FossFor.us was a technical success, if a market failure Can we bring the same technical success to sf.net?
Content begins life in “Develop” world – old PHP/Relational codebase Gobble runs off existing APIs and new AMQP feeds to populate master mongodb server Webheads run off local MongoDB slaves
Questionable architectural decision, but it made sense at the time. All the while, our Data model evolved Factored out a lot of our custom mongodb code into…
Index=True (also has a unique=True arg) ForeignIdProperty like foreign keys – hints to the mapper how to actually infer RelationProperties Empty classes can have properties added by their mappers
Several interacting apps, with MongoDB and SOLR at the center Used to have RabbitMQ but we wanted more visibility into our queues Now I’ll show some of the model classes we use
Efficient, threaded messages Short “slugs” for linking to a particular message (and for URLs) Easy sorting Note the ‘if missing’ – basically default values that are created by Ming (in this case, python functions to generate the values)
Async Queuing Some perf instrumentation Can use findandmodify to grab and lock tasks for processing Uses both slow polling and amqp notification (but only to say POLL NOW)
UI-centered design Make sure it doesn’t take a lot of queries to build a page
Still alpha-quality (OSS because that’s the way we roll)
Can record many more than 4k events per second 345M events per day (single-thread, VM on a laptop) – we get a lot of traffic, but not that much MR makes this much lower if calculated continuously, still hundreds of events even with MR locking Probably use a slave for MR processing, may end up using Hadoop or Disco if we need to