How Business Insider Uses MongoDB

8,020 views

Published on

1 Comment
13 Likes
Statistics
Notes
No Downloads
Views
Total views
8,020
On SlideShare
0
From Embeds
0
Number of Embeds
680
Actions
Shares
0
Downloads
199
Comments
1
Likes
13
Embeds 0
No embeds

No notes for slide


























  • How Business Insider Uses MongoDB

    1. 1. How Business Insider Uses MongoDB The NoSQL Approach in Practice Ian White GilbaneSF May 20th, 2010
    2. 2. Business Insider www.businessinsider.com • Multiple business verticals (tech, markets) • Dedicated fulltime editorial staff • 3k original stories, 50k aggregated content, per month • > 1.2m pageviews per day • custom CMS powered by LAMP
    3. 3. LAMP Linux OS Apache Webserver MySQL Database PHP Application
    4. 4. LAMP at Business Insider Linux OS Apache Webserver MongoDB Database PHP Application
    5. 5. What’s MongoDB? • NoSQL Database • Open Source (supported by 10gen) • Document-oriented • Schema-free • Highly horizontally scalable • Dynamically queryable
    6. 6. NoSQL Means non-relational, next-generation operational datastores and databases • Document-oriented: CouchDB, MongoDB • Graph: Neo4J • Key/value: Cassandra, Redis
    7. 7. NoSQL no joins + no complex transactions Horizontally Scalable Architectures
    8. 8. • scalability & performance memcached • key/value • RDBMS depth of functionality
    9. 9. Simplified Blog: a SQL approach Three tables of rows: posts posts_comments posts_tags post_id post_id post_id name content name content INSERT INTO posts (post_id, name, content) VALUES (42, “Lost Series Finale Approaching!”, “<p>It’s going to be pretty exciting.</p>’); INSERT INTO posts_comments (post_id, content) VALUES (42, “Cool!”); INSERT INTO posts_comments (post_id, content) VALUES (42, “Awesome!”); INSERT INTO posts_tags (post_id, content) VALUES (42, ‘Lost’); INSERT INTO posts_tags (post_id, content) VALUES (42, ‘Television’);
    10. 10. Simplified Blog: a MongoDB approach One collection of documents: posts _id name content comments tags db.posts.insert( { _id: 42, name: “Lost Series Finale Approaching!”, content: “<p>It’s going to be pretty exciting.</p>”, comments: [ { content: “Cool!” }, { content: “Awesome!” }, ], tags: [“Lost”, “Television”] } );
    11. 11. Simplified Blog: Queries Get a post and all its tags and comments SQL SELECT * FROM posts WHERE post_id = 42; SELECT * FROM posts_comments WHERE post_id = 42; SELECT * FROM posts_tags WHERE post_id = 42; MongoDB db.posts.findOne( {_id: 42} );
    12. 12. Simplified Blog: Queries Get all of the posts tagged a certain way SQL SELECT * FROM posts JOIN posts_tags USING (post_id) WHERE posts_tags.name = “Television”; MongoDB db.posts.find( { tags: “Television” } );
    13. 13. Easier Development • Complex data structures (hashes, arrays) stored directly in their natural form • No need for Object-Relational Mapping • No need to worry about SQL injection • Fewer collections/tables in system • Easy for new developers to pick up
    14. 14. Easier Deployment • ALTERs are a pain and require downtime on large datasets • You don’t need ALTERs in MongoDB! • Though occasionally still need migration scripts
    15. 15. Horizontal Scaling • No JOINs encourages scalable practices • Denormalization, scalable design gets baked in early • Hard for a sane design NOT to scale
    16. 16. Replication master slave master master slave slave slave slave master master slave master
    17. 17. Auto-sharding Shards mongo mongo mongo .. Config mongo mongo mongo Servers mongod mongod mongod mongos mongos .. client
    18. 18. High Performance • As fast as a cache when retrieving individual documents • Limited use of caching (posts are pulled live from MongoDB) • Every pageview on Business Insider does multiple writes • Just using a simple master/slave, running about 5% capacity
    19. 19. Realtime Analytics
    20. 20. Realtime Analytics • MongoDB is highly optimized for realtime updates • Up-to-the-second data on pageviews, referrers, click tracking • Minimal development time, huge value to editorial • Could be bolted onto a SQL-based website or traditional CMS
    21. 21. Database File Storage (GridFS) • Every image stored/served in the database • Eliminates awkwardness/duplicate systems for replication, backups, test datasets, etc
    22. 22. Other Cool Stuff • Map/Reduce • Capped Collections • Geospatial Indexes • ... but we don’t use them (yet!)
    23. 23. Folks Using MongoDB
    24. 24. The Future (IMHO) • NoSQL adoption grows rapidly • Some sites use it for specific systems, some go all NoSQL • Open-source CMSs support NoSQL (already happening) • More diversity in datastores • RDMS still useful but no longer the only option
    25. 25. Questions? Ian White iwhite@businessinsider.com twitter.com/eonwhite

    ×