Scalable File System
In 14 Days
Jeff Hoffer, Software Architect
Alex Zherdev, Sr. Software Engineer
Our Background
In the beginning...
“YouTube” for Documents
Today
“Make every small business better”
Professional Documents...
Our Product
www.docstoc.com
Initial Approach
Pros:
• Existing libraries used
• Reliable storage
• Replication
Cons:
• Hard to scale out
• Replication ...
IIS HTTP Based Solution
Pros:
• HTTP GET
• IIS Static Content Cache
• 5TB = Years of Growth
• Easy Setup & Deploy
Cons:
• ...
Importance of Performance
• IIS Source Failed early
2013
• Page speed heavily
influenced our traffic
and SEO
• MongoDB sol...
Requirements
 Sharded – horizontal scale out of reads and writes
 Replication – no single point of failure for core busi...
Requirements
 Sharded – horizontal scale out of reads and writes
 Replication – no single point of failure for core busi...
Requirements
 Sharded – horizontal scale out of reads and writes
 Replication – no single point of failure for core busi...
Requirements
 Sharded – horizontal scale out of reads and writes
 Replication – no single point of failure for core busi...
MongoDB FTW
Test Setup
{
id : {document_id}
body: {text_content}
created: {date_time}
}
• Simple Structure
• Object Size 50KB
• Shard on hashed i...
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empt...
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empt...
Test Setup
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empt...
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empt...
Test Setup
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empt...
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empt...
Production
In Conclusion…
It’s Good Enough, It’s Fast Enough, and Doggone It, Developers Like It!
• Fast Prototype
• Low Maintenance
...
Upcoming SlideShare
Loading in...5
×

Scalable Text File Service with MongoDB (Intuit)

731

Published on

Docstoc.com (founded in 2007, acquired by Intuit in 2013) is one of the largest online repositories of documents. A critical component of our product is our text file service, which delivers text documents to both humans and crawlers. In early 2013 this service, which was file system based, became a prohibitive bottleneck. To meet our scaling needs, we replaced it with one backed by a sharded MongoDB cluster. This talk will cover:

Our traffic load (5:1 bots:humans ratio) How we implemented the system in our SOA environment How MongoDB fit our use case out of the box How we load tested peak time traffic before hardware purchase How we loaded the system and how we rolled it out live Performance metrics and gains in stability and reliability

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
731
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Scalable Text File Service with MongoDB (Intuit)

  1. 1. Scalable File System In 14 Days Jeff Hoffer, Software Architect Alex Zherdev, Sr. Software Engineer
  2. 2. Our Background In the beginning... “YouTube” for Documents Today “Make every small business better” Professional Documents Custom Documents Business Licenses Jason Nazar Alon Shwartz The Team
  3. 3. Our Product www.docstoc.com
  4. 4. Initial Approach Pros: • Existing libraries used • Reliable storage • Replication Cons: • Hard to scale out • Replication can’t keep up • Taxed all data SELECT `text_data` FROM `documents` WHERE `doc_id` = 8675309;
  5. 5. IIS HTTP Based Solution Pros: • HTTP GET • IIS Static Content Cache • 5TB = Years of Growth • Easy Setup & Deploy Cons: • Not scalable • NTFS & 30M small files • Replication In-House HTTP GET http://docs.api/text/160717/8675309.txt
  6. 6. Importance of Performance • IIS Source Failed early 2013 • Page speed heavily influenced our traffic and SEO • MongoDB solution implemented within 2 weeks and results immediately felt 0 5 10 15 20 25 Speed 0 1 2 3 4 Views
  7. 7. Requirements  Sharded – horizontal scale out of reads and writes  Replication – no single point of failure for core business data  Doc Page Peak Read Load of 200 / second < 4s  REST Interface – switch only requires changing URL  Easy to Maintain – maintenance cost of no more than 1 FTE / day / month  99.9% uptime  Can handle # of our current set of text files 43 M  Production Rollout within 3 weeks
  8. 8. Requirements  Sharded – horizontal scale out of reads and writes  Replication – no single point of failure for core business data  Doc Page Peak Read Load of 200 / second < 4s  REST Interface – switch only requires changing URL  Easy to Maintain – maintenance cost of no more than 1 FTE / day / month  99.9% uptime  Can handle # of our current set of text files 43 M  Production Rollout within 3 weeks
  9. 9. Requirements  Sharded – horizontal scale out of reads and writes  Replication – no single point of failure for core business data  Doc Page Peak Read Load of 200 / second < 4s  REST Interface – switch only requires changing URL  Easy to Maintain – maintenance cost of no more than 1 FTE / day / month  99.9% uptime  Can handle # of our current set of text files 43 M  Production Rollout within 3 weeks
  10. 10. Requirements  Sharded – horizontal scale out of reads and writes  Replication – no single point of failure for core business data  Doc Page Peak Read Load of 200 / second < 4s  REST Interface – switch only requires changing URL  Easy to Maintain – maintenance cost of no more than 1 FTE / day / month  99.9% uptime  Can handle # of our current set of text files 43 M  Production Rollout within 3 weeks
  11. 11. MongoDB FTW
  12. 12. Test Setup
  13. 13. { id : {document_id} body: {text_content} created: {date_time} } • Simple Structure • Object Size 50KB • Shard on hashed id • Rarely modified • Heavy Reads Mongo Collection Structure
  14. 14. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) **10x peak load
  15. 15. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) **10x peak load
  16. 16. Test Setup
  17. 17. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) *ASP.NET MVC 4 Web API **10x peak load
  18. 18. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) *ASP.NET MVC 4 Web API **10x peak load
  19. 19. Test Setup
  20. 20. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 2M 1 hour (3x) *ASP.NET MVC 4 Web API **10x peak load
  21. 21. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 2M 1 hour (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 6M Overnight (10 hrs) *ASP.NET MVC 4 Web API **10x peak load
  22. 22. Production
  23. 23. In Conclusion… It’s Good Enough, It’s Fast Enough, and Doggone It, Developers Like It! • Fast Prototype • Low Maintenance • Quick Deployment • Scale Out • Stable • Linux, Windows, Mac • Excellent Support
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×