A Scalable and Modern
Infrastructure at CARFAX
About Me
• Jai Hirsch – Senior Systems Architect, Data
Technologies at CARFAX
• Long-time Java and Database Developer
• Da...
“CARFAX helps millions of people buy and sell used cars with
more confidence”
CARFAX Vehicle History Report
Documents on the Report
NoSQL Before it Was
Cool
Proprietary Key Value Store on
OpenVMS Developed by
CARFAX in 1984
Never mind that sh*t! Here comes Mongo!
Why MongoDB?
Legacy structures mapped to
documents
High availability using replica sets
Platform Independence
Support
MongoDB at CARFAX
Our Production Environment
The Legacy Database and High
Volume Loads
High Availability Reads
Our Production
Environment
Server Deployment
AUTOMATE
AUTOMATE
AUTOMATE
AUTOMATE
Server Configuration
12 Shards with two spare
servers racked for failover
• OS: Linux
• MongoDB 2.4.9
• 128 GIGs of RAM
• ...
The Future
Extract,
Transform,
Load
Loading Millions to Billions of Records per
Day
AUTOMATE
AUTOMATE
AUTOMATE
AUTOMATE
First Attempt To Load Was
Completely CPU Bound
Not Acceptable!
45 Days to
Backload the
Legacy Database
Distributed
Processing
Acceptable!
Billion+ inserts per
Day!
9 Days to Backload
The MongoDB
Implementation
13 billion+ documents
1.5 billion+ new documents per
year
 Document size: ~ 795 Bytes
 VHR ...
High
Availability
Reads
Millions of Reports per
Day
AUTOMATE
AUTOMATE
AUTOMATE
Read Scalability
With Tagging
Each Data
center is
Tagged
Each Replica
Set is Tagged
5X More
Reports per
Second
But we can do More!
Lets Wrap It Up
Don’t buy a used car without a
CARFAX report
➢Grok your data and working set
➢Architect for your load volu...
Keys To Success
➢AUTOMATE EVERYTHING
➢Test Many Configurations
➢Grid Computing is Awesome
➢Shard Early, Shard Often
And Remember
Friends Don’t
Let Friends Use
Default Ulimits!
Thank You!
The migration was a
success due to the
incredible teams at
CARFAX and MongoDB
We are always looking
for great...
Building a Scalable and Modern Infrastructure at CARFAX
Building a Scalable and Modern Infrastructure at CARFAX
Upcoming SlideShare
Loading in...5
×

Building a Scalable and Modern Infrastructure at CARFAX

3,465

Published on

The CARFAX vehicle history database contains over twelve billion documents in a twelve shard cluster that replicates to multiple data centers. This will be a step by step walk through of how we deploy our servers, manage high volume reads and writes, and our configuration for high availability. By automating everything from the operating system install up we are able deploy complete replica clusters quickly and efficiently. Using distributed processing and message queuing we load millions of new documents each day with a projected growth over a billion records per year. Through the use of tagging, server configuration, and read settings we deliver content with high consistency and availability.

Published in: Technology

Building a Scalable and Modern Infrastructure at CARFAX

  1. 1. A Scalable and Modern Infrastructure at CARFAX
  2. 2. About Me • Jai Hirsch – Senior Systems Architect, Data Technologies at CARFAX • Long-time Java and Database Developer • Data and Distributed Processing Enthusiast • Github: https://github.com/JaiHirsch Twitter: @JaiHirsch https://twitter.com/JaiHirsch LinkedIn: http://www.linkedin.com/pub/jai-hirsch/8/a89/335
  3. 3. “CARFAX helps millions of people buy and sell used cars with more confidence”
  4. 4. CARFAX Vehicle History Report
  5. 5. Documents on the Report
  6. 6. NoSQL Before it Was Cool Proprietary Key Value Store on OpenVMS Developed by CARFAX in 1984
  7. 7. Never mind that sh*t! Here comes Mongo!
  8. 8. Why MongoDB? Legacy structures mapped to documents High availability using replica sets Platform Independence Support
  9. 9. MongoDB at CARFAX Our Production Environment The Legacy Database and High Volume Loads High Availability Reads
  10. 10. Our Production Environment
  11. 11. Server Deployment AUTOMATE AUTOMATE AUTOMATE AUTOMATE
  12. 12. Server Configuration 12 Shards with two spare servers racked for failover • OS: Linux • MongoDB 2.4.9 • 128 GIGs of RAM • 1.8 TB of Drive Space • 10K RPM SAS Drives
  13. 13. The Future
  14. 14. Extract, Transform, Load
  15. 15. Loading Millions to Billions of Records per Day AUTOMATE AUTOMATE AUTOMATE AUTOMATE
  16. 16. First Attempt To Load Was Completely CPU Bound
  17. 17. Not Acceptable! 45 Days to Backload the Legacy Database
  18. 18. Distributed Processing
  19. 19. Acceptable! Billion+ inserts per Day! 9 Days to Backload
  20. 20. The MongoDB Implementation 13 billion+ documents 1.5 billion+ new documents per year  Document size: ~ 795 Bytes  VHR uses 200+ documents
  21. 21. High Availability Reads
  22. 22. Millions of Reports per Day AUTOMATE AUTOMATE AUTOMATE
  23. 23. Read Scalability With Tagging
  24. 24. Each Data center is Tagged Each Replica Set is Tagged
  25. 25. 5X More Reports per Second
  26. 26. But we can do More!
  27. 27. Lets Wrap It Up Don’t buy a used car without a CARFAX report ➢Grok your data and working set ➢Architect for your load volume ➢Scale your reads to meet demand 29
  28. 28. Keys To Success ➢AUTOMATE EVERYTHING ➢Test Many Configurations ➢Grid Computing is Awesome ➢Shard Early, Shard Often
  29. 29. And Remember
  30. 30. Friends Don’t Let Friends Use Default Ulimits!
  31. 31. Thank You! The migration was a success due to the incredible teams at CARFAX and MongoDB We are always looking for great people to join us. www.carfax.com/careers
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×