Most people have probably used IMDb, but they probably won’t use if if they have to pay.
What we do: Display, video, mobile, social
Loop 1-6 is 200ms, 3-4 is 100ms for RF
We do this 45B times a day
Real Time Auction
Selecting the right ad for each auction
Automatically learning from every response & getting better
Nobody else is doing this as fast, precisely, consistently for our customers
Loop 1-6 is 200ms, 3-4 is 100ms for RF
We do this 45B times a day
We are now in 8 data centers in the world
We have optimized design of data centers as well.
We custom design our racks, get servers assembled, racked and tested in a California facility. Then, ship to the data center.
This is what we do not just for US data centers but also for data centers in Europe or Asia. Each rack can be 1500 lb or more and many racks are sent by air for initially install.
Now, let’s look at the two kinds of racks shown above:
Hadoop Server (the full racks) :L Data (Hadoop) servers are bigger as they have 12X3TB drives and 20 servers fill the whole rack.
Bidders: Bidders have lot of cores but take less space because they have only 2 2.5” drives each. 40 servers fill up half the rack but we run out of switch ports.
And, this is 5% of Rocket Fuel
Just say “We have amazing scale” – let the numbers speak for themselves.
Managing Hadoop cluster is not easy
Start early
We are heavy users of puppet
Infradb is similar what puppet hiera but infradb was written in house 4 yrs ago.
-> puppet and infradb are tightly integrated
We use puppet and infradb to make maintenance easy
Infradb helps us populate hadoop property values based on hardware config we have.
For ex: Our fairshare slot distribution is automatically handled by infradb whenever we add new nodes.
-> here is an example , we define the formula to decide no. of MR slots per server based on mem , cpu , disks
-> we always want to have homogenous hardware for easy maintenance and planning , but it is impossible since “need changes with time”
-> automation like this will let you not about having heterogeneous servers.
-> not just configuration, we use infradb to define alerts once and all the newly added hosts and clusters will be automatically monitored by our nagios.
A typical hadoop problem, you start with small cluster and want to grow.
Hadoop default properties works well on small clusters , the problem starts when your cluster grows.
Problem will be big when it happens on large clusters
Aren’t we suppose to get better performance after adding nodes ?
-> too many properties to change
-> be careful when tuning any changes
-> have metrics to compare pre and post changes.
-> MAPREDUCE-2026 : JobTracker.getJobCounter() will lock JobTracker and call JobInProgress.getCounters(). JobInProgress.getCounters() can be very expensive because it aggregates all the task counters. We found that from the JobTracker jstacks that this method is one of the bottleneck of the JobTracker performance.
-> any HDFS heavy job can impact your HDFS performance , you will not realize unless you monitor the metrics
-> don’t let any engineer impact your cluster.
-> monitoring your applications is not enough when you are running on a scale in multiple datacenters across world
-> we should monitor the network mesh
-> find out bad queries immediately and kill them before they impact cluster
-> don’t loose your capacity due to mass tasktracker blacklisting by a single job.
-> long running jobs should be killed . No point in letting them run.
-> understand the workload on your cluster to better tune the scheduler properties
->whenever you add more nodes, the slot distribution should be automatically distributed to different queues.
-> no scheduler is perfect unless you understand and tune it
-> have ACL’s in place , don’t let any one engineer impact your MR workload.
-> have an proper accounting for teams who use more MR capacity.
How we operated for initial few years
Recently added DATA BCP (Business Continuity Plan)
Latency critical and important data goes both places
Other data after processing
Make use of BCP cluster to do meaningful things until disaster happens