Be the first to like this
Getafix is a data replication scheme that cuts memory costs in interactive analytics engines. It tracks data segment popularity in a cluster and runs a modified best-fit algorithm to find a memory efficient allocation. It is built upon our optimal solution to the static version of the scheduling problem, which yields a solution that has the least makespan and replication factor. In practice, it can cut cluster memory costs by 2X without sacrificing query performance and save firms as much as $10 million annually. It also automatically tiers popular data to powerful nodes in a heterogeneous cluster and greatly simplifies sys admin work.
Source Code: http://dprg.cs.uiuc.edu/downloads#l_getafix