The problem of big data is not only that it is capacious, but that it is also heterogeneous, dirty, and growing even faster than the improvement in disk capacity. One challenge is then to derive value by answering ad hoc questions in a timely fashion that justifies the preservation of big data. A group of us from databases, machine learning, networking, and systems just started a new lab at University of California, Berkeley, to tackle this challenge. The AMPLab is working at the intersection of three trends: statistical machine learning (Algorithms), cloud computing (Machines), and crowdsourcing (People). One of the driving applications for the AMP Lab is cancer genomics. Over the next several years, gene-sequencing technologies will begin to make their way into medicine, offering the most complex tests available. This advance brings a new type of data with tremendous promise to help elucidate physiological and pathological functions within the body, as well as to make more informed decisions about patient care. The cost of genome sequencing is projected to fall within range where it may be used for diagnostic and treatment purposes within the next two years. Due to the overwhelming amount of information returned by these tests, direct human interpretation is not feasible, and therefore will have to be guided by computational methods and visualization. The use of sequencing information has debuted in the cnacer. A provocative hypothesis is that the massive growth of online digital description of tumor cell genomes will enable computer scientists to help make breakthroughs in cancer treatment, perhaps even within the next few years. Learn about the frightening fractions of cancer, dramatic speedups in genomic data processing by using cloud computing, and the blurring between opportunity and obligation when dealing with a problem that affects the lives of millions of people.