Vice President of Communications, Deborah Robison | 6400 Sanger Road || Orlando, Florida 32827 | 407.745.2073 | www.sanfordburnham.orgBig Data and medical researchWhen scientists first sequenced the human genome nearly a decade ago, it was hailed as anachievement that would transform biology and the way scientists tackle new problems. President BillClinton said it would “revolutionize the diagnosis, prevention, and treatment of most, if not all, humandiseases.”While theHuman Genome Project has yet to fulfill Clinton’s prophecy, the sequencing of thehuman genome is helping scientists to better understand the health and medical needs of people basedon their individual genetic blueprint and design new and effective treatments for disease. It has alsospurred the growth of bioinformatics and systems biology, and led to the creation of vast informationsets that have come to be known as “Big Data.”Bits and bytesBig Data refers to our ability to collect and analyze the massive amounts of data we nowgenerate.“The goal for medical research,” says Ranjan Perera, Ph.D., scientific director of AnalyticalGenomics and Bioinformatics at Sanford-Burnham Medical Research Institute at Lake Nona, “is to usethat information to take a macroscopic view of health, including the ability to recognize patterns orclues to disease genesis and development.”These new scientific endeavors are generating immense amounts of data, from terabytes andpetabytes to exabytes and zettabytes. According to global market intelligence provider IDC,the overallamount of digital information being processed exceeded one zettabyte (that’s a one with 24 zeroes after
Vice President of Communications, Deborah Robison | 6400 Sanger Road || Orlando, Florida 32827 | 407.745.2073 | www.sanfordburnham.orgit) for the first time in 2010 and approached two zettabytes in 2011—or nearly as many stars in ouruniverse.The million-dollar question is what this means for science. What are the new technologies thatare creating such massive data, and how will the data advance medical research and lead to thedevelopment of novel therapeutics?The “omics”For years, the study of biology was driven by a reductionist approach, looking at one gene orprotein at a time to determine their role in health and disease. Today, systems biology has taken over.This biomedical research approach looks at the larger picture of biology in a more holistic fashion. “Thereductionist approach is like being in the middle of a forest looking at the trees; systems biology is ahigh-level view of the entire forest,” says Thomas “T.C.” Chung, Ph.D., director of outreach and projectmanager NIH MLPCNin Sanford-Burnham’s Conrad Prebys Center for Chemical Genomics. The PrebysCenter includes two world-class robotic ultrahigh-throughput screening and automated microscopesystems to identify chemical compounds that have the potential to become future medicines.At Sanford-Burnham, researchers use a variety of scientific technology platforms—called the“omics”—that are generating expansive amounts of data. This data is inter-related through simpleconnections, but more usually in complex networks. Discerning and understanding the underlyingcomplexity of these networks often enables new hypotheses that advance medical science andtreatments. The most commonly studied platforms are:Genomics – the study of genes and their functionProteomics – the study of an organism’s complete complement of proteinsMetabolomics – the study of the relative differences between biological samples based on theirmetabolite profile
Vice President of Communications, Deborah Robison | 6400 Sanger Road || Orlando, Florida 32827 | 407.745.2073 | www.sanfordburnham.orgLipidomics – the study of global lipid profiling in an organismThe numbers behind these fields are staggering: approximately six billion base pairs (thebuildingblocks of the DNA double helix; C-G and A-T) in the genome,approximately 20,300 protein-coding genes,thousands of RNA moleculesand at least 2,900 metabolites. And that’s just for one person!Rather than examining data from just one of these platforms like before, scientists now look at multipleplatforms, producing the sort of data that only massive supercomputers can handle.Disease profilingThe ability of scientists to analyze this data means a better understanding of disease processesand potentially more solutions for treating myriad conditions. Because information is collected onmultiple levels—proteins, metabolites, genes, RNA—scientists gain a broad snapshot of what is going onin a particular disease. This data will be useful for diagnostics, especially early cancer detection. Profilinga person’s proteome or metabolome can help clinicians see global changes in the body that may predictcancer far in advance of symptoms. The more “omics” data revealsabout disease development and thebody’s response to treatment, the more therapies can be tailored to an individual’s particular set ofcircumstances.Big Data will also have a major impact on personalized medicine. Until recently, diseasetreatment was often a one-size-fits-all approach. “As we learn more about how our genes driveresponse to treatment, therapies can be tailored to an individual’s disease based on their geneticprofile,” says Adam Godzik, Ph.D., director of Bioinformatics and Systems Biology at Sanford-Burnham.Because everyone’s disease is different, understanding what happens on a molecular level candetermine the most appropriate treatment for a given patient.Putting data to workWhile it has become an indispensable tool for researchers, Big Data carries challenges for thescientific community. The challenge is not so much collecting data, but figuring out what information is
Vice President of Communications, Deborah Robison | 6400 Sanger Road || Orlando, Florida 32827 | 407.745.2073 | www.sanfordburnham.orgimportant and, more critically, how to use it. “You have to sift through the data and figure out what’sdisease-causing and what’s normal background variation, or noise, for large-scale datasets,” saysSumitChanda, Ph.D., of Sanford-Burnham’s Infectious and Inflammatory Disease Center.In his laboratory, Chanda studies cellular proteins involved in influenza A and retrovirus/HIVinfection. He uses a series of systems-level approaches that produce huge amounts of data, tounderstand the molecular strategies adapted by these viruses as countermeasures to innate immuneresponses.These and other tools will help build a comprehensive cellular “roadmap” exploited by viruses toenable their propagation within cells. His work is expected to provide unprecedented insight into themolecular circuitry commandeered by pathogens to establish infection and will offer new opportunitiesfor the development of next-generation host factor and immune-mediated retroviral drugs. None of thiswould be possible without Big Data.Evolution vs. revolutionSanford-Burnham scientists and IT professionals have been facing Big Data questions for years,says IT director Eric Hicks. With its 10.4 petabyte capability, the Institute’s storage system can “growwith us,” allowing scientists to sift through an ever-growing trove of data. In addition, a tape library,with even higher scalability but lower costs, is replicated bi-coastally (in California and Florida) everynight, so there is no exposure to tape failure. File sharing on National LambdaRail, a high-speedcomputer network that links the U.S. research and education communities, connects scientists atSanford-Burnham’s two primary research facilities, with speeds about 1,000 times faster than a homebroadband connection.Hicks says Sanford-Burnham looks at Big Data from two perspectives. The evolutionaryperspective means faster bandwidth, a larger pipe, and more storage, which are more straightforwardproblems. The revolutionary viewpoint is more difficult: How do scientists use technology they’ve never
Vice President of Communications, Deborah Robison | 6400 Sanger Road || Orlando, Florida 32827 | 407.745.2073 | www.sanfordburnham.orgused before? “Existing applications can run more quickly, with bigger systems and more sophisticatedalgorithms,” says Hicks. “The revolutionary problem is much harder to solve.”As forecast, Big Data is transforming the life sciences. It provides not only deeper and broaderinsight into human biology, it also helps scientists with the most practical applications of theirresearch—understanding disease processes and developing new treatments for human diseases.