Hadoop World 2009 New York Oct 2, 2009 Sequence Alignment and Hadoop . Booz Allen Hamilton Inc. 134 National Business Parkway Annapolis Junction, MD 20701 Tel (301) 543-4665 [email_address] Paul Brown Associate
The Impact of Hadoop
Booz Allen Hamilton, a leading strategy and technology consulting firm, works with clients to deliver results that endure. Every day, government agencies, corporations, institutions, and not-for-profit organizations rely on Booz Allen’s expertise and objectivity, and on the combined capabilities and dedication of our exceptional people to find solutions and seize opportunities.
This dramatically lowers the cost of entry to distributed computing, and opens up a wide range of computational applications that were previously out of reach to those who need them.
Frequently these applications are faster, cheaper and more accurate than their predecessors. This has the result of completely changing the playing field.
The intention of this work, and this talk, is to explore how technology like Hadoop can have a game changing impact on Bio-Informatics
Biological Information Paul Brown 9/21/09 Need to verify all these.
Biology + Computer Science = Bioinformatics http://bioinformatics.ubc.ca/about/what_is_bioinformatics A Y N A R N A N R N Y A Y N N R N A A N R N
Bioinformatics: The Pain
We are obtaining biological data at a steadily-increasing rate – an exponential curve steeply tilting to vertical
Converting that data to usable information is a process that is proceeding, albeit not completely keeping up with its acquisition
Leveraging all that information to create knowledge is an open challenge, lagging far behind our rate of data collection
Unique opportunities abound in creating an environment that enables true understanding of this rich sea of data: Hadoop promises to be such an environment
Hadoop is a scalable data storage and processing file system with an easily accessible analytic framework .
Used “on-demand” with a cloud provider maximizes resources.
So What? Querying a database of sequences for similar sequences
“One to Many” comparisons
~58,000 protiens in the PDB.
Protein alignment frequently used in the development of medicines.
Looking for a certain sequence across species, helps indicate function.