Genomics Center Compares 100s of Computations Simultaneously with Panasas


Published on

The Center for Integrative Genomics brings together researchers from traditionally separated fields of study to analyze and compare the genome sequences of a broad spectrum of organisms in order to determine the mechanisms responsible for evolutionary diversity among animals, plants and microbes. The university was facing a challenge to quickly and easily conduct comparative analyses of hundreds of computations required to accomplish their mission to research and understand gene regulation. Here is how the integrated software/hardware solution which includes the Panasas Operating Environment and the PanFS™ parallel filesystem with the Panasas DirectFLOW® protocol helped the University to achieve exceptional performance.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Genomics Center Compares 100s of Computations Simultaneously with Panasas

  1. 1. Customer Success Story UC Berkeley“Every processor in UC Berkeleyour architecture is UC Berkeley Center for Integrative Genomicsnow running at peak The Center for Integrative Genomics brings together researchers from traditionallyperformance. The separated fields of study to analyze and compare the genome sequences of aresulting parallelism broad spectrum of organisms in order to determine the mechanisms responsible for evolutionary diversity among animals, plants and microbes. Faculty at the Center areis delivering a drawn from distinct academic departments at UC Berkeley and Lawrence Berkeley6X performance National Laboratories, including molecular and cellular biology, integrative biology,improvement in our computer science, bioengineering, plant and microbial biology, mathematics, andenvironment.” public health. One of the Center’s primary objectives is to decode regulatory DNA - the regions of the genome that control gene expression - and the protein factors thatGene MyersDepartment of Computer Science, bind to them. By understanding how a common set of genes is deployed differently inUniversity of California at Berkeley various organisms, they hope to reveal the mechanisms of evolution. The Challenge parallel manner. “The ability to quickly When the Center opened in 2002, and easily conduct comparative analyses project leaders Michael Levine PhD, from a single, easily maintained file Gene Myers PhD, and Lior Patcher PhD system is primary to our mission,” said realized that to conduct comparative Gene Myers, from the Department analysis at the scale necessary to be of Computer Science, University ofSUMMARY successful, they needed to build an California, Berkeley. “It is only throughIndustry: Life Sciences IT environment that could process, extensive testing that we will be able to store and retrieve multi-gigabyte files understand gene regulation.”THE CHALLENGE containing the DNA sequences ofTo quickly and easily conduct various organisms. Most comparative On the compute side of the solution, thecomparative analyses of hundreds of analyses involve two such large data Center deployed a Linux cluster with 36computations required to accomplishtheir mission to research and understand sets, for example, the genome of a CPUs to meet their processing needs.gene regulation. human vs. the genome of a mouse. The But key to the overall solution was undistilled results of such comparisons the deployment of a high performanceTHE SOLUTION deliver datasets typically on the same storage system. Specifically, the Center order as the input. But a given multi- needed a storage solution that couldThe fully integrated software/hardwaresolution included the Panasas® way analysis involves performing tens deliver exception I/O performance, scaleOperating Environment and the PanFS™ and hundreds of such computations. The seamlessly to handle a growing numberparallel file system with the Panasas Center wanted a solution that delivered of data sets, and be flexible enough toDirectFLOW® protocol. the performance and flexibility to handle a quick change of one, or both, conduct many such analyses on a weekly of the data sets being compared. TheTHE RESULT team knew that a distributed file system basis. While compute power was an • Parallel access to compute cluster issue, the more important consideration could offer many of the things they were for exceptional I/O performance was to have a single conceptual data looking for, but they needed to find a • Maximized cluster utilization system that contained all the data and solution that was easy to manage and • Flexibility in running data set results, and that could deliver the inputs one that delivered an exceptional price/ comparisons and receive the results from the CPU’s performance ratio. • Maximized ROI from clustered in a highly efficient and thus necessarily, computing environment 1-888-panasas
  2. 2. Customer Success Story: UC BerkeleyThe SolutionThe team at the Center looked at several different solutions. “After conversations with severalMost standard storage products had the architectural limitationof delivering data through one, or possibly two, ethernet ports; a storage vendors, it was clear thatchoke point that immediately eliminated them from consideration. Panasas was the only companySeveral solutions involved purchasing commodity components that really understood what weand then adding a software layer that provided a distributed file needed.”system, but the price/performance ratio on these approacheswas poor. Only the Panasas® Storage solution, an integrated Gene Myers Department of Computer Science,software/hardware distributed file system, provided the desired University of California, Berkeleyprice/performance ratio for an academic budget and deliveredthe integration needed to simplify installation and managementas well as reduce recurring administration costs. “After Further benefit for the center was the flexibility and ease ofconversations with several storage vendors, it was clear that management offered by a single, global namespace. “WithPanasas was the only company that really understood what we other solutions, we were forced to trade off manageabilityneeded,” said Myers. and flexibility for greater performance,” commented Myers. “With Panasas, we can get it all.” Finally, the lab is confidentA three shelf, multi-TB Panasas Storage Cluster was deployed knowing that as the system grows in size it will still beto support the 36 CPU cluster configured in a 6 x 6 matrix. The flexible and easy to manage, plus the performance will scalehead node of each row in the cluster matrix has two Gigabit with capacity in linear fashion.Ethernet connections to the Panasas system. Data is pipelinedthrough the configuration and all CPUs in the system are able toreceive and push data simultaneously at the peak capacity of thenetwork.The ResultOnce moved into production, the Panasas StorageCluster enabled the Center to realize the benefits of acomprehensive cluster architecture. “Every processorin our architecture is now running at peak performance,”said Myers. “The resulting parallelism is delivering a6X performance improvement in our environment.” ThePanasas system is designed from the ground up to deliverexceptional random I/O and data access throughput,breaking the traditional storage bottleneck and allowingdirect disk-to-cluster node access.About PanasasPanasas, Inc., the leader in high-performance scale-out NAS storage solutions, enables enterprise customers to rapidly solvecomplex computing problems, speed innovation and bring new products to market faster. All Panasas solutions leverage thepatented PanFS™ storage operating system to deliver exceptional performance, scalability and manageability. PW-10-21500 | Phone: 1-888-PANASAS | © 2010 Panasas Incorporated. All rights reserved. Panasas is a trademark of Panasas, Inc. in the United States and other countries.