The Andrej Sali Lab Processes Millions of Small Files with Panasas


Published on

The University of California, San Francisco (UCSF) needed a computing solution that could process millions of small files as quickly as possible for researchers identifying the structural similarities of protein models. They needed to eliminate poor storage system performance and significantly decrease administrator management time and within a limited budget. Here is how the fully integrated software/hardware solution including the Panasas Operating Environment and the PanFS parallel file system with the Panasas DirectFLOW protocol helped UCSF to overcome the challenges.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Andrej Sali Lab Processes Millions of Small Files with Panasas

  1. 1. Customer Success Story University of California, San Francisco University of California,“After evaluating severalsolutions, it was clear San Franciscothat Panasas had the The Andrej Sali Laboratory at the University of California, San Francisco usesmost comprehensive computation grounded in the laws of physics and evolution to study the structureoffering.” and function of proteins. The Sali Laboratory strives to improve and apply methods for predicting the structures of proteins, determining the structures of proteins andDr. Ursula Pieper macromolecular assemblies and annotating the functions of proteins using theirAssistant Researcher, structures. By contributing to structure-based functional annotation of proteins,Sali Laboratory this research enhances the impact of genome sequencing, structural genomics, and functional genomics on biology and medicine. To meet their scientific goals, the Sali Laboratory leverages advanced computing solutions for high-throughput structural and functional studies of proteins. The Challenge system image. Using direct attached In order to find the structural similarities storage in the past had delivered of protein models, the team at Sali inconsistent performance results and the Laboratory knew they needed a Sali Laboratory knew that managementSUMMARY computing solution that could handle a problems were on the horizon asIndustry: Life Sciences massive I/O workload. The environment the system scaled in size. Finally, is responsible for analyzing 2,000,000 realizing that they were working withTHE CHALLENGE protein sequences against 30,000 a constricted academic budget, price/This customer needed a computing experimental structures in an effort performance and overall system valuesolution that could process millions of to predict 3D structures. “Our system were key requirements. “Certain financialsmall files as quickly as possible for has extreme I/O requirements,” said Dr. constraints are common in academia, soresearchers identifying the structuralsimilarities of protein models. They Ursula Pieper, Assistant Researcher we had to find a solution with the bestneeded to eliminate poor storage system at Sali Laboratory, “We constantly price/performance,” said Dr. Pieper.performance and significantly decrease process millions of small files and needadministrator management time andwithin a limited budget. an IT solution to return results to our The Solution researchers as quickly as possible.” An extensive evaluation process included detailed reviews of several highTHE SOLUTION For the compute side of the solution, performance and next- generation networkThe fully integrated software/hardwaresolution included the Panasas® the Sali Laboratory grew a 370 dual attached storage solutions. Ultimately, theOperating Environment and the PanFS™ processor Linux cluster. While this Panasas® Storage system was selectedparallel file system with the Panasas solution met their computational needs, for its random I/O performance capabilities,DirectFLOW® protocol. storage system performance and ease of management through a seamless administrator management were major global namespace and extreme value. TheTHE RESULT concerns. The Sali Laboratory was multi-TB solution was connected to the • Up to 5X Performance Improvement looking for a solution that could deliver Linux cluster for storing and retrieving • A single namespace for simplified exceptional I/O and had the ability to computational data sets. “After evaluating cluster management scale capacity. Key to capacity scaling several solutions,” said Dr. Pieper. “It • Maximized ROI from clustered was finding a single, global namespace was clear that Panasas had the most computing environment to manage all of the data sets in a single comprehensive offering.” 1-888-panasas
  2. 2. Customer Success Story: University of California, San FranciscoThe Panasas Storage Cluster leverages a distributed filesystem to provide direct disk to client access via the PanasasDirectFLOW® protocol. Further, the system uses finely tuned “We’ve eliminated our I/Ohardware components – Panasas StorageBlade® module and bottleneck to complete ourPanasas DirectorBlade® module – to support record-setting computations more quickly. Inrandom I/O. The Panasas Operating Environment is built onan object-based architecture, which allows seamless scaling in fact, the Panasas system has cutcapacity with a unified, global namespace. Finally, by leveraging the job times for the computationindustry standard components, Panasas is able to offer this of protein interactions, one of oursolution at extremely competitive price points. most I/O intensive projects, from 40 hours to less than 8 hours.”The ResultUpon moving the Panasas Storage Cluster into production, Dr. Ursula Pieperthe Sali Laboratory immediately experienced a performance Assistant Researcher, Sali Laboratoryimprovement. “The Panasas I/O performance is extremelyhigh,” said Dr. Pieper. “We’ve eliminated our I/O bottleneckto complete our computations more quickly. In fact, thePanasas system has cut the job times for the computation ofprotein interactions, one of our most I/O intensive projects,from 40 hours to less than 8 hours.” Even more impressiveis the fact that as the number of clients accessing thesystem increases, performance remains the same.Further, the integrated software/hardware solution andsingle unified namespace streamlined system installationand management. “The system is easy to manage,” said Dr.Pieper. “We don’t want to worry about IT issues, and with thePanasas solution we don’t have to.” The single namespaceoffered by the Panasas Storage Cluster allows significantcapacity growth with no worries about re-partitioning ormanaging discrete data islands. Finally, Panasas hasprovided exceptional service and support. “The entire teamat Panasas has been great in helping us every step of theway,” said Dr. Pieper.About PanasasPanasas, Inc., the leader in high-performance scale-out NAS storage solutions, enables enterprise customers to rapidly solvecomplex computing problems, speed innovation and bring new products to market faster. All Panasas solutions leverage thepatented PanFS™ storage operating system to deliver exceptional performance, scalability and manageability. PW-10-21600 | Phone: 1-888-PANASAS | © 2010 Panasas Incorporated. All rights reserved. Panasas is a trademark of Panasas, Inc. in the United States and other countries.