Transcript of "Swiss National Supercomputing Center"
IBM Systems and Technology EducationCase Study Swiss National Supercomputing Center Gains low-latency, high-bandwidth storage with IBM General Parallel File System Founded in 1991, CSCS, the Swiss National Supercomputing Center, Overview develops and promotes technical and scientiﬁc services for the Swiss research community in the ﬁeld of high-performance computing The need (HPC). CSCS enables world-class scientiﬁc research by pioneering, With data volumes doubling each operating and supporting leading-edge supercomputing technologies. year, researchers at the Swiss National Supercomputing Center (CSCS) needed Located near Lugano, in the south of Switzerland, CSCS is an a centralized storage solution offering autonomous unit of the Swiss Federal Institute of Technology in low latency, high bandwidth and Zurich (ETH Zurich). extreme scalability. The solution CSCS serves dozens of different research institutions, supporting a CSCS engaged IBM to build a broad range of computational projects across theoretical chemistry, centralized storage solution based material sciences, biological sciences and climate science. Simulations on IBM® General Parallel File System (GPFS), IBM System x® and and other computational projects running on the organization’s com- IBM System Storage® hardware, and pute clusters process many terabytes of data and generate large sets of InﬁniBand networking technology. intermediate results ready for further computation. During the actual The beneﬁt simulation run time, all of this data resides on the ’scratch’ storage sys- The solution supports massively parallel tems that are directly attached to each cluster. read/write operations and provides a single namespace for all systems. It In the past, research teams would store intermediate simulation results offers extremely high availability and on tape - but the limited bandwidth of the tape library made this nondisruptive online scaling of the ﬁle system. impractical as data volumes grew rapidly. To analyze their results fully, users would have needed to transfer the data back to their own institu- tions, which was not feasible because of the low transfer speeds. “Even over high-speed leased lines, copying data back to a university network could take weeks, so we wanted to give our users the possibil- ity of storing their data locally at CSCS for the duration of their proj- ects,” comments Dominik Ulmer, CSCS general manager. “Equally, the typical HPC workﬂow has become more sophisticated: Instead of simply running a simulation on an input data set, we now often run
IBM Systems and Technology EducationCase Study multiple simulations in series, using the output data from one as the input data for the next. This tendency to reuse data was another reason “We selected IBM GPFS for creating a permanent, centralized data storage solution at CSCS.” as it offered the best combination of high Choosing the best solution The amount of data handled in the HPC environment at CSCS scalability, compatibility roughly doubles each year, making it imperative to select a highly scal- with our distributed able architecture for the proposed centralized storage solution. It was also critical to choose a ﬁle system that could be mounted on multiple operating systems, par- different HPC systems simultaneously, and that would offer both per- allelism of access, and formance and reliability. failover between nodes.” “We tested a number of ﬁle systems and narrowed our choice down to Oracle Lustre and IBM General Parallel File System [GPFS],” —Hussein Harake, HPC systems engineer, CSCS says Hussein Harake, CSCS HPC systems engineer. “We selected IBM GPFS, as it offered the best combination of high scalability, com- patibility with our distributed operating systems, parallelism of access and failover between nodes. Our data can be very long-lived, so it was also important to choose a solution that would offer longevity—both in terms of the reliability of long-term data storage and in terms of the vendor support and roadmap. Selecting GPFS from IBM enabled us to meet these requirements.” Managing rapid growth GPFS supports single cluster ﬁle systems of multiple petabytes and runs at I/O rates of more than 100 gigabytes per second. Individual clusters may be cross-connected to provide parallel access to data even across large geographic distances. At CSCS, GPFS offers both low latency (needed for high-speed access to small ﬁles) and high band- width (vital for delivering very large ﬁles to compute clusters). “Our GPFS-based central ﬁle store is becoming a really important resource for us,” says Harake. “Users really appreciate the option to store their data locally rather than needing to copy it back to their own institution. They are requesting more capacity than we originally antic- ipated, so the environment is growing faster than expected.” Ulmer adds, “The rapid rate of growth in data volumes is partly a con- sequence of researchers being able to run more complex simulations on the newer HPC clusters. So, to an extent, they are catching up on 2
IBM Systems and Technology EducationCase Study projects that couldn’t be done before. GPFS gave us an infrastructure Solution components: that would grow with user demand but in a way that was predictable in budgetary terms.” Hardware ● IBM® System x® 3650 M2 A key decision factor for GPFS was its support for nondisruptive ● IBM System Storage® DS5100 ● IBM System Storage DS5300 migrations and upgrades. Since ﬁrst implementing the IBM ﬁle ● IBM System Storage EXP5000 system, CSCS has upgraded through three phases of different storage ● IBM System Storage EXP5060 arrays and network switches, all without loss of data or service inter- Software ruption. Today, the centralized storage solution is based around three ● IBM General Parallel File System IBM System Storage DS5100 controllers with eight IBM System Storage EXP5060 Storage Expansion Enclosures (containing high- Services capacity SATA disks) and four IBM System Storage EXP5000 Storage ● IBM Global Technology Services Expansion Enclosures (containing high-performance Fibre Channel disks). IBM System x 3650 M2 servers running GPFS act as the ﬁle servers; Mellanox gateways and switches provide high-speed InﬁniBand networking. Says Harake, “We have upgraded the ﬁle system several times, changed the disk controllers and even changed the disks themselves, all without taking the solution down. We will soon upgrade the controllers from DS5100 to DS5300 and add four more expansion enclosures, which will expand our total capacity to 2 PB without any interruption to service.” Holistic approach IBM is responsible for supplying and supporting every element in the centralized storage environment, from the disks up to the network infrastructure. “We wanted IBM to take ownership of the core net- work, so that we have a single point of support for the whole environ- ment,” says Ulmer. “This holistic approach helps us minimize risk and delays in support.” He adds, “We consider HPC technology know-how to be our core competence, and we want to ﬁnd external partners that are willing to tackle the really cutting-edge stuff and learn alongside us. Our rela- tionship with IBM is very good, and we see a lot of value in our shared workshops. With the GPFS-based centralized storage solution, we feel that we have the ideal building-block for the coming years. The IBM solution will enable us to expand our capacity enormously with- out disruption and without loss of performance.” 3