HPC lab projects

1,034 views

Published on

Presentation at Emory regarding current projects and possibilities for interaction

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,034
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

HPC lab projects

  1. 1. HPC LabDavid A. Bader, E. Jason Riedy, HenningMeyerhenke, (horde of students...)
  2. 2. HPC Lab Projects• UHPC (DARPA) – Echelon: Extreme-scale Compute Hierarchies with Efficient Locality- Optimized Nodes – CHASM: Challenge Applications and Scalable Metrics (CHASM) for Ubiquitous High Performance Computing• GTFOLD (NIH): Combinatorial and Computational Methods for the Analysis, Prediction, and Design of Viral RNA Structures• PETA-APPS (NSF): Petascale Simulation for Understanding Whole- Genome Evolution• Graph500 (Sandia): Establish benchmarks for high-performance data- intensive computations on parallel, shared-memory platforms• STING (Intel): An open-source dynamic graph package for Intel platforms• CASS-MT (DoD): Graph Analytics for Streaming Data on Emerging Platforms• GALAXY (NIH, PI Dr. J. Taylor, Emory): Dynamically Scaling Parallel Execution for Cloud-based Bioinformatics 2
  3. 3. HPC Lab Projects And yet more...• Burton (NSF): Develop software and algorithmic infrastructure for massively multithreaded architectures.• Dynamic Graph Data Structures in X10 (IBM): Develop and evaluate graph data structures in X10• I/UCRC Center for Hybrid and Multicore Productivity Research, CHMPR (NSF) 3
  4. 4. Ubiquitous High Performance Computing (DARPA): Echelon Overall goal: develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks Architectural Drivers: Energy Efficient Security and Dependability Programmability Program Objectives: One PFLOPS, single cabinet including self-contained cooling 50 GFLOPS/W (equivalent to 20 pJ/FLOP) Total cabinet power budget 57KW, includes processing resources, storage and cooling Security embedded at all system levels Parallel, efficient execution models Highly programmable parallel systems Scalable systems – from terascale to petascale David A. Bader (CSE) Echelon Leadership Team“NVIDIA-Led Team Receives $25 Million Contract From DARPA to Develop High-Performance GPU Computing Systems” -MarketWatch Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes 4
  5. 5. Ubiquitous High PerformanceComputing (DARPA): CHASMOverall goal: develop highly parallel, security enabled, powerefficient processing systems, supporting ease of programming, withresilient execution through all failure modes and intrusion attacksArchitectural Drivers: New architectures require new benchmarks Evaluating usability requires applications Existing metrics do not encompass alll UHPC goalsProgram Objectives: Develop applications, benchmarks, and metrics Drive UHPC development Support performance analysis of UHPC systems Dan Campbell, GTRI, co-PI CHASM: Challenge Applications and Scalable Metrics for Ubiquitous High Performance Computing 5
  6. 6. GTFold (NIH):RNA Secondary Structure Prediction Program Goals Accurate structure of large viruses such as:FACULTY •InfluenzaChristine Heitsch (Mathematics) •HIV •PolioDavid A. Bader •Tobacco MosaicSteve Harvey (Biology) •Hanta 6
  7. 7. PetaApps (NSF):Phylogenetics Research on IBM Blue WatersAs part of the IBM PERCS team, we designed the IBMBlue Waters supercomputer that will sustain petascaleperformance on our applications, under the DARPAHigh Productivity Computing Systems program. • GRAPPA: Genome Rearrangements Analysis under Parsimony and other Phylogenetic Algorithm • Freely-available, open-source, GNU GPL • already used by other computational phylogeny groups, Caprara, Pevzner, LANL, FBI, Smithsonian Institute, Aventis, GlaxoSmithKline, PharmCos. • Gene-order Phylogeny Reconstruction • Breakpoint Median • Inversion Median • over one-billion fold speedup from previous codes • Parallelism scales linearly with the number of processors FACULTY David A. Bader, CSE www.phylo.org 7
  8. 8. Graph500 (SNL):Exploration of shared-memory graph benchmarks• Establish benchmarks for high-performance data- intensive computations on parallel, shared-memory Image Source: Nexus (Facebook application) platforms.• NOT LINPACK! 5 8 1 Image Source: Giot et al., “A Protein Interaction Map of Drosophila• Spec, reference melanogaster”, Science 302, 1722-1736, 2003 7 3 4 6 9 implementations at http://graph500.org 2 Problem Size• Ranking debuted at SC10 Class• Press: IEEE Spectrum, Toy (10) 17 GiB Computerworld, HPCWire, MIT Mini (11) 140 GiB Tech. Review, EE Times, Small (12) 1.1 TiB slashdot, etc... Medium (13) 18 TiB Large (14) 140 TiB Huge (15) 1.1 PiB 8
  9. 9. STING (Intel):Spatio-Temporal Interaction Networks and GraphsAn open-source dynamic graph package for Intel platforms Intel: Parallel Algorithms in• Develop and tune the Non-Numeric Computing STING package to analyze streaming, graph- structured data for Intel multi- and manycore platforms.• To support platforms from Photo © CTL Corp. server farms (NYSE, Facebook) to hand-held devices Photo © Intel• Span update scales from terabytes per day to human entry rates• Basis for algorithmic and performance work Photo © Intel 9
  10. 10. CASS-MT:Center for Adaptive Supercomputing Software• DoD-sponsored, launched July 2008• Pacific-Northwest Lab – Georgia Tech, Sandia, WA State, Delaware• The newest breed of supercomputers have hardware set up not just for speed, but also to better tackle large networks of seemingly random data. And now, a multi-institutional group of researchers has been awarded more than $12M to develop software for these supercomputers. Applications include anywhere complex webs of information can be found: from internet security and power grid stability to complex biological networks. 10
  11. 11. Example:Mining Twitter for Social GoodICPP 2010 Image credit: bioethicsinstitute.org 11
  12. 12. GALAXY (NIH, PI Dr. J. Taylor, Emory):Dynamically Scaling Parallel Execution for Cloud-based BioinformaticsParallel Genome Sequence Assembly Next Generation Sequencing experiments produce a large amount of small base pair strings (reads) Task: Assemble (concatenate) reads appropriately into larger substrings (contigs) Two main assembly approaches, both graph-based (de Bruijn vs. overlap/string graph) Objectives: Improve running time and ultimately also assembly accuracy Assembly Approach: Use overlap/string graph for higher accuracy Parallelism to reduce running time Compression to reduce memory consumption 12
  13. 13. Pasqual:New memory-efficient, parallel fast sequence assemblerExperimental ResultsMemory Usage and Running Time ● Pasqual: Our parallel (shared memory, OpenMP) sequence assembler ● Run on commodity server (8 cores, 16 hyperthreads) ● Memory usage reduced to ca. 50% for large data sets ● Running time compared to sequential assemblers: 24 to 325 times faster! ● Biologists can assembler larger data sets faster 13

×