Transcript of "BEACON's Cyberinfrastructure Needs"
BEACON Center for the Study of Evolution in Action An NSF Science and Technology CenterHeadquartered at Michigan State University Funded in 2010, at $25 million for first five years – Celebrating our second anniversary in August, funding expected through 2020 5 Partner Universities, 131 Faculty Members 42 Postdocs – 180 Graduate/47 UG Students; 445 people total Diverse Research – Microbiology, Robotic Swarms, Genetic Algorithms, Zoology, Computational Evolution, Plant Biology, and many other areas
MISSION: Illuminating and harnessing the power of evolution in action to advance science and technology and benefit society Three cross-cutting themes of BEACON researchBiological Evolution Digital Evolution Evolutionary Applications
Extreme compute: Avida• Avida: digital model for evolution Self-replicating computer programs• 100k CPU days per PhD thesis• ~100 GB of data per run to analyze Much less to preserve/archive• Low memory: < 1 GB of RAM/run• GPGPU not useful• Data archiving: community & university standards• Bottom line: Fairly traditional compute use (more cores, mo’ better!)
Extreme data: NGS data• Sequencing non-model organisms & communities – Soil metagenomics – Non-model animal transcriptomics• $10k sequencing/week => 1 TB of data Assembly requires 2x bigmem machine-weeks (512+ GB of RAM)• RAM and I/O limited.• “Big graph” problem with no locality Not easily distributable.• Estimate 5-50 Tbp of sequence needed/sample ~1m genomes/sample …multiple samples/thesis.• Community & university data archiving stds. Growth in sequencing capacity is outpacing Moore’s Law; new algorithmic approaches needed.
Our efforts1. Training - Biology has become data-intensive quite quickly! - Most biologists are not trained in effective use of computation. - Grad students are extremely motivated! - We run intro courses & many focused workshops: Intro grad course; Software Carpentry (Sloan); Analyzing Next- Gen Sequencing Data (NIH); metagenomics.2. Well-integrated layer of “cyberinfrastructure research” - Faculty research programs, labs incorporate development of robust community of software for modeling, simulation, data analysis. - Algorithm research is tightly integrated with biological research programs; e.g. novel compression approaches provide significant leverage on next-gen sequencing problems. - Exploration & adaptation to loosely coupled, poor I/O platforms (i.e. the Amazon cloud) to enable flexible extension of compute capacity. - …underappreciated, underfunded.