Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BEACON's Cyberinfrastructure Needs


Published on

Our slides from an NSF meeting on computing needs in biology.

Published in: Education, Technology
  • Hi there! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

BEACON's Cyberinfrastructure Needs

  1. 1. BEACON Center for the Study of Evolution in Action An NSF Science and Technology CenterHeadquartered at Michigan State University Funded in 2010, at $25 million for first five years – Celebrating our second anniversary in August, funding expected through 2020 5 Partner Universities, 131 Faculty Members 42 Postdocs – 180 Graduate/47 UG Students; 445 people total Diverse Research – Microbiology, Robotic Swarms, Genetic Algorithms, Zoology, Computational Evolution, Plant Biology, and many other areas
  2. 2. MISSION: Illuminating and harnessing the power of evolution in action to advance science and technology and benefit society Three cross-cutting themes of BEACON researchBiological Evolution Digital Evolution Evolutionary Applications
  3. 3. Extreme compute: Avida• Avida: digital model for evolution Self-replicating computer programs• 100k CPU days per PhD thesis• ~100 GB of data per run to analyze Much less to preserve/archive• Low memory: < 1 GB of RAM/run• GPGPU not useful• Data archiving: community & university standards• Bottom line: Fairly traditional compute use (more cores, mo’ better!)
  4. 4. Extreme data: NGS data• Sequencing non-model organisms & communities – Soil metagenomics – Non-model animal transcriptomics• $10k sequencing/week => 1 TB of data Assembly requires 2x bigmem machine-weeks (512+ GB of RAM)• RAM and I/O limited.• “Big graph” problem with no locality Not easily distributable.• Estimate 5-50 Tbp of sequence needed/sample ~1m genomes/sample …multiple samples/thesis.• Community & university data archiving stds. Growth in sequencing capacity is outpacing Moore’s Law; new algorithmic approaches needed.
  5. 5. Our efforts1. Training - Biology has become data-intensive quite quickly! - Most biologists are not trained in effective use of computation. - Grad students are extremely motivated! - We run intro courses & many focused workshops: Intro grad course; Software Carpentry (Sloan); Analyzing Next- Gen Sequencing Data (NIH); metagenomics.2. Well-integrated layer of “cyberinfrastructure research” - Faculty research programs, labs incorporate development of robust community of software for modeling, simulation, data analysis. - Algorithm research is tightly integrated with biological research programs; e.g. novel compression approaches provide significant leverage on next-gen sequencing problems. - Exploration & adaptation to loosely coupled, poor I/O platforms (i.e. the Amazon cloud) to enable flexible extension of compute capacity. - …underappreciated, underfunded.