1. The iPlant Collaborative
Matt Vaughn
Director, Life Sciences Computing
Texas Advanced Computing Center
vaughn@tacc.utexas.edu
www.iplantc.org
Biology Cyberinfrastructure to Meet the Challenges of Large Datasets
2. What is iPlant?
The iPlant Collaborative is an NSF-
sponsored community-driven
organization that builds, operates, and
supports extensible and powerful
cyberinfrastructure for life sciences
4. iPlant Software Products
Name Description Target Audience
DNA Subway Educational interface to genomics topics Beginning users and educators
Discovery Environment User-friendly, petascale graphical workbench Command-line naïve users who
need to do scalable bioinformatics
Atmosphere User-friendly, on demand cloud computing and
persistent services
Users with desktop use cases or
complex software environments
Bisque Platform to facilitate cloud-based exchange and
exploration of biological images
Command-line naïve users who
need to do image analysis
Spatial Data
Infrastructure*
Platform for developing geospatial information
systems & deploying spatial data infrastructures
Command-line naïve users who
need to work with GIS data
iPlant Science APIs RESTful interface to all iPlant capabilities Advanced users, developers, 3rd
party infrastructure or service
providers
iPlant Data Store Capacious, scalable, shareable storage Shared and used by all iPlant users
5. iPlant Services
• Education, Outreach and
Training
• Real-time user support
• Hackathons & workshops
• Extended Collaborative
Support
• Powered by iPlant Program
6. 2014-2015 Highlights: Thousand Plant Transcriptomes
• Marker paper1 out along with several
coordinated manuscripts
• 100x increase in green plant gene coverage
by Genbank
• Key insights into relationships between land
plants and green algae
• Original sequence reads, assemblies &
downstream analyses, plus data access APIs
and workflows available via iPlant2
1. Wickett & Mirarab et al. Proc Natl Acad Sci U S A. 2014 Nov 11;111(45):E4859-68. doi: 10.1073/pnas.1323926111
2. Matasci et. Al 2014 GigaScience 2014, 3:17 doi:10.1186/2047-217X-3-17
7. 2014-2015 Highlights: iMicrobe
iMicrobe Data Commons
• Hurwitz Lab, University of Arizona
• Funded by Gordon & Betty Moore
Foundation
• Aim: Make the high-value CAMERA
microbial datasets available through an
interactive data commons
• Required just two months of development
thanks to iPlant cyberinfrastructure
– http://data.imicrobe.us/
• Being replicated to power a viral genomics
platform
iPlant offers a powerful toolbox for rapidly developing next-
generation community resources
8. 2014-2015 Highlights: iPlant’s Broadening Impact
• Powered by iPlant program
• Foundation for other life
sciences projects
• Adoption outside the life
sciences
JETSTREAM
9. 2014-2015 Highlights: Jetstream
• iPlant Atmosphere
demonstrated value of user-
provisioned cloud
• Partnership: Indiana
University, TACC, UArizona, U
Chicago, UTSA, Johns Hopkins
& Penn State
• NSF ACI #1445604
• January 2016 via XSEDE
• ~50x capacity of iPlant
Atmosphere. Same great UI.
Innovative new capabilities.
A national science and engineering cloud
10. What’s Coming Next?
• New high performance tools and workflows
– MAKER-P and a host of assembly and expression workflows
• iPlant Data Commons
– Discoverability, persistence, provenance
• Expanded support for pro users and developers
– APIs, workshops, tutorials, and more
• New capabilities to support Science Communities
– Expanding participation and fostering cooperation
11.
12. The iPlant Collaborative
New and Continuing Peer Collaborations
• CoGe – Comparative genomics
• EPIC – CoGe extension to support
epigenetics
• iAnimal – 2x USDA AFRI grants for CI
• Galaxy – Hosting usegalaxy.org
• BioExtract Server
• IBP – GCP led
• IRRI/CAS – Resequenced rice varieties
• KBase – DOE’s CI for bioenergy
• transPLANT – Elixir’s CI for plants
• TAIR – Hosting for sustainability
13. The iPlant Collaborative
Scientific Achievements through iPlant’s Open Infrastructure
1KP – 1000 Plant Transcriptome
Project
• Stored tens of millions of sequence
reads with iPlant, all assemblies,
plus data access APIs exposing 3+
million compute hours of
downstream analysis
• Demonstrates TNRS, tree creation,
ortholog clustering, etc.
• Claimed to create 100-fold increase
in plant genes in GenBank
• Dozens of papers out or on the way
14. Presenter Title
David Horvath Progress in Sequencing the Genome of an Invasive Polyploid Weed (Leafy Spurge)
Joshua Der A Global Gene Family Classification Resource for Plants and Its Utility for Comparative
Genomics, Genome Annotation, and Gene Family Studies
Kranthi Mandadi Transcriptomic Analyses and Alternative Splicing Landscapes of Brachypodium
Infected with Panicum Mosaic Virus
John Duvick Genome Annotation in the Cloud through XGDBvm Virtual Server Instances Deployed
at iPlant
Dong Xhu Soybean Knowledge Base (SoyKB): A Web Resource for Integration of Soybean
Translational Genomics and Molecular Breeding
Bonnie Hurwitz iMicrobe: Advancing Clinical and Environmental Microbial Research using the iPlant
Cyberinfrastructure
The iPlant Collaborative
Success Stories from our Users
Editor's Notes
To get the ball rolling, we ran a ‘proposal process’ to select two bio communties who had thorny Grand Challenges that could inform development of a national CI