Bringing a culture of computing to the Plant Sciences.
The state of the art today. On the left are icons representing SOME of the ways we work with data.Tools are separated from one another by compute platform, data format, integration issues, programming model.Often a mixture of desktop, command line, database, and web-page based analysesLabor intensive, fragile solutions devised to reach scientific objectivesLittle ability to share results, analytical methods, or to work collaborativelyWe can INVERT the language of the COMPLAINTS to form DESIGN PRINCIPLES.Going to focus on a couple of NGS cases in my talk
Left tree: Maple tree phylogeny from D. AckerlyLeft picture: Joe Felsenstein, ca. 1980Right picture: Ranger cluster at TACC
Our understanding of the phylogeny of the half million known species of green plants has expanded dramatically over the past two decades, The task of assembling a comprehensive "tree of life" for them presents a Grand Challenge.Also part of the grand challenge is developing the necessary infrastructre to view and use the tree of life, to put it into the hands of plant biologists
Public archives:MAT = Mean Annual TemperatureStephen Smith. iPlant supported postdoc. Now Assistant professor at the U MichiganPublished in PNAS last year
Left tree: Maple tree phylogeny from D. AckerlyLeft picture: Joe Felsenstein, ca. 1980Right picture: Ranger cluster at TACCNew sequencing technologies – Computational Power and Simplified access to computational resources allow us to move from local to global scale. Climate change, nutrition global scale.
Highest level of abstraction. Exactly like we can embed recent tweets in our web page, portal builders can add tools and services to their portals. E.g. BioExtract and CIPRES
From the Apps catalog in the DE, select Create -> New AppOpens the Tool Integration interfaceSelect: Create New -> Request Tool InstallationFill out the form and submit it.It takes 2-5 business days to deploy the tool.
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative:A Cyberinfrastructure for the Life Sciences Naim Matasci BIO5 / The iPlant Collaborative
Problem 1: Data Volume• Cost of analysis follows Moores Law: – 1 Student with 1 computer to analyze 1 Mb of data produced in 2001 – 200 Students and 200 computers to analyze all data produced for the same cost today (10 Gb)
Problem 2: Fragmented Analytical Landscape 1. Tools separated by compute platform, data format, integration issues, and programming model. 1. Mixture of desktop, command line, database, and web-based tools 2. Labor intensive, fragile solutions devised to reach scientific objectives 3. Little ability to share results, analytical methods 4. Lack of reproducibility
ScalabilityABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC
Major Ways to Access iPlant• Storing and sharing data large and small: iPlant Data Storage• Integrated web-based analysis: The Discovery Environment• Cloud computing: Atmosphere• Applications: TNRS, TreeViewer, PhytoBisque, etc• Scientific networking, knowledgebase and information exchange: My-Plant.org• Educational tools: DNASubway• Embedding iPlant CI capabilities into software: The Foundation API• High Performance Computing for experts: TeraGrid/XSEDE 10
Why is the tree of life important?“Knowledge of evolutionary relationships isfundamental to biology, yielding new insightsacross the plant sciences, from comparativegenomics and molecular evolution, to plantdevelopment, to the study of adaptation,speciation, community assembly, andecosystem functioning.”
Nothing in biology makessense except in the lightof evolution. T. G. Dobzahnsky
"We combined geospatial and molecularsequence data from two public archives toproduce a 1,230-taxon phylogeny of the grasseswith accompanying climate data for all species,extracted from more than 1.1 million herbariumspecimens." Edwards and Smith, 2010
"Here we show that grasses are ancestrally awarm-adapted clade and that C4 evolutionwas not correlated with shifts betweentemperate and tropical biomes. Instead, 18of 20 inferred C4 origins were correlated withmarked reductions in mean annualprecipitation."
iPlants APIs – The Foundation API Service Role EndpointIO File storage, retrieval and management. Database interoperabilityDATA File format conversionAPPS Registration and discovery of HPC applicationsJOB Submission and management of compute jobsSYSTEMS Availability and info about XSEDE hostsPROFILE User profile discoveryAUTH Token based secure authenticationPOSTIT URL shortener
iPlant Data StoreDramatization: Not the actual iPlant Data Store
Overview of the iPlant Data Store Some important items we won’t see in the demoSource Destination Copy Method Time (seconds)CD My Computer cp 320Berkeley Server My Computer scp 150External Drive My Computer cp 36USB2.0 Flash My Computer cp 30iDS MyComputer iget 18My Computer My Computer cp 15 Close to optimum conditions; transfer between Univ. of Arizona and UC Berkeley 100GB: 29m15s 1 GB / 17.5 seconds
Tree Visualization• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information
iPlant Tree Viewerhttp://portnoy.iplantcollaborative.org/