Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The iPlant Collaborative:A Cyberinfrastructure for the Life            Sciences          Naim Matasci  BIO5 / The iPlant C...
What is iPlant?
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
Problem 1: Data Volume• Cost of analysis follows Moores Law:  – 1 Student with 1 computer to analyze 1 Mb of    data produ...
Problem 2: Fragmented Analytical Landscape                      1. Tools separated by compute                         plat...
ScalabilityABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC
Major Ways to Access iPlant• Storing and sharing data large and small: iPlant Data  Storage• Integrated web-based analysis...
Why is the tree of life important?“Knowledge of evolutionary relationships isfundamental to biology, yielding new insights...
Nothing in biology makessense except in the lightof evolution.                T. G. Dobzahnsky
C3 to C4 PhotosynthesisXin-Guang et al. 2008
"We combined geospatial and molecularsequence data from two public archives toproduce a 1,230-taxon phylogeny of the grass...
"Here we show that grasses are ancestrally awarm-adapted clade and that C4 evolutionwas not correlated with shifts between...
New Possibilities                                                                      Acer glabrum                       ...
Just Ask
Atmosphere
iPlants APIs – The Foundation API      Service                            Role     EndpointIO              File storage, r...
Consumer Applications                        25
iPlant Data StoreDramatization: Not the actual iPlant Data Store
Overview of the iPlant Data Store      Some important items we won’t see in the demoSource            Destination      Cop...
Tree Visualization•   > 500K Taxa•   Fast•   Web based, platform independent•   Semantic zooming•   Metadata driven displa...
iPlant Tree Viewerhttp://portnoy.iplantcollaborative.org/
LIVE TREE VIEW DEMO
Obstacles                                     Lobelia_kauaensis                                     Lobelia_villosa       ...
Taxonomic uncertainty1. Non-existent names  •   Misspellings  •   Contamination      •   Annotations      •   Morphospecie...
a) Centaurium curvistamineum                                               (Wittr.) Abrams (1951)                         ...
Request Tool Installation        Apps -> Create -> New AppCreate New -> Request Tool Installation                         ...
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Upcoming SlideShare
Loading in …5
×

The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

440 views

Published on

iPlant Presentation given at NESCent in June 2012 for the phylotastic participants of Phylotastic

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

  1. 1. The iPlant Collaborative:A Cyberinfrastructure for the Life Sciences Naim Matasci BIO5 / The iPlant Collaborative
  2. 2. What is iPlant?
  3. 3. http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
  4. 4. Problem 1: Data Volume• Cost of analysis follows Moores Law: – 1 Student with 1 computer to analyze 1 Mb of data produced in 2001 – 200 Students and 200 computers to analyze all data produced for the same cost today (10 Gb)
  5. 5. Problem 2: Fragmented Analytical Landscape 1. Tools separated by compute platform, data format, integration issues, and programming model. 1. Mixture of desktop, command line, database, and web-based tools 2. Labor intensive, fragile solutions devised to reach scientific objectives 3. Little ability to share results, analytical methods 4. Lack of reproducibility
  6. 6. ScalabilityABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC
  7. 7. Major Ways to Access iPlant• Storing and sharing data large and small: iPlant Data Storage• Integrated web-based analysis: The Discovery Environment• Cloud computing: Atmosphere• Applications: TNRS, TreeViewer, PhytoBisque, etc• Scientific networking, knowledgebase and information exchange: My-Plant.org• Educational tools: DNASubway• Embedding iPlant CI capabilities into software: The Foundation API• High Performance Computing for experts: TeraGrid/XSEDE 10
  8. 8. Why is the tree of life important?“Knowledge of evolutionary relationships isfundamental to biology, yielding new insightsacross the plant sciences, from comparativegenomics and molecular evolution, to plantdevelopment, to the study of adaptation,speciation, community assembly, andecosystem functioning.”
  9. 9. Nothing in biology makessense except in the lightof evolution. T. G. Dobzahnsky
  10. 10. C3 to C4 PhotosynthesisXin-Guang et al. 2008
  11. 11. "We combined geospatial and molecularsequence data from two public archives toproduce a 1,230-taxon phylogeny of the grasseswith accompanying climate data for all species,extracted from more than 1.1 million herbariumspecimens." Edwards and Smith, 2010
  12. 12. "Here we show that grasses are ancestrally awarm-adapted clade and that C4 evolutionwas not correlated with shifts betweentemperate and tropical biomes. Instead, 18of 20 inferred C4 origins were correlated withmarked reductions in mean annualprecipitation."
  13. 13. New Possibilities Acer glabrum Acer saccharinum Acer rubrum Acer distylum Acer macrophyllum Acer nipponicum Acer spicatum Acer carpinifolium Acer diabolicum Acer circinatum Acer sieboldianum Acer palmatum Acer saccharum Acer tschonoskii Acer rufinerve Acer pensylvanicum Acer crataegifolium Acer monoillumina Genome Analyzer, Ranger Cluster at TACC, Acer phylogeny (Ackerly 2009), Green Plant ToL
  14. 14. Just Ask
  15. 15. Atmosphere
  16. 16. iPlants APIs – The Foundation API Service Role EndpointIO File storage, retrieval and management. Database interoperabilityDATA File format conversionAPPS Registration and discovery of HPC applicationsJOB Submission and management of compute jobsSYSTEMS Availability and info about XSEDE hostsPROFILE User profile discoveryAUTH Token based secure authenticationPOSTIT URL shortener
  17. 17. Consumer Applications 25
  18. 18. iPlant Data StoreDramatization: Not the actual iPlant Data Store
  19. 19. Overview of the iPlant Data Store Some important items we won’t see in the demoSource Destination Copy Method Time (seconds)CD My Computer cp 320Berkeley Server My Computer scp 150External Drive My Computer cp 36USB2.0 Flash My Computer cp 30iDS MyComputer iget 18My Computer My Computer cp 15 Close to optimum conditions; transfer between Univ. of Arizona and UC Berkeley 100GB: 29m15s 1 GB / 17.5 seconds
  20. 20. Tree Visualization• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information
  21. 21. iPlant Tree Viewerhttp://portnoy.iplantcollaborative.org/
  22. 22. LIVE TREE VIEW DEMO
  23. 23. Obstacles Lobelia_kauaensis Lobelia_villosa Lobelia_gloria-montis Trematolobelia_kauaiensis Trematolobelia_macrostachys Lobelia_hypoleuca Lobelia_yuccoides Lobelia_niihauensis Brighamia_insignis Brighamia_rockii Delissea_rhytidosperma Delissea_subcordata Cyanea_pilosa Cyanea_acuminata Cyanea_hirtella Cyanea_coriacea Cyanea_leptostegia Clermontia_kakeana Clermontia_parviflora Clermontia_arborescens Clermontia_faurieiNumber of taxa Taxa names
  24. 24. Taxonomic uncertainty1. Non-existent names • Misspellings • Contamination • Annotations • Morphospecies • Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions)2. Synonymy • Nomenclatural synonyms • Taxonomic synonyms / concepts3. Misidentifications, incomplete identifications
  25. 25. a) Centaurium curvistamineum (Wittr.) Abrams (1951) b) Centaurium minimum (Howell) Piper (1915) c) Centaurium muhlenbergii (Griseb.) Wight ex Piper (1906) d) Centaurium muhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937) e) Centaurium muhlenbergii (Griseb.) Wight ex Piper var. albiflorum Suksd. (1927) f) Centaurodes muhlenbergii (Griseb.) Kuntze (1891) g) Erythraea curvistaminea Wittr. (1886) h) Erythraea minima Howell (1901) i) Erythraea muhlenbergii Griseb. (1839)Image: Gordon Leppig & Andrea J. Pickart
  26. 26. Request Tool Installation Apps -> Create -> New AppCreate New -> Request Tool Installation Fill out forms and submit. Receive response in 2-5 days.

×