SlideShare a Scribd company logo
1 of 16
Download to read offline
Zero to a Bioinformatics
Analysis Platform in Four Minutes
    Enis Afgan, Brad Chapman, Konstantinos Krampis, James Taylor
                                                         BOSC 2012
                                                     Long Beach, CA
Australian National Research Cloud



Provide computational infrastructure to support researchers
needs

    Compute and Storage
    (~25,000 cores + ? PB)
What’s required for genomics?
✔ •    Compute
✔ •    Storage
  •    Data resources
       o  Ensembl, dbSNP, etc

  •    Tools
  •    Visualisation
  •    Protocols
  •    Expertise
  •    Community!
Genomics Virtual Lab
Compute + Storage =   IaaS
shell vs. IDE

   We want it now
What’s required for genomics?
•    Compute
•    Storage
•    Data resources
     o  Ensembl, dbSNP, etc

•    Tools
•    Visualisation
•    Protocols
•    Expertise
•    Community!
Galaxy




                          y
       y                     CloudMan
CloudBioLinux            y
     y


           BioCloudCentral.org
Playing together
•    CloudBioLinux
     o  Quickly build-your-own tool suite / ready to roll
     o  Graphical & command line access

•    CloudMan
     o  Create a scalable and shareable processing platform

•    Galaxy
     o  Do exploratory analysis

•    BioCloudCentral.org
     o  Get started easily
•    Bundle infrastructure with an analysis tool suite, quickly
     o  Validate our approach
     o  Easier to maintain and replicate

•    Expose it all via at a variety of interfaces
     o  Support meta-analysis workflow

•    Move forward
     o  Add new features
     o  Start using it
And one new thing…

blend
 o    A python library for interacting with Galaxy’s API
 o    And CloudMan
 o    And BioCloudCentral
Request compute infrastructure

Manipulate compute infrastructure

Upload data and run analyses




Docs and examples




Test               Automate
                 repetitive tasks
Distribute
Docs and
examples included
http://blend.readthedocs.org/
Playing together
•    CloudBioLinux
     o  Build-your-own tool suite / ready to roll
     o  Graphical & command line access

•    CloudMan
     o  Create a scalable and shareable processing platform

•    BioCloudCentral.org
     o  Get started easily

•    Galaxy
     o  Do exploratory analysis

•    Blend library
     o  Automate repetitive tasks: analysis AND infrastructure
Questions?
  cloudbiolinux.org
  usecloudman.org
  usegalaxy.org
  biocloudcentral.org
  blend.readthedocs.org

  Visit the poster session (poster #10)

More Related Content

What's hot

Preservation Workflows with Taverna
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Taverna
cneudecker
 

What's hot (20)

Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
 
Preservation Workflows with Taverna
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Taverna
 
IT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & TechnologyIT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & Technology
 
OpenStack - Pour un Cloud ouvert - Journées FedeRez 2014
OpenStack - Pour un Cloud ouvert - Journées FedeRez 2014OpenStack - Pour un Cloud ouvert - Journées FedeRez 2014
OpenStack - Pour un Cloud ouvert - Journées FedeRez 2014
 
Introducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing FrameworkIntroducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing Framework
 
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldLeonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
 
Ci Connect: A service for building multi-institutional cluster environments
Ci Connect: A service for building multi-institutional cluster environmentsCi Connect: A service for building multi-institutional cluster environments
Ci Connect: A service for building multi-institutional cluster environments
 
Ansible
AnsibleAnsible
Ansible
 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
 
Indexing big data in the cloud
Indexing big data in the cloudIndexing big data in the cloud
Indexing big data in the cloud
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state government
 
Open Science Grid
Open Science GridOpen Science Grid
Open Science Grid
 
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in Production
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in ProductionJoel Jacobson (Datastax) - Diagnosing Cassandra Problems in Production
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in Production
 
Building big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerBuilding big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran Tessler
 
IronSource Atom - Redshift - Lessons Learned
IronSource Atom -  Redshift - Lessons LearnedIronSource Atom -  Redshift - Lessons Learned
IronSource Atom - Redshift - Lessons Learned
 
Circos plot
Circos plot Circos plot
Circos plot
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Shug meetup Hops Hadoop
Shug meetup Hops HadoopShug meetup Hops Hadoop
Shug meetup Hops Hadoop
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Datascience lab 2017 odessa kappa architecture 2.0
Datascience lab 2017 odessa   kappa architecture 2.0Datascience lab 2017 odessa   kappa architecture 2.0
Datascience lab 2017 odessa kappa architecture 2.0
 

Similar to E Afgan - Zero to a bioinformatics analysis platform in four minutes

Open Source Visualization of Scientific Data
Open Source Visualization of Scientific DataOpen Source Visualization of Scientific Data
Open Source Visualization of Scientific Data
Marcus Hanwell
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the Desktop
Marcus Hanwell
 

Similar to E Afgan - Zero to a bioinformatics analysis platform in four minutes (20)

re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
DevOps environment with OpenStack and NetApp
DevOps environment with OpenStack and NetAppDevOps environment with OpenStack and NetApp
DevOps environment with OpenStack and NetApp
 
Open Source Visualization of Scientific Data
Open Source Visualization of Scientific DataOpen Source Visualization of Scientific Data
Open Source Visualization of Scientific Data
 
Cloud patterns
Cloud patternsCloud patterns
Cloud patterns
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2O
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the Desktop
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuse
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Blue Teaming on a Budget of Zero
Blue Teaming on a Budget of ZeroBlue Teaming on a Budget of Zero
Blue Teaming on a Budget of Zero
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOps
 
Avoiding cloud lock-in
Avoiding cloud lock-inAvoiding cloud lock-in
Avoiding cloud lock-in
 
OpenStack Documentation in the Open
OpenStack Documentation in the OpenOpenStack Documentation in the Open
OpenStack Documentation in the Open
 
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a brief
 

More from Jan Aerts

Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Jan Aerts
 

More from Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
B Kinoshita - Creating biology pipelines with BioUno
B Kinoshita - Creating biology pipelines with BioUnoB Kinoshita - Creating biology pipelines with BioUno
B Kinoshita - Creating biology pipelines with BioUno
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

E Afgan - Zero to a bioinformatics analysis platform in four minutes

  • 1. Zero to a Bioinformatics Analysis Platform in Four Minutes Enis Afgan, Brad Chapman, Konstantinos Krampis, James Taylor BOSC 2012 Long Beach, CA
  • 2. Australian National Research Cloud Provide computational infrastructure to support researchers needs Compute and Storage (~25,000 cores + ? PB)
  • 3. What’s required for genomics? ✔ •  Compute ✔ •  Storage •  Data resources o  Ensembl, dbSNP, etc •  Tools •  Visualisation •  Protocols •  Expertise •  Community!
  • 6. shell vs. IDE We want it now
  • 7. What’s required for genomics? •  Compute •  Storage •  Data resources o  Ensembl, dbSNP, etc •  Tools •  Visualisation •  Protocols •  Expertise •  Community!
  • 8. Galaxy y y CloudMan CloudBioLinux y y BioCloudCentral.org
  • 9. Playing together •  CloudBioLinux o  Quickly build-your-own tool suite / ready to roll o  Graphical & command line access •  CloudMan o  Create a scalable and shareable processing platform •  Galaxy o  Do exploratory analysis •  BioCloudCentral.org o  Get started easily
  • 10.
  • 11. •  Bundle infrastructure with an analysis tool suite, quickly o  Validate our approach o  Easier to maintain and replicate •  Expose it all via at a variety of interfaces o  Support meta-analysis workflow •  Move forward o  Add new features o  Start using it
  • 12. And one new thing… blend o  A python library for interacting with Galaxy’s API o  And CloudMan o  And BioCloudCentral
  • 13. Request compute infrastructure Manipulate compute infrastructure Upload data and run analyses Docs and examples Test Automate repetitive tasks Distribute
  • 15. Playing together •  CloudBioLinux o  Build-your-own tool suite / ready to roll o  Graphical & command line access •  CloudMan o  Create a scalable and shareable processing platform •  BioCloudCentral.org o  Get started easily •  Galaxy o  Do exploratory analysis •  Blend library o  Automate repetitive tasks: analysis AND infrastructure
  • 16. Questions? cloudbiolinux.org usecloudman.org usegalaxy.org biocloudcentral.org blend.readthedocs.org Visit the poster session (poster #10)