SlideShare a Scribd company logo
Files, Tools, and Bioinformatics in the Cloud


   Thomas Keane

   Vertebrate Resequencing Informatics
   WTSI
   thomas.keane@sanger.ac.uk




Vertebrate Resequencing Informatics   17th November, 2009
DATA is the problem!

 NGS means large volumes of raw data
     Previously SRF (~8-10bytes per bp), now BAM (~1.6bytes per bp)
 How much data can a sequencing machine produce?
     20Gbp per lane, 16 lanes per run (1 run = 1.5 weeks) => 11Tbp/year
     Small sequencing center: 4 machines?
     44Tbp per year!
 Raw data in BAM: 70Tbytes                             SV Calling: SVMerge
 Processed calls much smaller
     1000G pilot VCF < 1Gbyte

        Alignment + BAM improvement




Vertebrate Resequencing Informatics   17th November, 2009
Simplistic Model: Cloud as compute resource




                                                                               Processes
                                                                  1. Align


                            SRF/Fastq/BAM
                            (2Mbps/sec)                           Variant calling (n x SNP callers, n indel
                                                                  callers, SV callers)
Sequencing Center/Institute                                   BAM + VCF
                                                              (2Mbps/sec)


                                           BAM                                      3,240 days
                                           VCF                                      to upload!


  Vertebrate Resequencing Informatics   17th November, 2009
Move the raw data generation to the compute




                                                                Variant calling (n x SNP callers, n indel
                                                                callers, SV callers)



Sequencing Center/Institute
                                            VCF
                         BAM
                         VCF

    Vertebrate Resequencing Informatics   17th November, 2009
Large Collaborative Projects: Cloud centric model




                                                            VCF

                                               Analysis groups
Vertebrate Resequencing Informatics   17th November, 2009

More Related Content

Viewers also liked

Cloud Computing And Mobility
Cloud Computing And MobilityCloud Computing And Mobility
Cloud Computing And Mobilitygowense
 
Cloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the FutureCloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the FutureAppistry
 
Trend and Future of Cloud Computing
Trend and Future of Cloud ComputingTrend and Future of Cloud Computing
Trend and Future of Cloud Computinghybrid cloud
 
2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study2015 Future of Cloud Computing Study
2015 Future of Cloud Computing StudyNorth Bridge
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computingHaslina
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing StudyNorth Bridge
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple pptAgarwaljay
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computingRkrishna Mishra
 

Viewers also liked (10)

Cloud Computing And Mobility
Cloud Computing And MobilityCloud Computing And Mobility
Cloud Computing And Mobility
 
Cloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the FutureCloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the Future
 
Trend and Future of Cloud Computing
Trend and Future of Cloud ComputingTrend and Future of Cloud Computing
Trend and Future of Cloud Computing
 
2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computing
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple ppt
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 
cloud computing ppt
cloud computing pptcloud computing ppt
cloud computing ppt
 

Similar to Next generation sequencing in cloud computing era

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data PreprocessingcursoNGS
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesThomas Keane
 
ULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORKULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORKaravind m t
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipelineDan Bolser
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 
Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas
 
PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!Kara Van Malssen
 
IPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH SwitzerlandIPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH SwitzerlandSwiss IPv6 Council
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...Academia Sinica
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace VisualizationPTIHPA
 
Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time SchedulingProviding Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time Schedulingtcucinotta
 
ProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.netProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.netProcessOne
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 
Distributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] worldDistributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] worldSATOSHI TAGOMORI
 
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specificationCambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specificationAdvantec Distribution
 

Similar to Next generation sequencing in cloud computing era (20)

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and Challenges
 
ULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORKULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORK
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipeline
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)
 
PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!
 
IPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH SwitzerlandIPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH Switzerland
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Cambium networks ugps_specification
Cambium networks ugps_specificationCambium networks ugps_specification
Cambium networks ugps_specification
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time SchedulingProviding Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling
 
ProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.netProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.net
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Distributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] worldDistributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] world
 
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specificationCambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
 

More from Thomas Keane

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsThomas Keane
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Thomas Keane
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2Thomas Keane
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingThomas Keane
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Thomas Keane
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Thomas Keane
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...Thomas Keane
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Thomas Keane
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Thomas Keane
 

More from Thomas Keane (10)

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotations
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 

Recently uploaded

In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 

Recently uploaded (20)

In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Next generation sequencing in cloud computing era

  • 1. Files, Tools, and Bioinformatics in the Cloud Thomas Keane Vertebrate Resequencing Informatics WTSI thomas.keane@sanger.ac.uk Vertebrate Resequencing Informatics 17th November, 2009
  • 2. DATA is the problem! NGS means large volumes of raw data   Previously SRF (~8-10bytes per bp), now BAM (~1.6bytes per bp) How much data can a sequencing machine produce?   20Gbp per lane, 16 lanes per run (1 run = 1.5 weeks) => 11Tbp/year   Small sequencing center: 4 machines?   44Tbp per year! Raw data in BAM: 70Tbytes SV Calling: SVMerge Processed calls much smaller   1000G pilot VCF < 1Gbyte Alignment + BAM improvement Vertebrate Resequencing Informatics 17th November, 2009
  • 3. Simplistic Model: Cloud as compute resource Processes 1. Align SRF/Fastq/BAM (2Mbps/sec) Variant calling (n x SNP callers, n indel callers, SV callers) Sequencing Center/Institute BAM + VCF (2Mbps/sec) BAM 3,240 days VCF to upload! Vertebrate Resequencing Informatics 17th November, 2009
  • 4. Move the raw data generation to the compute Variant calling (n x SNP callers, n indel callers, SV callers) Sequencing Center/Institute VCF BAM VCF Vertebrate Resequencing Informatics 17th November, 2009
  • 5. Large Collaborative Projects: Cloud centric model VCF Analysis groups Vertebrate Resequencing Informatics 17th November, 2009