SlideShare a Scribd company logo
1 of 25
Download to read offline
Computational infrastructure for
    NGS data analysis


                        Pablo Escobar
                     pescobar@cipf.es
Index
   Introduction to supercomputing (HPC, high
    performance computing)
   NGS computational infrastructures
   Sequencing Centers
       Medical Genome Project MGP - Sevilla
       Centro Nacional Análisis genómico CNAG - Barcelona
       Beijing Genomics Institute BGI - China
   Alternatives:
       Cloud computing
       Grid computing
Computational infrastructure for NGS




   In NGS we have to process really big amounts
    of data, which is not trivial in computing terms.

   Big NGS projects require supercomputing
    infrastructures
Computational infrastructure for NGS


These infrastructures are expensive and not trivial to use, we require:
       Acondicionated data center (servers room). This is
        expensive
       Computing cluster:
                  Many computing nodes (servers)
                  High performance storage (hard disks)


                  Fast networks (10Gb ethernet, infiniband...)


       Skilled people in computing ( sysadmins and developers).
                    In CNAG currently 30 staff - >50% informatics
Big infrastructure cluster

   Distributed memory cluster
        Starting at 20 computing nodes
                  160 to 240 cores
                  


        amd64 (x86_64) is the most
         used cpu architecture
        At least 48GB ram per node
   Fast networks
        10Gbit
        Infiniband
   Batch queue system (sge, condor, pbs, slurm)
   Optional MPI and GPUs environment depending on project
    requirements
Big infrastructure storage

   Distributed filesystem for high performance
    storage (starting at 100TB)
       Lustre
       GPFS
       Ibrix
       parallel nfs
       glusterFS
   Storage is the most expensive
   NFS is not a good option for supercomputing
Big infrastructure storage
Big infrastructure
Big infrastructure

   Starting at 200.000€
       200.000€ is just the hardware
       Plus data center (computers room)
       Plus informatics salary
   Not every partner knows about supercomputing.
       SGI
       Bull
       IBM
       HP
Middle-size infrastructure

   ”Small” distributed filesystem ( around 50TB).

   ”Small” cluster (around 10 nodes, 80 to 120 cores).

   At least gigabit ethernet network.

   Price range: 50.000 – 100.000 € (just hardware)
       plus data center and informatics salary
Small infrastructure


   Recommended at least 2 machines
       8 or 12 cores each machine.
       48Gb ram minimum each machine.
       BIG local disk. At least 4TB each machine
                   As much local disks as we can afford


   Price range: starting at 8.000€ - 10.000€ (two
    machines)
Sequencing centers in Spain


                   Medical Genome Project
   Sequencing Instruments
       7 GS-FLX (Roche)
       4 SolidTM 4 (Applied Biosystems)


   Informatics infrastructure
       300 core cluster
       0,5 petabyte hard disks
Medical genome project
Storage racks
                    IBRIX filesystem
                          front-ends
MGP raw data generation

   a solid sequencer run
        7 days running
        Generates around 4TB
   Only the four solid sequencers working
    full time can generate around 12TB
    each week.
   12TB just of raw data. After running
    bioinformatics analysis more data is
    generated
   Raw data size grows really fast
        New sequencer models
        New reagents
MGP raw data generation
Sequencing centers in Spain


                                  CNAG
   Sequencing Instruments
       8 Illumina Genome Analyzer Iix
       6 Illumina HiSeq2000
       4 Illumina cBots
   Informatics infrastructure
       850 core cluster
       1.2 petabyte hard disks
       10 x 10 Gb/s link with marenostrum (Barcelona Super
        Computer 10,240 cores)
CNAG
Largest sequencing center in the
             world
   Beijing Genomics Institute (BGI)
Largest sequencing center in the
             world
   Beijing Genomics Institute (BGI)




                          Hardware resources
         Source: http://www.genomics.cn/en/platform.php?id=249
Sequencing center resources
Most used operating system is GNU/LINUX




       Source: http://www.top500.org/stats/list/36/osfam
Alternatives – cloud computing
   Pros
       Flexibility.
       You pay what you use.
       Don´t need to maintain a data center.
   Cons
       Transfer big datasets over internet is slow.
       You pay for consumed bandwidth. That is a problem
        with big datasets.
       Lower performance, specially in disk read/write.
       Privacy/security concerns.
       More expensive for big and long term projects.
Alternatives – grid computing
Alternatives – grid computing

   Pros
       Cheaper.
       More resources available.
   Cons
       Too heterogeneous environment.
       Slow connectivity (specially in Spain).
       Requires a lot of time to find good resources in the
        grid.
M4NY TH4NKS
     :)

More Related Content

What's hot

2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 
Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Integrated DNA Technologies
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)LOGESWARAN KA
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
Exploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingExploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingQIAGEN
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...Andor Kiss
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsNick Loman
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansGenomeInABottle
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomicsFrancisco Garc
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...Laura Berry
 

What's hot (20)

2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Exploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingExploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencing
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomics
 
Nextera Overview Feb 2010
Nextera Overview Feb 2010Nextera Overview Feb 2010
Nextera Overview Feb 2010
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...
 

Viewers also liked

Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...QIAGEN
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewDominic Suciu
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applicationsAGRF_Ltd
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAGEN
 
Next Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisNext Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisBastian Greshake
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Maté Ongenaert
 
Making your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designMaking your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designjelena121
 
NGS in cancer treatment
NGS in cancer treatmentNGS in cancer treatment
NGS in cancer treatmentNur Suhaida
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGBilal Nizami
 
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth Alira Health
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...QIAGEN
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1QIAGEN
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingDayananda Salam
 
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”Vinay Shiva Prasad
 

Viewers also liked (20)

Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Next Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisNext Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome Analysis
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
Making your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designMaking your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental design
 
NGS in cancer treatment
NGS in cancer treatmentNGS in cancer treatment
NGS in cancer treatment
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
 

Similar to Computational infrastructure for NGS data analysis

BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataZhong Wang
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
 
From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009Olli-Pekka Lehto
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PC Cluster Consortium
 
Sanger HPC infrastructure Report (2007)
Sanger HPC infrastructure  Report (2007)Sanger HPC infrastructure  Report (2007)
Sanger HPC infrastructure Report (2007)Guy Coates
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreDataWorks Summit
 
HPC_June2011
HPC_June2011HPC_June2011
HPC_June2011cfloare
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
Exadata x2 ext
Exadata x2 extExadata x2 ext
Exadata x2 extyangjx
 
Python for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPython for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPyData
 
2011 11-10 e-bio-grid_workshop introduction
2011 11-10 e-bio-grid_workshop introduction2011 11-10 e-bio-grid_workshop introduction
2011 11-10 e-bio-grid_workshop introductionINooren
 
Intel new processors
Intel new processorsIntel new processors
Intel new processorszaid_b
 

Similar to Computational infrastructure for NGS data analysis (20)

BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing data
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
 
Sanger HPC infrastructure Report (2007)
Sanger HPC infrastructure  Report (2007)Sanger HPC infrastructure  Report (2007)
Sanger HPC infrastructure Report (2007)
 
NSCC Training - Introductory Class
NSCC Training - Introductory ClassNSCC Training - Introductory Class
NSCC Training - Introductory Class
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
 
NSCC Training Introductory Class
NSCC Training Introductory Class NSCC Training Introductory Class
NSCC Training Introductory Class
 
HPC_June2011
HPC_June2011HPC_June2011
HPC_June2011
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Exadata x2 ext
Exadata x2 extExadata x2 ext
Exadata x2 ext
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Available HPC Resources at CSUC
Available HPC Resources at CSUCAvailable HPC Resources at CSUC
Available HPC Resources at CSUC
 
Python for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPython for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark Basham
 
2011 11-10 e-bio-grid_workshop introduction
2011 11-10 e-bio-grid_workshop introduction2011 11-10 e-bio-grid_workshop introduction
2011 11-10 e-bio-grid_workshop introduction
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 

More from cursoNGS

Towards an understanding of diversity in biological and biomedical systems
Towards an understanding of diversity in biological and biomedical systemsTowards an understanding of diversity in biological and biomedical systems
Towards an understanding of diversity in biological and biomedical systemscursoNGS
 
Utilidad de la genómica en la salud humana
Utilidad de la genómica en la salud humanaUtilidad de la genómica en la salud humana
Utilidad de la genómica en la salud humanacursoNGS
 
NGS analysis of micro-RNA
NGS analysis of micro-RNANGS analysis of micro-RNA
NGS analysis of micro-RNAcursoNGS
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGScursoNGS
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data PreprocessingcursoNGS
 
Linux for bioinformatics
Linux for bioinformaticsLinux for bioinformatics
Linux for bioinformaticscursoNGS
 
Introduccion a la bioinformatica
Introduccion a la bioinformaticaIntroduccion a la bioinformatica
Introduccion a la bioinformaticacursoNGS
 

More from cursoNGS (8)

Towards an understanding of diversity in biological and biomedical systems
Towards an understanding of diversity in biological and biomedical systemsTowards an understanding of diversity in biological and biomedical systems
Towards an understanding of diversity in biological and biomedical systems
 
Utilidad de la genómica en la salud humana
Utilidad de la genómica en la salud humanaUtilidad de la genómica en la salud humana
Utilidad de la genómica en la salud humana
 
NGS analysis of micro-RNA
NGS analysis of micro-RNANGS analysis of micro-RNA
NGS analysis of micro-RNA
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
 
Linux for bioinformatics
Linux for bioinformaticsLinux for bioinformatics
Linux for bioinformatics
 
Introduccion a la bioinformatica
Introduccion a la bioinformaticaIntroduccion a la bioinformatica
Introduccion a la bioinformatica
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Computational infrastructure for NGS data analysis

  • 1. Computational infrastructure for NGS data analysis Pablo Escobar pescobar@cipf.es
  • 2. Index  Introduction to supercomputing (HPC, high performance computing)  NGS computational infrastructures  Sequencing Centers  Medical Genome Project MGP - Sevilla  Centro Nacional Análisis genómico CNAG - Barcelona  Beijing Genomics Institute BGI - China  Alternatives:  Cloud computing  Grid computing
  • 3. Computational infrastructure for NGS  In NGS we have to process really big amounts of data, which is not trivial in computing terms.  Big NGS projects require supercomputing infrastructures
  • 4. Computational infrastructure for NGS These infrastructures are expensive and not trivial to use, we require:  Acondicionated data center (servers room). This is expensive  Computing cluster:  Many computing nodes (servers)  High performance storage (hard disks)  Fast networks (10Gb ethernet, infiniband...)  Skilled people in computing ( sysadmins and developers).  In CNAG currently 30 staff - >50% informatics
  • 5. Big infrastructure cluster  Distributed memory cluster  Starting at 20 computing nodes 160 to 240 cores   amd64 (x86_64) is the most used cpu architecture  At least 48GB ram per node  Fast networks  10Gbit  Infiniband  Batch queue system (sge, condor, pbs, slurm)  Optional MPI and GPUs environment depending on project requirements
  • 6. Big infrastructure storage  Distributed filesystem for high performance storage (starting at 100TB)  Lustre  GPFS  Ibrix  parallel nfs  glusterFS  Storage is the most expensive  NFS is not a good option for supercomputing
  • 9. Big infrastructure  Starting at 200.000€  200.000€ is just the hardware  Plus data center (computers room)  Plus informatics salary  Not every partner knows about supercomputing.  SGI  Bull  IBM  HP
  • 10. Middle-size infrastructure  ”Small” distributed filesystem ( around 50TB).  ”Small” cluster (around 10 nodes, 80 to 120 cores).  At least gigabit ethernet network.  Price range: 50.000 – 100.000 € (just hardware)  plus data center and informatics salary
  • 11. Small infrastructure  Recommended at least 2 machines  8 or 12 cores each machine.  48Gb ram minimum each machine.  BIG local disk. At least 4TB each machine  As much local disks as we can afford  Price range: starting at 8.000€ - 10.000€ (two machines)
  • 12. Sequencing centers in Spain Medical Genome Project  Sequencing Instruments  7 GS-FLX (Roche)  4 SolidTM 4 (Applied Biosystems)  Informatics infrastructure  300 core cluster  0,5 petabyte hard disks
  • 13. Medical genome project Storage racks IBRIX filesystem front-ends
  • 14. MGP raw data generation  a solid sequencer run  7 days running  Generates around 4TB  Only the four solid sequencers working full time can generate around 12TB each week.  12TB just of raw data. After running bioinformatics analysis more data is generated  Raw data size grows really fast  New sequencer models  New reagents
  • 15. MGP raw data generation
  • 16. Sequencing centers in Spain CNAG  Sequencing Instruments  8 Illumina Genome Analyzer Iix  6 Illumina HiSeq2000  4 Illumina cBots  Informatics infrastructure  850 core cluster  1.2 petabyte hard disks  10 x 10 Gb/s link with marenostrum (Barcelona Super Computer 10,240 cores)
  • 17. CNAG
  • 18. Largest sequencing center in the world  Beijing Genomics Institute (BGI)
  • 19. Largest sequencing center in the world  Beijing Genomics Institute (BGI) Hardware resources Source: http://www.genomics.cn/en/platform.php?id=249
  • 21. Most used operating system is GNU/LINUX Source: http://www.top500.org/stats/list/36/osfam
  • 22. Alternatives – cloud computing  Pros  Flexibility.  You pay what you use.  Don´t need to maintain a data center.  Cons  Transfer big datasets over internet is slow.  You pay for consumed bandwidth. That is a problem with big datasets.  Lower performance, specially in disk read/write.  Privacy/security concerns.  More expensive for big and long term projects.
  • 24. Alternatives – grid computing  Pros  Cheaper.  More resources available.  Cons  Too heterogeneous environment.  Slow connectivity (specially in Spain).  Requires a lot of time to find good resources in the grid.