SlideShare a Scribd company logo
1 of 20
WHOLE EXOME
SEQUENCING(WES)
4-11-2016
WHAT IS WES?
Sequencing of the whole exome (protein coding regions of the
genome)
Rabbani et al. reports that 85% of Mendelian disorders are linked to
mutations in exonic regions.
 WES therefore can have great clinical utility.
Local connection: In 2010, Dr. Elizabeth Worthey of Medical College
of Wisconsin sequenced an exome of a child with severe ulcerative
colitis.
WES DATA
All NGS assays use the same data storage formats for output.
 FASTQ, BAM
However, in RNA-Seq, we were interested in gene counts.
In WES data, we are interested in the differences between the human
reference sequence and the sample data.
We will annotate these differences to see if they are deleterious or
not.
WES DATA
• Genomes are getting cheaper
and cheaper.
• The SRA(NCBI Sequence Read
Archive) has trillions of base
pairs worth of data.
WHOLE EXOME PIPELINE
• We will be using a program called
SeqMule to automate the analysis of
our whole exome data.
PAIRED END SEQUENCING
• NGS data is almost always in a paired-end
format, which means that there are two files
associated with a particular run.
• For more information on the concept, I
refer you to http://goo.gl/7FKH6j.
STEP 1: DOWNLOAD DATA FROM
SRA
• The HapMap venture sequenced many
populations, including individuals of European
ancestry from Utah.
• One of these individuals, a child only known
by the sample accession number NA12878 is
probably the most sequenced individual on
Earth. You will download and analyze this
individual yourself.
• For demonstration, we will be downloading
another individual from the same cohort,
named NA07000.
STEP 1: DOWNLOAD DATA FROM
SRA
• Go the SRA-DNAnexus website and enter
SRR766039.
• Find the SRA file and download it.
STEP 1: DOWNLOAD DATA FROM
SRA
• Create a new folder in Linux and download the
SRA file into the folder.
• Commands:
mkdir NA07000; cd NA07000
wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-
instant/reads/ByRun/sra/SRR/SRR766/SRR766039/SRR76
6039.sra
fastq-dump --split-3 SRR766039.sra
STEP 2: RUN FASTQC
• While still in the NA007000 folder, run FASTQC
to get quality metrics.
•
STEP 2: RUN FASTQC: FORWARD
READ
• 101 bp sequences, good quality
throughout.
STEP 2: RUN FASTQC: REVERSE
READ
• Illumina instruments always have
quality degradation at 3’ end of
reverse reads.
• This pair of FASTQs do not need
trimming.
STEP 3: UNDERSTAND SEQMULE
• Once we have the FASTQ files, we will then use
a program called SeqMule to:
1. Align the reads to the reference genome.
2. De-duplicate the alignment to remove PCR duplicate.
3. Re-align the reads around insertions and deletions.
4. Call variants, create VCF of consensus calls
5. Produce plots of coverage.
STEP 3: UNDERSTAND SEQMULE
• Type: less ~/NGSTools/SeqMule/advanced_config to
see the config file.
• In this file, these lines have 1 beside them for
(Run=True):
• 2P_bwamem=1 #BWA-MEM alignment
• 3p_samtools_rmdup=1 #use MarkDuplicates from Picard tools to
mark duplicates
• 4p_samtools_filter=1 #use 'samtools view' command to filter reads
under 30 MAPQ
• 6px_gatklite_realign=1 #use GenomeAnalysisTKLite from GATK to
generate GATK intervals and then do realignment
• 8p_gatk_HaplotypeCaller=1
• 8p_samtools_mpileup=1
• 8p_freebayes=1
STEP 4: RUN SEQMULE
• While in the NA07000 folder, run this command:
•seqmule pipeline -a SRR766039_1.fastq -b SRR766039_2.fastq
-e -prefix NA07000 -threads 7 -capture default
•-a: forward read
• -b: reverse read
• -e: exome data
• -prefix: what you want to name the sample
• -threads: how many cores you want for alignment. 7 is good enough.
• -capture: default exome
•Seqmule should begin to run without stopping
immediately.
• Wait 4 hours.
STEP 5: EXAMINE OUTPUT
• Open the NA00070_report folder after completion.
• Open the summary.html file to observe the results of the
SeqMule run.
STEP 6: ANNOTATE CONSENSUS
VCF
• Go to http://wannovar.usc.edu/ to use
wANNOVAR, a web tool to annotate genomic
variants. I have used custom filtering to filter out
variants which are found in less than 5% of the
population. Press Submit when ready.
STEP 7: DOWNLOAD CSV FILE AND
FILTER
• When wANNOVAR is complete, you have two
choices.
1. Download the full annotation in CSV or TXT format to
upload into Excel for manipulation.
2. Download the Step 3 VCF (if you used Custom Filtering)
and re-annotate the VCF a second time to only annotate
your filtered variants.
3. Use IGV to confirm variant depth by opening the realigned
BAM file.
NOW IT’S YOUR TURN!
• You will run sample NA12878 through our whole-exome pipeline.
1. Create the sample folder.
2. Download a high-quality exome run for NA12878 using these
commands:
1. wget https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360-
A_1.fastq.gz
2. wget https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360-
A_2.fastq.gz
3. Run FastQC on the reads.
4. Run Seqmule: seqmule pipeline -a NA12878-NGv3-LAB1360-
A_1.fastq.gz -b NA12878-NGv3-LAB1360-A_1.fastq.gz -e -prefix
NA12878 -capture default
5. Upload consensus VCF to wANNOVAR, open realigned BAM in IGV,
and explore the most sequenced genome in the world!
HAPPY VARIANT HUNTING!

More Related Content

What's hot

Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation SequencingFarid MUSA
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingAmritha S R
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingDayananda Salam
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingArindam Ghosh
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basicsUSD Bioinformatics
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingTapish Goel
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.mkim8
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingShelomi Karoon
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Mrinal Vashisth
 

What's hot (20)

Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
Exome Sequencing
Exome SequencingExome Sequencing
Exome Sequencing
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basics
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
STR IMP.pptx
STR IMP.pptxSTR IMP.pptx
STR IMP.pptx
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
 

Viewers also liked

Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotLi Shen
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsGolden Helix Inc
 

Viewers also liked (6)

Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plot
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 

Similar to Whole exome sequencing(wes)

Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
 
Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013Shen Lu
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...Mark Evans
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesChung-Tsai Su
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pubsesejun
 
Toast 2015 qiime_talk
Toast 2015 qiime_talkToast 2015 qiime_talk
Toast 2015 qiime_talkTOASTworkshop
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysisYun Lung Li
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineGabe Rudy
 

Similar to Whole exome sequencing(wes) (20)

Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and Opportunities
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
 
Toast 2015 qiime_talk
Toast 2015 qiime_talkToast 2015 qiime_talk
Toast 2015 qiime_talk
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
ILLUMINA SEQUENCE.pptx
ILLUMINA SEQUENCE.pptxILLUMINA SEQUENCE.pptx
ILLUMINA SEQUENCE.pptx
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision Medicine
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
Seq 301116
Seq 301116Seq 301116
Seq 301116
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Whole exome sequencing(wes)

  • 2. WHAT IS WES? Sequencing of the whole exome (protein coding regions of the genome) Rabbani et al. reports that 85% of Mendelian disorders are linked to mutations in exonic regions.  WES therefore can have great clinical utility. Local connection: In 2010, Dr. Elizabeth Worthey of Medical College of Wisconsin sequenced an exome of a child with severe ulcerative colitis.
  • 3. WES DATA All NGS assays use the same data storage formats for output.  FASTQ, BAM However, in RNA-Seq, we were interested in gene counts. In WES data, we are interested in the differences between the human reference sequence and the sample data. We will annotate these differences to see if they are deleterious or not.
  • 4. WES DATA • Genomes are getting cheaper and cheaper. • The SRA(NCBI Sequence Read Archive) has trillions of base pairs worth of data.
  • 5. WHOLE EXOME PIPELINE • We will be using a program called SeqMule to automate the analysis of our whole exome data.
  • 6. PAIRED END SEQUENCING • NGS data is almost always in a paired-end format, which means that there are two files associated with a particular run. • For more information on the concept, I refer you to http://goo.gl/7FKH6j.
  • 7. STEP 1: DOWNLOAD DATA FROM SRA • The HapMap venture sequenced many populations, including individuals of European ancestry from Utah. • One of these individuals, a child only known by the sample accession number NA12878 is probably the most sequenced individual on Earth. You will download and analyze this individual yourself. • For demonstration, we will be downloading another individual from the same cohort, named NA07000.
  • 8. STEP 1: DOWNLOAD DATA FROM SRA • Go the SRA-DNAnexus website and enter SRR766039. • Find the SRA file and download it.
  • 9. STEP 1: DOWNLOAD DATA FROM SRA • Create a new folder in Linux and download the SRA file into the folder. • Commands: mkdir NA07000; cd NA07000 wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra- instant/reads/ByRun/sra/SRR/SRR766/SRR766039/SRR76 6039.sra fastq-dump --split-3 SRR766039.sra
  • 10. STEP 2: RUN FASTQC • While still in the NA007000 folder, run FASTQC to get quality metrics. •
  • 11. STEP 2: RUN FASTQC: FORWARD READ • 101 bp sequences, good quality throughout.
  • 12. STEP 2: RUN FASTQC: REVERSE READ • Illumina instruments always have quality degradation at 3’ end of reverse reads. • This pair of FASTQs do not need trimming.
  • 13. STEP 3: UNDERSTAND SEQMULE • Once we have the FASTQ files, we will then use a program called SeqMule to: 1. Align the reads to the reference genome. 2. De-duplicate the alignment to remove PCR duplicate. 3. Re-align the reads around insertions and deletions. 4. Call variants, create VCF of consensus calls 5. Produce plots of coverage.
  • 14. STEP 3: UNDERSTAND SEQMULE • Type: less ~/NGSTools/SeqMule/advanced_config to see the config file. • In this file, these lines have 1 beside them for (Run=True): • 2P_bwamem=1 #BWA-MEM alignment • 3p_samtools_rmdup=1 #use MarkDuplicates from Picard tools to mark duplicates • 4p_samtools_filter=1 #use 'samtools view' command to filter reads under 30 MAPQ • 6px_gatklite_realign=1 #use GenomeAnalysisTKLite from GATK to generate GATK intervals and then do realignment • 8p_gatk_HaplotypeCaller=1 • 8p_samtools_mpileup=1 • 8p_freebayes=1
  • 15. STEP 4: RUN SEQMULE • While in the NA07000 folder, run this command: •seqmule pipeline -a SRR766039_1.fastq -b SRR766039_2.fastq -e -prefix NA07000 -threads 7 -capture default •-a: forward read • -b: reverse read • -e: exome data • -prefix: what you want to name the sample • -threads: how many cores you want for alignment. 7 is good enough. • -capture: default exome •Seqmule should begin to run without stopping immediately. • Wait 4 hours.
  • 16. STEP 5: EXAMINE OUTPUT • Open the NA00070_report folder after completion. • Open the summary.html file to observe the results of the SeqMule run.
  • 17. STEP 6: ANNOTATE CONSENSUS VCF • Go to http://wannovar.usc.edu/ to use wANNOVAR, a web tool to annotate genomic variants. I have used custom filtering to filter out variants which are found in less than 5% of the population. Press Submit when ready.
  • 18. STEP 7: DOWNLOAD CSV FILE AND FILTER • When wANNOVAR is complete, you have two choices. 1. Download the full annotation in CSV or TXT format to upload into Excel for manipulation. 2. Download the Step 3 VCF (if you used Custom Filtering) and re-annotate the VCF a second time to only annotate your filtered variants. 3. Use IGV to confirm variant depth by opening the realigned BAM file.
  • 19. NOW IT’S YOUR TURN! • You will run sample NA12878 through our whole-exome pipeline. 1. Create the sample folder. 2. Download a high-quality exome run for NA12878 using these commands: 1. wget https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360- A_1.fastq.gz 2. wget https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360- A_2.fastq.gz 3. Run FastQC on the reads. 4. Run Seqmule: seqmule pipeline -a NA12878-NGv3-LAB1360- A_1.fastq.gz -b NA12878-NGv3-LAB1360-A_1.fastq.gz -e -prefix NA12878 -capture default 5. Upload consensus VCF to wANNOVAR, open realigned BAM in IGV, and explore the most sequenced genome in the world!