SlideShare a Scribd company logo
1 of 20
WHOLE EXOME
SEQUENCING(WES)
4-11-2016
WHAT IS WES?
Sequencing of the whole exome (protein coding regions of the
genome)
Rabbani et al. reports that 85% of Mendelian disorders are linked to
mutations in exonic regions.
 WES therefore can have great clinical utility.
Local connection: In 2010, Dr. Elizabeth Worthey of Medical College
of Wisconsin sequenced an exome of a child with severe ulcerative
colitis.
WES DATA
All NGS assays use the same data storage formats for output.
 FASTQ, BAM
However, in RNA-Seq, we were interested in gene counts.
In WES data, we are interested in the differences between the human
reference sequence and the sample data.
We will annotate these differences to see if they are deleterious or
not.
WES DATA
• Genomes are getting cheaper
and cheaper.
• The SRA(NCBI Sequence Read
Archive) has trillions of base
pairs worth of data.
WHOLE EXOME PIPELINE
• We will be using a program called
SeqMule to automate the analysis of
our whole exome data.
PAIRED END SEQUENCING
• NGS data is almost always in a paired-end
format, which means that there are two files
associated with a particular run.
• For more information on the concept, I
refer you to http://goo.gl/7FKH6j.
STEP 1: DOWNLOAD DATA FROM
SRA
• The HapMap venture sequenced many
populations, including individuals of European
ancestry from Utah.
• One of these individuals, a child only known
by the sample accession number NA12878 is
probably the most sequenced individual on
Earth. You will download and analyze this
individual yourself.
• For demonstration, we will be downloading
another individual from the same cohort,
named NA07000.
STEP 1: DOWNLOAD DATA FROM
SRA
• Go the SRA-DNAnexus website and enter
SRR766039.
• Find the SRA file and download it.
STEP 1: DOWNLOAD DATA FROM
SRA
• Create a new folder in Linux and download the
SRA file into the folder.
• Commands:
mkdir NA07000; cd NA07000
wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-
instant/reads/ByRun/sra/SRR/SRR766/SRR766039/SRR76
6039.sra
fastq-dump --split-3 SRR766039.sra
STEP 2: RUN FASTQC
• While still in the NA007000 folder, run FASTQC
to get quality metrics.
•
STEP 2: RUN FASTQC: FORWARD
READ
• 101 bp sequences, good quality
throughout.
STEP 2: RUN FASTQC: REVERSE
READ
• Illumina instruments always have
quality degradation at 3’ end of
reverse reads.
• This pair of FASTQs do not need
trimming.
STEP 3: UNDERSTAND SEQMULE
• Once we have the FASTQ files, we will then use
a program called SeqMule to:
1. Align the reads to the reference genome.
2. De-duplicate the alignment to remove PCR duplicate.
3. Re-align the reads around insertions and deletions.
4. Call variants, create VCF of consensus calls
5. Produce plots of coverage.
STEP 3: UNDERSTAND SEQMULE
• Type: less ~/NGSTools/SeqMule/advanced_config to
see the config file.
• In this file, these lines have 1 beside them for
(Run=True):
• 2P_bwamem=1 #BWA-MEM alignment
• 3p_samtools_rmdup=1 #use MarkDuplicates from Picard tools to
mark duplicates
• 4p_samtools_filter=1 #use 'samtools view' command to filter reads
under 30 MAPQ
• 6px_gatklite_realign=1 #use GenomeAnalysisTKLite from GATK to
generate GATK intervals and then do realignment
• 8p_gatk_HaplotypeCaller=1
• 8p_samtools_mpileup=1
• 8p_freebayes=1
STEP 4: RUN SEQMULE
• While in the NA07000 folder, run this command:
•seqmule pipeline -a SRR766039_1.fastq -b SRR766039_2.fastq
-e -prefix NA07000 -threads 7 -capture default
•-a: forward read
• -b: reverse read
• -e: exome data
• -prefix: what you want to name the sample
• -threads: how many cores you want for alignment. 7 is good enough.
• -capture: default exome
•Seqmule should begin to run without stopping
immediately.
• Wait 4 hours.
STEP 5: EXAMINE OUTPUT
• Open the NA00070_report folder after completion.
• Open the summary.html file to observe the results of the
SeqMule run.
STEP 6: ANNOTATE CONSENSUS
VCF
• Go to http://wannovar.usc.edu/ to use
wANNOVAR, a web tool to annotate genomic
variants. I have used custom filtering to filter out
variants which are found in less than 5% of the
population. Press Submit when ready.
STEP 7: DOWNLOAD CSV FILE AND
FILTER
• When wANNOVAR is complete, you have two
choices.
1. Download the full annotation in CSV or TXT format to
upload into Excel for manipulation.
2. Download the Step 3 VCF (if you used Custom Filtering)
and re-annotate the VCF a second time to only annotate
your filtered variants.
3. Use IGV to confirm variant depth by opening the realigned
BAM file.
NOW IT’S YOUR TURN!
• You will run sample NA12878 through our whole-exome pipeline.
1. Create the sample folder.
2. Download a high-quality exome run for NA12878 using these
commands:
1. wget https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360-
A_1.fastq.gz
2. wget https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360-
A_2.fastq.gz
3. Run FastQC on the reads.
4. Run Seqmule: seqmule pipeline -a NA12878-NGv3-LAB1360-
A_1.fastq.gz -b NA12878-NGv3-LAB1360-A_1.fastq.gz -e -prefix
NA12878 -capture default
5. Upload consensus VCF to wANNOVAR, open realigned BAM in IGV,
and explore the most sequenced genome in the world!
HAPPY VARIANT HUNTING!

More Related Content

What's hot

What's hot (20)

Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Comparative genomics.pdf
Comparative genomics.pdfComparative genomics.pdf
Comparative genomics.pdf
 
ILLUMINA SEQUENCE.pptx
ILLUMINA SEQUENCE.pptxILLUMINA SEQUENCE.pptx
ILLUMINA SEQUENCE.pptx
 
Microarray CGH
Microarray CGHMicroarray CGH
Microarray CGH
 
Next generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvementNext generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvement
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
Microarray (DNA and SNP microarray)
Microarray (DNA and SNP microarray)Microarray (DNA and SNP microarray)
Microarray (DNA and SNP microarray)
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Whole Exome Sequencing .pptx
Whole Exome Sequencing .pptxWhole Exome Sequencing .pptx
Whole Exome Sequencing .pptx
 
Gene mapping | Genetic map | Physical Map | DNA Data Analysis (upgraded)
Gene mapping | Genetic map | Physical Map | DNA Data Analysis (upgraded)Gene mapping | Genetic map | Physical Map | DNA Data Analysis (upgraded)
Gene mapping | Genetic map | Physical Map | DNA Data Analysis (upgraded)
 
Comparative genomic hybridization
Comparative genomic hybridizationComparative genomic hybridization
Comparative genomic hybridization
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Genomic variation
Genomic variationGenomic variation
Genomic variation
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 

Similar to Whole exome sequencing(wes)

Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013
Shen Lu
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
sesejun
 
Toast 2015 qiime_talk
Toast 2015 qiime_talkToast 2015 qiime_talk
Toast 2015 qiime_talk
TOASTworkshop
 

Similar to Whole exome sequencing(wes) (20)

Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and Opportunities
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
 
Toast 2015 qiime_talk
Toast 2015 qiime_talkToast 2015 qiime_talk
Toast 2015 qiime_talk
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision Medicine
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
Seq 301116
Seq 301116Seq 301116
Seq 301116
 
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportTwo Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
 

Recently uploaded

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Recently uploaded (20)

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

Whole exome sequencing(wes)

  • 2. WHAT IS WES? Sequencing of the whole exome (protein coding regions of the genome) Rabbani et al. reports that 85% of Mendelian disorders are linked to mutations in exonic regions.  WES therefore can have great clinical utility. Local connection: In 2010, Dr. Elizabeth Worthey of Medical College of Wisconsin sequenced an exome of a child with severe ulcerative colitis.
  • 3. WES DATA All NGS assays use the same data storage formats for output.  FASTQ, BAM However, in RNA-Seq, we were interested in gene counts. In WES data, we are interested in the differences between the human reference sequence and the sample data. We will annotate these differences to see if they are deleterious or not.
  • 4. WES DATA • Genomes are getting cheaper and cheaper. • The SRA(NCBI Sequence Read Archive) has trillions of base pairs worth of data.
  • 5. WHOLE EXOME PIPELINE • We will be using a program called SeqMule to automate the analysis of our whole exome data.
  • 6. PAIRED END SEQUENCING • NGS data is almost always in a paired-end format, which means that there are two files associated with a particular run. • For more information on the concept, I refer you to http://goo.gl/7FKH6j.
  • 7. STEP 1: DOWNLOAD DATA FROM SRA • The HapMap venture sequenced many populations, including individuals of European ancestry from Utah. • One of these individuals, a child only known by the sample accession number NA12878 is probably the most sequenced individual on Earth. You will download and analyze this individual yourself. • For demonstration, we will be downloading another individual from the same cohort, named NA07000.
  • 8. STEP 1: DOWNLOAD DATA FROM SRA • Go the SRA-DNAnexus website and enter SRR766039. • Find the SRA file and download it.
  • 9. STEP 1: DOWNLOAD DATA FROM SRA • Create a new folder in Linux and download the SRA file into the folder. • Commands: mkdir NA07000; cd NA07000 wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra- instant/reads/ByRun/sra/SRR/SRR766/SRR766039/SRR76 6039.sra fastq-dump --split-3 SRR766039.sra
  • 10. STEP 2: RUN FASTQC • While still in the NA007000 folder, run FASTQC to get quality metrics. •
  • 11. STEP 2: RUN FASTQC: FORWARD READ • 101 bp sequences, good quality throughout.
  • 12. STEP 2: RUN FASTQC: REVERSE READ • Illumina instruments always have quality degradation at 3’ end of reverse reads. • This pair of FASTQs do not need trimming.
  • 13. STEP 3: UNDERSTAND SEQMULE • Once we have the FASTQ files, we will then use a program called SeqMule to: 1. Align the reads to the reference genome. 2. De-duplicate the alignment to remove PCR duplicate. 3. Re-align the reads around insertions and deletions. 4. Call variants, create VCF of consensus calls 5. Produce plots of coverage.
  • 14. STEP 3: UNDERSTAND SEQMULE • Type: less ~/NGSTools/SeqMule/advanced_config to see the config file. • In this file, these lines have 1 beside them for (Run=True): • 2P_bwamem=1 #BWA-MEM alignment • 3p_samtools_rmdup=1 #use MarkDuplicates from Picard tools to mark duplicates • 4p_samtools_filter=1 #use 'samtools view' command to filter reads under 30 MAPQ • 6px_gatklite_realign=1 #use GenomeAnalysisTKLite from GATK to generate GATK intervals and then do realignment • 8p_gatk_HaplotypeCaller=1 • 8p_samtools_mpileup=1 • 8p_freebayes=1
  • 15. STEP 4: RUN SEQMULE • While in the NA07000 folder, run this command: •seqmule pipeline -a SRR766039_1.fastq -b SRR766039_2.fastq -e -prefix NA07000 -threads 7 -capture default •-a: forward read • -b: reverse read • -e: exome data • -prefix: what you want to name the sample • -threads: how many cores you want for alignment. 7 is good enough. • -capture: default exome •Seqmule should begin to run without stopping immediately. • Wait 4 hours.
  • 16. STEP 5: EXAMINE OUTPUT • Open the NA00070_report folder after completion. • Open the summary.html file to observe the results of the SeqMule run.
  • 17. STEP 6: ANNOTATE CONSENSUS VCF • Go to http://wannovar.usc.edu/ to use wANNOVAR, a web tool to annotate genomic variants. I have used custom filtering to filter out variants which are found in less than 5% of the population. Press Submit when ready.
  • 18. STEP 7: DOWNLOAD CSV FILE AND FILTER • When wANNOVAR is complete, you have two choices. 1. Download the full annotation in CSV or TXT format to upload into Excel for manipulation. 2. Download the Step 3 VCF (if you used Custom Filtering) and re-annotate the VCF a second time to only annotate your filtered variants. 3. Use IGV to confirm variant depth by opening the realigned BAM file.
  • 19. NOW IT’S YOUR TURN! • You will run sample NA12878 through our whole-exome pipeline. 1. Create the sample folder. 2. Download a high-quality exome run for NA12878 using these commands: 1. wget https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360- A_1.fastq.gz 2. wget https://s3.amazonaws.com/bcbio_nextgen/NA12878-NGv3-LAB1360- A_2.fastq.gz 3. Run FastQC on the reads. 4. Run Seqmule: seqmule pipeline -a NA12878-NGv3-LAB1360- A_1.fastq.gz -b NA12878-NGv3-LAB1360-A_1.fastq.gz -e -prefix NA12878 -capture default 5. Upload consensus VCF to wANNOVAR, open realigned BAM in IGV, and explore the most sequenced genome in the world!