4. Steps followed for variant calling
Mapping to the Reference
STAR aligner is used for mapping RNA reads to a reference, we recommend using
STAR aligner because it increased sensitivity compared to TopHat (especially for
INDELS).
Data Cleanup
MarkDuplicates ,sort reads : This tool locates and tags duplicate reads in a BAM or
SAM file, where duplicate reads are defined as originating from a single fragment of
DNA.
Variant calling
The Genome Analysis Toolkit 4 (GATK) to perform variant calling and is based on the best practices for
variant discovery analysis outlined by the Broad Institute.
HaplotypeCaller : the program traverses the sequencing data to identify regions of the genomes in which
the samples being analyzed show substantial evidence of variation relative to the reference.
Also try varscan
5. Base recalibration and variant filtering
Base Quality Score Recalibration (BQSR) is an important step for accurate
variant detection that aims to minimize the effect of technical variation on
base quality scores (measured as Phred scores).
gatk BaseRecalibrator
-R ref.fa
-I sorted_dedup_reads.bam
--known-sites bqsr_snps.vcf
--known-sites bqsr_indels.vcf
-O recal_data.table
Variant Filtering – Based on multiple parametres
6.
7. SNP effect predictors
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes,
transcripts, and protein sequence, as well as regulatory regions.
• Location of the variants (e.g. upstream of a transcript, in coding sequence, in non-coding RNA, in
regulatory regions)
• Consequence of your variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift)
• SIFT and PolyPhen-2 scores for changes to protein sequence
SnpEff Genetic variant annotation and functional effect prediction toolbox. It annotates and predicts the
effects of genetic variants on genes andproteins (such as amino acid changes). Features:
Supports over 38,000 genomes.
Standard ANN annotation format
Cancer variants analysis
GATK compatible (-o gatk)
8.
9. MAFtools
Mutation Annotation Format (MAF) files are tab-delimited files that contain
somatic and/or germline mutation annotations.
To convert a VCF into a MAF, each variant must be mapped to only one of
all possible gene transcripts/isoforms that it might affected. vcf2maf.pl
depends heavily on VEP for variants annotation.
MAFtools Analyze and visualize Mutation Annotation Format
(MAF) files from large scale sequencing studies. This package
provides various functions to perform most commonly used
analyses in cancer genomics and to create feature rich
customizable visualzations with minimal effort