SlideShare a Scribd company logo
1 of 29
Download to read offline
Bioinforma)cs	
  Primary	
  Analysis	
  
Tutorial	
  
Phil	
  Richmond,	
  PRA	
  
Dowell	
  Lab	
  
University	
  of	
  Colorado,	
  Biofron)ers	
  
Ins)tute	
  
	
  
Outline	
  
•  Intro	
  
– Things	
  that	
  will	
  be	
  covered	
  
– Things	
  that	
  won’t	
  be	
  covered	
  
•  Workflow	
  
•  Mapping	
  with	
  Bow)e	
  
•  File	
  Conversion	
  with	
  Samtools	
  
•  Visualiza)on	
  with	
  IGV	
  
•  Extras	
  
Sequencing	
  
•  There	
  are	
  many	
  different	
  types	
  of	
  sequencing	
  
including	
  454,	
  Illumina,	
  SOLiD,	
  IonTorrent,	
  and	
  
more.	
  
•  If	
  you	
  are	
  interested	
  in	
  each	
  type	
  of	
  
sequencing…	
  
Things	
  that	
  will	
  be	
  covered	
  
•  The	
  primary	
  analysis	
  that	
  I	
  will	
  walk	
  through	
  is	
  
a	
  “bare	
  bones”	
  analysis,	
  meant	
  to	
  take	
  your	
  
reads	
  from	
  Illumina	
  sequencer	
  to	
  visualizer,	
  as	
  
well	
  as	
  some	
  organiza)onal	
  prac)ces	
  
– Mapping	
  (Bow)e/BWA)	
  
– File	
  format	
  conversion	
  
– Visualiza)on	
  
Things	
  that	
  won’t	
  be	
  covered	
  
•  Post/preprocessing	
  steps	
  that	
  I’m	
  leaving	
  out	
  include:	
  
–  FastX	
  analysis	
  of	
  raw	
  reads	
  and	
  adapter	
  clipping,	
  etc.	
  
–  PCR	
  duplicate	
  marking	
  (Illumina)	
  on	
  raw	
  reads	
  
–  Base	
  Quality	
  Score	
  Recalibra)on	
  (GATK)	
  on	
  mapped	
  reads	
  
–  Local	
  Realignment	
  around	
  indels	
  on	
  mapped	
  reads	
  
•  Any	
  Secondary	
  or	
  Ter)ary	
  analysis	
  or	
  scrip)ng	
  
techniques	
  
–  Secondary	
  analysis	
  by	
  personal	
  appt.	
  
–  Scrip)ng	
  techniques	
  by	
  joining	
  Dave	
  Knox’s	
  python	
  class	
  
Login	
  to	
  Tuxedo	
  
•  Login	
  with	
  –X	
  op)on	
  to	
  open	
  X11	
  viewer.	
  
•  On	
  a	
  PC…see	
  me	
  for	
  separate	
  instruc)ons	
  to	
  
pipe	
  visualiza)on	
  
•  ssh	
  –X	
  richmonp@tuxedo.colorado.edu	
  
Working	
  Directory	
  
•  We	
  will	
  be	
  working	
  in	
  /data/Tutorial/<Student>	
  
–  cd	
  /data/Tutorial/Phil/	
  
•  The	
  necessary	
  files	
  for	
  the	
  tutorial	
  are	
  in	
  /data/
Tutorial/Files/	
  
–  Parent113010.fa	
  is	
  the	
  reference	
  (e.	
  coli)	
  genome	
  
–  Parent120710.gff	
  is	
  the	
  annota)on	
  file	
  
–  Sample1_single.fastq	
  is	
  the	
  reads	
  file	
  we	
  are	
  working	
  
with	
  
Organiza)on	
  
•  In	
  your	
  own	
  directory	
  (/data/Tutorial/
<Student>/)	
  create	
  the	
  following	
  sub-­‐
directories:	
  
– Genome/	
  
•  Keep	
  the	
  fasta	
  and	
  gff	
  files	
  here	
  
– Bow)e/	
  
•  Keep	
  the	
  Bow)e	
  alignments,	
  and	
  post-­‐processing	
  of	
  
bow)e	
  alignments	
  here	
  
– Fastq/	
  
•  Keep	
  the	
  raw	
  fastq	
  files	
  here	
  
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
Fastq	
  file	
  
•  File	
  extension	
  .fastq	
  or	
  .fq	
  
•  Example:	
  
@Read_iden)fier_and_flowcell_info	
  
ACGTCCGGTTNNN…	
  
+	
  
B$!?NP[%&C…	
  
•  For	
  more	
  info	
  on	
  ASCII	
  encoding	
  QV	
  scores…
go	
  to	
  wikipedia	
  
Read	
  ID	
  
Read	
  Sequence	
  
Read	
  QV	
  ID	
  
Read	
  QV	
  Sequence
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
Mapping	
  the	
  Short	
  Reads	
  
•  Taking	
  each	
  read	
  and	
  mapping	
  it	
  to	
  a	
  
reference	
  genome	
  	
  
– Bow)e	
  
	
  
TGCATGCATGCATGCATGCATGCATGCATGCATGCAAAAAGCATGCATGCA	
  
TGCATGAATGCAAAAAGCATGCA	
  
Bow)e-­‐Build	
  Command	
  
•  In	
  order	
  to	
  map	
  the	
  reads	
  to	
  a	
  genome,	
  you	
  
must	
  acquire	
  the	
  genome	
  in	
  the	
  .fasta	
  (.fa)	
  
format,	
  and	
  then	
  index	
  it.	
  
•  bow)e-­‐build	
  -­‐f	
  <in.fasta>	
  <out_prefix>	
  
– $bow)e-­‐build	
  SGDv4.fasta	
  SGDv4_bow)e	
  
	
  
Bow)e	
  command	
  
•  Now	
  we	
  map	
  back	
  to	
  the	
  reference	
  we	
  just	
  
indexed.	
  
•  bow)e	
  <reference_in.prefix>	
  -­‐q	
  <in.fastq>	
  -­‐S	
  
<out.SAM>	
  2>	
  <out.stderr>	
  
– $	
  bow)e	
  /data/Tutorial/Phil/Genome/
Bow)e_index/SGDv3_bow)e	
  –q	
  Sample1.fastq	
  –S	
  
Sample1_	
  bow)e.sam	
  2>	
  Sample1_bow)e.stderr	
  
Sam	
  File	
  
•  Tab	
  Delimited	
  
•  hup://genome.sph.umich.edu/wiki/SAM	
  
•  Open	
  Example	
  SAM	
  
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
Samtools	
  Commands	
  
•  samtools	
  view	
  –bS	
  <in.sam>	
  -­‐o	
  <out.bam>	
  
– $samtools	
  view	
  –bS	
  Sample1_bow)e.sam	
  –o	
  
Sample1_bow)e.bam	
  
•  samtools	
  sort	
  <in.bam>	
  <out.sorted>	
  
– $samtools	
  sort	
  Sample1_bow)e.bam	
  
Sample1_bow)e.sorted	
  
•  samtools	
  index	
  <in.sorted.bam>	
  
– $samtools	
  index	
  Sample1_bow)e.sorted.bam	
  
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
IGV	
  
•  Located	
  at	
  /data2/IGV/	
  
•  Several	
  different	
  versions	
  available,	
  
recommend	
  either:	
  
•  	
  /data2/IGV/IGV_2.1.19/igv.jar	
  
•  /data2/IGV/IGV_1.5.64/igv.jar	
  
•  To	
  run	
  IGV:	
  	
  
– java	
  –Xmx5g	
  –jar	
  <igv.jar>	
  	
  
•  $java	
  –Xmx5g	
  –jar	
  /data2/IGV/IGV_1.5.64/igv.jar	
  &	
  
IGV:	
  Crea)ng	
  a	
  genome	
  
•  Reference	
  Instruc)ons	
  on	
  sheet.	
  
Bow)e	
  and	
  Bfast	
  IGV	
  
Bow$e	
  
Bfast	
  
Gene	
  
Advantages	
  to	
  Bfast	
  Gapped	
  Mapping	
  
Bow$e	
  
Bfast	
  
Gene	
  
Bfast	
  Mapping	
  Loosely	
  
Bow$e	
  
Bfast	
  
Gene	
  
If	
  you	
  are	
  gexng	
  the	
  hang	
  of	
  it	
  
quickly…	
  
•  Try	
  going	
  through	
  the	
  next	
  few	
  commands	
  
BWA	
  Paired	
  end	
  
•  /usr/local/src/bwa-­‐0.6.2/bwa	
  index	
  –a	
  is	
  –f	
  <in.fasta>	
  
•  Map	
  each	
  read	
  in	
  the	
  pair	
  independently	
  
•  /usr/local/src/bwa-­‐0.6.2/bwa	
  aln	
  <reference.prefix>	
  
<in_1.fq>	
  >	
  <out.sai>	
  
•  Finalize	
  the	
  mapping	
  by	
  conver)ng	
  (for	
  both	
  reads)	
  
both	
  the	
  .SAI	
  and	
  the	
  .FQ	
  into	
  a	
  final	
  SAM	
  alignment:	
  
•  /usr/local/src/bwa-­‐0.6.2/bwa	
  sampe	
  
<reference.prefix>	
  <in_1.sai>	
  <in_2.sai>	
  <in_1.fq>	
  
<in_2.fq>	
  >	
  <out_paired.sam>	
  	
  
Bow)e	
  Unique	
  Mapping	
  
•  Inves)gate	
  the	
  different	
  Bow)e	
  op)ons:	
  
– Look	
  at	
  –m	
  (number	
  of	
  mappings	
  per	
  read),	
  -­‐v	
  
(number	
  of	
  mismatches	
  per	
  seed)	
  
TopHat	
  Spliced	
  Mapping	
  
•  /usr/local/src/tophat-­‐2.0.4.Linux_x86_64/
tophat	
  –G	
  <in.gff>	
  	
  -­‐o	
  <output_directory>	
  
<bow)e_index>	
  <in.fastq>	
  	
  
The	
  end…for	
  now.	
  

More Related Content

Viewers also liked

Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...Institut de l'Elevage - Idele
 
Des éleveurs connectés - Conséquences et applications de l'élevage de précision
Des éleveurs connectés - Conséquences et applications de l'élevage de précisionDes éleveurs connectés - Conséquences et applications de l'élevage de précision
Des éleveurs connectés - Conséquences et applications de l'élevage de précisionInstitut de l'Elevage - Idele
 
Evolution de l'habitat en 2050 - Ademe
Evolution de l'habitat en 2050 - AdemeEvolution de l'habitat en 2050 - Ademe
Evolution de l'habitat en 2050 - AdemeBuild Green
 
Travail: Comment sensibiliser les personnes en phase d'installation ?
Travail: Comment sensibiliser les personnes en phase d'installation ?Travail: Comment sensibiliser les personnes en phase d'installation ?
Travail: Comment sensibiliser les personnes en phase d'installation ?Institut de l'Elevage - Idele
 
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...Institut de l'Elevage - Idele
 
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...Institut de l'Elevage - Idele
 
Les 3 axes de l'attractivité des métiers d'élevage
Les 3 axes de l'attractivité des métiers d'élevageLes 3 axes de l'attractivité des métiers d'élevage
Les 3 axes de l'attractivité des métiers d'élevageInstitut de l'Elevage - Idele
 
4. Outils pour le traitement ciblé et le traitement sélectif : Indicate...
4.       Outils pour le traitement ciblé et le traitement sélectif : Indicate...4.       Outils pour le traitement ciblé et le traitement sélectif : Indicate...
4. Outils pour le traitement ciblé et le traitement sélectif : Indicate...Institut de l'Elevage - Idele
 
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...Institut de l'Elevage - Idele
 
Guide : Choisir des matériaux pour construire et renover - Ademe
Guide : Choisir des matériaux pour construire et renover - AdemeGuide : Choisir des matériaux pour construire et renover - Ademe
Guide : Choisir des matériaux pour construire et renover - AdemeBuild Green
 

Viewers also liked (13)

Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
 
Des éleveurs connectés - Conséquences et applications de l'élevage de précision
Des éleveurs connectés - Conséquences et applications de l'élevage de précisionDes éleveurs connectés - Conséquences et applications de l'élevage de précision
Des éleveurs connectés - Conséquences et applications de l'élevage de précision
 
Evolution de l'habitat en 2050 - Ademe
Evolution de l'habitat en 2050 - AdemeEvolution de l'habitat en 2050 - Ademe
Evolution de l'habitat en 2050 - Ademe
 
Travail: Comment sensibiliser les personnes en phase d'installation ?
Travail: Comment sensibiliser les personnes en phase d'installation ?Travail: Comment sensibiliser les personnes en phase d'installation ?
Travail: Comment sensibiliser les personnes en phase d'installation ?
 
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
 
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
 
Les 3 axes de l'attractivité des métiers d'élevage
Les 3 axes de l'attractivité des métiers d'élevageLes 3 axes de l'attractivité des métiers d'élevage
Les 3 axes de l'attractivité des métiers d'élevage
 
Des solutions fourragères plus autonomes
Des solutions fourragères plus autonomesDes solutions fourragères plus autonomes
Des solutions fourragères plus autonomes
 
4. Outils pour le traitement ciblé et le traitement sélectif : Indicate...
4.       Outils pour le traitement ciblé et le traitement sélectif : Indicate...4.       Outils pour le traitement ciblé et le traitement sélectif : Indicate...
4. Outils pour le traitement ciblé et le traitement sélectif : Indicate...
 
Génomique semence sexée_eaap2015
Génomique semence sexée_eaap2015Génomique semence sexée_eaap2015
Génomique semence sexée_eaap2015
 
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
 
Guide : Choisir des matériaux pour construire et renover - Ademe
Guide : Choisir des matériaux pour construire et renover - AdemeGuide : Choisir des matériaux pour construire et renover - Ademe
Guide : Choisir des matériaux pour construire et renover - Ademe
 
Space 2015 orange - smart agriculture
Space 2015   orange - smart agricultureSpace 2015   orange - smart agriculture
Space 2015 orange - smart agriculture
 

Similar to Primary analysis tutorial depracated

Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDCDrew Farris
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesPerforce
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014ESUG
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014Marcus Denker
 
Playing with Java Classes and Bytecode
Playing with Java Classes and BytecodePlaying with Java Classes and Bytecode
Playing with Java Classes and BytecodeYoav Avrahami
 
Python VS GO
Python VS GOPython VS GO
Python VS GOOfir Nir
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Shirshanka Das
 
The Road to Lambda - Mike Duigou
The Road to Lambda - Mike DuigouThe Road to Lambda - Mike Duigou
The Road to Lambda - Mike Duigoujaxconf
 
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsBacking Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsITD Systems
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practiceC. Tobin Magle
 
Biohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityBiohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityJerven Bolleman
 
Griffith Bi Migration &amp; Source Control
Griffith Bi Migration &amp; Source ControlGriffith Bi Migration &amp; Source Control
Griffith Bi Migration &amp; Source ControlDavid Waters
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTTkevinvw
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and ActivatorKevin Webber
 

Similar to Primary analysis tutorial depracated (20)

Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse Branches
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014
 
Playing with Java Classes and Bytecode
Playing with Java Classes and BytecodePlaying with Java Classes and Bytecode
Playing with Java Classes and Bytecode
 
Python VS GO
Python VS GOPython VS GO
Python VS GO
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 
DATASTAGE ONLINE TRAINING
DATASTAGE ONLINE TRAININGDATASTAGE ONLINE TRAINING
DATASTAGE ONLINE TRAINING
 
Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The Road to Lambda - Mike Duigou
The Road to Lambda - Mike DuigouThe Road to Lambda - Mike Duigou
The Road to Lambda - Mike Duigou
 
01 html-introduction
01 html-introduction01 html-introduction
01 html-introduction
 
Tableau Architecture
Tableau ArchitectureTableau Architecture
Tableau Architecture
 
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsBacking Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Biohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityBiohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics Productivity
 
Griffith Bi Migration &amp; Source Control
Griffith Bi Migration &amp; Source ControlGriffith Bi Migration &amp; Source Control
Griffith Bi Migration &amp; Source Control
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTT
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 

Recently uploaded

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

Primary analysis tutorial depracated

  • 1. Bioinforma)cs  Primary  Analysis   Tutorial   Phil  Richmond,  PRA   Dowell  Lab   University  of  Colorado,  Biofron)ers   Ins)tute    
  • 2. Outline   •  Intro   – Things  that  will  be  covered   – Things  that  won’t  be  covered   •  Workflow   •  Mapping  with  Bow)e   •  File  Conversion  with  Samtools   •  Visualiza)on  with  IGV   •  Extras  
  • 3. Sequencing   •  There  are  many  different  types  of  sequencing   including  454,  Illumina,  SOLiD,  IonTorrent,  and   more.   •  If  you  are  interested  in  each  type  of   sequencing…  
  • 4. Things  that  will  be  covered   •  The  primary  analysis  that  I  will  walk  through  is   a  “bare  bones”  analysis,  meant  to  take  your   reads  from  Illumina  sequencer  to  visualizer,  as   well  as  some  organiza)onal  prac)ces   – Mapping  (Bow)e/BWA)   – File  format  conversion   – Visualiza)on  
  • 5. Things  that  won’t  be  covered   •  Post/preprocessing  steps  that  I’m  leaving  out  include:   –  FastX  analysis  of  raw  reads  and  adapter  clipping,  etc.   –  PCR  duplicate  marking  (Illumina)  on  raw  reads   –  Base  Quality  Score  Recalibra)on  (GATK)  on  mapped  reads   –  Local  Realignment  around  indels  on  mapped  reads   •  Any  Secondary  or  Ter)ary  analysis  or  scrip)ng   techniques   –  Secondary  analysis  by  personal  appt.   –  Scrip)ng  techniques  by  joining  Dave  Knox’s  python  class  
  • 6. Login  to  Tuxedo   •  Login  with  –X  op)on  to  open  X11  viewer.   •  On  a  PC…see  me  for  separate  instruc)ons  to   pipe  visualiza)on   •  ssh  –X  richmonp@tuxedo.colorado.edu  
  • 7. Working  Directory   •  We  will  be  working  in  /data/Tutorial/<Student>   –  cd  /data/Tutorial/Phil/   •  The  necessary  files  for  the  tutorial  are  in  /data/ Tutorial/Files/   –  Parent113010.fa  is  the  reference  (e.  coli)  genome   –  Parent120710.gff  is  the  annota)on  file   –  Sample1_single.fastq  is  the  reads  file  we  are  working   with  
  • 8. Organiza)on   •  In  your  own  directory  (/data/Tutorial/ <Student>/)  create  the  following  sub-­‐ directories:   – Genome/   •  Keep  the  fasta  and  gff  files  here   – Bow)e/   •  Keep  the  Bow)e  alignments,  and  post-­‐processing  of   bow)e  alignments  here   – Fastq/   •  Keep  the  raw  fastq  files  here  
  • 9. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 10. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 11. Fastq  file   •  File  extension  .fastq  or  .fq   •  Example:   @Read_iden)fier_and_flowcell_info   ACGTCCGGTTNNN…   +   B$!?NP[%&C…   •  For  more  info  on  ASCII  encoding  QV  scores… go  to  wikipedia   Read  ID   Read  Sequence   Read  QV  ID   Read  QV  Sequence
  • 12. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 13. Mapping  the  Short  Reads   •  Taking  each  read  and  mapping  it  to  a   reference  genome     – Bow)e     TGCATGCATGCATGCATGCATGCATGCATGCATGCAAAAAGCATGCATGCA   TGCATGAATGCAAAAAGCATGCA  
  • 14. Bow)e-­‐Build  Command   •  In  order  to  map  the  reads  to  a  genome,  you   must  acquire  the  genome  in  the  .fasta  (.fa)   format,  and  then  index  it.   •  bow)e-­‐build  -­‐f  <in.fasta>  <out_prefix>   – $bow)e-­‐build  SGDv4.fasta  SGDv4_bow)e    
  • 15. Bow)e  command   •  Now  we  map  back  to  the  reference  we  just   indexed.   •  bow)e  <reference_in.prefix>  -­‐q  <in.fastq>  -­‐S   <out.SAM>  2>  <out.stderr>   – $  bow)e  /data/Tutorial/Phil/Genome/ Bow)e_index/SGDv3_bow)e  –q  Sample1.fastq  –S   Sample1_  bow)e.sam  2>  Sample1_bow)e.stderr  
  • 16. Sam  File   •  Tab  Delimited   •  hup://genome.sph.umich.edu/wiki/SAM   •  Open  Example  SAM  
  • 17. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 18. Samtools  Commands   •  samtools  view  –bS  <in.sam>  -­‐o  <out.bam>   – $samtools  view  –bS  Sample1_bow)e.sam  –o   Sample1_bow)e.bam   •  samtools  sort  <in.bam>  <out.sorted>   – $samtools  sort  Sample1_bow)e.bam   Sample1_bow)e.sorted   •  samtools  index  <in.sorted.bam>   – $samtools  index  Sample1_bow)e.sorted.bam  
  • 19. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 20. IGV   •  Located  at  /data2/IGV/   •  Several  different  versions  available,   recommend  either:   •   /data2/IGV/IGV_2.1.19/igv.jar   •  /data2/IGV/IGV_1.5.64/igv.jar   •  To  run  IGV:     – java  –Xmx5g  –jar  <igv.jar>     •  $java  –Xmx5g  –jar  /data2/IGV/IGV_1.5.64/igv.jar  &  
  • 21. IGV:  Crea)ng  a  genome   •  Reference  Instruc)ons  on  sheet.  
  • 22. Bow)e  and  Bfast  IGV   Bow$e   Bfast   Gene  
  • 23. Advantages  to  Bfast  Gapped  Mapping   Bow$e   Bfast   Gene  
  • 24. Bfast  Mapping  Loosely   Bow$e   Bfast   Gene  
  • 25. If  you  are  gexng  the  hang  of  it   quickly…   •  Try  going  through  the  next  few  commands  
  • 26. BWA  Paired  end   •  /usr/local/src/bwa-­‐0.6.2/bwa  index  –a  is  –f  <in.fasta>   •  Map  each  read  in  the  pair  independently   •  /usr/local/src/bwa-­‐0.6.2/bwa  aln  <reference.prefix>   <in_1.fq>  >  <out.sai>   •  Finalize  the  mapping  by  conver)ng  (for  both  reads)   both  the  .SAI  and  the  .FQ  into  a  final  SAM  alignment:   •  /usr/local/src/bwa-­‐0.6.2/bwa  sampe   <reference.prefix>  <in_1.sai>  <in_2.sai>  <in_1.fq>   <in_2.fq>  >  <out_paired.sam>    
  • 27. Bow)e  Unique  Mapping   •  Inves)gate  the  different  Bow)e  op)ons:   – Look  at  –m  (number  of  mappings  per  read),  -­‐v   (number  of  mismatches  per  seed)  
  • 28. TopHat  Spliced  Mapping   •  /usr/local/src/tophat-­‐2.0.4.Linux_x86_64/ tophat  –G  <in.gff>    -­‐o  <output_directory>   <bow)e_index>  <in.fastq>