wings2014 Workshop 1 Design, sequence, align, count, visualize

2,383 views

Published on

Slides from Workshop 1 of wings 2014

Published in: Data & Analytics, Technology
  • Be the first to comment

wings2014 Workshop 1 Design, sequence, align, count, visualize

  1. 1. Workshops  in  next-­‐genera1on   science  at  UNC  Charlo7e  2014   Workshop  1  -­‐  Design,  sequence,   align,  count,  visualize   1  
  2. 2. Workshop  Loca1ons   •  Sec$on  1  -­‐  Room  801     – Ann  Loraine,  UNC  Charlo7e   – Naim  Matasci,  University  of  Arizona,  iPlant   •  Sec$on  2  -­‐  Room  802   – Ivory  Clabaugh  Blakley,  UNC  Charlo7e   – Xiangqin  Cui,  University  of  Alabama  Birmingham   •  Please  stay  in  your  sec$on   – Cover  same  material,  but  1ming  may  vary   2  
  3. 3. Meet  your  TAs   •  Graduate  students  from  UNCC  Dept  of   Bioinforma1cs  and  Genomics   –  801  Roshonda  Barner,  Ibro  Mujacic,  Chi-­‐Yu  "Jack"  Yen,   Warren  (G.)  Cole,  Tony  Dao,  Greg  Linchango,  Sushma   Madamanchi,  Anuja  Jain   –  802  Richard  Linchangco,    Fred  Lin,  Chris  Ball,  Lu  Tian,   Shawn  Chaffin,  Natascha  Moestl,  Walter  Clemens,   Adriano  Schneider   •  Loraine  Lab  members   –  801  Kyle  Su7lemyre  (IGB  support),  April  Estrada   (Research  Specialist,  Expert  IGB  User)   –  802  David  Norris  (IGB  Developer)   3  
  4. 4. Schedule   •  Workshop  1  -­‐  planning  an  experiment,  data   processing,  visualiza1on   – 9:00  to  11:30,  then  Lunch   •  Workshop  2  -­‐  introduc1on  to  R  &  RStudio  for   data  analysis,  differen1al  expression   – 12:30  to  2:30,  then  a  30'  Break   •  Workshop  3  -­‐  biological  interpreta1on  using   pathway  tools,  Gene  Ontology,  the  Web   – 3:00  to    5:00,  then  Done   4  
  5. 5. Using  RNA-­‐Seq  data  set  for  WiNGS2014     5   pollennetwork.org   •  Sponsored  by  Pollen  Research  Coordina1on   Network  in  Integra1ve  Pollen  Biology  (annual   mee1ng  starts  tonite)     •  Visit  Web  site  for  more  info  
  6. 6. RNA-­‐Seq  data  set  for  the  workshop   •  Goal:  Provide  resources  for  pollen  biology   –  Example  RNA-­‐Seq  data  analysis   –  Catalog  of  genes  expressed  in  pollen   –  Highlight  important  area  of  pollen  research   •  Problem:  Pollen  in  some  plant  species  is  vulnerable  to   heat  stress,  reduces  yields   –  Exposure  to  mild  heat  stress  (acclima$on)  can  protect   against  more  severe  stress  later  -­‐  called  acquired   thermotolerance  (Firon  2012)   •  To  learn  more,  we  sequenced  RNA  extracted  from   pollen  undergoing  a  mild  heat  stress   –  Same  temperature  that  can  establish  thermotolerance     6  
  7. 7. Samples  from  the  lab  of  Nurit  Firon,   Volcani  Ins1tute,  Israel   •  Firon  lab  studies  effects  of  heat  stress  on   tomato  pollen   •  Showed  (along  with  others)  that  high  temp.   reduces  pollen  viability,  sugar  content     •  Studying  a  heat-­‐tolerant  tomato  cul1var:   Hazera  3042   – Pollen  is  sensi1ve  to  heat  stress  but  not  as  much   as  other  varie1es   7  
  8. 8. Nurit's  experiment:  RNA-­‐Seq  of  heat-­‐ tolerant  tomato  cul1var  Hazera  3042   •  Collected  pollen  from  plants  growing  in   temperature-­‐controlled  greenhouses   –  Control  25/18°  C  op$mal  temperature   –  Treatment  32/26°  C  mild  chronic  heat  stress     •  Collected  batches  of  pollen  from  ~  10  plants   during  Sep.  &  Oct  2013     –  One  treatment,  one  control  per  collec1on   –  Made  RNA  from  five  collec1ons,  5  treatment,  5   control  "batches"   –   sequenced  at  UCLA  (69  base,  PE)   8  
  9. 9. Arabidopsis  cold  stress  RNA-­‐Seq     •  Simpler  data  set  with  one  treatment  &  control   –  Using  data  from  part  of  chr1,  treatment  sample  to   illustrate  data  processing,  visualiza1on,  effects  of   parameter  seongs  on  results  (maximum  intron  size  in   tophat  spliced  alignment  program)   •  For  details,  see:     –  experiment  record  at  the  Short  Read  Archive h7p://www.ncbi.nlm.nih.gov/sra/SRP029896     –  sample  h7p://www.ncbi.nlm.nih.gov/sra/SRX348640     •  Published  in  Methods  in  Molecular  Biology   h7p://www.ncbi.nlm.nih.gov/pubmed/24792048     9  
  10. 10. Workshop  1:  RNA-­‐Seq:  Design,   sequence,  align,  count,  visualize   wings  2014   10  10  
  11. 11. Goals   •  Learn  the  basics  (20')   – Plan  an  experiment   – Library  prep  for  RNA-­‐Seq   – Illumina  sequencing   •  Prac1ce:  Quality  analysis  using  FastQC  (30')   •  Prac1ce:  Data  processing  (30')   – Align  reads  (make  BAM  files  and  junc1on  files)     – Make  counts  files  for  sta1s1cal  analysis   – Merge  reads  into  transcript  models  w/  Cufflinks     •  Prac1ce:  Visualize  results  in  IGB  (60')   – Compare  to  data  set  in  Galaxy,  TAIR10  gene  models   11  
  12. 12. Visualiza1on  using  IGB   FASTQ  files   WildType1a.fastq Work  Shop  2   Workshop  1   Overview   FASTQC   Alignment   onto  Genome   $Command Line… WildType1a.bam Genera1on  of  Counts  Data   Counts.txt Sequencing  Strategy  
  13. 13. RNA-­‐seq:  ultra-­‐high  throughput  cDNA   sequencing   •  Several  papers  published  in  2008,  first  in  May     13  h7p://blog.sbgenomics.com/rna-­‐seq-­‐the-­‐first-­‐wave/   Ecker  lab   Snyder  lab   999  cites   1,076  cites  
  14. 14. Mortazavi  2008  "Mapping  and   quan1fying  mammalian  transcriptomes   by  RNA-­‐Seq"  Nature  Methods     •  Published  later  in  2008,   but  >  3000  cita1ons   •   Why?  Maybe  because   emphasized  RNA-­‐Seq  as     replacement  for   expression  DNA   microarrays   •  Comment  in  same  issue:   "Beginning  of  the  end  for   microarrays?"     14   google  scholar  
  15. 15. RNA-Seq Overview - Illumina   ~  ~  ~  ~   fragment   synthesize cDNA (random hexamers)   -  -  -  -   -  -  -  -   -  -  -   -  -   -  -   -  -  -  -   -  -  -  -   -  -  -   -  -   -  -   repair ends   add “A” bases to 3’ ends   ligate adapters   extract RNA, purify polyA+   -  -  -  -  -  -   -  -  -  -   -   amplify   library reflects RNA from original sample   Data, fastq sequence files Millions of reads per library   Map to genome Count reads per gene   improve gene models   identify differentially expressed genes   alignments   analyze splicing   and much more..   prepare flowcell   Plan experiment •  Biological replication •  Sequencing strategy •  Data analysis strategy   sequence by synthesis   collect samples   2. Making Libraries   quality assessment 3. Sequencing   4. Data Analysis   1. Design   15  
  16. 16. Five  steps  for  design   1.  Ar1culate  your  ques$ons  or  hypothesis     2.  Define  your  unit  of  biological  replica1on.   3.  Write  up  your  sample  collec1on  protocol  in   detail   –  Does  the  protocol  allow  you  to  test  your  hypothesis?     4.  Define  library  synthesis  &  sequencing  strategy   –  Read  lengths,  paired  end  vs.  single  end,  depth,   barcoding   5.  Ask  an  experienced  data  analyst  to  review  your   plan,  revise  needed   16  
  17. 17. Image:     David  C  Corney  Ph.  D.    h7p://www.labome.com/method/RNA-­‐seq-­‐Using-­‐Next-­‐Genera1on-­‐Sequencing.html   Fork  or  "Y"  adapters   size  selec1on   Library  synthesis     17   Y  adapters   contain  indexes,   allow   mul1plexing  
  18. 18. Example  library  molecule     Unknown   sequence  Rd1   Rd2   barcode   Universal   adapter     Index   Primer   18   Rd1   Rd2   Rd1  &  Rd  2  are  from  reverse  complements,  might  overlap.     Ref:  h7p://nextgen.mgh.harvard.edu/IlluminaChemistry.html   P5   P7  
  19. 19. Flow  cell  prepara1on  &   sequencing  by  synthesis   19   h7ps://www.youtube.com/watch?v=HMyCqWhwB8E    
  20. 20. Review:  Paired  End  vs  Single  End   •  Single  End  –  cheaper   •  Paired  End  –  more  expensive   – two  reads  per  fragment   – coun1ng  fragments,  not  reads     – call  normalized  counts  FPKM  not  RPKM   sequenced  in  SE   Sequenced  in  PE   SE   PE   indexed   adapter   20  
  21. 21. Get  the  reads  in  a  FASTQ  file   •  File  contains  millions  of  records   – Each  record  has  four  lines,  represents  ONE   sequence   •  Line  1  –  the  name,  starts  with  @   •  Line  2  –  the  sequence,  starts  at  new  line   •  Line  3  –  some  other  stuff,  op1onal,  starts  with  +   •  Line  4  –  the  quality  scores,  starts  at  new  line   @SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12! CCTAAATGGTGCCATGCTAGGAGGCCGTGCCCTTCTTGAAAAGTTGTATGTGAA! +! BBBFFFFFFBFFFIIIIFI<FFIIIIIFIIIIFBFIIIIIIIIFFFIIIIFIII! base  =  T   score  =  F  =  37   21  
  22. 22. Phred  Quality  score  Q   h7p://en.wikipedia.org/wiki/FASTQ_format   Describes  how  exponen1ally  unlikely   it  is  that  a  given  base  call  is  wrong.   Q  =  -­‐10  log10  pe     22  
  23. 23. h7p://drive5.com/usearch/manual/quality_score.html   Different  Illumina  data  processing  pipelines   used  different  score  encodings   23  
  24. 24. Get  two  files  -­‐  Read1  &  Read2  -­‐  from   paired  end  sequencing   •  Read1  and  Read2  have  same  read  iden$fier,  are   reverse  complements  of  the  same  fragment     •  Example  is  processing  pipeline  Cassava  1.8,  older   versions  used  different  naming  conven1ons   @SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12! CCTAAATGGTGCCATGCTAGGAGGCCGTGCCCTTCTTGAAAAGTTGTATGTGAA! +! BBBFFFFFFBFFFIIIIFI<FFIIIIIFIIIIFBFIIIIIIIIFFFIIIIFIII! @SN1083:379:H8VA1ADXX:2:1101:1248:2144 2:N:0:12! CATTTTCGACGTTGTTAATAAGCTCTGCGTACTTGCAAGCTATCTGCGCGAACG! +! BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIFFF! 24   R1   R2  
  25. 25. Sequence  iden1fier  line  in  Cassava  1.8     25   @SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12! machine    run#    flow-­‐cell-­‐id    lane      1le    x-­‐pos    y-­‐pos   read#                            index      is-­‐filtered            (barcode)                            control  
  26. 26. FastQC   •  Many  groups  use  FastQC  as  a  first  pass  quality   assessment   •  Free  from  Babraham  h7p:// www.bioinforma1cs.babraham.ac.uk/ projects/fastqc/   •  Run  interac1vely  (point-­‐and-­‐click)  or   command  line  (won’t  cover  this)   26  
  27. 27. Prac1ce:  Using  FastQC     •  Go  to  Conference  DropBox  link:     –  h7p://bitly.com/rnaseq2014   •  Note  two  folders  –  FastQC  and  FastQC-­‐Examples   –  FastQC-­‐Examples  has  FastqQC  reports  from  different   species,  sample  types  (next  slide)   •  FastQC  folder,  download   –  Example.fastq   –  FastQC_Manual.pdf   •  Start  FastQC,  open  Example.fastq   27  
  28. 28. Prac1ce:  Watch  FastQC  video   •  h7ps://www.youtube.com/watch? v=bz93ReOv87Y  (start  around  34  sec)   •  Take-­‐home  #1:  FastQC  assesses  whether  your   data  files  are  typical   •  Take-­‐home  #2:  A  "bad  result"  from  FastQC   doesn't  always  mean  your  data  are  not  useful   or  valuable   •  Explore  on  your  own!  (~  15  minutes)   28  
  29. 29. Prac1ce:  View  reports  in  Fastqc-­‐ Examples  (~  15  min)       •  Blueberry     – OnealRipe_1     – OzarkblueGreen_1   •   Tomato  pollen   – T2_1     – C2_1     •  Rice   – Control2h-­‐R2    Per  read  %GC   29  
  30. 30. Prac1ce:  Data  processing   •  Double-­‐click  "Alignment.tar.gz"  on  your   Desktop  to  unpack  it   •  Also  available  from   h7p://bitly.com/rnaseq2014   30  
  31. 31. Prac1ce:  Look  at  "align.sh"   •  Open  Alignment  folder   •  Right-­‐click  "align.sh"   •  Select  "open  with  text  editor"   •  This  is  a  shell  script   –  Commands  executed  in  sequence     –  Very  useful  for  automa1ng  tasks   •  First  line  is  "she-­‐bang"  line   –  tells  Terminal  it's  a  shell  script   •  All  other  lines  star1ng  with  #  are   comments  (not  run)   31   Learning  the   bash  shell     Great  guide  to   wri1ng  shell   scripts  
  32. 32. align.sh  -­‐  simple  pipeline  for  RNA-­‐ Seq  data  processing   •  Aligns  a  sample  fastq  file    to  genome   –  tophat2, bowtie2! –  fastq  file  is  from  Arabidopsis  cold  stress  experiment   (Short  Read  Archive  SRX348640)   –  file  ColdTreatment-little.fastq.gz (gzip-­‐ compressed,  .gz)   •  Counts  reads  that  align  to  TAIR10  genes   –  featureCounts! –  only  coun1ng  reads  that  uniquely  align   •  Merges  alignments  into  transcript  models   –  cufflinks! 32  
  33. 33. Prac1ce:  Intro  to  Terminal   •  Double-­‐click  Terminal  shortcut  on  desktop     –  Program  for  entering  commands  or  running  scripts   –  Also  called  a  "shell"  or  "Unix  shell"   –  Can  open  mul1ple  Terminal  windows     •  Each  window  called  a  "shell"  or  "Unix  shell"   •  Terminal  shows  hierarchical  view  of  file  system   –  An  upside-­‐down  tree,  where  every  folder  is  inside   another  folder   –  Folders  are  also  called  "directories"     –  The  top  folder  (that  contains  everything  else)  is  called   "root"  directory  -­‐    /  (forward  slash)   33  
  34. 34. Prac1ce:  Open  Terminal,  try  these   commands   •  cd  change  directory   –  by  itself  means  "go  to  user   home  directory"     –  with  an  argument  means:  go   there     –  with  ".."  means  go  up  one   •  pwd  -­‐  "print  the  current   working  directory"  &  find   out  where  you  are   34  
  35. 35. Prac1ce:  Try  these  commands   ls lists  files  and  directories  in   the  current  directory   35  
  36. 36. Prac1ce:  Try  these  commands   36   •  ls -l  "list  long"     – report  more  informa1on  about  files   – "d"  means  it's  a  directory  (folder)      
  37. 37. Prac1ce:  Run  align.sh  in  Terminal   •  Go  to  home  directory   •  Go  to  Desktop     •  Go  to  Alignment     •  Run  align.sh     37  
  38. 38. Now   Running:   tophat2     spliced   alignment   tool   38   TopHat:  discovering  splice   junc$ons  with  RNA-­‐Seq     Cole  Trapnell1,  Lior  Pachter  and   Steven  L.  Salzberg   Figure  1  
  39. 39. Tophat  Output  -­‐  we'll  open  in  IGB   •  Creates  new  folder  with  files,  including...   •  accepted_hits.bam  -­‐  "binary  alignments"  file   contains  read  alignments   –  BAM  -­‐  compressed  version  of  SAM  -­‐  "sequence  alignment",   needs  index  ".bai"  file  (made  using  samtools)   •  junction.bed  -­‐  reports  boundaries  of  introns,   called  "junc1on"  features     –  BED  format,  tab-­‐delimited  plain  text  file   –  one  junc1on  feature  per  line   –  fi{h  field  is  score,  no.  spliced  reads  aligned  across  the   junc1on   –  see:  h7p://genome.ucsc.edu/FAQ/ FAQformat.html#format1   39  
  40. 40. Prac1ce:  Start  IGB  while  script  runs     •  Double-­‐click  IGB  desktop  icon   •  Click  Arabidopsis  flower  on  start  screen   40  
  41. 41. Prac1ce:  How  to  get  IGB  if  you're  using   your  own  computer   •  Go  to  h7p://bioviz.org   •  Follow  Download  link   •  Choose  Medium  Memory  op1on  (typical)   41  
  42. 42. TAIR10  annota1ons,  June  2009   Columbia-­‐0  genome  release   •  TAIR10  protein-­‐coding  gene  models  loaded   automa1cally  from  IGB  data  server     •  Forward  &  reverse  strand  in  separate  tracks   42   Forward   Reverse  
  43. 43. RNA-­‐Seq,  ChIP-­‐Seq,  other  data  sets   available  in  Data  Access  tab   •  IGB  data  servers,  can  set  up  your  own     43  
  44. 44. Arabidopsis  pollen  data  sets   •  Read  alignments,  coverage  graphs,  junc1on  files   •  From  2013  Plant  Phys.  Pollen  RNA-­‐Seq  paper  44  
  45. 45. Prac1ce:  Combine  Plus  &  Minus  Tracks   Click  "+/-­‐"  to   combine  tracks     45   Use  Data  Management  Table  to  change  track   color,  name,  visibility,  load  op1ons,  strand  op1ons  
  46. 46. Summary  of  moving  and  zooming   •  Animated  zooming     –  click  to  posi1on  zoom  stripe,  sets  zoom  focus   –  horizontal  zoom  &  ver1cal  stretch   •  Moving  from  side  to  side  (panning)   –  arrows  in  toolbar   –  hand  icon  -­‐  the  move  tool   •  Jump-­‐zooming   –  Click-­‐drag  coordinate  axis  with  arrow  tool   –  Double-­‐click  to  zoom  in  on  a  feature     –  Search  by  name   46  
  47. 47. Prac1ce:  Zoom  in  on  a  feature   •  Zoom  in  on  alt-­‐spliced  gene  models  *  on  chr1   •  This  is  animated  zooming   47   1.  Click  to  set   zoom  focus  2.  Drag  slider   to  zoom  in     *  
  48. 48. Prac1ce:  Click  move  arrows  to  reposi1on   during  zoom   •  Click  data   display  to  re-­‐ focus  zoom  on   target  loca1on   48  
  49. 49. 49   Prac1ce:  Or  use  move  tool  (hand)  to   reposi1on  during  zoom   •  Click  display  to  focus  zoom  on  target     1.  Select   move  tool   (hand)       2.  Click-­‐drag   to  move  
  50. 50. Prac1ce:  Click-­‐drag  sequence  axis  to  jump-­‐ zoom  to  a  region   2.  Click  number  line   50   3.  Drag   4.  Release   •  Highlighted  region  becomes  new  view   1.  Select   pointer  tool  
  51. 51. Prac1ce:  Jump-­‐zoom  to  gene  model   •  Double-­‐click  label,  space  a  li7le  above  exon  blocks,  or   intron  to  jump-­‐zoom  to  a  gene  model   –  Also  selects  it,  selected  items  outlined  in  red   51   2.  double-­‐click   label  or  intron     1.  Select   pointer  tool  
  52. 52. A{er  jump-­‐zoom,  gene  model  is  selected     •  Arrows  indicate  direc1on  of  transcrip1on   52   Selected  gene   model   outlined  in  red  
  53. 53. Prac1ce:  Gene  model  close-­‐up   •  Use  ver1cal  slider  to  make  gene  models  taller   •  Increase  window  size  to  make  more  room   53   Drag  slider  to  stretch  ver1cally  
  54. 54. Prac1ce:  Interact  with  data  using  pointer.   Select  pointer  (arrow)  in  toolbar     •  Click  intron,  label,  or  region  above  blocks  to  select   whole  gene  model   •  Click  blocks  to  select  parts  of  a  gene  model   •  SHIFT-­‐click  to  mul1-­‐select   •  CLICK-­‐drag  to  select  &  count  everything  in  a  region   •  Selec1on  Info,  top  right,  reports  counts   –  "i"  bu7on  shows  info  if  one  item  selected     54  
  55. 55. Prac1ce:  View  edge  Matching   •  Edges  that  match  selected  item  edges  are   highlighted  in  red   •  To  change  edge-­‐match  color  choose  File  >   Preferences  >  Other  Op$ons   •  To  turn  off  or  on,  see  View  >  Edge  Matching     55  
  56. 56. Prac1ce:  to  work  with  sequence  data,  click   Load  Sequence   56  •  Sequence  appears  in  Coordinates  track  
  57. 57. Prac1ce:  Zoom  in  to  see  amino  acids   •  Note:  Must  load  genomic  sequence  first   57  
  58. 58. Prac1ce:  Zoom  in  on  end  of  transla1on   •  Click  the  "thick  end"  and  then  zoom  in   •  Note:  Variants  encode  same  C-­‐term  amino  acids   58  
  59. 59. Prac1ce:  Select  genomic  sequence   1.  Choose   pointer  tool   in  toolbar       2.  Click-­‐drag   genomic   sequence  to   select  a  region   3.  CNTRL-­‐click   to  copy   •  Length  of  selected  region  reported  in  Selec$on  Info   box  (top  right)   •  Useful  for  designing  primers,  measuring  regions   59  
  60. 60. Prac1ce:  Right-­‐click  (or  CNTRL-­‐click)  gene  model     •  Shows  op1ons  to  run  a  Web  search,  BLAST  search,   view  sequence   60  
  61. 61. Prac1ce:  Quick  Search   •  Enter  search  text,  select  op1on   •  Jump-­‐zoom  to  selected  gene   61   Choose   At-­‐SR30  
  62. 62. Zoomed  to  At-­‐SR30,  RNA-­‐binding   protein  involved  in  splicing   62  
  63. 63. Looking  ahead  to  Workshop  3   •  Some  genes  that  were  highly  expressed  in   tomato  pollen  are  annotated  as  "Unknown"   proteins  &  have  no  counterpart  in  Arabidopsis.   •  You  can  use  IGB  to  quickly  find  those  genes   and  then  run  BLASTX  or  BLASTP  searches  at   NCBI  to  find  out...   – Are  they  unique  to  tomato?   – Could  they  be  non-­‐coding?     63  
  64. 64. Prac1ce:  Open  files  from  align.sh! •  Zoom  out  to  show  more  of  At-­‐SR30  region   •  Choose  File  >  Open   – Select  "accepted_hits.bam"  &   "junctions.bed"     •  A  new  empty  track  appears  for  each  file   •  Click  Load  Data  to  load  reads  and  junc1ons   64  
  65. 65. 65   read  alignments  stack     reads  at  top  of  stack   not  being  shown  (too   many  to  fit)  
  66. 66. 66   junc1on  features,   summarizing   spliced  reads   junc1on  features,   summarizing   spliced  reads  
  67. 67. Prac1ce:  Configure  view  -­‐  Load   Sequence   67   Click  Load   Sequence  to   load  genomic   bases  for  this   region    
  68. 68. Prac1ce:  Configure  view  -­‐  Lock  mRNA  track  height   68   1.  Click  TAIR10  mRNA   track  label  to  select  it   2.  Open   Annota$on  tab   3.  Select  Lock  Track   Height,  enter  170,  click   Apply  
  69. 69. Prac1ce:  Configure  view  -­‐  configure  junc1on  track   69   1.  Click  junc$ons   track  label  to  select   junc1ons  track   2.  Open   Annota$on  tab   3.  Select   score  in  Label   Field     4.  Select  +/-­‐   in  Strand  
  70. 70. Prac1ce:  Configure  view  -­‐  lock  junc1on  track  height   70   1.  Click  junc$ons   track  label  to   select  it   2.  Open   Annota$on  tab   3.  Select  Lock  Track  Height,   enter  120,  click  Apply  
  71. 71. Prac1ce:  Change  read  stack  height  to  see  more  reads   1.   CNTRL-­‐click  (or  right-­‐click)  accepted_hits.bam   track  label   2.  Choose  Set  Stack  Height...   71  
  72. 72. Prac1ce:  Change  read  stack  height     3.  Enter  50     72   Prac1ce:  Change  read  stack  height  to  see  more  reads  
  73. 73. Prac1ce:  Set  mRNA  stack  height     2.  Enter  3  -­‐     tallest  stack   has  3  models     73   Note:  Tabs  are  minimized  to  make  more  space   1.  Right-­‐click   TAIR10  mRNA   track  label,   choose  Set   Stack  Height  
  74. 74. Prac1ce:  Note  read  support  for   alterna1ve  splicing   Take-­‐home:  Many  spliced   reads  support  both   variants,  but  there  are  also   many  reads  inside  the   introns,  indica1ng  failure  to   splice.  This  may  be  typical   of  alt-­‐spliced  introns?   74  
  75. 75. Prac1ce:  Use  junc1on  track  to   quan1fy  support  for  splice  variants   1.  Click-­‐drag  to  genes  track   2.  Scores  are  number  of   spliced  reads  suppor1ng   each  junc1on.   75  
  76. 76. Prac1ce:  Compare  Cufflinks  GTF  file  to   Gene  models     •  Open  Alignments  >  cufflinks_cold  >   transcripts.gf   76  
  77. 77. Prac1ce:  View  Cufflinks  gene   models   77   1.  Click  Load   Data  to  see   Cufflinks   models   2.  Click-­‐drag   new  track   next  to  gene   models   3.  Use   ver$cal  slider   to  make  more   room   Take-­‐home:   Cufflinks   annota1ons   close,  but   incomplete.      
  78. 78. Prac1ce:  Load  data  from  Galaxy   78   1.  Go  to  usegalaxy.org   2.  Open  Shared  Data   3.  Choose   Published   Histories  
  79. 79. Prac1ce:  Load  data  from  Galaxy   79   1.  Search  for  Cold   3.  Select  Cold   stress  in   Arabidopsis  (with   default  maximum   intron  size)    
  80. 80. Prac1ce:  Load  data  from  Galaxy   •  Illustrates  results  when  tophat  is  run  with  default  seongs:   –  default  maximum  intron  size  is  500,000  bases   •  Tophat  was  developed  with  human  data  in  mind,  where   large  introns  are  common   80   Select   Import   History    
  81. 81. Prac1ce:  Select  start  using  this  history   81  
  82. 82. 82   1.  Select  Treatment  junc1ons       2.  Select  display  in  IGB  View    
  83. 83. 83   New  tab  opens.  Select   Click  to  go  to  IGB    
  84. 84. 84   New  track   1.  Click   Load  Data  
  85. 85. Prac1ce:  Remove  reads  -­‐  don't  need  them  now   85   1.  Right-­‐click   accepted_hits.bam   2.  Choose  Delete  Track  
  86. 86. 86   1.  Zoom  out   all  the  way   2.  Click  Load   Data   Your  data  are  here  
  87. 87. 87   Take-­‐home:  Tophat  run   with  default  parameters   predicts  enormous   introns.  Important  to   understand  parameters   seongs  -­‐-­‐  defaults  are   not  always  best.  
  88. 88. Now  you  can   •  Describe  Illumina  library  synthesis,  sequencing   •  Evaluate  data  quality  using  FastQC   •  Run  a  data  processing  pipeline  (shell  script)   •  View  and  explore  data  in  a  genome  browser   – and  load  data  sets  from  Galaxy,  local  files   88   Thank  you  for  your  a7en1on!  

×