Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gene expression introduction


Published on

Microarray as one of recent biomedical technologies produce high dimensional data. This makes statistical analysis become challenging. I presented an overview of microarray analysis specifically in the use of gene expression profiling in a discussion.

Published in: Education, Technology
  • Be the first to comment

Gene expression introduction

  1. 1. Analysis of Gene ExpressionAn overviewSetia Pramana
  2. 2. Outline•  Biological  background   –  Central  Dogma   –  DNA     –  Genes  •  Genomics  •  Microarrays  •  Gene  Expression  data  analysis  pipeline  •  What’s  next  ??   Gene  expression  analysis  
  3. 3. Central Dogma Gene  expression  analysis  
  4. 4. DeoxyriboNucleic Acid (DNA)•  DNA  is  the  organic  molecule  that  carries  the  informaBon   used  by  a  cell  to  build  the  proteins  that  carry  out  most  of   the  biological  processes  in  a  cell.  •  Double  helix  •  Pair:  G  ≡  C,A  =  T    •  Example  sequence:              ATGCTGATCGATGCAGAATCGATC   wikipedia•  Length  of  human  DNA  is  about                3  ×  109  base  pair  (bp)  •  Between  us,  DNA    99.9  %  the  same,  •  Our  DNA  99  %  the  same  chimpanzees.  analysis   Gene  expression      
  5. 5. Gene•  The  full  DNA  sequence  of  an  organism  is  called  its   genome  •  A  segment  that  specifies  the  sequence  of  a  protein.  •  Length:  1000-­‐3000  bases    •  Approximately  around  20,000  -­‐25,000  genes     Gene  expression  analysis  h(p://www.dna-­‐sequencing-­‐­‐sequencing/gene-­‐dna/
  6. 6. Genetic Code•  NucleoBde  sequence  of  a  mRNA  is  translated  into  the   amino  acid  sequence  of  the  corresponding  protein.   Gene  expression  analysis                                                                hp://  
  7. 7. Genomics•  Genomics  is  the  study  of  all  the  genes  of  a  cell,  or  Bssue,   at  :   –  the  DNA  (genotype),  e.g.,  GWAS  SNP,  CNV  etc…   –  mRNA  (transcriptomics),    Gene  expression,   –  or  protein  levels  (proteomics).  •  FuncBonal  Genomics:  study  of  the  funcBonality  of   specific  genes,  their  relaBons  to  diseases,  their   associated  proteins  and  their  parBcipaBon  in  biological   processes.       Gene  expression  analysis    
  8. 8. Gene Expression•  Different  Bssues  in  the  same  human  may  express  different  genes,   according  to  their  role  in  the  human  body.  •  The  same  cell  may  express  different  genes  under  different  circumstances   (stress,  nutriBon,  etc.).  •  Cells  express  different  genes  during  lifeBme  (for  instance,  embryonic  gene   expression  differs  from  adult  gene  expression).  •  Technologies  for  measuring  mRNA  assume:   –  The  level  of  mRNA  in  the  cell  is  an  indicaBon  of  the  protein  level  in  the   cell,  since  the  major  regularity  is  on  the  subscripBon  process,  and  not   the  transcripBon  process.   –  Genes  are  expressed  only  when  needed.   Gene  expression  analysis  
  9. 9. Microarrays Gene  expression  analysis  
  10. 10. Microarray Technologies•  Two  type  of  microarray  technologies:     –  Single  channel     –  Dual  channel    •  Plaforms:     –  Affymetrix,     –  Illumina,     –  Agilent   Gene  expression  analysis  
  11. 11. Microarrays Applications•  Gene  expression  profiling  (our  focus)  •  SNP  arrays  for  studying  single  nucleoBde  polymorphisms  (SNP)  and  copy   number  variaBons  (CNV)  such  as  deleBons  or  inserBons.  •  Etc:     –  ChIP  on  chip  for  invesBgaBng  protein  binding  site  occupancy,   –  Exon  arrays  to  search  for  alternaBve  splicing  events   –  Tiling  arrays  for  idenBfying  novel  transcripts  that  are  either  coding  or   non-­‐coding.   Gene  expression  analysis  
  12. 12. Microarrays Applications: MammaPrint•  MammaPrint-­‐  test,  can  determine  the  likelihood  of  breast  cancer   returning  within  10  years  aher  treatment.  •  First  FDA-­‐approved  molecular  test  that  is  based  on  microarray  technology.  •  Predict  whether  exisBng  cancer  will  metastasize.    •  InvesBgate  the  paerns  and  behavior  of  large  numbers  of  genes.    •  The  recurrence  of  cancer  is  partly  dependent  on  the  acBvaBon  and   suppression  of  certain  genes  located  in  the  tumor.  •  MammaPrint  can  measure  the  acBvity  of  those  genes,  then  it  can  predict     paBents’  odds  of  the  cancer  spreading.   Gene  expression  analysis  
  13. 13. The Pipeline•  Experiment  design  à  Lab  work  à  Image  processing      •  à  Background  correcBon  •  à  NormalizaBon    •  à  Signal  summarizaBon  (GCRMA,  FARMS)  (for  affymetrix  plaform)  •  à  Data  Analysis:     –  DifferenBally  Expressed  genes   –  Clustering   –  ClassificaBon   –  Etc.  •  à  Network  /  Pathways    analysis  (GSEA  etc..)    •  à  Biological  interpretaBons   Gene  expression  analysis  
  14. 14. Image Processing Gene  expression  analysis  
  15. 15. Log2 Intensity•  Response:  log2  Intensity  …….    why?  •  StaBsBcs:  Log-­‐transforming  the  data  makes  the  intensity  distribuBon  more   symmetric  and  bell-­‐shaped,  i.e.,  a  normal  distribuBon  •  Biology:  The  biological  processes  in  whole  individuals  presumably  act  in  a   mulBplicaBve  way.  Log-­‐transformaBon  exactly  makes  the  intensiBes  and   the  expression  levels  behave  in  a  mulBplicaBve  way.   Gene  expression  analysis  
  16. 16. Normalization•  Process  to  remove  systemaBc  errors  which  can  cause   considerable  biases.    •  SystemaBc  errors  are  due  to:   –  Different  incorporaBon  efficiencies  of  dyes.     –  Different  amounts  of  mRNA  in  the  tested  sample,   causing  different  expression  levels.   –  Difference  in  experimenter  or  protocol  (if  data  were   gathered  in  different  labs).   –  Different  scanning  parameters   –  Differences  between  chips  created  in  different   producBon  batches.  •  Example:  QGene  expression  analysis   uanBle  normalizaBon  
  17. 17. NormalizationGene  expression  analysis  
  18. 18. Microarrays, Data structure Gene  expression  analysis  
  19. 19. Microrrays, Applications•  IdenBfy  diseases  related  genes    •  ClassificaBon,  example  Mamaprint    •  Cluster  genes  •  Clusters  the  samples  (disease  stages,  Bssues)  :  class   discovery  •  Clusters  genes  and  samples    •  Pharmacogenomics:   –  Personalized  medicine:  individualize  therapies   –  Target  based  medicine:  More  effecBve  but  less  side   effect  dGene  expression  analysis   rugs.    
  20. 20. Data Analysis Challenges•  The  curse  of  high-­‐dimensionality:  •  Obstacle  in  the  soluBon  of  classificaBon  and  clustering  problems  •  Problem  of  mulBple  tesBng  problem:  the  problem  of  having  an  increased   number  of  false  posiBve  results  because  the  same  hypothesis  is  tested   mulBple  Bmes.  •  MulBple  tesBng  correcBon:     –  FWER:  Bonferroni,  Holm.     –  FDR:  BH,  BY   Gene  expression  analysis  
  21. 21. Identification of Differential Genes•  Discover  genes  with  different  expression  in  two  or  more  different  Bssues/ condiBons.  •  Fold  change  •  t-­‐type  test:   –  t-­‐  test   –  Modified  t-­‐test:  Significance      Analyss  of  Microarray  (SAM),                  t  -­‐  LIMMA  •  Linear  Models  for  Microarray  Data    (LIMMA)   Gene  expression  analysis  
  22. 22. Clustering•  Clustering  genes  or  condiBons  or  both.  •  Deducing  funcBons  of  unknown  genes  from  known  genes  with  similar   expression  paerns.  •  IdenBfying  disease  profiles  -­‐  Bssues  with  similar  pathology  should  yield   similar  expression  profiles.    •  Co-­‐expression  of  genes  may  imply  co-­‐regulaBon.    •  ClassificaBon  of  biological  condiBons.    •  Drug  development     Gene  expression  analysis  
  23. 23. ClusteringStatistical Methods: Hierarchicalclustering, K-means, CLICK (CLusterIdentification via Connectivity Kernels),Biclustering, etc.More: Gene  expression  analysis  
  24. 24. Classification•  Classification of tumor malignancies into known classes : supervised learning;•  Identification of marker genes that characterize the different tumor classes: feature selection.Genes distinguishing ALL from AML (twotypes of leukemia). Gene  expression  analysis  
  25. 25. Classification•  Methods:   –  Discriminant  analysis  :  LDA,  K  nearest  neighbor.   –  ClassificaBon  Tree   –  LogisBc  regression,  penalized  LR:  LASSO.   –  Neural  network   –  Support  vector  machines  (SVM)   –  Random  forest,  etc…..   A  survey  of  these  methods:   hp://   hp:// dudoit.pdf   Gene  expression  analysis    
  26. 26. Pathways Analysis•  We  discover  DE  genes,   whats  next?  •  IdenBfy  which  pathways   (e,g,.  GO  KEGG)  terms  are   most  commonly  associated   with  the    DE  genes.  •  Methods:  GEA,  GSEA,  NEA,   etc.   Gene  expression  analysis  
  27. 27. What’s next•  Next-­‐generaBon  sequencing   +  No  need  to  know  the  sequence  of  the  transcript.   +  There  are  no  arBfacts  due  to  cross-­‐hybridizaBon   +  Beer  quanBtaBon  of  low  abundance  transcripts.   -­‐  New  data  types  and  huge  data  volumes.   -­‐  Quality  •  EpigeneBcs   –  The  study  of  heritable  changes  in  genome  funcBon   that  occur  without  a  change  in  DNA  sequence  ( hp://,1,0  ).     –  DNA  methylaBon   Gene  expression  analysis  
  28. 28. Reference•  Gohlmann,,  H.  and  Talloen,  W,  Gene  Expression  Studies  Using  Affymetrix   Microarrays,  Chapman  &  Hall/CRC  MathemaBcal  &  ComputaBonal   Biology,  2009.  •  hp://    Other  useful  books:  •  Gentleman  R,  Carey  V,  Huber  W,  Irizarry  R,  Dudoit  S,  editors:   BioinformaBcs  and  computaBonal  biology  soluBons  using  R  and   Bioconductor  .  Springer  Science,  New  York,  2005.  •  Amaratunga  D,  Cabrera  J:  ExploraBon  and  Analysis  of  DNA  Microarray  and   Protein  Array  Data.  Wiley-­‐Interscience,  2004.   Gene  expression  analysis