SlideShare a Scribd company logo
Computational Biology Laboratory Chuan Yi Tang CS Department, NTHU [email_address]
Our Aims ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
Coregulated genes Gene 1 Gene 2 Gene 3 Transcription factor atgaccgggatactgattaat a caa g gt tgggtataatggagtacgataa attgaga t caa t gt acggcgggtgctctcccgattggaag a caa c gt ggg gcaatcgggatc a caa c gt agaattggatgtcaaaataatggagtggcac gtcaatcgaaaaaacggtggtgagc g caa a gt aaagggattggaccgctt S1 S2 S3 S4
SP 1 5 0 0 0 9 0 0 0 g 4 9 9 6 0 9 9 4 c 0 0 0 3 0 0 0 5 t 0 0 0 0 0 0 0 0 a 8 7 6 5 4 3 2 1
IUPAC code Sp1 binding  site  Y CCG Y CC S
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A RR TT YYRS A  high motif degeneracy , weak motif AAGTT YYR CA  low motif degeneracy ,  strong motif
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],METHODS atgaccgggatactgattaat a caa g gt tgggtataatggagtacgataa attgaga t caa t gt acggcgggtgctctcccgattggaag a caa c gt ggg gcaatcgggatc a caa c gt agaattggatgtcaaaataatggagtggcac gtcaatcgaaaaaacggtggtgagc g caa a gt aaagggattggaccgctt S1 S2 S3 S4
e.g.  l =3,  d =1  k =4 W ij  =  ATA All possible set of degenerate  positions : {P1, p2,p3} _ TA, A _ A, AT _ For each possible set  X  = { p 1, …,  pd } of degenerate positions, all  Wpq  with  V ( Wij ,  Wpq )     X  are collected.  K=4  K=5  K=2  _TA ATA (S1) CTA (S2) ATA (S3) CTA (S3) TTA (S4) A_A ATA (S1) ATA (S2) ATA (S3) ACA (S4) AAA (S4) ACA (S5) AT_ ATC (S2) ATT (S3) ATA (S3) ATA(S3) AAA(S3)
Background letter  probabilities  are  P A  = 0.22,  P T  = 0.22  P C  = 0.28, and  P G  = 0.28.  A negative ( p ,  q )-entry means that the letter  p  at position  q  is weakly conserved  in  G ( Wij  |  X ).  L pq  = log[(observed probability of  p  at position  q  in  G ( W ij  |  X ))  /  P p ]  Pseudo occurrence elimination
Motif scoring methods  s 1  = (    Lij  /  pj  ) /  l ,  This fact is used to measure the conservation and the significance of  each reported motif.  (1.51+1.51+1.51+1.51+(0.31+0.31)/2+1.51+(0.31+0.82)/2)
The measure used for comparison is the performance coefficient  | K      P | / | K      P |. (Pevzner P. A. and Sze, S. H. (2000) Combinatorial approaches to finding subtle signals in DNA sequences. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), 269-278.) K  is the set of positions of the known motif occurrences in the input sequences. P  is the set of predicted positions.  The best performance coefficients among the  top ten motifs found by these tools  are compared.  Evaluation of performance on synthetic data atgaccgggatactgattaat a caa g gt tgggtataatggagtacgataa attgaga t caa t gt acggcgggtgctctcccgattggaag a caa c gt ggg gcaatcgggatc a caa c gt agaattggatgtcaaccaaagtggagtggcac Red words  the set of positions of the known motif occurrences ( K ) the set of predicted positions ( P )  | K      P | = 21  | K      P | = 35  | K      P | / | K      P |= 21/35 = 0.6 S1 S2 S3
Evaluation of performance on synthetic data
MotifSeeker Specificity :  | K      P | / | P |   false positive Sensitivity : | K      P | / | K |   false negative
The best performance coefficient among the  top ten motifs selected.
Evaluation of performance on tissue-specific regulatory elements ,[object Object],[object Object],[object Object]
 
Reference ,[object Object],[object Object],[object Object],[object Object]
臺灣土雞在育種上所面臨的問題 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
雞群育種 育種計劃 篩選 基因型 表現型
利用血清蛋白質當作篩選標誌 ,[object Object],[object Object],[object Object],[object Object],[object Object]
研究蛋白質體當做雞群的篩選 ,[object Object],[object Object],[object Object]
禽類產蛋之生物路徑分析 科學農業  (2004), 10 月 ,[object Object],[object Object],[object Object]
 
Serum protein marker ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Exp I   Exp II
Stage selection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Exp (I) Fig. 1. Egg production rate of TRFCC (n=157).   (A) Total egg number of all hens, (B) hens in four groups (A) (B)
Fig. 3. Association of relative protein levels with total egg number.   (A) Vitellogenin (B) Apo A-I
(C) Ovotransferrin (D) X protein
 
Exp II.  篩選策略 ,[object Object],[object Object],[object Object],[object Object],[object Object]
Fig. 1. Egg production rate of batch A (n=77) and batch B (n=78) of TRFCC.
Code-selection ,[object Object],Score rank score rank Batch A Batch B Transformation Regional codes code
Code-selection Step 1: selection 20% of low egg number of birds in batch B of TRFCC
Step 2: Transform codes in batch A of birds
結論 ,[object Object],[object Object],[object Object],[object Object]
致謝  參與土雞計劃之合作及研究人員 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
刀鋒式伺服器在尖端科學計算 領域的研發 ( 廣達產學 ) 子計畫二 :  建置叢集計算技術於理論物理及生物資訊的環境   國家實驗研究院 :  莊哲男院長 國家高速網路與計算中心 :  張西亞博士  國家理論科學研究中心 :  張圖南主任 清華大學資訊工程學系 :  唐傳義教授
Performance Comparison between IB and GE on Quanta Blade Server ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Quanta Blade Server
生物資訊相關應用的研發 (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
生物資訊相關應用的研發 (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
方法的研發 (3) ,[object Object],[object Object],[object Object],[object Object]
第二年的研究計畫 (2006/11~2007/7) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
實驗室未來導向 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
核醫影像銀行的病史探勘及其在癌症診斷上的應用 唐傳義 閻紫宸 ( 長庚核醫科主任 ) 王速貞 (FDA USA)
背景 ,[object Object],[object Object],[object Object],[object Object]
那些是有價值的資訊 ,[object Object],[object Object],[object Object]
鼻咽癌 ( Nasopharyngenl Carcinoma , NPC )   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Genome-wide Interpretation: Informatics of Immune Responses -The Concept of Immunometer 林口長庚紀念醫院  內科部 感染醫學科 黃景泰醫師 Ching-Tai Huang, M.D., Ph.D. Infectious Diseases, Medicine Chang Gung Memorial Hospital
自體抗原 腫瘤 傳染性微生物 環境抗原 Immune Tolerance & Immune Activation - Balance between Physiology & Pathology Tolerance Activation 移植器官
Transgenic Mouse Model -Adoptive Transfer System Recipients HA expressing Transgenic Mice Pooled splenocytes &  lymph node cells C3-HA Low Donors HA specific TCR Transgenic Mice a) CD4 + : 6.5 (I-E d  HA 110-120  ) b) CD8 + : clone 4 (K d  HA 518-526 ) C3-HA High Non-Tg
Immune Tolerance & Immune Activation -in CD4+ T Cells Tolerance Memory Anergic/Regulatory Activated/Memory Naive
Immune Tolerance & Immune Activation -Dynamic genomic approach (With Affymetrix Gene Chips) Day 2 Day 3 Day 4 Naive Tolerance Memory Anergic/Regulatory Activated/Memory RNA RNA RNA RNA RNA RNA RNA
 
Our Aims ,[object Object],[object Object],[object Object]
NF- κ B pathway model
NFκB IKK NF- κ B
NFKB ICAM1
 

More Related Content

Similar to Introduction to CSBB Lab

New Arrays Compiled 60909 中文
New Arrays Compiled 60909 中文New Arrays Compiled 60909 中文
New Arrays Compiled 60909 中文
Yin殷 Bruce彦
 
微觀之開放資料 Geo
微觀之開放資料  Geo微觀之開放資料  Geo
微觀之開放資料 Geo
KAMERA11901
 

Similar to Introduction to CSBB Lab (11)

big data analysis gene detection using random forests-rotated
big data analysis gene detection using random forests-rotatedbig data analysis gene detection using random forests-rotated
big data analysis gene detection using random forests-rotated
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Sci ppt
Sci pptSci ppt
Sci ppt
 
Sbc
SbcSbc
Sbc
 
CRISPR/ Cas9||鸡3种组织中热应激相关基因的表达谱芯片分析
CRISPR/ Cas9||鸡3种组织中热应激相关基因的表达谱芯片分析CRISPR/ Cas9||鸡3种组织中热应激相关基因的表达谱芯片分析
CRISPR/ Cas9||鸡3种组织中热应激相关基因的表达谱芯片分析
 
資料科學於預防醫學之應用
資料科學於預防醫學之應用資料科學於預防醫學之應用
資料科學於預防醫學之應用
 
New Arrays Compiled 60909 中文
New Arrays Compiled 60909 中文New Arrays Compiled 60909 中文
New Arrays Compiled 60909 中文
 
微觀之開放資料 Geo
微觀之開放資料  Geo微觀之開放資料  Geo
微觀之開放資料 Geo
 
临床科研设计和Sci论文策略 Rna干扰技术的应用
临床科研设计和Sci论文策略 Rna干扰技术的应用临床科研设计和Sci论文策略 Rna干扰技术的应用
临床科研设计和Sci论文策略 Rna干扰技术的应用
 
iNEXT: an r package for interpolation and extrapolation species diversity
iNEXT: an r package for interpolation and extrapolation species diversityiNEXT: an r package for interpolation and extrapolation species diversity
iNEXT: an r package for interpolation and extrapolation species diversity
 
1
11
1
 

More from Abner Huang (7)

2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 

Introduction to CSBB Lab

  • 1. Computational Biology Laboratory Chuan Yi Tang CS Department, NTHU [email_address]
  • 2.
  • 3.
  • 4. Coregulated genes Gene 1 Gene 2 Gene 3 Transcription factor atgaccgggatactgattaat a caa g gt tgggtataatggagtacgataa attgaga t caa t gt acggcgggtgctctcccgattggaag a caa c gt ggg gcaatcgggatc a caa c gt agaattggatgtcaaaataatggagtggcac gtcaatcgaaaaaacggtggtgagc g caa a gt aaagggattggaccgctt S1 S2 S3 S4
  • 5. SP 1 5 0 0 0 9 0 0 0 g 4 9 9 6 0 9 9 4 c 0 0 0 3 0 0 0 5 t 0 0 0 0 0 0 0 0 a 8 7 6 5 4 3 2 1
  • 6. IUPAC code Sp1 binding site Y CCG Y CC S
  • 7.
  • 8.
  • 9. e.g. l =3, d =1 k =4 W ij = ATA All possible set of degenerate positions : {P1, p2,p3} _ TA, A _ A, AT _ For each possible set X = { p 1, …, pd } of degenerate positions, all Wpq with V ( Wij , Wpq )  X are collected. K=4 K=5 K=2 _TA ATA (S1) CTA (S2) ATA (S3) CTA (S3) TTA (S4) A_A ATA (S1) ATA (S2) ATA (S3) ACA (S4) AAA (S4) ACA (S5) AT_ ATC (S2) ATT (S3) ATA (S3) ATA(S3) AAA(S3)
  • 10. Background letter probabilities are P A = 0.22, P T = 0.22 P C = 0.28, and P G = 0.28. A negative ( p , q )-entry means that the letter p at position q is weakly conserved in G ( Wij | X ). L pq = log[(observed probability of p at position q in G ( W ij | X )) / P p ] Pseudo occurrence elimination
  • 11. Motif scoring methods s 1 = (  Lij / pj ) / l , This fact is used to measure the conservation and the significance of each reported motif. (1.51+1.51+1.51+1.51+(0.31+0.31)/2+1.51+(0.31+0.82)/2)
  • 12. The measure used for comparison is the performance coefficient | K  P | / | K  P |. (Pevzner P. A. and Sze, S. H. (2000) Combinatorial approaches to finding subtle signals in DNA sequences. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), 269-278.) K is the set of positions of the known motif occurrences in the input sequences. P is the set of predicted positions. The best performance coefficients among the top ten motifs found by these tools are compared. Evaluation of performance on synthetic data atgaccgggatactgattaat a caa g gt tgggtataatggagtacgataa attgaga t caa t gt acggcgggtgctctcccgattggaag a caa c gt ggg gcaatcgggatc a caa c gt agaattggatgtcaaccaaagtggagtggcac Red words the set of positions of the known motif occurrences ( K ) the set of predicted positions ( P ) | K  P | = 21 | K  P | = 35 | K  P | / | K  P |= 21/35 = 0.6 S1 S2 S3
  • 13. Evaluation of performance on synthetic data
  • 14. MotifSeeker Specificity : | K  P | / | P | false positive Sensitivity : | K  P | / | K | false negative
  • 15. The best performance coefficient among the top ten motifs selected.
  • 16.
  • 17.  
  • 18.
  • 19.
  • 20. 雞群育種 育種計劃 篩選 基因型 表現型
  • 21.
  • 22.
  • 23.
  • 24.  
  • 25.
  • 26.
  • 27. Exp (I) Fig. 1. Egg production rate of TRFCC (n=157). (A) Total egg number of all hens, (B) hens in four groups (A) (B)
  • 28. Fig. 3. Association of relative protein levels with total egg number. (A) Vitellogenin (B) Apo A-I
  • 30.  
  • 31.
  • 32. Fig. 1. Egg production rate of batch A (n=77) and batch B (n=78) of TRFCC.
  • 33.
  • 34. Code-selection Step 1: selection 20% of low egg number of birds in batch B of TRFCC
  • 35. Step 2: Transform codes in batch A of birds
  • 36.
  • 37.
  • 38. 刀鋒式伺服器在尖端科學計算 領域的研發 ( 廣達產學 ) 子計畫二 : 建置叢集計算技術於理論物理及生物資訊的環境 國家實驗研究院 : 莊哲男院長 國家高速網路與計算中心 : 張西亞博士 國家理論科學研究中心 : 張圖南主任 清華大學資訊工程學系 : 唐傳義教授
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 46.
  • 47.
  • 48.
  • 49. Genome-wide Interpretation: Informatics of Immune Responses -The Concept of Immunometer 林口長庚紀念醫院 內科部 感染醫學科 黃景泰醫師 Ching-Tai Huang, M.D., Ph.D. Infectious Diseases, Medicine Chang Gung Memorial Hospital
  • 50. 自體抗原 腫瘤 傳染性微生物 環境抗原 Immune Tolerance & Immune Activation - Balance between Physiology & Pathology Tolerance Activation 移植器官
  • 51. Transgenic Mouse Model -Adoptive Transfer System Recipients HA expressing Transgenic Mice Pooled splenocytes & lymph node cells C3-HA Low Donors HA specific TCR Transgenic Mice a) CD4 + : 6.5 (I-E d HA 110-120 ) b) CD8 + : clone 4 (K d HA 518-526 ) C3-HA High Non-Tg
  • 52. Immune Tolerance & Immune Activation -in CD4+ T Cells Tolerance Memory Anergic/Regulatory Activated/Memory Naive
  • 53. Immune Tolerance & Immune Activation -Dynamic genomic approach (With Affymetrix Gene Chips) Day 2 Day 3 Day 4 Naive Tolerance Memory Anergic/Regulatory Activated/Memory RNA RNA RNA RNA RNA RNA RNA
  • 54.  
  • 55.
  • 56. NF- κ B pathway model
  • 59.