ppt

Gene Interaction Analysis Using k-way Interaction Loglinear Model: A Case Study on Yeast Data Xintao Wu UNC Charlotte Daniel Barbara George Mason Univ. Liying Zhang Memorial Sloan Kettering Cancer Center Yong Ye UNC Charlotte

Microarray data ,[object Object],[object Object],[object Object],[object Object],[object Object],0-1 continuous data 10^6-10^9 transactions 10^1-10^2 samples column 10^3-10^4 items 10^3-10^4 genes row Market basket data microarray

Background -- Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Background — Interaction analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Background -- Association Rule ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Customer buys Y Customer buys both Customer buys X

Criticism to Support and Confidence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],Criticism to Support and Confidence

Criticism to lift ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Criticism to lift ,[object Object],Predicted count of all-two-factor model based on two-way distributions Shrinkage estimates, (or we can use raw count) an estimate of the number of transactions containing the item set over and above those that can be explained by the pairwise associations of the items

Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Saturated log-linear model main effect 1-factor effect 2-factor effect which shows the dependency within the distributions of A,B.

Computing -term ,[object Object],[object Object],Loglinear parameters sum to 0 over all indices

k-way loglinear model ,[object Object],Independence model pairwise model 3-way model

Our Method ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Preprocessing ,[object Object],[object Object],[object Object],[object Object],0.25 0.5 -0.2 s6 0.30 0.08 0.8 s5 -0.25 0.3 0.15 s4 0.28 0.13 0.3 s3 0.5 0.1 0.6 s2 -0.24 0.1 0.23 s1 C B A 1 1 1 s6 1 1 s5 1 1 s4 1 1 s3 1 1 S2 1 1 S1 C C B B A A

Contingency table ,[object Object],[object Object],[object Object],Frequent set 54 19 0 3 15 0 0 0 0 C 1 0 0 0 11 1 0 0 0 C 0 0 0 0 0 0 0 0 0 C D 2 1 0 2 7 0 1 0 0 C 0 1 0 7 130 4 0 0 0 C 0 0 0 0 7 3 0 0 1 C D 0 0 0 0 0 0 0 0 0 C 0 0 0 0 7 0 0 0 0 C 0 0 0 0 9 4 0 5 5 C D A A A A A A A A A B B B

Examine residues ,[object Object],[object Object]

Experimental Results ,[object Object],[object Object],[object Object],[object Object],The size of frequent item sets from Apriori The size of item sets which can not be interpreted by k-way model 1 4 8 8 20 8 19 39 39 18 691 852 1084 1134 15 1931 2253 2500 2735 14 Support(%)

Experimental Results The frequencies and estimates from all k-way interactions ORF naming 28 24 0 54 YGL117W,YER175C,YMR096W,YMR095C 32 17 0 56 YJR109C,YMR094W,YMR096W,YMR095C 23 15 0 54 YJR109C, YGL117W, YMR096W,YMR095C 26 15 0 56 YHR029C ,YMR094W,YMR096W,YMR095C 3-way 2-way 1-way Frequency Gene Set Whether the open reading frame is on the Watson or Crick strand W or C C the order of the open reading frame on the chromosome arm, starting from the centromere and counting out to the telomere 3-digit 029 for the left or right arm L or R R for the chromosome upon which the ORF resides (16) A-P H for Yeast Y Y

Experimental results ,[object Object],[object Object],[object Object],[object Object]

Lattice for -term of saturated model (2-category case) All (4.560) A (0.284) B (1.407) C (1.493) D (-0.144) AB (-0.044) AC (0.681) AD (-0.006) BC (-0.765) BD (-0.296) CD (0.245) ABC (0.233) ABD (-0.185) ACD (-0.118) BCD (-0.093) ABCD (0.038) ,[object Object],[object Object]

Two-category vs. multi-category ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Preprocessing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Interaction Modeling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Graphical gaussian modeling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Loglinear modeling ,[object Object]

ppt

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (8)

Similar to ppt

Similar to ppt (20)

More from butest

More from butest (20)

ppt