Data	
  Analy)cs	
  for	
  Personalized	
  
Medicine	
  
Aryya	
  Gangopadhyay	
  
UMBC	
  
Presented	
  at	
  the	
  3rd	...
Scope	
  
•  Big	
  data	
  promise	
  (Pentland	
  et	
  al	
  2013)	
  
–  US	
  Healthcare	
  industry	
  can	
  save	
...
•  “Within	
  10	
  years	
  every	
  healthcare	
  consumer	
  will	
  be	
  surrounded	
  by	
  a	
  virtual	
  
cloud	
...
Interconnec)ons	
  
–  Biological	
  processes	
  are	
  interconnected	
  systems	
  
–  Analyze	
  interac)ons	
  
–  Re...
Extensions	
  to	
  our	
  previous	
  work	
  
–  Updated	
  the	
  network	
  
•  Nodes:	
  5168	
  to	
  9767	
  
•  Ed...
Network	
  extracted	
  from	
  CIDeR:	
  2014	
  
•  Nodes:	
  9767	
  
•  Edges:	
  27744	
  
•  Diameter:	
  15	
  
•  ...
Node	
  Centrality	
  measures:	
  correla)ons	
  
x	
  =	
  Authority	
  
Y	
  =	
  Betweenness	
  Centrality	
  
Correla...
Correla)ons	
  of	
  Node	
  Centrality	
  measures	
  
Clustering.Coefficient	
  
Clustering.Coefficient	
  
Hub	
  
Hub	
  
...
Overall	
  network	
  characteris)cs	
  
•  PageRank,	
  hub	
  and	
  authority	
  scores	
  are	
  strongly	
  
correlat...
Par))oning	
  the	
  graph	
  
•  How	
  can	
  we	
  capture	
  the	
  above	
  characteris)cs?	
  
•  Modularity:	
  	
 ...
Clusters	
  formed	
  by	
  maximizing	
  modularity	
  
Dendrogram	
  of	
  top	
  8	
  Disease	
  Clusters	
  
C	
  
C	
  
Cluster	
  100	
  
Nodes:	
  1177	
  
Edges:	
  2122	
  
Cluster	
  82	
  
Nodes:	
  1200	
  
Edges:	
  2554	
  
K-­‐core	
  
•  Objec7ve:	
  Restrict	
  analysis	
  to	
  regions	
  of	
  increased	
  
centrality	
  and	
  connectedne...
5-­‐core	
  graph:	
  color	
  code-­‐Type	
  
5-­‐core	
  graph:	
  color-­‐code:	
  Modularity	
  class	
  
Disease	
  Clusters	
  (top	
  5)	
  dendrogram	
  
C	
  
C	
  
5-­‐core	
  graph:	
  Cluster	
  5	
  (26%)	
  
5-­‐core	
  graph:	
  Cluster	
  6	
  (22%)	
  
5-­‐core	
  graph:	
  Cluster	
  0	
  (16%)	
  
5-­‐core	
  graph:	
  Cluster	
  3	
  (13%)	
  
5-­‐core	
  graph:	
  Cluster	
  4	
  (12.5%)	
  
Comparison	
  of	
  clusters	
  
•  Contribu7ng	
  areas	
  
•  Biology,	
  bioinforma)cs,	
  sociology,	
  SNA,	
  Physics,	
  applied	
  
mathema)cs,	
  ...
Data	Analytics for Personalized Medicine by Aryya Gangopadhyay, PhD
Upcoming SlideShare
Loading in …5
×

Data Analytics for Personalized Medicine by Aryya Gangopadhyay, PhD

399 views

Published on

Presented at the 3rd International Conference on Personalized Medicine, June 26-29, 2014. Dr. Gangopadhyay is Chair of the Department of Information Systems at University of Maryland Baltimore County.

Published in: Healthcare, Technology
  • Be the first to comment

  • Be the first to like this

Data Analytics for Personalized Medicine by Aryya Gangopadhyay, PhD

  1. 1. Data  Analy)cs  for  Personalized   Medicine   Aryya  Gangopadhyay   UMBC   Presented  at  the  3rd  Interna7onal   Conference  on  Personalized  Medicine,   June  26-­‐29,  2014  
  2. 2. Scope   •  Big  data  promise  (Pentland  et  al  2013)   –  US  Healthcare  industry  can  save  $200  billion  per  year   •  Need  complete  picture   –  Reality  mining  (MIT  Tech.  Review  2008)   –  Socio-­‐demographics   –  EMRs   –  Biological  data   •  Interac7ons  in  the  network   –  Topology-­‐based  analysis   –  Centrality-­‐based  analysis   –  Perturba)ons  (diseases  as  network  perturba)ons:  del  Sol  et  al   2010)     •  Network  par77oning   •  Visualiza7on  
  3. 3. •  “Within  10  years  every  healthcare  consumer  will  be  surrounded  by  a  virtual   cloud  of  billions  of  data  points”  [Hood  et  al.  2013]     Big  data  in  healthcare  
  4. 4. Interconnec)ons   –  Biological  processes  are  interconnected  systems   –  Analyze  interac)ons   –  Resilient  against  random            perturba)ons   –  Vulnerable  to  targeted            aXacks   CIDeR:  Large,  mul7-­‐dimensional,   mul7modal,  dynamic  
  5. 5. Extensions  to  our  previous  work   –  Updated  the  network   •  Nodes:  5168  to  9767   •  Edges:  14410  to  27744   –  Previous  analysis   •  Network  characteris)cs:  CC,  diameter,  path  lengths,  etc.   •  Node-­‐based  analysis   – Developed  a  new  method  for  iden)fying  effectors  and   receptors   •  Perturba)on  analysis   – Extensions   •  How  do  we  par))on  the  network?   •  What  criteria  to  use  and  why?   •  What  are  the  effects  of  such  par))oning?  
  6. 6. Network  extracted  from  CIDeR:  2014   •  Nodes:  9767   •  Edges:  27744   •  Diameter:  15   •  #  CC:  89   •  Avg.  PL:  4.7   •  Avg.  degree:  2.8  
  7. 7. Node  Centrality  measures:  correla)ons   x  =  Authority   Y  =  Betweenness  Centrality   Correla)on:  0.8   x  =  Clustering  Coefficient   Y  =  Betweenness  Centrality   Correla)on:  -­‐0.02   x  =  Hub   Y  =  Authority   Correla)on:  0.88   x  =  PageRank   Y  =  Authority   Correla)on:  0.92  
  8. 8. Correla)ons  of  Node  Centrality  measures   Clustering.Coefficient   Clustering.Coefficient   Hub   Hub   Authority   Authority   PageRank   PageRank   Eigenvector.Centrality   Eigenvector.   Centrality   Betweenness.Centrality   Betweenness.   Centrality   Eccentricity   Eccentricity  
  9. 9. Overall  network  characteris)cs   •  PageRank,  hub  and  authority  scores  are  strongly   correlated   •  Clustering  coefficient  is  nega)vely  correlated  with   other  node  centrality  measures   •  Implica7ons:   1.  Nodes  that  are  strong  effectors  are  also  strong  receptors   2.  Less  central  nodes  are  not  connected  to  each  other  but   mainly  with  an  influen)al  node   3.  Influen7al  nodes  are  mostly  connected  to  each  other   4.  Fully  connected  sub-­‐graphs  are  small  and  rare  
  10. 10. Par))oning  the  graph   •  How  can  we  capture  the  above  characteris)cs?   •  Modularity:         •  The  objec)ve  is  to  maximize  Q     •  Intui)on:     –  Put  influen)al  nodes  in  separate  clusters   –  Create  dense  sub-­‐communi)es  (common  neighbors)     •  Algorithms  (op)mal  solu)on  is  NP-­‐hard:  Brandes   2007):   –  Spectral  clustering  based  (Newman  2006)   –  Greedy  algorithm    (Blondel  et  al.  2008)   Q = 1 2m (Aij − didj 2m ) i∈Cl , j∈Cl ∑ l=1 k ∑
  11. 11. Clusters  formed  by  maximizing  modularity  
  12. 12. Dendrogram  of  top  8  Disease  Clusters   C   C  
  13. 13. Cluster  100   Nodes:  1177   Edges:  2122  
  14. 14. Cluster  82   Nodes:  1200   Edges:  2554  
  15. 15. K-­‐core   •  Objec7ve:  Restrict  analysis  to  regions  of  increased   centrality  and  connectedness   •  K-­‐core:  largest  sub-­‐graph  where  all  nodes  have  a   minimum  degree  of  k  (Batagelj  2002).   •  K=5  (mode=2  for  the  en)re  network)   •  Protein  Interac)on  Networks  (Wuchty  et  al  2005,   Hamelin  et  al  2008)   Taken  from  Hamelin  et  al  2008  
  16. 16. 5-­‐core  graph:  color  code-­‐Type  
  17. 17. 5-­‐core  graph:  color-­‐code:  Modularity  class  
  18. 18. Disease  Clusters  (top  5)  dendrogram   C   C  
  19. 19. 5-­‐core  graph:  Cluster  5  (26%)  
  20. 20. 5-­‐core  graph:  Cluster  6  (22%)  
  21. 21. 5-­‐core  graph:  Cluster  0  (16%)  
  22. 22. 5-­‐core  graph:  Cluster  3  (13%)  
  23. 23. 5-­‐core  graph:  Cluster  4  (12.5%)  
  24. 24. Comparison  of  clusters  
  25. 25. •  Contribu7ng  areas   •  Biology,  bioinforma)cs,  sociology,  SNA,  Physics,  applied   mathema)cs,  Computer  and  informa)on  sciences     •  Summary   •  Holis)c    analysis  of  health  data   •  Analysis  based  on  node  centrality   •  Network  par))oning   •  Studying  the  effect  of  perturba)on   •  Where  do  we  go  from  here   •  Create  a  taxonomic  structure  of  elements  and  interac)ons   •  Search  tool     •  Biological  and  clinical  implica)ons   Conclusion  

×