0
Data Integration: what I haven’t yet achieved
Neil Saunders

MATHEMATICS, INFORMATICS AND STATISTICS
www.csiro.au
My main project

Ludwig colorectal cancer study

Data integration 2 of 21
Multiple “omics” platforms

exon expression

Data integration 3 of 21

methylation

copy number
We want to “integrate” these data

but what does that mean?

Data integration 4 of 21
Integration can mean “portals”

Data integration 5 of 21
Integration can mean “visualization”

Data integration 6 of 21
Integration can mean “correlation”

Data integration 7 of 21
What do we think integration means?

A

+

B

+

C

More information when combined than when separate
Data integration 8 o...
What’s already “out there”? PubMed
PubMed Search: "data integration"
q
q

q

q

articles / 100 000

12

q

q

8
q

q
q

4
...
What’s already “out there”? CiteULike

http://www.citeulike.org/user/neils/tag/integration

Data integration 10 of 21
Buzz-word compliant

Data integration 11 of 21
Quote from integIRTy paper

These methods can be roughly grouped into four categories:
stepwise, regression-based, correla...
Regression: SIM

Integrated analysis of DNA copy number and gene expression microarray data using gene sets
BMC Bioinforma...
1

2

3

4

5

6

7

8

10
9

11

12

13

14

15

16

17
18

19
20
21
22
0

0

Data integration 14 of 21
0.2
0.4
2

0.6
0....
Latent variable: iCluster

(file under impractical)

Data integration 15 of 21
Basics that are never explained 1/2

Integration across groups or description of samples?

Data integration 16 of 21
Basics that are never explained 2/2

Genes x Samples

Data integration 17 of 21
Conclusions 1/3

We’re not the first people doing this...
...but it’s becoming a “hot topic”

Data integration 18 of 21
Conclusions 2/3

Room for improvement in software, much of which is:

• Poorly-written
• Poorly-documented
• Difficult to i...
Conclusions 3/3

Too much for one individual!

Data integration 20 of 21
CSIRO Mathematics, Informatics and Statistics
Neil Saunders
t
+61 2 9325 3144
e Neil.Saunders@csiro.au
w Mathematics, Info...
Upcoming SlideShare
Loading in...5
×

Data Integration: What I Haven't Yet Achieved

336

Published on

Data integration is a hot topic in bioinformatics, but the term means different things to different people. What do we think it means? Talk given at CSIRO Bioinformatics & Biostatistics group meeting, November 21 2012.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
336
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Data Integration: What I Haven't Yet Achieved"

  1. 1. Data Integration: what I haven’t yet achieved Neil Saunders MATHEMATICS, INFORMATICS AND STATISTICS www.csiro.au
  2. 2. My main project Ludwig colorectal cancer study Data integration 2 of 21
  3. 3. Multiple “omics” platforms exon expression Data integration 3 of 21 methylation copy number
  4. 4. We want to “integrate” these data but what does that mean? Data integration 4 of 21
  5. 5. Integration can mean “portals” Data integration 5 of 21
  6. 6. Integration can mean “visualization” Data integration 6 of 21
  7. 7. Integration can mean “correlation” Data integration 7 of 21
  8. 8. What do we think integration means? A + B + C More information when combined than when separate Data integration 8 of 21
  9. 9. What’s already “out there”? PubMed PubMed Search: "data integration" q q q q articles / 100 000 12 q q 8 q q q 4 q q 2002 2004 2006 Year Data integration 9 of 21 2008 2010
  10. 10. What’s already “out there”? CiteULike http://www.citeulike.org/user/neils/tag/integration Data integration 10 of 21
  11. 11. Buzz-word compliant Data integration 11 of 21
  12. 12. Quote from integIRTy paper These methods can be roughly grouped into four categories: stepwise, regression-based, correlation-based and latent variable models integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory Bioinformatics, Vol. 28, No. 22. (15 November 2012), pp. 2861-2869 Data integration 12 of 21
  13. 13. Regression: SIM Integrated analysis of DNA copy number and gene expression microarray data using gene sets BMC Bioinformatics 2009, 10:203 Data integration 13 of 21
  14. 14. 1 2 3 4 5 6 7 8 10 9 11 12 13 14 15 16 17 18 19 20 21 22 0 0 Data integration 14 of 21 0.2 0.4 2 0.6 0.8 4 1 Correlation 010 026 142 011 115 018 037 145 017 009 023 002 116 117 120 003 036 029 040 114 118 121 112 006 113 119 034 035 028 004 007 013 014 016 024 012 019 021 015 001 067 068 072 077 048 058 064 050 075 080 086 051 061 070 076 087 092 096 099 101 104 110 093 097 100 089 109 091 103 127 130 131 135 133 136 134 137 125 128 138 146 032 033 043 038 041 042 140 141 144 153 152 147 122 123 132 126 139 069 074 085 055 095 005 066 010 026 142 011 115 018 037 145 017 009 023 002 116 117 120 003 036 029 040 114 118 121 112 006 113 119 034 035 028 004 007 013 014 016 024 012 019 021 015 001 067 068 072 077 048 058 064 050 075 080 086 051 061 070 076 087 092 096 099 101 104 110 093 097 100 089 109 091 103 127 130 131 135 133 136 134 137 125 128 138 146 032 033 043 038 041 042 140 141 144 153 152 147 122 123 132 126 139 069 074 085 055 095 005 066 Chr Correlation: DR-Integrator
  15. 15. Latent variable: iCluster (file under impractical) Data integration 15 of 21
  16. 16. Basics that are never explained 1/2 Integration across groups or description of samples? Data integration 16 of 21
  17. 17. Basics that are never explained 2/2 Genes x Samples Data integration 17 of 21
  18. 18. Conclusions 1/3 We’re not the first people doing this... ...but it’s becoming a “hot topic” Data integration 18 of 21
  19. 19. Conclusions 2/3 Room for improvement in software, much of which is: • Poorly-written • Poorly-documented • Difficult to implement Data integration 19 of 21
  20. 20. Conclusions 3/3 Too much for one individual! Data integration 20 of 21
  21. 21. CSIRO Mathematics, Informatics and Statistics Neil Saunders t +61 2 9325 3144 e Neil.Saunders@csiro.au w Mathematics, Informatics and Statistics web MATHEMATICS, INFORMATICS AND STATISTICS www.csiro.au
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×