"The bounty of the commons"
In this talk, we explore how public data can become more valuable with reuse. This reuse helps us get to the bottom of cases where we are certain and wrong and helps us ask better questions.
6. Tothill et al. Clinical Cancer Research. 2008
One hundred and seventy one tumors consistently
segregated into one of the six k-means clusters.
Most of the remaining tumors (80 of 114) could be
further assigned to one of the molecular subsets by
performing class prediction.
171 clustered cleanly
80 could be assigned
34 ???
12-40% unclear
7. The Cancer Genome Atlas, Nature. 2011
The silhouette width was computed to filter out expression profiles
that were included in a subclass, but that were not robust
representatives of the subclass. This resulted in the removal of 51
of 135 samples of the Differentiated subclass; 12 of 107 samples of
the Immunoreactive subclass; 0 of 109 samples of the Mesenchymal
subclass; and 13 of 138 samples of the Proliferative subclass..
16. What if you re-analyze Tothill
without LMP samples?
17. What if you re-analyze Tothill
with LMP samples?
18. Comprehensive cross-population analysis of high-
grade serous ovarian cancer supports no more than
three subtypes
bioRxiv: http://dx.doi.org/10.1101/030239
github: http://github.com/greenelab/hgsc_subtypes
19. Research is to see what everybody
else has seen and to think what
nobody else has thought.
- Albert Szent-Györgyi
Image by J.W. McGuire/NIH
29. Node42 reflects Anr Activity
E−GEOD−17179
}wt
}
}
Δanr
Δdnr
E−GEOD−17296
}
}
}
}
}
}
Δanr
ΔroxSR
Δanr
ΔroxSR
wt
wt
}
}
EXP
STAT
O2
E−GEOD−52445
O2
Node42 - Anr Activity
E−GEOD−33160
O2
A
−15 0 10
Value
Color Key Color Key
−10 0 10
Value
Color Key
Value
−10 0 10
−10 0 15
Color Key
Value
30. New Experiment Validates Node 42’s
Low-O2 Signature
CF lung epithelial cells
Jack Hammond
E−GEOD−17179 E−GEOD−17296
B
}
}
Δanr
wt
}
}Δanr
wt }
}Δan
wt
−5 0 5
Color Key
Value
Color Key
Value
−4 0 4
Color Key
Value
−2 0 2
Microarray RNAseq
PAO1
RNAseq
J215
CE−GEOD−17179
}
}
Δanr
Δdnr
E−GEOD−17296
}
}
}Δanr
ΔroxSR
wt
}STAT
D−52445
D−33160
B
}
}
Δanr
wt
}
}Δanr
wt }
}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value
−4 0 4
Color Key
Value
−2 0 2
Microarray RNAseq
PAO1
RNAseq
J215
C
31. Cross-platform normalization of microarray and
RNA-seq data for machine learning applications
Thompson, Tan, Greene. PeerJ.
Jeff Thompson
32. Cross-platform normalization of microarray and
RNA-seq data for machine learning applications
Thompson, Tan, Greene. PeerJ. 2016
33. New Experiment Validates Node 42’s
Low-O2 Signature
CF lung epithelial cells
Jack Hammond
E−GEOD−17179 E−GEOD−17296
B
}
}
Δanr
wt
}
}Δanr
wt }
}Δan
wt
−5 0 5
Color Key
Value
Color Key
Value
−4 0 4
Color Key
Value
−2 0 2
Microarray RNAseq
PAO1
RNAseq
J215
CE−GEOD−17179
}
}
Δanr
Δdnr
E−GEOD−17296
}
}
}Δanr
ΔroxSR
wt
}STAT
D−52445
D−33160
B
}
}
Δanr
wt
}
}Δanr
wt }
}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value
−4 0 4
Color Key
Value
−2 0 2
Microarray RNAseq
PAO1
RNAseq
J215
C
E−GEOD−17179
}
}
Δanr
Δdnr
E−GEOD−17296
}
}
}Δanr
ΔroxSR
wt
}STAT
D−52445
D−33160
B
}
}
Δanr
wt
}
}Δanr
wt }
}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value
−4 0 4
Color Key
Value
−2 0 2
Microarray RNAseq
PAO1
RNAseq
J215
C
E−GEOD−17179
}Δdnr
E−GEOD−17296
}Δanr
45
60
B
}
}
Δanr
wt
}
}Δanr
wt }
}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value
−4 0 4
Color Key
Value
−2 0 2
Microarray RNAseq
PAO1
RNAseq
J215
C
E−GEOD−17179
}wt
}
}
Δanr
Δdnr
E−GEOD−17296
}
}
}
}
}
Δanr
ΔroxSR
Δanr
ΔroxSR
wt
}
}
EXP
STAT
−GEOD−52445
−GEOD−33160
B
}
}
Δanr
wt
}
}Δanr
wt }
}Δanr
wt
−5 0 5
Color Key
Value
Color Key
Value
−4 0 4
Color Key
Value
−2 0 2
Microarray RNAseq
PAO1
RNAseq
J215
C
34. ADAGE analysis of publicly available
gene expression data collections
illuminates Pseudomonas aeruginosa-
host interactions
bioRxiv: http://dx.doi.org/10.1101/030650
github: http://github.com/greenelab/adage
Tan, Hammond, Hogan, and Greene. mSystems. 2016
35. I didn’t want to just know the
names of things. I remember really
wanting to know how it all worked.
- Elizabeth Blackburn
Image: US Embassy Sweden
50. Semi-Supervised Learning of the
Electronic Health Record for Phenotype
Stratification
bioRxiv: http://dx.doi.org/10.1101/039800
github: http://github.com/greenelab/DAPS
Reproducible computational workflows
with continuous analysis
bioRxiv: http://dx.doi.org/10.1101/056473
github: http://github.com/greenelab/continuous_analysis
51.
52. Research Parasite Awards
(The “Parasites”)
Selection criteria for the work in question:
• The awardee must not have been involved the design of the experiments
that generated the data.
• The awardee published independently of the original investigators, and the
original investigators are not authors of the secondary analyses but are
appropriately credited in the manuscripts.
• The awardee may have extended, replicated or disproved what the original
investigators had posited.
• The awardee has provided source code and intermediate or final results in a
manner that enhances reproducibility.
53. Research Parasite Awards
(The “Parasites”)
Additional selection criteria for the Junior
Parasite award:
• The awardee must have published the work at the training stage of their
career (postdoctoral, graduate, or undergraduate). If the awardee has
assumed a position as an independent investigator she or he should not have
been in that position for more than 2 years.
• The award will be based on work described in a single manuscript
(submitted alongside the nomination letter).
54. Research Parasite Awards
(The “Parasites”)
Additional selection criteria for the
Sustained Parasitism award:
• The awardee must be in an independent investigator position in academia,
industry or public sector.
• The awardee must be a last or corresponding author on the three
manuscripts submitted alongside the nomination letter.
• At least a five-year period must have elapsed between the publication of the
first manuscript and the final manuscript.
55. Details
• Submit by October 14, 2016 @ 5PM HST
• Additional Instructions at
http://greenelab.com/parasite-award
• COI rules are strict! I can only talk about rules.
2017 Selection Committee:
56. Greene Lab:
Jaclyn Taroni (Postdoc)
Daniel Himmelstein (Postdoc)
Jie Tan (Grad Student)
Gregory Way (Grad Student)
Brett Beaulieu-Jones (Grad Student)
Amy Campbell (Postbacc)
René Zelaya (Programmer)
Matt Huyck (Programmer)
Dongbo Hu (Programmer)
Kathy Chen (Undergrad)
Mulin Xiong (Undergrad)
Tim Chang (Undergrad)
Roshan Ravishankar (Undergrad)
Collaborators:
Deb Hogan & Jack Hammond
Data:
All investigators who publicly release their gene
expression data.
Images:
Artists who release their work under a Creative
Commons license.
Funding:
Gordon and Betty Moore Foundation
National Science Foundation
Cystic Fibrosis Foundation
National Institutes of Health
Find us online:
http://www.greenelab.com
Twitter: @GreeneScientist
Calvin and Hobbes. Bill Watterson