SlideShare a Scribd company logo
1 of 21
Download to read offline
Data as research output;
Data as part of the scholarly record
Todd Vision
University of North Carolina at Chapel Hill
Dryad Digital Repository

SciELO15 Ÿ 24 October 2013 Ÿ São Paulo
CC-­‐BY-­‐NC-­‐SA	
  nic221	
  
h/p://www.flickr.com/photos/nic221/391536867/	
  
Source:	
  IFEX	
  h/p://www.ifex.org/
united_states/2013/09/05/cipa_libraries/	
  
2011

Clades With Bootstrap Support (%)

Number
of Taxa
136-Taxon Cons.
Reduced Cons.
82-Taxon Cons.

121

BURLEIGH ET AL.—INFERRING THE PLANT TREE OF LIFE FROM GENE TREES

TABLE 1. Summary of supertree bootstrap support from the GTP
analysis

100

90

70

50

136
82
82

9.8
50.6
53.1

30.8
70.9
72.2

56.4
89.9
84.8

74.4
98.7
96.2

plants (99% support), gymnosperms (100% support),
angiosperms (99% support), eudicots (99% support),
core eudicots (99% support), and asterids (100% support; Fig. 3). Within gymnosperms, Gnetales were sister
to the conifers (100% support; Fig. 3). Amborella was sister to all other angiosperms, and Nuphar (Nympheales)
was sister to all angiosperms except Amborella (Fig. 3).
Magnoliids were sister to a monocot + eudicot clade
(Fig. 3). Within monocots, the Poaceae (grass family)
had 100% support, and within the grasses, the Panicoideae clade had 100% bootstrap support (Fig. 3). In
the core eudicot clade, the Caryophyllales (100% support) were sister to the rosids (99% support) and the
asterids (100% support) (Fig. 3).
There were several differences in the species tree obtained using ML gene trees versus NJ/PP gene trees. For
example, the relationships among eurosid lineages differed slightly; however, in both analyses, Malpighiales

FIGURE 2. Average quartet similarity for each taxon among bootstrap trees. Each point in the graph represents a single taxon. The xaxis shows the number of gene families trees that have data from the
taxon. The y-axis shows the average percentage frequency of quartets
(four taxon statements) containing the taxon that are identical between
two bootstrap trees. The shaded area in the graph contains all taxa that
are present in less than 1300 gene trees.

Table 2. Tests using susceptibility genes for complex human traits
Complex trait

D ISCUSSION
OMIM

Review(s)a

Geneb

Reviews

OMIM

Downloaded from sysbio.oxfordjournals.org at University of North Carolina at Chapel Hill on February 18, 2011

Notes: This displays the percentage of total clades at or above a given
level of bootstrap support for 1) the majority rule consensus of all bootstrap trees from the NJ/PP analysis of 136 taxa (136-Taxon Cons.), 2)
the reduced consensus of all bootstrap trees for the 82 taxa present in
at least 1300 of the gene trees (Reduced Cons.), and 3) the majority rule
consensus of all bootstrap trees from the NJ/PP analysis of the same
82 taxa as above (82-Taxon Cons.).

(eurosid I) were nested in a clade with eurosid II taxa
(Figs. 1 and 3). The BEP-clade (Bambusoideae, Ehrhartoideae, and Pooideae) was not supported in the analysis using NJ/PP gene trees, but it was when using ML
gene trees (Fig. 3). Acorus americanus was not placed
A computational system to select candidate genes for complex human traits
with other monocots in the NJ/PP analysis, but it was
in a monocot clade when using ML gene trees (Fig. 3).

Frequent gene and whole-genome duplications have,
Rank Total
Percent Enrich Rank Total
Percent Enrich
in the past, limited the use of nuclear genes for deep
level phylogenetic macular
Age-related analyses in plants and 15350892 clades
603075 15094132; other
CFH
7263
13771 47.3
2
10450 12608 17.1
1
1784 Knies et al. 12608
degeneration
LOC387715 –
13771 –
–
–
–
–
with highly duplicated genomes. GTP provides a way to
603075 N/Ac
C2
–
–
–
–
766
12875 94.1
17
exploit theARMD (second run) information inherent not only
phylogenetic
CFB
–
–
–
–
44
12875 99.7
293
in the relationships among orthologous genes but also
Alzheimer’s disease
104300 15225164
LOC439999 –
13550 –
–
–
13709 –
–
the rare gene duplications that produce paralogous gene
Asthma
600807 12810182; 14551038 NPSR1
1117
13881 92.0
12
2835
13120 78.4
5
family members. Rather than treating gene tree discorAutism
209850 11733747; 12142938 EN2
98
13610 99.3
139
98
13213 99.2
135
234
13039 98.2
56
168
12703 98.7
76
dance as aCeliac disease it seeks212750 species tree that pronuisance,
the 12907013; 12699968; MYO9B
14592529
vides the best reconciliation among the many discordant
Myocardial infarction
608446 15861005; 16041318 LTA4H
122
14043 99.1
115
–d
–
–
–
gene trees.
Parkinson’s disease
168600 16026116; 16278972 SEMA5A
4548
13477 66.2
3
879
13329 93.4
15
In this study, we arthritis GTP to find species trees PTPN22
Rheumatoid used
180300 15478157; 12915205 that
333
13279 97.5
40
2156
13038 83.5
6
minimize the total number of duplications across a FCRL3
col3743
13279 71.8
3
2230
13038 82.9
6
Schizophrenia
181500 trees. The sequence
10013 14603 31.4
1
8065
13572 40.6
2
lection of nearly 18,896 plant gene 15340352; 16033310 ENTH
Type 1 diabetes mellitus collections of existing EST
12123 14272 15.1
1
7675
13130 41.5
2
sampling includes extensive 222100 12270944; 11921414 SUMO4
11237226; 11899083 PTPN22
165
14272 98.8
86
833
13130 93.7
16
data that have rarely before been used for plant phyloIL2RA
130
14272 99.1
110
528
13130 96.0
25
genetics (but see de la Torre et al. 2006; Sanderson CTLA4
and
78
14272 99.5
183
324
13130 97.5
40
McMahon Type 2 diabetes mellitus 125853 15662000; 15662001; Thus,
2007; de la Torre-B´ rcena et al. 2009). TCF7L2
a
2911
13922 79.1
5
4013
13586 70.5
3
15662002; 15662003
this study provides a new nuclear genomic perspective
Totals of life.
725e
13826e 94.7e
54f FIG. 2.—Best-fit nucleotide substitution models for each alignment. Shown is a cartoon illustration of the rate categories of the best-fit nucleotide
879e
13130e 93.4e
43f
on the plant tree
substitution models for each molecule. Within a molecule, rates were scaled to the maximum rate (black). Diagonal lines depict transitions; the edges of
Overall, athe phylogenetic relationships inferred from
the square depict transversions. The HKY85 model, which was used for the rate ratios reported throughout this article, is shown for comparison on
PubMed ID(s) of review articles used in corpus.
the right.
b
gene duplications are Methods section. HUGO approved genepreviousto identify genes.
largely consistent with symbols used
For references see
large-scale cNo suitable reviewstudies of (see Methods section).
molecular corpus available plant phylogeny (e.g.,
d
Soltis et al.The OMIM Hiluis et al. 2003; Jansen not used. 2007).
2000; record insufficiently detailed and was et al.
e
Median result.
Substitution Patterns in RRE
a higher jp (jp 5 7.61 with 95% CI [4.79–18.48]) than the
f
Yet the GTP analysis also provides support for some
Mean result.
paired sites as a whole (jp 5 4.21 with 95% CI [3.51–
We examined 3 possible explanations for the surprisrelationships that are unresolved or conflicting in pre5.28]). This suggests that the presence of protein-coding
Search::VectorSpace by Maciej Ceglowskij(http://www.perl.com/pub/
or sum
ing result that p , ju in RRE. First, because both the RRE
vious analyses. For example, the results support the
constraints does impede compensatory evolution at paired
a/2003/02/19/engine.html). and CRE secondary structures occur within coding regions,
Databases and ontology schemas were
n
X
placement of magnoliids sister to monocots + eudicots,
sites in RNA secondary structures, although it does not exdownloaded and parsed into examined the possibility that the difference between jp
XML under a custom XML schema.
we
int3, Ceratophyllum, which
zij
making eudicots (possibly withi ¼
Intermediate text and data-miningis diminished by stored as XML protein sequence. plain why ju would be ‘‘greater’’ than jp in RRE.
and ju results were also selection on the
j¼0
Second, we examined the possibility that we had used
was not included in this study) sister to monocots (Figs 1
under the same schema. We recalculated j and j for both molecules using only
p
u
a nonrepresentative sample of RRE sequences. To confirm
of the transformed scores for gene i.
and 3). The relationships among these major clades are
data from 4-fold degenerate sites in paired and unpaired rethat the observed substitution patterns in RRE were not speThe fourth method, referred to as (e.g., Soltis the other
unclear from analyses using few genesint4, differs fromet al.
gions. In CRE, the presence of codons affects the estimates cific to the particular set of HIV sequences we examined
three by considering both the score of a gene within a data source
2.5 Selection of the tests for predicted traits
complex direction (4-fold degenerate sites:
2000, 2007;as well as the number of but our resultthat consistent
Hilu et al. 2003), genes returned for is data source. First,
in the
To assess the ability of CAESAR to; choose valid 5j1:45 ), though the 4-fold sites over- (which were all derived from subtype B), we estimated
jp 5j2:89 all sites: jp candidates, 18 test
with recent analyses using is obtained. genes (Jansen et al.
u
u
a transformed score sij 81 plastid
genes were selected from recentlythe predicted pattern. We had less power to compare jp and ju from 2 additional RRE alignments of sequences
shoot published reports providing strong
2007). The placement of Malpighiales within a eurosid
rij
evidence of statistical association with known complex human unpaired sites of drawn from higher taxonomic levels: sequences from dif4-fold degenerate sites at the paired and
sij conflicts
I clade (Figs. 1 and 3) generally¼ Pn rij with previous
disorders. The test genes included CTLA4 were too few 4-fold degenerate unpaired ferent subtypes (1 sequence each from A, B, C, F, G, H, J,
i¼0
RRE because there (Ueda et al., 2003),
large-scale angiosperm analyses (e.g., Soltis et al. 2000;
PTPN22 (Bottini et al., sites, and there was insufficient sequence variability at these and K) and sequences from different groups (1–2 sequences
2004), PTPN22 (Begovich et al., 2004),
The transformed et al. 2007). Given the novelty of
each from M, N, and O) of HIV. In both these alignments,
SUMO4 (Guo et al., 2004), FCRL3 (Kochi et al., 2005), ENTH
Hilu et al. 2003; Jansen gene scores are then summed together to provide
sites. However, the 4-fold degenerate paired sites did show
a
the results were qualitatively similar to those for subtype B:
(Pimm et al., 2005), EN2 (Gharani et al., 2004), TCF7L2 (Grant et al.,
the result, it final score for each gene.
should be interpreted with great caution.
ju was significantly higher than jp (table 4).
2006), CFH (Klein et al., 2005), LOC387715 (Rivera et al., 2005),
J
X gj
Our results indicate that data from many gene trees
LTA4H (Helgadottir et al., 2006), C2 (Gold et al., 2006),
Third, we considered whether the RRE estimates were
int4, i
sij
may be required to produce a ¼well-supported phyG
CFB (Gold et al., 2006), NPSR1 (Laitinen et al., 2004), MYO9B
disproportionately influenced by a portion of the molecule
j¼1
logeny using GTP (Table 1; Figs. 2 and 3), suggesting
(Monsuur et al., 2005), IL2RA (Vella et al., 2005), SEMA5A
that experiences a type of selection that differs from the
where gj is use data genes returned for source j and
that GTP may not the number ofas efficiently as more tradi(Maraganore et al., 2005) and LOC439999 (Grupe et al., 2006).
molecule as a whole. We systematically removed each
Each disorder required a custom corpus, either an OMIM record
tional phylogenetic analyses of concatenated multigene
stem-loop of RRE and reestimated jp and ju for the resultor one or more review articles describing the biology of the disorder
J
X
data sets. For example, in plants, recent analyses of up
ing partial structures. The jp and ju estimates were quali(Table 2). Review articles were selected by searching PubMed
G¼
gj
tatively similar for all these partial structures (table 5).
to 83 plastid genes have apparently resolved enigmatic
(Wheeler et al., 2006) for articles published before the year of discovery
j¼1
relationships in the backbone angiosperm phylogeny,
of each gene association. Where multiple suitable review articles
2.4 Implementation
whereas our analyses appear to require data from 1000
were available, the texts were concatenated to produce the corpus.
Table 3
We removed any direct reference to the testing gene in the input text.
The et al. 2007; Moore et al. 2007, Perl version
genes (Jansen CAESAR algorithms were written using 2010). Like5.8.1
Transition–Transversion Rate Ratios (jp)
and Java version 1.4.2. The vector space similarity searches were
performed using a modified version of the Perl module

In addition, entries in the GAD containing the test genes were removed.
Thus, the input data closely mimicked the state of knowledge prior

1135

FIG. 3.—Transition–transversion rate ratios (j) for each alignment.
The dotted line represents a 1:1 relationship between jp and ju. The solid
line represents the predicted relationship jp 5j2 . Note that the CRE data
u
point is from the analysis of 4-fold degenerate sites in paired and unpaired
regions.

Structure
RRE
IRES
CRE
5S rRNA
16S rRNA
23S rRNA
A tRNA
M tRNA
12S rRNA
RNase P
a

j

jp

ju

k

5.19
6.50
12.52
3.70
3.24
2.57
6.04
11.98
3.90
2.98

4.21
15.34
22.36
4.44
3.79
3.06
9.48
18.78
6.69
4.86

9.01
3.60
2.93
2.82
2.02
1.71
3.30
9.65
2.83
1.30

546.05a
73.46a
177.32a
35.05a
665.64a
1281.71a
204.73a
122.24a
131.93a
59.21a

LRT value significant at P , 0.0001
Relatively
little data is
published 
within 
articles 

Published tables  figures





Analysed data

Raw data
Reuse of open data boosts
citations to the original article

Piwowar	
  and	
  Vision	
  (2013)	
  	
  
doi:10.7717/peerj.175	
  
Volume	


Most analyzed data is in the ‘long tail’, for
which there is no specialized repository

Structured data	

(e.g. Genbank, GBIF)	


Long-tail data	


Rank frequency of datatype	

After Heidorn (2008) doi:10.1353/lib.0.0036
Peer-to-peer data sharing does not work
Wicherts and colleagues requested data from from
141 articles in American Psychological Association
journals.
“6 months later, after … 400 emails, [sending]
detailed descriptions of our study aims, approvals
of our ethical committee, signed assurances not to
share data with others, and even our full
resumes…” only 27% of authors complied 

Wicherts JM, Borsboom D, Kats J, Molenaar D (2006) doi:10.1037/0003-066X.61.7.726
Data is best captured at the time of publication
Time	
  of	
  publica(on	
  
Specific	
  details	
  

Informa(on	
  Content	
  

General	
  details	
  
Re(rement	
  or	
  	
  
career	
  change	
  
Accident	
  
Death	
  

Time	
  

(Michener	
  et	
  al.	
  1997)	
  
CC-­‐BY	
  Adamo	
  
h/p://www.piqs.de/fotos/121272.html	
  

Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow,
Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
Joint Data Archiving Policy ( JDAP )
Data are important products of the scientific
enterprise, and they should be preserved and
usable for decades in the future. 
As a condition for publication, data supporting the
results in the article should be deposited in an
appropriate public archive.
Authors may elect to embargo access to the data for
a period up to a year after publication. 
Exceptions may be granted at the discretion of the
editor, especially for sensitive information.
http://datadryad.org/pages/jdap
High impact factor journals have stronger data
archiving policies
IF=6.0

n=70

IF=3.6

IF=4.5

Piwowar HA, Chapman WW (2008) hdl:10101/npre.2008.1700.1
author

prepare manuscript
and related data files

JOURNAL
submit manuscript

manuscript review

DRYAD
upload data

editor

accepted?
no

accepted?

send article
description

Dryad data
package

send data
identifier (DOI)

yes

curation
data curator

published article
(with data citation)

published data
(with article citation)
When using this data, please cite the original article:
Chave J, Coomes D, Jansen S, Lewis SL, Swenson NG, Zanne
AE (2009) Towards a worldwide wood economics spectrum.
Ecology Letters 12: 351-366. doi:10.1111/j.
1461-0248.2009.01285.x 

Additionally, please cite the Dryad data package:
Zanne AE, Lopez-Gonzalez G, Coomes DA, Ilic J, Jansen S,
Lewis SL, Miller RB, Swenson NG, Wiemann MC, Chave J
(2009) Data from: Towards a worldwide wood economics
spectrum. Dryad Digital Repository. doi:10.5061/dryad.234
No fees for submission from low and
lower middle income countries
Dryad by the numbers
Data packages 
 
 4,172 
Authors 
 
 
 
15,581 
Data files 
 
 
 
11,912
Integrated journals 
 37
All journals 
 
 
 
268 
File downloads 
4,629,256

Stats	
  as	
  of	
  23	
  Oct	
  2013	
  
To learn more
• 
• 
• 
• 
• 

Repository home: http://datadryad.org
News: http://blog.datadryad.org
Project documentation: http://wiki.datadryad.org
Twitter: @datadryad
Code: http://code.google.com/p/dryad

or contact us: 
•  http://datadryad.org/feedback 
•  Todd Vision, Director, tjv@bio.unc.edu
•  Laura Wendell, Dryad Executive Director, lwendell@datadryad.org

More Related Content

What's hot

Comparing the Amount and Quality of Information from Different Sequencing Str...
Comparing the Amount and Quality of Information from Different Sequencing Str...Comparing the Amount and Quality of Information from Different Sequencing Str...
Comparing the Amount and Quality of Information from Different Sequencing Str...jembrown
 
Leyva et al Chem&Biol 2010 (dragged)
Leyva et al Chem&Biol 2010 (dragged)Leyva et al Chem&Biol 2010 (dragged)
Leyva et al Chem&Biol 2010 (dragged)Hyunsun Park
 
02 f ijab 12-724, 011-018
02 f ijab 12-724, 011-01802 f ijab 12-724, 011-018
02 f ijab 12-724, 011-018IJAB1999
 
54 przemyslaw szafranski - 6287844 - compositions and methods for controlli...
54   przemyslaw szafranski - 6287844 - compositions and methods for controlli...54   przemyslaw szafranski - 6287844 - compositions and methods for controlli...
54 przemyslaw szafranski - 6287844 - compositions and methods for controlli...Mello_Patent_Registry
 
Genomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelGenomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelJohn B. Cole, Ph.D.
 
human_mutation_article
human_mutation_articlehuman_mutation_article
human_mutation_articleNeha Gupta
 
JSHS Poster
JSHS PosterJSHS Poster
JSHS PosterEric Zhu
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Laurence Dawkins-Hall
 
CSUPerb_2014_Calderon-Final
CSUPerb_2014_Calderon-FinalCSUPerb_2014_Calderon-Final
CSUPerb_2014_Calderon-FinalAlissa Calderon
 
The effects of banana peels on blood parameters of grower rabbits
The effects of banana peels on blood parameters of grower rabbitsThe effects of banana peels on blood parameters of grower rabbits
The effects of banana peels on blood parameters of grower rabbitsAlexander Decker
 
Alzheimer's Disease in Fruit Flies
Alzheimer's Disease in Fruit FliesAlzheimer's Disease in Fruit Flies
Alzheimer's Disease in Fruit FliesCathy_McElwain
 
A Primer to Bioinformatics: 29 September 2017
A Primer to Bioinformatics: 29 September 2017A Primer to Bioinformatics: 29 September 2017
A Primer to Bioinformatics: 29 September 2017DocSoc2017
 
ASBMB_Poster_3_25_2015_Jake
ASBMB_Poster_3_25_2015_Jake ASBMB_Poster_3_25_2015_Jake
ASBMB_Poster_3_25_2015_Jake Jake Elwood
 
The Assessment Of Adma 1
The Assessment Of Adma 1The Assessment Of Adma 1
The Assessment Of Adma 1flic99
 
Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...
Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...
Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...CrimsonpublishersMedical
 
Research/ International Drug Discovery Science and Technology Conference 2017
Research/ International Drug Discovery Science and Technology Conference 2017Research/ International Drug Discovery Science and Technology Conference 2017
Research/ International Drug Discovery Science and Technology Conference 2017Green-book
 
ASBMB_Poster_3_25_2015_Jake
ASBMB_Poster_3_25_2015_Jake ASBMB_Poster_3_25_2015_Jake
ASBMB_Poster_3_25_2015_Jake Jake Elwood
 

What's hot (20)

Comparing the Amount and Quality of Information from Different Sequencing Str...
Comparing the Amount and Quality of Information from Different Sequencing Str...Comparing the Amount and Quality of Information from Different Sequencing Str...
Comparing the Amount and Quality of Information from Different Sequencing Str...
 
GJB6
GJB6GJB6
GJB6
 
Grant Proposal 2006
Grant Proposal 2006Grant Proposal 2006
Grant Proposal 2006
 
Leyva et al Chem&Biol 2010 (dragged)
Leyva et al Chem&Biol 2010 (dragged)Leyva et al Chem&Biol 2010 (dragged)
Leyva et al Chem&Biol 2010 (dragged)
 
02 f ijab 12-724, 011-018
02 f ijab 12-724, 011-01802 f ijab 12-724, 011-018
02 f ijab 12-724, 011-018
 
54 przemyslaw szafranski - 6287844 - compositions and methods for controlli...
54   przemyslaw szafranski - 6287844 - compositions and methods for controlli...54   przemyslaw szafranski - 6287844 - compositions and methods for controlli...
54 przemyslaw szafranski - 6287844 - compositions and methods for controlli...
 
Genomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelGenomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a model
 
human_mutation_article
human_mutation_articlehuman_mutation_article
human_mutation_article
 
JSHS Poster
JSHS PosterJSHS Poster
JSHS Poster
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...
 
CSUPerb_2014_Calderon-Final
CSUPerb_2014_Calderon-FinalCSUPerb_2014_Calderon-Final
CSUPerb_2014_Calderon-Final
 
14KoVar
14KoVar14KoVar
14KoVar
 
The effects of banana peels on blood parameters of grower rabbits
The effects of banana peels on blood parameters of grower rabbitsThe effects of banana peels on blood parameters of grower rabbits
The effects of banana peels on blood parameters of grower rabbits
 
Alzheimer's Disease in Fruit Flies
Alzheimer's Disease in Fruit FliesAlzheimer's Disease in Fruit Flies
Alzheimer's Disease in Fruit Flies
 
A Primer to Bioinformatics: 29 September 2017
A Primer to Bioinformatics: 29 September 2017A Primer to Bioinformatics: 29 September 2017
A Primer to Bioinformatics: 29 September 2017
 
ASBMB_Poster_3_25_2015_Jake
ASBMB_Poster_3_25_2015_Jake ASBMB_Poster_3_25_2015_Jake
ASBMB_Poster_3_25_2015_Jake
 
The Assessment Of Adma 1
The Assessment Of Adma 1The Assessment Of Adma 1
The Assessment Of Adma 1
 
Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...
Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...
Crimson publishers-5-MethylcytosineDNA Methylation Patterns among Gut Predomi...
 
Research/ International Drug Discovery Science and Technology Conference 2017
Research/ International Drug Discovery Science and Technology Conference 2017Research/ International Drug Discovery Science and Technology Conference 2017
Research/ International Drug Discovery Science and Technology Conference 2017
 
ASBMB_Poster_3_25_2015_Jake
ASBMB_Poster_3_25_2015_Jake ASBMB_Poster_3_25_2015_Jake
ASBMB_Poster_3_25_2015_Jake
 

Similar to Data as research output; Data as part of the scholarly record

MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art... Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...Healthcare and Medical Sciences
 
Plang functional genome
Plang functional genomePlang functional genome
Plang functional genometcha163
 
A general model for the origin of allometric scaling laws in biology
A general model for the origin of allometric scaling laws in biologyA general model for the origin of allometric scaling laws in biology
A general model for the origin of allometric scaling laws in biologyJosé Luis Moreno Garvayo
 
Duong_H_2008a
Duong_H_2008aDuong_H_2008a
Duong_H_2008aHao Duong
 
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEMMonica Pava-Ripoll
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Robin Gutell
 
Science 2013-schuenemann-179-83 leprosy önemli
Science 2013-schuenemann-179-83 leprosy önemliScience 2013-schuenemann-179-83 leprosy önemli
Science 2013-schuenemann-179-83 leprosy önemliHazal Sav
 
dan.crawford.project.final
dan.crawford.project.finaldan.crawford.project.final
dan.crawford.project.finalDan Crawford
 
Gutell 056.mpe.1996.05.0391
Gutell 056.mpe.1996.05.0391Gutell 056.mpe.1996.05.0391
Gutell 056.mpe.1996.05.0391Robin Gutell
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Databricks
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleLaurence Dawkins-Hall
 
9th Student Conference for Conservation Science, UK 2008
9th Student Conference for Conservation Science, UK 20089th Student Conference for Conservation Science, UK 2008
9th Student Conference for Conservation Science, UK 2008Dr. Amalesh Dhar
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by thAgripinaBeaulieuyw
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by thsachazerbelq9l
 
1- Why was the Tomasetti et al article so misinterpreted by th.docx
1- Why was the Tomasetti et al article so misinterpreted by th.docx1- Why was the Tomasetti et al article so misinterpreted by th.docx
1- Why was the Tomasetti et al article so misinterpreted by th.docxjeremylockett77
 

Similar to Data as research output; Data as part of the scholarly record (20)

MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art... Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 
Plang functional genome
Plang functional genomePlang functional genome
Plang functional genome
 
A general model for the origin of allometric scaling laws in biology
A general model for the origin of allometric scaling laws in biologyA general model for the origin of allometric scaling laws in biology
A general model for the origin of allometric scaling laws in biology
 
Duong_H_2008a
Duong_H_2008aDuong_H_2008a
Duong_H_2008a
 
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277
 
Science 2013-schuenemann-179-83 leprosy önemli
Science 2013-schuenemann-179-83 leprosy önemliScience 2013-schuenemann-179-83 leprosy önemli
Science 2013-schuenemann-179-83 leprosy önemli
 
Levitan
LevitanLevitan
Levitan
 
dan.crawford.project.final
dan.crawford.project.finaldan.crawford.project.final
dan.crawford.project.final
 
Poster_GCP_Knapp
Poster_GCP_KnappPoster_GCP_Knapp
Poster_GCP_Knapp
 
Gutell 056.mpe.1996.05.0391
Gutell 056.mpe.1996.05.0391Gutell 056.mpe.1996.05.0391
Gutell 056.mpe.1996.05.0391
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattle
 
Article 9
Article 9Article 9
Article 9
 
9th Student Conference for Conservation Science, UK 2008
9th Student Conference for Conservation Science, UK 20089th Student Conference for Conservation Science, UK 2008
9th Student Conference for Conservation Science, UK 2008
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th
 
1- Why was the Tomasetti et al article so misinterpreted by th.docx
1- Why was the Tomasetti et al article so misinterpreted by th.docx1- Why was the Tomasetti et al article so misinterpreted by th.docx
1- Why was the Tomasetti et al article so misinterpreted by th.docx
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Data as research output; Data as part of the scholarly record

  • 1. Data as research output; Data as part of the scholarly record Todd Vision University of North Carolina at Chapel Hill Dryad Digital Repository SciELO15 Ÿ 24 October 2013 Ÿ São Paulo
  • 4.
  • 5. 2011 Clades With Bootstrap Support (%) Number of Taxa 136-Taxon Cons. Reduced Cons. 82-Taxon Cons. 121 BURLEIGH ET AL.—INFERRING THE PLANT TREE OF LIFE FROM GENE TREES TABLE 1. Summary of supertree bootstrap support from the GTP analysis 100 90 70 50 136 82 82 9.8 50.6 53.1 30.8 70.9 72.2 56.4 89.9 84.8 74.4 98.7 96.2 plants (99% support), gymnosperms (100% support), angiosperms (99% support), eudicots (99% support), core eudicots (99% support), and asterids (100% support; Fig. 3). Within gymnosperms, Gnetales were sister to the conifers (100% support; Fig. 3). Amborella was sister to all other angiosperms, and Nuphar (Nympheales) was sister to all angiosperms except Amborella (Fig. 3). Magnoliids were sister to a monocot + eudicot clade (Fig. 3). Within monocots, the Poaceae (grass family) had 100% support, and within the grasses, the Panicoideae clade had 100% bootstrap support (Fig. 3). In the core eudicot clade, the Caryophyllales (100% support) were sister to the rosids (99% support) and the asterids (100% support) (Fig. 3). There were several differences in the species tree obtained using ML gene trees versus NJ/PP gene trees. For example, the relationships among eurosid lineages differed slightly; however, in both analyses, Malpighiales FIGURE 2. Average quartet similarity for each taxon among bootstrap trees. Each point in the graph represents a single taxon. The xaxis shows the number of gene families trees that have data from the taxon. The y-axis shows the average percentage frequency of quartets (four taxon statements) containing the taxon that are identical between two bootstrap trees. The shaded area in the graph contains all taxa that are present in less than 1300 gene trees. Table 2. Tests using susceptibility genes for complex human traits Complex trait D ISCUSSION OMIM Review(s)a Geneb Reviews OMIM Downloaded from sysbio.oxfordjournals.org at University of North Carolina at Chapel Hill on February 18, 2011 Notes: This displays the percentage of total clades at or above a given level of bootstrap support for 1) the majority rule consensus of all bootstrap trees from the NJ/PP analysis of 136 taxa (136-Taxon Cons.), 2) the reduced consensus of all bootstrap trees for the 82 taxa present in at least 1300 of the gene trees (Reduced Cons.), and 3) the majority rule consensus of all bootstrap trees from the NJ/PP analysis of the same 82 taxa as above (82-Taxon Cons.). (eurosid I) were nested in a clade with eurosid II taxa (Figs. 1 and 3). The BEP-clade (Bambusoideae, Ehrhartoideae, and Pooideae) was not supported in the analysis using NJ/PP gene trees, but it was when using ML gene trees (Fig. 3). Acorus americanus was not placed A computational system to select candidate genes for complex human traits with other monocots in the NJ/PP analysis, but it was in a monocot clade when using ML gene trees (Fig. 3). Frequent gene and whole-genome duplications have, Rank Total Percent Enrich Rank Total Percent Enrich in the past, limited the use of nuclear genes for deep level phylogenetic macular Age-related analyses in plants and 15350892 clades 603075 15094132; other CFH 7263 13771 47.3 2 10450 12608 17.1 1 1784 Knies et al. 12608 degeneration LOC387715 – 13771 – – – – – with highly duplicated genomes. GTP provides a way to 603075 N/Ac C2 – – – – 766 12875 94.1 17 exploit theARMD (second run) information inherent not only phylogenetic CFB – – – – 44 12875 99.7 293 in the relationships among orthologous genes but also Alzheimer’s disease 104300 15225164 LOC439999 – 13550 – – – 13709 – – the rare gene duplications that produce paralogous gene Asthma 600807 12810182; 14551038 NPSR1 1117 13881 92.0 12 2835 13120 78.4 5 family members. Rather than treating gene tree discorAutism 209850 11733747; 12142938 EN2 98 13610 99.3 139 98 13213 99.2 135 234 13039 98.2 56 168 12703 98.7 76 dance as aCeliac disease it seeks212750 species tree that pronuisance, the 12907013; 12699968; MYO9B 14592529 vides the best reconciliation among the many discordant Myocardial infarction 608446 15861005; 16041318 LTA4H 122 14043 99.1 115 –d – – – gene trees. Parkinson’s disease 168600 16026116; 16278972 SEMA5A 4548 13477 66.2 3 879 13329 93.4 15 In this study, we arthritis GTP to find species trees PTPN22 Rheumatoid used 180300 15478157; 12915205 that 333 13279 97.5 40 2156 13038 83.5 6 minimize the total number of duplications across a FCRL3 col3743 13279 71.8 3 2230 13038 82.9 6 Schizophrenia 181500 trees. The sequence 10013 14603 31.4 1 8065 13572 40.6 2 lection of nearly 18,896 plant gene 15340352; 16033310 ENTH Type 1 diabetes mellitus collections of existing EST 12123 14272 15.1 1 7675 13130 41.5 2 sampling includes extensive 222100 12270944; 11921414 SUMO4 11237226; 11899083 PTPN22 165 14272 98.8 86 833 13130 93.7 16 data that have rarely before been used for plant phyloIL2RA 130 14272 99.1 110 528 13130 96.0 25 genetics (but see de la Torre et al. 2006; Sanderson CTLA4 and 78 14272 99.5 183 324 13130 97.5 40 McMahon Type 2 diabetes mellitus 125853 15662000; 15662001; Thus, 2007; de la Torre-B´ rcena et al. 2009). TCF7L2 a 2911 13922 79.1 5 4013 13586 70.5 3 15662002; 15662003 this study provides a new nuclear genomic perspective Totals of life. 725e 13826e 94.7e 54f FIG. 2.—Best-fit nucleotide substitution models for each alignment. Shown is a cartoon illustration of the rate categories of the best-fit nucleotide 879e 13130e 93.4e 43f on the plant tree substitution models for each molecule. Within a molecule, rates were scaled to the maximum rate (black). Diagonal lines depict transitions; the edges of Overall, athe phylogenetic relationships inferred from the square depict transversions. The HKY85 model, which was used for the rate ratios reported throughout this article, is shown for comparison on PubMed ID(s) of review articles used in corpus. the right. b gene duplications are Methods section. HUGO approved genepreviousto identify genes. largely consistent with symbols used For references see large-scale cNo suitable reviewstudies of (see Methods section). molecular corpus available plant phylogeny (e.g., d Soltis et al.The OMIM Hiluis et al. 2003; Jansen not used. 2007). 2000; record insufficiently detailed and was et al. e Median result. Substitution Patterns in RRE a higher jp (jp 5 7.61 with 95% CI [4.79–18.48]) than the f Yet the GTP analysis also provides support for some Mean result. paired sites as a whole (jp 5 4.21 with 95% CI [3.51– We examined 3 possible explanations for the surprisrelationships that are unresolved or conflicting in pre5.28]). This suggests that the presence of protein-coding Search::VectorSpace by Maciej Ceglowskij(http://www.perl.com/pub/ or sum ing result that p , ju in RRE. First, because both the RRE vious analyses. For example, the results support the constraints does impede compensatory evolution at paired a/2003/02/19/engine.html). and CRE secondary structures occur within coding regions, Databases and ontology schemas were n X placement of magnoliids sister to monocots + eudicots, sites in RNA secondary structures, although it does not exdownloaded and parsed into examined the possibility that the difference between jp XML under a custom XML schema. we int3, Ceratophyllum, which zij making eudicots (possibly withi ¼ Intermediate text and data-miningis diminished by stored as XML protein sequence. plain why ju would be ‘‘greater’’ than jp in RRE. and ju results were also selection on the j¼0 Second, we examined the possibility that we had used was not included in this study) sister to monocots (Figs 1 under the same schema. We recalculated j and j for both molecules using only p u a nonrepresentative sample of RRE sequences. To confirm of the transformed scores for gene i. and 3). The relationships among these major clades are data from 4-fold degenerate sites in paired and unpaired rethat the observed substitution patterns in RRE were not speThe fourth method, referred to as (e.g., Soltis the other unclear from analyses using few genesint4, differs fromet al. gions. In CRE, the presence of codons affects the estimates cific to the particular set of HIV sequences we examined three by considering both the score of a gene within a data source 2.5 Selection of the tests for predicted traits complex direction (4-fold degenerate sites: 2000, 2007;as well as the number of but our resultthat consistent Hilu et al. 2003), genes returned for is data source. First, in the To assess the ability of CAESAR to; choose valid 5j1:45 ), though the 4-fold sites over- (which were all derived from subtype B), we estimated jp 5j2:89 all sites: jp candidates, 18 test with recent analyses using is obtained. genes (Jansen et al. u u a transformed score sij 81 plastid genes were selected from recentlythe predicted pattern. We had less power to compare jp and ju from 2 additional RRE alignments of sequences shoot published reports providing strong 2007). The placement of Malpighiales within a eurosid rij evidence of statistical association with known complex human unpaired sites of drawn from higher taxonomic levels: sequences from dif4-fold degenerate sites at the paired and sij conflicts I clade (Figs. 1 and 3) generally¼ Pn rij with previous disorders. The test genes included CTLA4 were too few 4-fold degenerate unpaired ferent subtypes (1 sequence each from A, B, C, F, G, H, J, i¼0 RRE because there (Ueda et al., 2003), large-scale angiosperm analyses (e.g., Soltis et al. 2000; PTPN22 (Bottini et al., sites, and there was insufficient sequence variability at these and K) and sequences from different groups (1–2 sequences 2004), PTPN22 (Begovich et al., 2004), The transformed et al. 2007). Given the novelty of each from M, N, and O) of HIV. In both these alignments, SUMO4 (Guo et al., 2004), FCRL3 (Kochi et al., 2005), ENTH Hilu et al. 2003; Jansen gene scores are then summed together to provide sites. However, the 4-fold degenerate paired sites did show a the results were qualitatively similar to those for subtype B: (Pimm et al., 2005), EN2 (Gharani et al., 2004), TCF7L2 (Grant et al., the result, it final score for each gene. should be interpreted with great caution. ju was significantly higher than jp (table 4). 2006), CFH (Klein et al., 2005), LOC387715 (Rivera et al., 2005), J X gj Our results indicate that data from many gene trees LTA4H (Helgadottir et al., 2006), C2 (Gold et al., 2006), Third, we considered whether the RRE estimates were int4, i sij may be required to produce a ¼well-supported phyG CFB (Gold et al., 2006), NPSR1 (Laitinen et al., 2004), MYO9B disproportionately influenced by a portion of the molecule j¼1 logeny using GTP (Table 1; Figs. 2 and 3), suggesting (Monsuur et al., 2005), IL2RA (Vella et al., 2005), SEMA5A that experiences a type of selection that differs from the where gj is use data genes returned for source j and that GTP may not the number ofas efficiently as more tradi(Maraganore et al., 2005) and LOC439999 (Grupe et al., 2006). molecule as a whole. We systematically removed each Each disorder required a custom corpus, either an OMIM record tional phylogenetic analyses of concatenated multigene stem-loop of RRE and reestimated jp and ju for the resultor one or more review articles describing the biology of the disorder J X data sets. For example, in plants, recent analyses of up ing partial structures. The jp and ju estimates were quali(Table 2). Review articles were selected by searching PubMed G¼ gj tatively similar for all these partial structures (table 5). to 83 plastid genes have apparently resolved enigmatic (Wheeler et al., 2006) for articles published before the year of discovery j¼1 relationships in the backbone angiosperm phylogeny, of each gene association. Where multiple suitable review articles 2.4 Implementation whereas our analyses appear to require data from 1000 were available, the texts were concatenated to produce the corpus. Table 3 We removed any direct reference to the testing gene in the input text. The et al. 2007; Moore et al. 2007, Perl version genes (Jansen CAESAR algorithms were written using 2010). Like5.8.1 Transition–Transversion Rate Ratios (jp) and Java version 1.4.2. The vector space similarity searches were performed using a modified version of the Perl module In addition, entries in the GAD containing the test genes were removed. Thus, the input data closely mimicked the state of knowledge prior 1135 FIG. 3.—Transition–transversion rate ratios (j) for each alignment. The dotted line represents a 1:1 relationship between jp and ju. The solid line represents the predicted relationship jp 5j2 . Note that the CRE data u point is from the analysis of 4-fold degenerate sites in paired and unpaired regions. Structure RRE IRES CRE 5S rRNA 16S rRNA 23S rRNA A tRNA M tRNA 12S rRNA RNase P a j jp ju k 5.19 6.50 12.52 3.70 3.24 2.57 6.04 11.98 3.90 2.98 4.21 15.34 22.36 4.44 3.79 3.06 9.48 18.78 6.69 4.86 9.01 3.60 2.93 2.82 2.02 1.71 3.30 9.65 2.83 1.30 546.05a 73.46a 177.32a 35.05a 665.64a 1281.71a 204.73a 122.24a 131.93a 59.21a LRT value significant at P , 0.0001
  • 6. Relatively little data is published within articles Published tables figures Analysed data Raw data
  • 7. Reuse of open data boosts citations to the original article Piwowar  and  Vision  (2013)     doi:10.7717/peerj.175  
  • 8. Volume Most analyzed data is in the ‘long tail’, for which there is no specialized repository Structured data (e.g. Genbank, GBIF) Long-tail data Rank frequency of datatype After Heidorn (2008) doi:10.1353/lib.0.0036
  • 9. Peer-to-peer data sharing does not work Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals. “6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied Wicherts JM, Borsboom D, Kats J, Molenaar D (2006) doi:10.1037/0003-066X.61.7.726
  • 10. Data is best captured at the time of publication Time  of  publica(on   Specific  details   Informa(on  Content   General  details   Re(rement  or     career  change   Accident   Death   Time   (Michener  et  al.  1997)  
  • 11. CC-­‐BY  Adamo   h/p://www.piqs.de/fotos/121272.html   Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
  • 12. Joint Data Archiving Policy ( JDAP ) Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive. Authors may elect to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information. http://datadryad.org/pages/jdap
  • 13. High impact factor journals have stronger data archiving policies IF=6.0 n=70 IF=3.6 IF=4.5 Piwowar HA, Chapman WW (2008) hdl:10101/npre.2008.1700.1
  • 14.
  • 15. author prepare manuscript and related data files JOURNAL submit manuscript manuscript review DRYAD upload data editor accepted? no accepted? send article description Dryad data package send data identifier (DOI) yes curation data curator published article (with data citation) published data (with article citation)
  • 16. When using this data, please cite the original article: Chave J, Coomes D, Jansen S, Lewis SL, Swenson NG, Zanne AE (2009) Towards a worldwide wood economics spectrum. Ecology Letters 12: 351-366. doi:10.1111/j. 1461-0248.2009.01285.x Additionally, please cite the Dryad data package: Zanne AE, Lopez-Gonzalez G, Coomes DA, Ilic J, Jansen S, Lewis SL, Miller RB, Swenson NG, Wiemann MC, Chave J (2009) Data from: Towards a worldwide wood economics spectrum. Dryad Digital Repository. doi:10.5061/dryad.234
  • 17.
  • 18. No fees for submission from low and lower middle income countries
  • 19. Dryad by the numbers Data packages 4,172 Authors 15,581 Data files 11,912 Integrated journals 37 All journals 268 File downloads 4,629,256 Stats  as  of  23  Oct  2013  
  • 20.
  • 21. To learn more •  •  •  •  •  Repository home: http://datadryad.org News: http://blog.datadryad.org Project documentation: http://wiki.datadryad.org Twitter: @datadryad Code: http://code.google.com/p/dryad or contact us: •  http://datadryad.org/feedback •  Todd Vision, Director, tjv@bio.unc.edu •  Laura Wendell, Dryad Executive Director, lwendell@datadryad.org