Is Data Publication the Right Metaphor?
Mark A. Parsons and Peter Fox
Rensselaer Polytechnic Institute
Research Data Publication in Principle and Practice
International Data Curation Conference Workshop
San Francisco, California
24 February 2014
Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
A Community Conversation
okay, I'll say it. The *term* data 'publication' bothers me more and more.
Am leaning toward data release and *maybe* review, #CODATA2010
Most people think they can
get along perfectly well
We have found, on the
contrary, that metaphor is
pervasive in everyday life,
not just in language but in
thought and action. Our
system, in terms of which
we both think and act, is
Language is at once a surface phenomenon and a
source of power. It is a means of expressing,
communicating, accessing, and even shaping
Language gets its power because it is deﬁned
relative to frames, prototypes, metaphor, narratives,
images and emotions. Part of its power comes from
unconscious aspects: we are not consciously aware
of all that it evokes in us, but it is there, hidden,
always at work. If we hear the same language over
and over, we will think more and more in terms of the
frames and metaphors activated by that language.
Some attributes of the ideal system
Trust (of data, system, and people)*
Data are accessible to humans and machines*
Usable, incl. some level of understandability*
• Citable data
• Simple (in concept)
• Ethically open data
• Appropriately transparent (translucent)
• Data are contextually associated
• Handles distributed security, authentication, and legality.
• Deﬁned roles
How the current models perform
Three perceptual frames of concern in
• Peer review
• data review ≠ literature review
• quality is in the eye of the beholder—“Facts all come with points of view. Facts
don’t do what I want them to.” (Talking Heads)
• We can’t keep up with the literature now.
• Data citation
• Does a DOI imply a imprimatur? Why and what kind? What about other
• When do we need a citation vs. a simple pointer? When does credit play a role?
• Copyright and intellectual “property”
• Data are used, referenced, discovered well outside the scholarly article.
• A copyright article should not be a primary path to data
Relationships not just physical
Data systems that grow, evolve,
and thrive on diversity.
Ref: M. Serres—Crossing the
Northwest Passage between
culture and science
Spatial metaphors abound
photo by Frank Kovalchek (CC-BY)
“We should be treating data as an
ongoing process” (Schopf, 2012).
New forms; new agreements;
Disaggregating the functions
• A new paradigm of Archive, Release, Mediate, ... that disaggregates the
functions? Hence multiple metaphors.
• Formal, sustained archiving (like a museum or archive)
• Rapid, carefully versioned and described releases (like software)
• Simple, Weak (least power), Scalable, Open?
• Active mediation between producers and users (like specialist shop keepers
• More metaphors, please.
A research agenda based on:
Data Science in Action
• How do roles and relations change with different metaphors and world views?
• What are the new norms and contractual relations?
• What is the spectrum or space of referencing, citing, and relating? How much
does credit really matter? When?
• What approaches can bridge the domain, data, and computer science
disciplines into cohesive collaborations when needed? Is there a maturity model?
• What approaches in the data life cycle need to scale? How?
• How can research collections be discovered beyond the context of the scholarly
• How do we track context instead of quality?
• What does it mean “Data as a ﬁrst class object”? Is it really necessary?