Is Data Publication the Right Metaphor?
Mark A. Parsons and Peter Fox
Rensselaer Polytechnic Institute
!
!
Research Data Publication in Principle and Practice
International Data Curation Conference Workshop
San Francisco, California
24 February 2014

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
A Community Conversation

	 	 okay, I'll say it. The *term* data 'publication' bothers me more and more.
Am leaning toward data release and *maybe* review, #CODATA2010
	 @taswegian
Most people think they can
get along perfectly well
without metaphor.
!

We have found, on the
contrary, that metaphor is
pervasive in everyday life,
not just in language but in
thought and action. Our
ordinary conceptual
system, in terms of which
we both think and act, is
fundamentally metaphorical
in nature.
Language is at once a surface phenomenon and a
source of power. It is a means of expressing,
communicating, accessing, and even shaping
thought. […]
Language gets its power because it is defined
relative to frames, prototypes, metaphor, narratives,
images and emotions. Part of its power comes from
unconscious aspects: we are not consciously aware
of all that it evokes in us, but it is there, hidden,
always at work. If we hear the same language over
and over, we will think more and more in terms of the
frames and metaphors activated by that language.
	 —George Lakoff
Data Publication

Moving from the library to the
internet. © photogj –fotolia.com
Big Iron

Image courtesy of SITEC
Map Making

“The Imperial Cartographer”
Science Support
Linked Data
Some attributes of the ideal system
•
•
•
•
•
•

*critical

Trust (of data, system, and people)*
Discoverable data*
Preserved data*
Data are accessible to humans and machines*
Usable, incl. some level of understandability*
Distributed governance*

• Verifiable
• Citable data
• Simple (in concept)
• Scalable/evolvable
• Ethically open data
• Appropriately transparent (translucent)
• Data are contextually associated
• Handles distributed security, authentication, and legality.
• Defined roles

Difficult
Easy

Easy

Easy

Difficult
Difficult
Difficult
Easy

Difficult
Difficult
Difficult
Difficult
Difficult
Difficult
Difficult
How the current models perform
*critical

Data Pub.

Big Iron

Sci. support

Maps

Linked

•

Trust

good

moderate

good

moderate

poor

•

Discovery

poor

moderate

poor

moderate

good

•

Preservation

good

poor

variable

poor

poor

•

Access

moderate

moderate

moderate

good

good

•

Usable

moderate

moderate

good

moderate

moderate

•

Governance

poor

good

poor

moderate

poor

•

Credit/
Accountability

good

moderate

variable

poor

variable
Three perceptual frames of concern in
Data Publication
• Peer review
• data review ≠ literature review
• quality is in the eye of the beholder—“Facts all come with points of view. Facts
don’t do what I want them to.” (Talking Heads)
• We can’t keep up with the literature now.
• Data citation
• Does a DOI imply a imprimatur? Why and what kind? What about other
identifiers?
• When do we need a citation vs. a simple pointer? When does credit play a role?
• Copyright and intellectual “property”
• Data are used, referenced, discovered well outside the scholarly article.
• A copyright article should not be a primary path to data
Do we need new and alternative metaphors?
Infrastructure?

Relationships not just physical
structures.
Ecosystem?

Data systems that grow, evolve,
and thrive on diversity.
Exploration	

Ref: M. Serres—Crossing the
Northwest Passage between
culture and science
Grand Bazaar?	

Spatial metaphors abound
photo by Frank Kovalchek (CC-BY)
Software Production?

“We should be treating data as an
ongoing process” (Schopf, 2012).
Contracts?

New forms; new agreements;
new parties
Disaggregating the functions
• A new paradigm of Archive, Release, Mediate, ... that disaggregates the
functions? Hence multiple metaphors.
• Formal, sustained archiving (like a museum or archive)
• Rapid, carefully versioned and described releases (like software)
• Simple, Weak (least power), Scalable, Open?
• Active mediation between producers and users (like specialist shop keepers
filling niches)
!

• More metaphors, please.
A research agenda based on:	
	 Data Science in Action
• How do roles and relations change with different metaphors and world views?
• What are the new norms and contractual relations?
• What is the spectrum or space of referencing, citing, and relating? How much
does credit really matter? When?
• What approaches can bridge the domain, data, and computer science
disciplines into cohesive collaborations when needed? Is there a maturity model?
• What approaches in the data life cycle need to scale? How?
• How can research collections be discovered beyond the context of the scholarly
article?
• How do we track context instead of quality?
• What does it mean “Data as a first class object”? Is it really necessary?
Thank You	

parsom3@rpi.edu

Is data publication the right metaphor?

  • 1.
    Is Data Publicationthe Right Metaphor? Mark A. Parsons and Peter Fox Rensselaer Polytechnic Institute ! ! Research Data Publication in Principle and Practice International Data Curation Conference Workshop San Francisco, California 24 February 2014 Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
  • 2.
    A Community Conversation okay, I'll say it. The *term* data 'publication' bothers me more and more. Am leaning toward data release and *maybe* review, #CODATA2010 @taswegian
  • 4.
    Most people thinkthey can get along perfectly well without metaphor. ! We have found, on the contrary, that metaphor is pervasive in everyday life, not just in language but in thought and action. Our ordinary conceptual system, in terms of which we both think and act, is fundamentally metaphorical in nature.
  • 5.
    Language is atonce a surface phenomenon and a source of power. It is a means of expressing, communicating, accessing, and even shaping thought. […] Language gets its power because it is defined relative to frames, prototypes, metaphor, narratives, images and emotions. Part of its power comes from unconscious aspects: we are not consciously aware of all that it evokes in us, but it is there, hidden, always at work. If we hear the same language over and over, we will think more and more in terms of the frames and metaphors activated by that language. —George Lakoff
  • 6.
    Data Publication Moving fromthe library to the internet. © photogj –fotolia.com
  • 7.
  • 9.
  • 10.
  • 11.
  • 12.
    Some attributes ofthe ideal system • • • • • • *critical Trust (of data, system, and people)* Discoverable data* Preserved data* Data are accessible to humans and machines* Usable, incl. some level of understandability* Distributed governance* • Verifiable • Citable data • Simple (in concept) • Scalable/evolvable • Ethically open data • Appropriately transparent (translucent) • Data are contextually associated • Handles distributed security, authentication, and legality. • Defined roles Difficult Easy Easy Easy Difficult Difficult Difficult Easy Difficult Difficult Difficult Difficult Difficult Difficult Difficult
  • 13.
    How the currentmodels perform *critical Data Pub. Big Iron Sci. support Maps Linked • Trust good moderate good moderate poor • Discovery poor moderate poor moderate good • Preservation good poor variable poor poor • Access moderate moderate moderate good good • Usable moderate moderate good moderate moderate • Governance poor good poor moderate poor • Credit/ Accountability good moderate variable poor variable
  • 14.
    Three perceptual framesof concern in Data Publication • Peer review • data review ≠ literature review • quality is in the eye of the beholder—“Facts all come with points of view. Facts don’t do what I want them to.” (Talking Heads) • We can’t keep up with the literature now. • Data citation • Does a DOI imply a imprimatur? Why and what kind? What about other identifiers? • When do we need a citation vs. a simple pointer? When does credit play a role? • Copyright and intellectual “property” • Data are used, referenced, discovered well outside the scholarly article. • A copyright article should not be a primary path to data
  • 15.
    Do we neednew and alternative metaphors?
  • 16.
  • 17.
    Ecosystem? Data systems thatgrow, evolve, and thrive on diversity.
  • 18.
    Exploration Ref: M. Serres—Crossingthe Northwest Passage between culture and science
  • 19.
    Grand Bazaar? Spatial metaphorsabound photo by Frank Kovalchek (CC-BY)
  • 20.
    Software Production? “We shouldbe treating data as an ongoing process” (Schopf, 2012).
  • 21.
    Contracts? New forms; newagreements; new parties
  • 22.
    Disaggregating the functions •A new paradigm of Archive, Release, Mediate, ... that disaggregates the functions? Hence multiple metaphors. • Formal, sustained archiving (like a museum or archive) • Rapid, carefully versioned and described releases (like software) • Simple, Weak (least power), Scalable, Open? • Active mediation between producers and users (like specialist shop keepers filling niches) ! • More metaphors, please.
  • 23.
    A research agendabased on: Data Science in Action • How do roles and relations change with different metaphors and world views? • What are the new norms and contractual relations? • What is the spectrum or space of referencing, citing, and relating? How much does credit really matter? When? • What approaches can bridge the domain, data, and computer science disciplines into cohesive collaborations when needed? Is there a maturity model? • What approaches in the data life cycle need to scale? How? • How can research collections be discovered beyond the context of the scholarly article? • How do we track context instead of quality? • What does it mean “Data as a first class object”? Is it really necessary?
  • 24.