Peter Fox (RPI) @taswegian
NFDP 2013
May 22, 2013, Oxford, UK
The Now and Now for Data: Metaphors for
Making Data Publical...
Am not going to …
http://mp-datamatters.blogspot.com/ Is Data Publication the Right Metaphor? http://dx.doi.org/10.2481/ds...
Just to get us going…
The latest (U.S. example)
International Council for Science – Strategic Coordinating
Committee on Information and Data - recommendation
http://eloqu...
ICSU SCCID recommendation
•Engage actively
– publishers of all kinds together
– library community
– scientific researchers...
Goal?
• Data as a first class object
• As a subject of
conversation (v. discourse)
• Metaphors to achieve this
abound and ...
It seems we are not quite there yet
• We*
are having
conversations (like the
one today) about data+x
(x=citation, publicat...
What if we had a conversation about this data?
20080602 Fox VSTO et al.
11
Metaphor!
12
Data Information Knowledge
Producers Consumers
Context
Presentation
Organization
Integration
Conversation
Cre...
Data perspective under some metaphors
13
Producers Consumers
Quality Control
Fitness for Purpose Fitness for Use
Quality A...
For others: Is this separation good or not?
14
Producers Consumers
Quality Control
Fitness for Purpose Fitness for Use
Qua...
Technical advances
From: C. Borgman, 2008, NSF Cyberlearning Report
Global Change Information System (GCIS)
16
Vision:
A unified web based source of
authoritative, accessible, usable, and
ti...
Prototype Use Case
Name Discover and visit data center website of dataset used to generate report figure.
Goal The NCA Rep...
Assessment links to information
18
Non-specialist Use Case
Name Find Latest Datasets by Keyword
Goal Search for datasets associated with the keyword “snow”, ...
Parsons
& Fox
Setting of the roles and relations
• Yes it is about contracts… of all sorts…
– An agency example, they are exploring a
nu...
An un-named US govt. agency
Data Review!
From my Research Data Alliance talk; #5
• Please all SNAP your fingers (1, 2, 3,
NOW)
• <snap> the culture around data has...
Call to discussion
• Multiple metaphors, many considerations
• An ecosystem approach allows multiple solutions in a
comple...
Back shed
Pros/Cons - Data Centres (‘big iron’)
• Volume
• Streamlined
• Automation
• Auditable
• Reprocessing capability
• Central ...
Pros/Cons - Publishers
• Simple
• Tested
• Disseminated
• Shifted burden
• Imprimatur
• De-facto preservation
• Citable
• ...
Pros/Cons - Release (software)
• Many stages (alpha, beta,
release candidate, release)
• Versioned
• Documented and change...
Pros/Cons - Linked data
• Scales
• Built on web
• Simple model design
• Tested
• Disseminated
• Machine processable
• No c...
33
.. Data has Lots of Audiences
From “Why EPO?”, a NASA internal
report on science education, 2005
More Strategic
Less St...
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Upcoming SlideShare
Loading in …5
×

Fox-Keynote-Now and Now of Data Publishing-nfdp13

421 views
302 views

Published on

Keynote given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
421
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • http://brianwhitworth.com/STS/WordleCover.png Conversations about data. 1 st class v discourse
  • http://gbeaubouef.files.wordpress.com/2012/01/slide1.jpg http://eksouth.weebly.com/uploads/6/5/2/3/6523461/8933221.jpg Or lecture you, or cheer lead.
  • ICSU should establish a forum for the exploration and eventual agreement in relation to science of all the terms used under the broad umbrella of Open Access .
  • ICSU should engage actively with publishers of all kinds together with the library community and with scientific researchers to document and promote community best practice in the handling of supplemental material, publication of data and appropriate data citation. http://www.einstruction.com/files/default/files/publishers.jpg
  • http://www.aascend.org/wp-content/uploads/2012/03/argument-cartoon2.jpg
  • Frohlich et al.
  • The NCA report reader sees a figure and he/she wants to know where the data came from
  • http://www.instantdisplay.co.uk/metaphors.jpg Table 1. Parsons and Fox, DSJ 2013.
  • http://andreasgal.files.wordpress.com/2011/04/safety.jpg
  • There are lots of different kinds of audiences interested in data. While we are talking about using data in the classroom today, several other audiences of are importance to Virtual observatories. In particular, on the more strategic end are groups that, while smaller, have great impact on the public ’s and the government’s perception of the value of the data and its providers. In this category, I would place both science policy specialists and the media. Policy specialists and decision makers have a tremendous impact on budgets, but also feel, at least at some level, beholden to the tax payers. They want to see the impact that data has on people’s lives. They are also looking for information that will help them made an informed decision. In addition, the media plays a critical role, providing about 85% of the science content to the general public. A third group that is worth considering is the educated general public (the science-attentive public). They take science very seriously and can be a vocal advocate for a scinetific resource -- look at the Hubble scenario as an example.
  • Fox-Keynote-Now and Now of Data Publishing-nfdp13

    1. 1. Peter Fox (RPI) @taswegian NFDP 2013 May 22, 2013, Oxford, UK The Now and Now for Data: Metaphors for Making Data Publically Available
    2. 2. Am not going to … http://mp-datamatters.blogspot.com/ Is Data Publication the Right Metaphor? http://dx.doi.org/10.2481/dsj.WDS-042
    3. 3. Just to get us going…
    4. 4. The latest (U.S. example)
    5. 5. International Council for Science – Strategic Coordinating Committee on Information and Data - recommendation http://eloquentscience.com/wp- content/uploads/2011/04/open_access.jpg http://www.icsu.org/publications/reports-and-reviews/strategic-coordinating-committee-on-information-and-data-report OECD guidelines = data access and sharing policies http://bernews.com/wp-content/uploads/2011/02/oecd-logo.jpg
    6. 6. ICSU SCCID recommendation •Engage actively – publishers of all kinds together – library community – scientific researchers •To – Document and promote community best practice in the handling of supplemental material, publication of data and appropriate data citation. http://www.leebullen.com/Fini shed%20Pics/Scientists.jpg ?
    7. 7. Goal? • Data as a first class object • As a subject of conversation (v. discourse) • Metaphors to achieve this abound and indicate a particular stakeholder perspective (worldview, bias, edict, etc…)
    8. 8. It seems we are not quite there yet • We* are having conversations (like the one today) about data+x (x=citation, publication, integration, integrity, ownership, trust, …) • * = ./ ../ // and / (unixtm )
    9. 9. What if we had a conversation about this data?
    10. 10. 20080602 Fox VSTO et al. 11
    11. 11. Metaphor! 12 Data Information Knowledge Producers Consumers Context Presentation Organization Integration Conversation Creation Gathering Experience • Ecosystem • A framework for talking about data, and …
    12. 12. Data perspective under some metaphors 13 Producers Consumers Quality Control Fitness for Purpose Fitness for Use Quality Assessment Trustee Trustor
    13. 13. For others: Is this separation good or not? 14 Producers Consumers Quality Control Fitness for Purpose Fitness for Use Quality Assessment Trustee Trustor Publisher “Reader” This may be us, or others
    14. 14. Technical advances From: C. Borgman, 2008, NSF Cyberlearning Report
    15. 15. Global Change Information System (GCIS) 16 Vision: A unified web based source of authoritative, accessible, usable, and timely information about climate and global change for use by scientists, decision makers, and the public.
    16. 16. Prototype Use Case Name Discover and visit data center website of dataset used to generate report figure. Goal The NCA Report reader sees a figure and wants to know where the data came from. Summary A reader of the NCA is browsing the content via the website. He/she sees a figure and wants to know where the data came from. A reference to the publication in which the figure originated appears in the figure caption. Selecting the link to the source publication displays a page of information about the publication including, if available, the publication DOI. The page also includes references to the datasets cited in the publication. Following each of dataset reference links presents a page of information about the dataset, including links back to the agency/data center webpage describing the dataset in more detail and making the actual data available for order or download. Actors Primary Actor - reader of the NCA Preconditions Reader is viewing the NCA online report Post Conditions Reader visits the data center dataset website Normal Flow 1) System is presenting the NCA report to the reader in a web site. Presentation includes report figure with caption that includes reference to source publication. 2) Reader selects publication reference in figure caption 3) System displays information about publication, including DOI (if available). 4) Publication information includes publication dataset citations. 5) Reader selects a dataset cited by the publication. 6) System displays information about dataset including links to agency / data center webpages where more information and (potentially) data download links are available. 7) Reader selects the data center link and is redirected to data center dataset webpage. Discover and visit data center website of dataset used to generate report figure.
    17. 17. Assessment links to information 18
    18. 18. Non-specialist Use Case Name Find Latest Datasets by Keyword Goal Search for datasets associated with the keyword “snow”, list search results by recentness of publication. Summary User story: I want to look for information concerning “snow.” I don’t know if it is a CLEAN word or a GCMD word or don’t even know what GCMD or CLEAN is. How would I do it, and what would I see on my monitor during the process? Assumptions The reader is not assumed to have knowledge regarding the GCMD Keywords (or other) vocabulary. Actors Primary Actor - reader of the NCA Preconditions TBD Post Conditions Reader is presented with a list of datasets associated with the keyword “snow” sorted by dataset publication date. Normal Flow TBD Notes We are looking into two user interface options for dataset selection by keyword 1)As a free-text search where the user inputs “snow”. 2)Present the user a faceted browse interface with a vocabulary faceted which presents the user with terms from a structured vocabulary. The user can manually select the term(s) which match or contain “snow”. We intend to implement prototypes of both. Search for datasets with the keyword “snow”, ….
    19. 19. Parsons & Fox
    20. 20. Setting of the roles and relations • Yes it is about contracts… of all sorts… – An agency example, they are exploring a number of metaphors
    21. 21. An un-named US govt. agency
    22. 22. Data Review!
    23. 23. From my Research Data Alliance talk; #5 • Please all SNAP your fingers (1, 2, 3, NOW) • <snap> the culture around data has to change, as well as how we think about paradigms (metaphors)
    24. 24. Call to discussion • Multiple metaphors, many considerations • An ecosystem approach allows multiple solutions in a complex socio-technical system – transactions among providers and consumers – Significant opportunities for under-served data generators to get their data ‘out there’ perhaps publication (still a metaphor!) • Data Review !== Peer Review and more role disconnects • <discuss> • Please read our Data Science Journal essay and respond! • Thanks for your attention - pfox@cs.rpi.edu , http://tw.rpi.edu
    25. 25. Back shed
    26. 26. Pros/Cons - Data Centres (‘big iron’) • Volume • Streamlined • Automation • Auditable • Reprocessing capability • Central authority • Funded • Over-reliance on automation • Weak documentation • Use is assumed • Roles ill-defined, reputation? • Does not handle heterogeneity • Preservation ? • Overly focused on generation • …
    27. 27. Pros/Cons - Publishers • Simple • Tested • Disseminated • Shifted burden • Imprimatur • De-facto preservation • Citable • Based on science norms • Locked • Static/ • Not machine accessible • Cost? • Not scalable • Cannot verify use
    28. 28. Pros/Cons - Release (software) • Many stages (alpha, beta, release candidate, release) • Versioned • Documented and change notified • Intends to couple user feedback to developers • Packaged • Licensing well thought out • … • Provenance implicit • Preservation poorly dealt with • Quality may be difficult to determine • Attribution not part of the mind- set • Derivative or embedded use not always well defined • …
    29. 29. Pros/Cons - Linked data • Scales • Built on web • Simple model design • Tested • Disseminated • Machine processable • No central authority • Heterogeneous • Use not assumed • Flexible evolution • Supports encapsulation • Poor versioning • Poor auditing • No imprimatur • No preservation/ stewardship • Not human friendly • Heterogeneous vocab. • Changes data model • Unknown evolution • …
    30. 30. 33 .. Data has Lots of Audiences From “Why EPO?”, a NASA internal report on science education, 2005 More Strategic Less Strategic Science too!

    ×