2. Because good research needs good data
“Data is the new oil.”
Andreas Weigend, Stanford (ex Amazon)
“The future belongs to
companies and people that turn
data into products”
Mike Loukides, O’Reilly Media
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 2
3. Because good research needs good data
Overview
• Why should we care ?
• Things you could do
• How you might get there
• Things to avoid
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 3
4. Because good research needs good data
“Information… has become a saleable
commodity like never before”
Yet – 33% don’t know Earth orbits the
Sun (GB, 1999)
Brian Aldiss – “The Secret of This Book (1995)
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 4
5. Because good research needs good data
What is data curation ?
• “Maintaining, preserving and adding value to
research data throughout its lifecycle”
• More than preservation:
• Active management – dealing with change
• Less than preservation:
• Lifecycle sometimes involves destruction
• Sometimes, not always, about sharing,
publication or citation
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 5
6. Because good research needs good data
Why care?
• Data is expensive – an investment
• Reuse:
• More research
• Teaching & Learning
• Planning
• Impact – with or without publication
• Accountability
• Legal & regulatory requirements
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 6
7. Because good research needs good data
Without good RDM – BAD THINGS HAPPEN
With good RDM – GOOD STUFF HAPPENS
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 7
8. Because good research needs good data
EPSRC expects all those institutions it funds
•to develop a roadmap that aligns … with
EPSRC’s expectations by 1st May 2012;
•to be fully compliant … by 1st May 2015.
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 8
9. Because good research needs good data
• Awareness of regulatory environment
• Data access statement
• Policies and processes
• Data storage
• Structured metadata descriptions
• DOIs for data
• Securely preserved for a minimum of 10
years from last use
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 9
10. Because good research needs good data
“Data is the new oil.”
Andreas Weigend, Stanford (ex Amazon)
Data is more like soup –
its messy and you don’t
know what’s in it….
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 10
11. Because good research needs good data
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 11
13. Because good research needs good data
(e)-Research Life Cycle view of Data Curation?
Formulate hypothesis / ideas, test,
(New) knowledge Data processing experiment, observe: data creation,
extraction: data
collection & capture
mining, modelling,
analysis, synthesis Data processing
Data processing
Data management
e-Infrastructure storage & validation:
Adding value: Data
description, deposit,
linking, annotation, Open access
self-archiving,
visualisation, simulation
Collaboration preservation,
certification
Data processing
Data processing
Scholarly communications: data disclosure,
publication, citation, discovery, re-use
This work is licensed under a Creative Commons License
Attribution-ShareAlike 2.0 •Liz Lyon December 2005
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 13
14. Because good research needs good data
Chris Rusbridge, DCC
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 14
15. Because good research needs good data
OAIS
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 15
16. Because good research needs good data
MoReq2
Model Requirements for
Electronic Records
Management 2
• Records Management
Discipline
• No mention of DATA
• Simple to explain
• Easily used to organise
and present resources
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 16
17. Because good research needs good data
E-Science Curation Report - 2003
• E-science
discipline
• Appropriate
for current
focus
• Takes
integrated
look at higher
education
data curation
problems
• Granularity on
curation
activities?
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 17
18. Because good research needs good data
InterPARES - 2001
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 18
19. Because good research needs good data
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 19
20. Because good research needs good data
RLUK/Mary Auckland:
Reskilling for Research
9 areas are skill gaps for
subject librarians
Sheila Corrall: Libraries,
Librarians and Data
Many action exemplars
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 20
21. Because good research needs good data
Some library roles
• Leadership – coordinate action
• Audit – who has what, where does it go?
• Advice on access – data, wherever it is
• Preservation – permanance
• Citability
• Data/publication linking
• Promoting data in teaching
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 21
22. Understanding Data Requirements Because good research needs good data
http://www.dcc.ac.uk/
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 22
23. Because good research needs good data
Data management plans
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 23
24. What data to keep
Because good research needs good data
How to cite data
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 24
25. Because good research needs good data
Data Licensing
• Bespoke licences
• Standard licences
• Multiple licensing
• Licence mechanisms
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 25
26. Because good research needs good data
Tools to track impact
http://total-impact.org/
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 26
27. Because good research needs good data
Findable, citable data has value
• Important to link publications to data (and vice
versa)
• Increases citations – of data & publication
• Increases reuse (hence value)
• But effects exist even without publication
• All benefit – researcher; institution; publisher
MORAL: build a data registry
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 27
28. Because good research needs good data
How?
• Create policy – collaborate with others
• Develop existing digital services
• Learn about audit tools (DCC & others)
• Learn about data & sources
• Reskill subject librarians
• Learn about your own data
• Bridge between publishers & researchers
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 28
29. Because good research needs good data
4. Audit/Assessment
Benefits:
Prioritisation of resources
Capacity development and planning
Efficiency savings – move data to more cost-
effective storage
Manage risks associated with data loss
Realise value through improved access & re-
use
Scale: Dealing with Data: Rec 4
Departments, institutions
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 29
30. Because good research needs good data
How?
• Create policy – collaborate with others
• Develop existing digital services
• Learn about audit tools (DCC & others)
• Learn about data & sources
• Reskill subject librarians
• Learn about your own data
• Bridge between publishers & researchers
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 30
31. “The role of the Library in data-intensive
Because good research needs good data
research is important and a strategic
repositioning of the Library with
respect to research support is now
appropriate.”
UK
i n enough
“there are…not
5 l”
specialised data librarians yet”
n ly - a 8
n t ne 200
O d e ate Ju
i pd
“Recommendation: The research c clibraryU community in the UK
aand research institutes to define
ilip
“ ? C
should work with universities
?
properly and to formalise the role of data librarians, and to
develop a curriculum that ensures a suitable supply of
librarians skilled in data handling.” Dealing with Data : Rec 34
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 31
32. Because good research needs good data
How?
• Create policy – collaborate with others
• Develop existing digital services
• Learn about audit tools (DCC & others)
• Learn about data & sources
• Reskill subject librarians
• Learn about your own data
• Help promote data literacy
• Bridge between publishers & researchers
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 32
33. Because good research needs good data
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 33
34. Because good research needs good data
Observations
• Role for national & institutional differs
• BUILD on existing subject data centers
• Datasets aren’t publications
• Indistinct boundaries
• Continual change
• Multi-dimensional
• Non-linear
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 34
35. Because good research needs good data
“Institutions will try to preserve the
problem(s) to which they are the
solution”
Clay Shirky
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 35
36. Because good research needs good data
Original Source Data
Data Object A
Data Object B Data Object D
Data Object C
Publication A Publication B Publication C Publication D
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 36
37. Because good research needs good data
On Citing Data
• Peter Buneman. How to cite curated
databases and how to make them citable. In
Proceedings of the 18th Conference on
Scientific and Statistical Database
Management, pages 195-203, July 2006
[or http://homepages.inf.ed.ac.uk/opb/papers/ssdbm2006.pdf ]
• Some serious computer science – some for a
very general audience
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 37
38. Because good research needs good data
Summary
• Data not just adjunct to publication
• Data is often living – treat it as such (and be
ready to kill it)
• There’s more to the world than scholarly
research
• Hidden data is wasted data
• Bad things happen without RDM
• Great benefits accrue with it
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 38
39. Because good research needs good data
Questions
• How does data management align with
institutional mission?
• When is library a coordinator, and when is it a
service provider?
• What will you do alone, and what will you
coordinate with others?
• What skills must you acquire?
• What do you want from DCC?
2012-03-26 Kevin Ashley, DCC, UKSG Glasgow. CC-BY 39
Editor's Notes
\\not just about opennness – think of seismic, drug industry. Protected data, but needs to be reused in other parts of company, or many years after creation when originators have gone. Need to know what you have & how to use it.