1. How libraries can help
rescue data
9 February 2017
Download
pdf for
reading!
2. Research
Quality
Data
using workflow protocols
to ensure quality & trust
Levels of library commitment
We have Archive-It & can
do deep web archiving &
document “uncrawlables”
We know of specific data
our community needs;
we will rescue them.
2
3
We can harvest “uncrawlables”
for an agency through a data
rescue event or dedicated team
1We can survey our researchers
to nominate data sets & raise
awareness in our community
4
3. Research
Quality
Data
using workflow protocols
to ensure quality & trust
Levels of library commitment
We have Archive-It & can
do deep web archiving &
document “uncrawlables”
We know of specific data
our community needs;
we will rescue them.
2
3
We can harvest “uncrawlables”
of a program through a data
rescue event or dedicated team
1We can survey our researchers
to nominate data sets & raise
awareness in our community
4
4. Survey researchers & raise awareness
Nomination Form
to collect data they use & need
Ask researchers & use
1 Research
Quality
Data
5. Survey researchers & raise awareness
Raise awareness by:
attending & writing
about a Data Refuge
event
highlighting ways
your repository
can preserve data
Nomination Form
to collect data they use & need
Ask researchers & use
1 hold workshops &
panels on data
storytelling
Research
Quality
Data
6. Research
Quality
Data
using workflow protocols
to ensure quality & trust
Levels of library commitment
We have Archive-It & can
do deep web archiving &
document “uncrawlables”
We know of specific data
our community needs;
we will rescue them.
2
3
We can harvest “uncrawlables”
of a program through a data
rescue event or dedicated team
1We can survey our researchers
to nominate data sets & raise
awareness in our community
4
7. Web archive & document uncrawlables
2
to see what agencies
can be claimed
US Agency
Coordination
Spreadsheet
Use the
Research
Quality
Data
8. Web archive & document uncrawlables
2
to see what agencies
can be claimed
US Agency
Coordination
Spreadsheet
Use the
Submit a request to
claim an agency
& use your Archive-It
account to go as
deeply as possible
Research
Quality
Data
9. Web archive & document uncrawlables
2
to see what agencies
can be claimed
US Agency
Coordination
Spreadsheet
Use the
Submit a request to
claim an agency
& use your Archive-It
account to go as
deeply as possible
Research
Quality
Data
Document
“uncrawlables”
as you run into
archiving errors
10. Web archive & document uncrawlables
2
to see what agencies
can be claimed
US Agency
Coordination
Spreadsheet
Use the
Submit a request to
claim an agency
& use your Archive-It
account to go as
deeply as possible
Research
Quality
Data
Document
“uncrawlables”
as you run into
archiving errors
We need a Chrome plug-in
to launch an uncrawlables
form to make this easy &
consistent across all libs!
(similar to this one)
HELP!
11. Research
Quality
Data
using workflow protocols
to ensure quality & trust
Levels of library commitment
We have Archive-It & can
do deep web archiving &
document “uncrawlables”
We know of specific data
our community needs;
we will rescue them.
2
3
We can harvest “uncrawlables”
of a program through a data
rescue event or dedicated team
1We can survey our researchers
to nominate data sets & raise
awareness in our community
4
12. Rescue community-specific data
If you already know of high value, high priority data
sets for your community, get them!
Research
Quality
Data
3
13. Rescue community-specific data
Workflow Protocols
If you already know that there are data sets of high
value & high priority for your community, get them!
Use your library’s
preservation
github.com/datarefuge/workflow
Research
Quality
Data
3
in the spirit of
14. Rescue community-specific data
It’s only DataRefuge
if it follows QA processes
for a trusted chain of custody
If you already know that there are data sets of high
value & high priority for your community, get them!
Research
Quality
Data
3
Workflow Protocols
Use your library’s
preservation
github.com/datarefuge/workflow
in the spirit of
15. Rescue community-specific data
We are working on a
ckan Registry to link to
downloaded files in
your repository
It’s only DataRefuge
if it follows QA processes
for a trusted chain of custody
If you already know that there are data sets of high
value & high priority for your community, get them!
Research
Quality
Data
3
Workflow Protocols
Use your library’s
preservation
github.com/datarefuge/workflow
in the spirit of
16. 3
Rescue community-specific data
It’s only DataRefuge
if it follows QA processes
for a trusted chain of custody
If you already know that there are data sets of high
value & high priority for your community, get them!
Research
Quality
Data
Let us know when you’re done
by claiming it on the US Agency
spreadsheet!
Workflow Protocols
Use your library’s
preservation
github.com/datarefuge/workflow
in the spirit of
We are working on a
ckan Registry to link to
downloaded files in
your repository
17. Research
Quality
Data
using workflow protocols
to ensure quality & trust
Levels of library commitment
We have Archive-It & can
do deep web archiving &
document “uncrawlables”
We know of specific data
our community needs;
we will rescue them.
2
3
We can harvest “uncrawlables”
of a program through a data
rescue event or dedicated team
1We can survey our researchers
to nominate data sets & raise
awareness in our community
4
18. Harvest all uncrawlables
This is the highest level of commitment. You are
dedicated to harvesting all of the data you can find.
Research
Quality
Data
4
19. Harvest all uncrawlables
Claim an Agency
using the spreadsheet
This is the highest level of commitment. You are
dedicated to harvesting all of the data you can find.
Research
Quality
Data
4
20. Harvest all uncrawlables
Host a Data
Rescue Event
or designate an
internal library team
This is the highest level of commitment. You are
dedicated to harvesting all of the data you can find.
Research
Quality
Data
Claim an Agency
using the spreadsheet
4
21. Harvest all uncrawlables
This is the highest level of commitment. You are
dedicated to harvesting all of the data you can find.
Research
Quality
Data
Claim an Agency
using the spreadsheet
4
Host a Data
Rescue Event
or designate an
internal library team
Use established library
preservation workflow
protocols to maintain
trusted chain of custody
22. Harvest all uncrawlables
Use established library
preservation workflow
protocols to maintain
trusted chain of custody
This is the highest level of commitment. You are
dedicated to harvesting all of the data you can find.
Research
Quality
Data
Claim an Agency
using the spreadsheet
4
We are working on a
ckan Registry to link to
downloaded files in
your repository
Host a Data
Rescue Event
or designate an
internal library team
23. Repeat the cycle as needed
Claim an
agency
Document
uncrawlables
Harvest
uncrawlables
Verify/QA
uncrawlables
Register
uncrawlables
Update
spreadsheet
24. Research
Quality
Data
using workflow protocols
to ensure quality & trust
Levels of library commitment
We have Archive-It & can
do deep web archiving &
document “uncrawlables”
We know of specific data
our community needs;
we will rescue them.
2
3
We can harvest “uncrawlables”
for an agency through a data
rescue event or dedicated team
1We can survey our researchers
to nominate data sets & raise
awareness in our community
4
25. Research
Quality
Data
using workflow protocols
to ensure quality & trust
URLS
2
3
1
4
Nominate data sets
docs.google.com/forms/d/e/
1FAIpQLSd8JeRxvyVrBASrc4D42Z6nz8yIsQuu_c
GVRtO5uWO9yjlBFw/viewform?c=0&w=1
US Agency spreadsheet
Claim an Agency form
docs.google.com/spreadsheets/d/
1yIrhFrZkv2Yhdk48W_P5bd-C-jxLtbzJFcm2E5oq-
ec/edit
docs.google.com/forms/d/e/
1FAIpQLScE7iuLIEbEd0hkkP9_zB5skXwKqL8
EeW9hVlB4JSIkvCvm6Q/viewform?c=0&w=1
Workflow protocols example
github.com/datarefuge/workflow
Chain of custody
github.com/datarefuge/chain-of-custody
US Agency spreadsheet
Claim an Agency form
docs.google.com/spreadsheets/d/
1yIrhFrZkv2Yhdk48W_P5bd-C-jxLtbzJFcm2E5oq-
ec/edit
docs.google.com/forms/d/e/
1FAIpQLScE7iuLIEbEd0hkkP9_zB5skXwKqL8
EeW9hVlB4JSIkvCvm6Q/viewform?c=0&w=1
Workflow protocols example
github.com/datarefuge/workflow
Chain of custody
github.com/datarefuge/chain-of-custody