The document discusses data curation services provided by the California Digital Library (CDL). It describes CDL's Merritt repository for stable storage, EZID for assigning persistent identifiers, DMPTool for creating data management plans, and tools for data discovery, citation, and preservation cost modeling. CDL supports the full data lifecycle from deposition to long-term curation and access. The document outlines how CDL's services have expanded over time to meet the growing needs of data producers and a changing technological landscape.
Just Digitise It by Daniel Wilksch of the Public Records Office Victoria. Presented at the 2017 Community Heritage Grants (CHG) Preservation and Collection Management Training Workshops
Digital Preservation Best Practices: Lessons Learned From Across the PondBenoit Pauwels
Digital Preservation Best Practices: Lessons Learned From Across the Pond. Slavko Manojlovich (Associate University Librarian (IT) / Manager, Digital Archives Initiative Memorial University St Johns Canada) and Benoit Pauwels (Head, Library Automation Team, Université libre de Bruxelles Belgium)
A North Carolina Connecting to Collections (C2C) workshop co-taught by Audra Eagle Yun (WFU), Nicholas Graham (UNC), and Lisa Gregory (State Archives of NC). This workshop took place on June 13, 2011 in Wilson, NC.
Just Digitise It by Daniel Wilksch of the Public Records Office Victoria. Presented at the 2018 Community Heritage Grants (CHG) Preservation and Collection Management Training Workshops
This presentation will provide an overview of issues in digital preservation. Presentation was delivered during the joint DPE/Planets/CAPAR/nestor training event, ‘The Preservation challenge: basic concepts and practical applications’ (Barcelona, March 2009)
Just Digitise It by Daniel Wilksch of the Public Records Office Victoria. Presented at the 2017 Community Heritage Grants (CHG) Preservation and Collection Management Training Workshops
Digital Preservation Best Practices: Lessons Learned From Across the PondBenoit Pauwels
Digital Preservation Best Practices: Lessons Learned From Across the Pond. Slavko Manojlovich (Associate University Librarian (IT) / Manager, Digital Archives Initiative Memorial University St Johns Canada) and Benoit Pauwels (Head, Library Automation Team, Université libre de Bruxelles Belgium)
A North Carolina Connecting to Collections (C2C) workshop co-taught by Audra Eagle Yun (WFU), Nicholas Graham (UNC), and Lisa Gregory (State Archives of NC). This workshop took place on June 13, 2011 in Wilson, NC.
Just Digitise It by Daniel Wilksch of the Public Records Office Victoria. Presented at the 2018 Community Heritage Grants (CHG) Preservation and Collection Management Training Workshops
This presentation will provide an overview of issues in digital preservation. Presentation was delivered during the joint DPE/Planets/CAPAR/nestor training event, ‘The Preservation challenge: basic concepts and practical applications’ (Barcelona, March 2009)
Invited talk given to the National Acquisitions Group conference, 5 September 2012.
Focusing on the reasons for building the Digital Library, making the case, and the social/organisational and technological aspects of digital preservation. Not covered are aspects such as collection development, audience engagement, and resource discovery.
A presentation on Digital Preservation by Rupesh Kumar A, Assistant Professor, Department of Studies and Research in Library and Information Science, Tumkur University, Tumakuru, Karnataka, India.
Preparation, Proceed and Review of preservation of Digital Library Asheesh Kamal
My paper focuses on the future information to preserve and use in a user-friendly environment; and also digital preservation methods and strategy, the life cycle of digital media, especially in the digital library.
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the Advanced Information Systems module of the MSc in Library and Library Management, University of the West of England Frenchay Campus, Bristol, October 24th, 2006
Presentation given during the "Ensuring Enduring Access: A Forum on Digital Preservation" sponsored by CARLI (Consortium of Academic and Research Libraries in Illinois) in Champaign, Illinois, USA on July 21, 2009.
The original PowerPoint for this presentation is available at:
http://hdl.handle.net/2142/13147
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the Advanced Information Systems module of the MSc in Library and Library Management, University of the West of England Frenchay Campus, Bristol, February 27, 2008
Natalie Harrower - Digital Preservation: Let's do it together!dri_ireland
Presentation given by Natalie Harrower, Director of DRI, at A Future for Digital Records in Local Authority Services, a seminar held by DRI and Limerick City and County Council, 12 October 2018 at the Royal Irish Academy, Dublin
presented by Stuart Macdonald at the College of Science and Engineering - "What's new for you in the Library“, Murray Library, Kings Buildings, University of Edinburgh. 28 May 2014
Covers research data, research data management, funder policies and the University's RDM policy, RDM services and support, awareness raising, training, progress so far.
Brief Introduction to Digital PreservationMichael Day
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 10, 2010
Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY
An inexpensive way of storing large volumes of data, Hadoop is also scalable and redundant. But getting data out of Hadoop is tough due to a lack of a built-in query language. Also, because users experience high latency (up to several minutes per query), Hadoop is not appropriate for ad hoc query, reporting, and business analysis with traditional tools.
The first step in overcoming Hadoop's constraints is connecting to HIVE, a data warehouse infrastructure built on top of Hadoop, which provides the relational structure necessary for schedule reporting of large datasets data stored in Hadoop files. HIVE also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data.
But to really unlock the power of Hadoop, you must be able to efficiently extract data stored across multiple (often tens or hundreds) of nodes with a user-friendly ETL (extract, transform and load) tool that will then allow you to move your Hadoop data into a relational data mart or warehouse where you can use BI tools for analysis.
An Introduction to digital preservation at the Library of Congresslljohnston
Introduction to digital preservation initiatives at the Library of Congress and the National Digital Information Infrastructure and Preservation Program
Invited talk given to the National Acquisitions Group conference, 5 September 2012.
Focusing on the reasons for building the Digital Library, making the case, and the social/organisational and technological aspects of digital preservation. Not covered are aspects such as collection development, audience engagement, and resource discovery.
A presentation on Digital Preservation by Rupesh Kumar A, Assistant Professor, Department of Studies and Research in Library and Information Science, Tumkur University, Tumakuru, Karnataka, India.
Preparation, Proceed and Review of preservation of Digital Library Asheesh Kamal
My paper focuses on the future information to preserve and use in a user-friendly environment; and also digital preservation methods and strategy, the life cycle of digital media, especially in the digital library.
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the Advanced Information Systems module of the MSc in Library and Library Management, University of the West of England Frenchay Campus, Bristol, October 24th, 2006
Presentation given during the "Ensuring Enduring Access: A Forum on Digital Preservation" sponsored by CARLI (Consortium of Academic and Research Libraries in Illinois) in Champaign, Illinois, USA on July 21, 2009.
The original PowerPoint for this presentation is available at:
http://hdl.handle.net/2142/13147
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the Advanced Information Systems module of the MSc in Library and Library Management, University of the West of England Frenchay Campus, Bristol, February 27, 2008
Natalie Harrower - Digital Preservation: Let's do it together!dri_ireland
Presentation given by Natalie Harrower, Director of DRI, at A Future for Digital Records in Local Authority Services, a seminar held by DRI and Limerick City and County Council, 12 October 2018 at the Royal Irish Academy, Dublin
presented by Stuart Macdonald at the College of Science and Engineering - "What's new for you in the Library“, Murray Library, Kings Buildings, University of Edinburgh. 28 May 2014
Covers research data, research data management, funder policies and the University's RDM policy, RDM services and support, awareness raising, training, progress so far.
Brief Introduction to Digital PreservationMichael Day
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 10, 2010
Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY
An inexpensive way of storing large volumes of data, Hadoop is also scalable and redundant. But getting data out of Hadoop is tough due to a lack of a built-in query language. Also, because users experience high latency (up to several minutes per query), Hadoop is not appropriate for ad hoc query, reporting, and business analysis with traditional tools.
The first step in overcoming Hadoop's constraints is connecting to HIVE, a data warehouse infrastructure built on top of Hadoop, which provides the relational structure necessary for schedule reporting of large datasets data stored in Hadoop files. HIVE also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data.
But to really unlock the power of Hadoop, you must be able to efficiently extract data stored across multiple (often tens or hundreds) of nodes with a user-friendly ETL (extract, transform and load) tool that will then allow you to move your Hadoop data into a relational data mart or warehouse where you can use BI tools for analysis.
An Introduction to digital preservation at the Library of Congresslljohnston
Introduction to digital preservation initiatives at the Library of Congress and the National Digital Information Infrastructure and Preservation Program
RDAP13 John Kunze: The Data Management EcosystemASIS&T
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
Marieke Guy, Institutional Support Officer, Digital Curation Centre, UKOLN, University of Bath, UK presents on Supporting Libraries in Leading the Way in Research Data Management at Online Information, London 20th -21st November 2012
To foster greater and more consistent use of the new 100 Gbps connections that is being deployed in the national RNP backbone, the e-Cyber project aims at delivering high-performing services to the most infrastructure-demanding research centers in Brazil. To do this, the project is getting inspired by the “superfacility” concept, which is adopted by initiatives like GRP (Global Research Platform) and EOSC (European Open Science Cloud). However, one of our biggest challenges is to engage the client institutions and bring them to co-create solutions and participate in the project governance.
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012Lee Dirks
An invited talk to 40+ directors of national libraries worldwide at the annual ExLibris member meeting at IFLA (Helsinki, Finland) on August 15th, 2012.
A brief overview of the development and current workflows for Research Data Management at Imperial College London, presented to colleagues at the University of Copenhagen and Roskilde University in Denmark.
Research Cyberinfrastructure at UCSD - David Minor - RDAP12ASIS&T
Research Cyberinfrastructure at UCSD
David Minor
UC San Diego Libraries San Diego Supercomputer Center
Presentation at Research Data Access & Preservation Summit
22 March 2012
In this presentation from the Dell booth at SC13, Joseph Antony from NCI describes how they are using HPC Virtualization to meet user needs.
Watch the video presentation: http://insidehpc.com/2013/12/05/panel-discussion-thought-hpc-virtualization-never-going-happen/
GBIF and reuse of research data, Bergen (2016-12-14)Dag Endresen
Biodiversity informatics seminar at the Department of Biology, University of Bergen on data publication and reuse of GBIF-mediated biodiversity data on 14th December 2016. Organized by the Norwegian GBIF Node and the Norwegian Biodiversity Information Center (NBIC, Artsdatabanken).
See also: http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
See also: http://doi.org/10.13140/RG.2.2.24290.32969
The traditional process of achieving metadata standards has failed, and I know what I’m talking about because of Dublin Core, BagIt, Z39.50, URLs, and ARKs.
We must think outside the box or we will keep failing. YAMZ (Yet Another Metadata Zoo) is not a standard. Instead it is a dictionary of terms, some fixed and others still evolving, that are meant to be selectively referenced by future standards. Terms are otherwise decoupled from standards that reference them. Each term is a kind of nano-specification with a unique persistent identifier that tracks the term from evolving to mature to deprecated.
YAMZ.net is a tool for taxonomy building. Metadata vocabulary standardization ranks among the most awful design-by-committee experiences, whether at the international standards level or at the working group level. We used a crowdsourced metadata dictionary with reputation-based voting, and in which every term gets a unique persistent identifier. In the second half, are exercises to see how it all works in practice.
Two themes
1. Proposed metadata for “persistence statements”
What you mean by persistence
Informing user linking choices
2. Metadata hardened in open yamz.net dictionary
Crowdsourced, but with reputation-based voting
Every term has a unique persistent identifier (PID)
The scheme of an identifier determines almost nothing about its behavior compared to a resolver that's ready to map it to various services. When resolver infrastructure is shared across schemes instead of siloed, all schemes benefit. With suitable prefixing dozens of well-known, so-called non-actionable schemes can become available from a single unified base URL. The idealized resolver would adopt a fully open infrastructure, and support all schemes and the best features from modern resolvers -- deduplication, content negotiation, link checking, inflections, suffix passthrough, etc.
A huge amount of incredibly diverse research data remains beyond the reach of internet search engines, peer review processes, and systematic cataloging. The ability by consumers to annotate data is an important mitigation, harnessing "the crowd" to make it easier for everyone to discover and re-use data.
Improving profitability for small businessBen Wann
In this comprehensive presentation, we will explore strategies and practical tips for enhancing profitability in small businesses. Tailored to meet the unique challenges faced by small enterprises, this session covers various aspects that directly impact the bottom line. Attendees will learn how to optimize operational efficiency, manage expenses, and increase revenue through innovative marketing and customer engagement techniques.
Cracking the Workplace Discipline Code Main.pptxWorkforce Group
Cultivating and maintaining discipline within teams is a critical differentiator for successful organisations.
Forward-thinking leaders and business managers understand the impact that discipline has on organisational success. A disciplined workforce operates with clarity, focus, and a shared understanding of expectations, ultimately driving better results, optimising productivity, and facilitating seamless collaboration.
Although discipline is not a one-size-fits-all approach, it can help create a work environment that encourages personal growth and accountability rather than solely relying on punitive measures.
In this deck, you will learn the significance of workplace discipline for organisational success. You’ll also learn
• Four (4) workplace discipline methods you should consider
• The best and most practical approach to implementing workplace discipline.
• Three (3) key tips to maintain a disciplined workplace.
Falcon stands out as a top-tier P2P Invoice Discounting platform in India, bridging esteemed blue-chip companies and eager investors. Our goal is to transform the investment landscape in India by establishing a comprehensive destination for borrowers and investors with diverse profiles and needs, all while minimizing risk. What sets Falcon apart is the elimination of intermediaries such as commercial banks and depository institutions, allowing investors to enjoy higher yields.
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...BBPMedia1
Grote partijen zijn al een tijdje onderweg met retail media. Ondertussen worden in dit domein ook de kansen zichtbaar voor andere spelers in de markt. Maar met die kansen ontstaan ook vragen: Zelf retail media worden of erop adverteren? In welke fase van de funnel past het en hoe integreer je het in een mediaplan? Wat is nu precies het verschil met marketplaces en Programmatic ads? In dit half uur beslechten we de dilemma's en krijg je antwoorden op wanneer het voor jou tijd is om de volgende stap te zetten.
Personal Brand Statement:
As an Army veteran dedicated to lifelong learning, I bring a disciplined, strategic mindset to my pursuits. I am constantly expanding my knowledge to innovate and lead effectively. My journey is driven by a commitment to excellence, and to make a meaningful impact in the world.
Remote sensing and monitoring are changing the mining industry for the better. These are providing innovative solutions to long-standing challenges. Those related to exploration, extraction, and overall environmental management by mining technology companies Odisha. These technologies make use of satellite imaging, aerial photography and sensors to collect data that might be inaccessible or from hazardous locations. With the use of this technology, mining operations are becoming increasingly efficient. Let us gain more insight into the key aspects associated with remote sensing and monitoring when it comes to mining.
Business Valuation Principles for EntrepreneursBen Wann
This insightful presentation is designed to equip entrepreneurs with the essential knowledge and tools needed to accurately value their businesses. Understanding business valuation is crucial for making informed decisions, whether you're seeking investment, planning to sell, or simply want to gauge your company's worth.
3.0 Project 2_ Developing My Brand Identity Kit.pptxtanyjahb
A personal brand exploration presentation summarizes an individual's unique qualities and goals, covering strengths, values, passions, and target audience. It helps individuals understand what makes them stand out, their desired image, and how they aim to achieve it.
What are the main advantages of using HR recruiter services.pdfHumanResourceDimensi1
HR recruiter services offer top talents to companies according to their specific needs. They handle all recruitment tasks from job posting to onboarding and help companies concentrate on their business growth. With their expertise and years of experience, they streamline the hiring process and save time and resources for the company.
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...Kumar Satyam
According to TechSci Research report, “India Orthopedic Devices Market -Industry Size, Share, Trends, Competition Forecast & Opportunities, 2030”, the India Orthopedic Devices Market stood at USD 1,280.54 Million in 2024 and is anticipated to grow with a CAGR of 7.84% in the forecast period, 2026-2030F. The India Orthopedic Devices Market is being driven by several factors. The most prominent ones include an increase in the elderly population, who are more prone to orthopedic conditions such as osteoporosis and arthritis. Moreover, the rise in sports injuries and road accidents are also contributing to the demand for orthopedic devices. Advances in technology and the introduction of innovative implants and prosthetics have further propelled the market growth. Additionally, government initiatives aimed at improving healthcare infrastructure and the increasing prevalence of lifestyle diseases have led to an upward trend in orthopedic surgeries, thereby fueling the market demand for these devices.
Discover the innovative and creative projects that highlight my journey throu...dylandmeas
Discover the innovative and creative projects that highlight my journey through Full Sail University. Below, you’ll find a collection of my work showcasing my skills and expertise in digital marketing, event planning, and media production.
The world of search engine optimization (SEO) is buzzing with discussions after Google confirmed that around 2,500 leaked internal documents related to its Search feature are indeed authentic. The revelation has sparked significant concerns within the SEO community. The leaked documents were initially reported by SEO experts Rand Fishkin and Mike King, igniting widespread analysis and discourse. For More Info:- https://news.arihantwebtech.com/search-disrupted-googles-leaked-documents-rock-the-seo-world/
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Supporting Data-Rich Research on Many Fronts
1. Suppor&ng
Data-‐Rich
Research
on
Many
Fronts
2 1
M a y
2 0 1 2
U n i v e r s i t y
o f
C a l i f o r n i a
C u r a & o n
C e n t e r
C a l i f o r n i a
D i g i t a l
L i b r a r y
2. California
Digital
Library
Serving
the
University
of
California
CDL
supports
the
research
lifecycle
• 10
campuses
• Collec&ons
• 360K
students,
faculty,
and
staff
• Digital
Special
Collec&ons
• 100’s
of
museums,
art
galleries,
• Discovery
&
Delivery
observatories,
marine
centers,
• Publishing
Group
botanical
gardens
• UC
Cura&on
Center
(UC3)
• 5
medical
centers
• 5
law
schools
• 3
Na&onal
Laboratories
4. Our
environment
circa
2002-‐2008
Focus
on
preserva&on
For
memory
organiza&ons
Infrastructure:
sta&c
Services:
hosted
Content:
museum
&
library
Sustainability:
?
5. Our
environment
since
2008
Focus
on
preserva&on
cura%on
(lifecycle)
For
memory
organiza&ons
and
now
data
producers
Infrastructure:
sta&c
+
cloud,
VM,
bitbucket
Services:
hosted
+
partnered,
self-‐serve
Content:
museum
&
library
+
research,
web
crawls
Sustainability:
?
cost
recovery,
pay
once
6. Today’s
journey
Data
service
basics
at
CDL
• Stable
storage
(Merri)
• Stable
iden&fiers
(EZID)
• Data
cita&on
(DataCite)
• Management
(DMPTool)
• Preserva&on
cost
modeling
...
that
enable
• Federa&on
(DataONE)
• Data
papers
• Capture
(WAS
web
archiving)
• Excel
add-‐in
(DCXL)
7. The
scien&fic
record
is
at
risk
Data
dissemina&on
is
rare,
risky,
expensive,
labor-‐intensive,
domain-‐specific,
and
receives
lile
credit
as
research
output
Global
Change
Galac&c
Change
8. The
changing
landscape
• Ever
increasing
number,
size,
and
diversity
of
content
• Ever
increasing
diversity
of
partners,
and
stakeholders
• Decreasing
resources
• Inevitability
of
disrup&ve
change
– Technology
– Ins&tu&onal
mission
R ESOURCES
T IME
9. Stable
storage:
Merri
repository
• Cura&on
repository
open
to
the
UC
community
and
beyond
• Discipline
/
content
agnos&c
• Micro-‐services
architecture
• Easy-‐to-‐use
UI
or
API
• Hosted
or
locally
deployed
Primary
FuncAons
1.
Deposit
2.
Manage
(metadata,
versions,
etc)
3.
Access
(expose)
4.
Share
(with
other
researchers)
5.
Preserve
10. EZID:
Long
term
iden%fiers
made
easy
• Precise
iden&fica&on
of
a
dataset
(DOI
or
ARK)
• Credit
to
data
producers
and
data
publishers
• A
link
from
the
tradi&onal
literature
to
the
data
(DataCite)
• Exposure
and
research
metrics
for
datasets
(Web
of
Knowledge,
Google)
Take
control
of
the
Primary
FuncAons
management
and
distribu%on
of
1.
Create
persistent
iden&fiers
your
research,
share
and
get
2.
Manage
iden&fiers
(and
associated
credit
for
it,
and
build
your
metadata)
over
&me
reputa%on
through
its
collec%on
and
documenta%on
3.
Resolve
iden&fiers
11. Discovery:
DataCite
consor&um
• Technische
Informa&onsbibliothek
(TIB),
• Canada
Ins&tute
for
Scien&fic
and
Germany
Technical
Informa&on
(CISTI)
• L’Ins&tut
de
l’Informa&on
Scien&fique
• Australian
Na&onal
Data
Service
(ANDS)
et
Technique
(INIST),
France
• The
Bri&sh
Library
• Library
or
the
ETH
Zürich
• California
Digital
Library,
USA
• Library
of
TU
Delk,
The
Netherlands
• Office
of
ScienAfic
and
Technical
InformaAon,
US
Department
of
Energy
• Purdue
University,
USA
• Technical
Informa&on
Center
of
Denmark
12. DMPTool
Mee&ng
funding
agencies
data
management
plan
requirements
• Connect
researchers
to
resources
to
create
a
data
management
plan
• NSF
and
directorates,
NIH,
NEH,
IMLS,
founda&ons
plus
• Customizable
Primary
FuncAons
1.
Step-‐by-‐step
“wizard”
2.
Templates
and
examples
3.
Links
to
ins&tu&onal
resources
and
agency
informa&on
4.
Plan
publica&on
and
sharing
14. Cost
Model
1:
Pay
as
you
go
• Billed/paid
annually
{ P
if
year = 0
0
if
year > 0
– Costs
for
archival
System
(A ),
Workflows
(W ),
Content
Types
(C ),
Monitoring
(M ),
and
Interven%ons
(V )
are
considered
common
goods,
and
are
appor&oned
equally
across
all
n
Producers
(P )
• Model
components
are
represented
by
two
terms:
the
number
of
units
and
the
per-‐unit
cost,
e.g.,
k ·S
– Storage
cost
(S )
accounted
on
a
per-‐Producer
basis
15. Model
2:
Pay
once,
preserve
for
“ T”
years
• Paid-‐up
price
for
fixed
term T
– A
func&on
of
r,
the
annual
investment
return,
and
d,
the
annual
decrease
in
unit
cost
of
preserva&on
– G
is
the
cost
of
providing
a
year’s
preserva&on
service;
G0
includes
the
added
first
year
expense
of
Producer
engagement
and
registra&on
– Sepng
T
=
∞
calculates
the
price
for
“forever”
16. New
distributed
framework
CoordinaAng
Nodes
Flexible,
scalable,
Member
Nodes
• retain
complete
metadata
sustainable
network
•
catalog
ins&tu&ons
diverse
• subset
of
all
data
•
serve
local
community
• perform
basic
indexing
•
provide
network-‐wide
•
provide
resources
for
managing
their
data
services
• ensure
data
availability
(preserva&on)
• provide
replica&on
services
19. Need
to
save
data
+
processing
Algorithms
+
Data
Structures
=
Programs
20. Vision
for
a
“data
paper”
• Wrap
the
unfamiliar
in
a
familiar
façade
• A
“data
paper”
is
minimally
a
cover
sheet
and
a
set
of
links
to
archived
ar&facts
• Cover
sheet
contains
familiar
elements:
&tle,
date,
authors,
abstract,
and
persistent
iden&fier
(DOI,
ARK,
etc.)
• Just
enough
to
permit
basic
exposure
and
discovery
– Building
a
basic
data
cita&on
– Indexing
by
services
such
as
Web
of
Science,
Google
Scholar
– Ins&lling
confidence
in
the
iden&fier’s
stability
21. 43 public archives
120+ archives total
58K crawls
7,500 + sites
600 million + URLs
40+ TB
24 institutions
Developed with LoC support by CDL, UNT, and others
22. What
are
people
using
WAS
for?
Archiving
at-‐risk
government
websites
and
publica&ons
Archiving
their
own
university
domains
Building
web
archives
to
complement
library
collec&ons
Documen&ng
web
coverage
of
significant
events
23. Data
cura%on
for
Excel
• Excel
is
the
database
of
choice
for
many
researchers
• Make
it
easy
to
share,
archive,
and
publish
data
• Keep
up
to
date
at
dcxl.cdlib.org
Primary
FuncAons
Surveyed
users
and
found:
• Most
researchers
are
unaware
of
1.
An
Excel
add-‐in
and
web
preserva&on
op&ons
applica&on
• Documenta&on
prac&ces
are
poor
2.
Metadata
descrip&on
(through
• Excel
is
just
one
tool
in
workflows
extrac&on
and
augmenta&on)
3.
Check
for
good
data
prac&ces
3.
Transfer
to
repository
24. A
data
cura&on
approach
at
CDL
• New
“data
paper”
publishing
model
[GBMF]
• DataCite
consor&um
and
cita&on
standards
• Other
fronts:
• DataONE
global
data
network
[NSF]
• Merri:
general-‐purpose
data
repository
• EZID:
scheme-‐agnos&c
&
de-‐coupled
crea&on,
resolu&on,
and
management
of
persistent
ids
• Data
management
plan
generator
• Web
archiving
service
[Library
of
Congress]
• Open-‐source
Excel
add-‐in
[MS
Research
&
GBMF]