Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
1330 mon etive toewe
1. “ F o r e v e r is c o m p o s e d o f
Nows ” :
L o n g -t e r m p r e s e r v a t io n o f
re s e a rc h d a ta
U K S G 2 0 12
in a n a c a d e m ic lib r a r y
G l a s g o w , 2 6 t h /2 7 t h M a r c h 2 0 12
D r . M a t t h ia s T ö w e
E T H Z u r ic h , E T H -B ib lio t h e k
1
2. O U T L IN E
1. B a c k g r o u n d : i s s u e s a n d
o b je c t iv e s
2. C u r r e n t p r o j e c t
3. R o l e s
4. V i s i o n
5. L i m i t a t i o n s
6. « N o w s » a n d c a v e a t s
2 6 /2 7
2 Ma rc h M . Tö w e
2 0 12
3. B A C K G R O U N D ( I)
C h a lle n g e s
• R e s e a r c h process as a whole
r e lie s o n d ig it a l d a t a
• Data can only be used in a
d e f in e d t e c h n ic a l e n v ir o n m e n t ,
which usually remains
stable for only a few years
• G o o d s c i e n t i f i c p r a c t i c e requires
retention of data in usable
form
2 6 /2 7
3 • F u n d i n g o r g a nM s Ta wtei o n s require
Ma rc h
2 0 12
i. ö
4. B A C K G R O U N D ( II)
C h a lle n g e s
• R e - u s e o f d a t a becomes
increasingly important and
should be facilitated
• Data which cannot easily be
reproduced and has p e r m a n e n t
r e l e v a n c e must remain
available
• P u b lis h e d o r r e f e r e n c e d
supplementary material
must be citable and remain
2 6 /2 7
4 Ma rc h M . Tö w e
2 0 12
5. M A J O R R IS K S
• D a t a lo s s
Data cannot be found
• L o s s o f r e a d a b ilit y
Data cannot be rendered due to
technical reasons (most often
obsolescence of one required
component such as application,
operating system, hardware)
• L o s s o f in t e r p r e t a b ilit y
5 Data cannot M be e interpreted and
2 6 /2 7
Ma rc h . Tö w
2 0 12
6. D A TA L O S S
D a ta L o s s
Data cannot be found because…
• Their location of storage is not
known
• File or folder structures were
changed without documentation
• Intransparent redundancies and
versions exist
• Persons originally responsible
cannot be conctacted
• Offline-media are stored in
unknown locations
6
• Offline-mediaM .were damaged by
2 6 /2 7
Ma rc h Tö w e
2 0 12
7. L O S S O F R E A D A B IL IT Y
L o s s o f r e a d a b ilit y
Data cannot be rendered
because…
• F i l e f o r m a t s are not recognized by
current software or are not
rendered correctly
• S o f t w a r e required for rendering
or even editing data is no
longer available
• Available older software
cannot be run on current
o p e r a t i n g s y s t e m s a n d /o r h a r d w a r e
2 6 /2 7
7 Ma rc h M . Tö w e
Re c o v e r y « e x po s t » mig ht
2 0 12 ev e n b e
8. LOS S OF
IN T E R P R E TA B IL IT Y
L o s s o f in t e r p r e t a b ilit y
Data cannot be interpreted and
used in a scientifically correct
way because s e m a n t i c i n f o r m a t i o n i s
m i s s i n g , e.g. about…
• S a m p le taking and preparation
• M e t h o d s o f m e a s u r e m e n t or data
collection
• Known e r r o r s a n d c o r r e c t i o n s
• L e v e l o f d a t a p r o c e s s in g
• M e t h o d s o f a n a l y s i s and algorithms
used
2 6 /2 7
8 Ma rc h M . Tö w e
2 0 12
9. WH AT WE M E A N B Y
C U R A T IO N
Wha t? Wh y? Who ?
E ns ure
D a ta D a ta
in t e lle c t u a l
C u r a t io n P rod uc e rs
r e -u s a b ilit y
C o nte nt E ns ure
E TH -
P r e s e r v a t io t e c h n ic a l
B ib lio t h e k
n r e -u s a b ilit y
B it s t r e a m E ns ure
I T -S e r v i c e s
P r e s e r v a t io t e c h n ic a l
E T H Z u r ic h
n s t a b ilit y
ed after Jens Ludwig, Wissgrid
2 6 /2 7
9 Ma rc h M . Tö w e
2 0 12
10. D IF F E R E N C E S B E T W E E N
D A TA T YP E S ?
Wha t? R e s e a r c h d a t ai b r a r y o b j e c t s
L
C o m p r e h e n s iv
e
F u ll c o n t r o l o f
D a ta d o c u m e n t a t io
me ta d a ta a nd
C u r a t io n n by
c o nte xt
produc e rs
M o r e qaun r e ld s s
re id e M a in ly
C o nte nt c ommon s ta nd a rd
P r e s e r v a t io f o rSm a t s p r e s e r vfa tri mn t s
o o a
a me
n
p r o c e d u r e s a p p ly
B it s t r e a m
P r e s e r v a t io „ A n y o b je c t is ju s t b it s “
n
2 6 /2 7
10 Ma rc h M . Tö w e
2 0 12
11. O T H E R V IE W S
(M a n y ) people including
p o t e n t i a l p a r t n e r s (IT, research)
• Tend to mix up long-term s t o r a g e
(bitstream preservation) and
l o n g - t e r m p r e s e r v a t i o n (keeping data
usable)
• Ta k e p r e s e r v a t i o n f o r g r a n t e d , once
data is reliably stored
• S e e t h e n e e d t o c h a n g e a n d im p r o v e
c u r2r6e n t p r a c t i c e i n d a t a m a n a g e m e n t
/2 7
11
withc hthe option e of long-term
Mar
2 0 12
M . Tö w
12. «OUR» ROLE AND
« T H E IR S »
• C a n w e a c t u a lly « r a is e a w a r e n e s s »
with researchers?
• Is it r e a lly u s e f u l t o b o t h e r
r e s e a r c h e r s with technical
background – unless asked
for?
• There are r e s e a r c h e r s w i t h a
h ig h le v e l o f a w a r e n e s s a n d
c onc e rn
2 6 /2 7
•
12 B e s t s t a r t w i t h Mt. hö o es e w h o a c t u a l l y
Ma rc h
2 0 12
T w
13. C OULDN’ T RES EARC HERS
D O IT T H E M S E L V E S ?
• D a t a m a n a g e m e n t a n d d ig it a l
c u r a t io n h a n d le d b y r e s e a r c h e r s
t h e m s e lv e s :
• P o s s ib le in principle and
sometimes done very well
• Tim e c o n s u m in g
• Supportive of research
productivity
• N o t p r o d u c t iv e r e s e a r c h in it s e lf
2 6 /2 7
13 Ma rc h M . Tö w e
2 0 12
14. WH Y D O E S E TH -
B IB L IO T H E K B O T H E R ?
• In f r a s t r u c t u r e s e r v ic e s s u c h a s
E T H -B ib lio t h e k a n d IT s e r v ic e s
• S u p p o r t the research process
• Can offer services to e a s e
w o r k l o a d o f r o u t i n e t a s k s for
researchers
• Rely on scientists to define
their requirements
• R e ly o n r e s e a r c h e r s t o d o c u m e n t t h e ir
d a t a according to community
needs
14
• E 2 6 pc lho i t s y n e r g i e T ö win order to
x /2 7
Mar M.
s e
2 0 12
15. W H Y T H E L IB R A R IE S ?
• Reputation of scientific
libraries as l o n g - l i v e d /p e r m a n e n t
in s t it u t io n s
• O r g a n is in g a n d m a n a g in g in f o r m a t io n
is seen as a task, where
librarians can contribute
their experience
• Building on former
(obviously p o s i t i v e ) t r a c k r e c o r d ,
t h e r e/2 7s h o u l d b e M aT ö bea s i s o f t r u s t
26
15 Ma rc h . w
2 0 12
16. N E W TA S K S F O R
L IB R A R IE S
• W e c a n b e s e r v i c e p r o v i d e r s – if we
have a service to offer
• W e t a k e o n a n e w r o le :
• I n a d d i t i o n t o d e l i v e r i n g i n f o r m a t i o n to
researchers…
• …we now offer s e r v i c e s a r o u n d t h e i r
o w n d a ta …
• …w h i c h w e o f t e n c a n n o t e v e n m a k e p u b l i c l y
a c c e s s ib le .
• New tasks call for a n e w
p r o f2e /2 7s i o n a l p r o f i l e («data
6 s
16 Ma rc h M . Tö w e
2 0 12
17. V IS IO N - U S E R ’ S V IE W
2 6 /2 7
17 Ma rc h M . Tö w e
2 0 12
18. V IS IO N – S YS T E M S V IE W
Additional Ingest Additional search /retrieve
Content sources
components Archival core modules & delivery components Access components
Primary data Preservation Planning Catalog
Administration Admin-
Secondary Interface
data Data
Producer Management Create User
(E-Depot) Archiv-DB
Deliveries /
Preprocessing
Retrieve
Ingest
Requests Reposi-
Ingest Access tories
GEVER
Archival
Additional Storage
sources
Storage-Layer
Security-Layer
(Authentication , Authorisation )
Abschlussbericht zur zweiten Phase „Pilot Langzeitarchivierung“,
S. 23f; Aliesch, P. et al., 2007: Projekt „Pilot
Langzeitarchivierung“. Intern.
2 6 /2 7
18 Ma rc h M . Tö w e
2 0 12
19. R O S E TTA
P re -
M a n u a lly
D a ta in g e s t ,
p r o d u c t io e .g .
n a nd s t r u c t u r in
h a n d lin g g,
fo r re -
L o n g -t e r m
c urre nt a r r a n g in g
p r e s e r v a t io n
a n a ly s is ( S e m i-) ,
a c c o r d in g t o O A IS
a u t o m a t i c sael lly c t i n g
e
2 6 /2 7
19 Ma rc h M . Tö w e
2 0 12
20. V IS IO N
Additional Ingest Additional search /retrieve
Content sources
components Archival core modules & delivery components Access components
Primary data Preservation Planning Catalog
M a n u a lly
P re - Administration Admin-
D a Secondary
ta in g e s t , Interface
p r o d u c data
t io e .g . Data
Producer Management Create User
(E-Depot) Archiv-DB
n a nd s t r u c t u r in Deliveries /
Preprocessing
h a n d lin g g, Retrieve
Ingest
Requests Reposi-
fo r re - Ingest Access tories
GEVER L o n g -t e r m
c urre nt a r r a n g in g
p r e s e r v a t io n
a n a ly s is ,
a c c o r dArchivalt o O A I S
in g
( S e m i -) a u tsoeme c t i n g
l
Additional i c a l l y Storage
at
sources
Storage-Layer
Security-Layer
(Authentication , Authorisation )
Abschlussbericht zur zweiten Phase „Pilot Langzeitarchivierung“,
S. 23f; Aliesch, P. et al., 2007: Projekt „Pilot
Langzeitarchivierung“. Intern.
2 6 /2 7
20 Ma rc h M . Tö w e
2 0 12
21. L IM ITA T IO N S
There are g e n e r a l l i m i t s to
what we can do
• We need to m a k e d e c i s i o n s no w…
• …which i n f l u e n c e i f a n d h o w d a t a c a n
b e u s e d in future.
• We d o n o t k n o w …
• W h o will use data
• W h e n d a t a will be used
• F o r w h i c h p u r p o s e data will be used
• « S o m e o n e » n e e d s t o c o m m it n o w t o
21
p a y i6n cgh f o r « e t e r n i. tT yw e
2 /2 7
Mar M ö
»
2 0 12
22. O N L Y R E S E A R C H D A TA ?
• These limitations are n o t
s p e c if ic o f r e s e a r c h d a t a …
• …but they are m o r e
p r o n o u n c e d in research:
• H i g h m o b i l i t y of staff
f l u c t u a t i o n in responsibilities
• D y n a m i c d e v e l o p m e n t of methods
• D a t a m a n a g e m e n t n o t a lw a y s
c o n s id e r e d a s a p r io r it y
2 6 /2 7
•
22 M u l tr ci t u d e o f f o r M . Ta w e
Ma h m ö ts
2 0 12
23. T H E T R O U B L E W IT H
« N O WS »
L o n g - t e r m p r e s e r v a t i o n i s n o o n e -o f f
a c t iv it y
• E a c h g e n e r a t io n h a s t o a c t
according to its best
knowledge
• Usually, the aim is to h a n d o v e r
u s a b l e d a t a t o t h e n e x t g e n e r a t i o n of
curators
• T h e o v e r a ll q u a lit y o f t h e p r e s e r v a t io n
c h a/2 n i s g o v e r n e d b y t h e p r e s e r v a t i o n
26
i7
23
s t M 0 pc h w i t h t h e l oMwT ö wse t q u a l i t y
2ea 12
r .
e
24. WH AT C A N B E D O N E
« N O W»
E x a m p le s f o r t h e « n o w s » in
re s e a rc h d a ta
• Only now we can c o m m u n i c a t e
w it h d a t a p r o d u c e r s
• F in d o u t w h a t t h e ir n e e d s a r e
• D e f in e t h e r e q u ir e d s e r v ic e s
• M a k e p r o d u c e r s d o c u m e n t t h e ir d a t a
• D i s c u s s a l t e r n a t i v e f o r m a t s where
necessary
2 6 /2 7
24 Ma rc h M . Tö w e
2 0 12
25. C AVE ATS
• Digital curation cannot
«improve» data retroactively:
« g a r b a g e in – g a r b a g e o u t »
• Therefore r e s e a r c h e r s n e e d t o
a c t i v e l y c o n t r i b u t e (e.g.
documentation)
• W h o d e c i d e s about data when the
producer is no longer
available?
• D a t a c a n b e m a d e p u b l i c l y a v a i l a b l e , but
thisr c must not T ö w e a prerequisite
M . be
2 6 /2 7
25 Ma h
2 0 12
26. M O R E C AVE ATS
• W r i t t e n a g r e e m e n t between data
producer and data archive on
formats, procedures and
access rights
• Management of a c t i v e d a t a n o t
t r e a t e d in c u r r e n t p r o je c t …
• …but w e n e e d t o p r o v i d e c o m f o r t a b l e
r o u t e s t o b r in g r e s e a r c h d a t a in t o t h e
a r c h iv e
• T h e r e i s n o a b s o l u t e s a f e t y against
willful attacks
2 6 /2 7
26 Ma rc h M . Tö w e
2 0 12
27. E VE N M O R E C AVE ATS
• «T h e a r t o f c o m m u n i c a t i n g w i t h t h e
f u t u r e »:
• We no w t r y t o m i n i m i z e r i s k s with
reasonable effort in order
to avoid their occurrence in
future
• Together with producers w e
c a n o n l y m a k e e d u c a t e d g u e s s e s at
who might want to use data
for what kind of purpose
•
2 6 /2 7
27 N o M arr ohc k e t s c i e nMc Te w e, b u t a n o n g o i n g
« c . ö »
2 0 12
28. T H A N K YO U V E R Y M U C H !
Q u e s t io n s ?
Dr. Matthias
Töwe
Head Digital
Curation
ETH-Bibliothek
Rämistrasse 101
8092 Zürich
Switzerland
+41 (0)44 632 60
32
m a t t h ia s . t o e w e @ ib r a r y. e t h z . c h
l
2 6 /2 7
2 8 t p : //w wr w . l i b r a r y . e t h z .M . h ö w e
ht Ma c h c T
2 0 12