Lizzy Jongma's presentation 'You've done it all. You've broken every code' during Digital Preservation for the Arts, Social Sciences and Humanities (DPASSH) 2017 in Brighton (UK), on the 14th of June.
30. Lizzy Jongma Network Dutch War Collections//
@LizzyJongma NIOD Institute for War, Holocaust
L.Jongma@niod.knaw.nl and Genocide Studies //
www.oorlogsbronnen.nl
Thank You
Editor's Notes
Today I would like to take you with me on my own personal journal into the digital World….
I was born in a different age, a different era : the Seventies … we had cookies, we had milk. We had childrens television on Wednesday afternoon.
We had no computer, no iPad, no smart phone. Not even a small laptop.
I am a digital immigrant: I moved into a world I was not born in and even though I try and catch up as quick as I can. I dont think I will ever be a “ local”
I don’t know if some/ any of you remember this game? We played it at the beginning of the millennium.
Google whack: try and find combinations of words like
squirreling dervishes
And only get ONE result
Rating was important but not as important as it is now, where you and your data are always lost in piles and piles and piles of data.
We did everything wrong…
I started digitizing collections when I was in university. We flipped prints onto flatbed scanners. Occassionally bend a corner of a print. Using to much light to scan the image.
We didn’t know about lighting, per or post scan corrections, standardized, embedded metadata, durable/archival image formats…
Like Baron von Munchhausen we had to pull ourselves from the swamps, by our own hair!
I love to show this picture, because it also tells us something about fashion and time and the way it influences our perseption.
This image is a medley of two images. One fully colourmanaged version of this painting and one bad scan of a seventies art history textbook.
Which is which and which one is available online as a hand painted reproduction?
Resently the newly appointed director of the V & A was badly misquoted when he noted in an interview that huge amounts of money was spent (or misspent) on digitization of collections because the differences in quality was so big.
The debate it stirred was all over the place. So I am not sure or I dont think I am quoting the director of V&A, but I do think this headline represents a general feeling in the audience.
I would like to pick up on two subjects in this debate:
The fact that money immediately comes into the debate.
In my humble opinion the economization of our world is a big threat to art, to history, to our culture, to our political system
All of our values are expressed in money, in profits, in business plans
We don’t run museums because they improve the quality of life of the vistors. They are tourist attractions and tourists mean money.
We wrote business plans for Open Data, for the Network of War Resources, for Europeana. Selling images is business, open data is big business. Or good PR with economic value..
Better things could be done with all that money stop digiting and start doing things in your museum, like exhibits.
But wait…
Is money better spent on exhibitions? Where do our audiences flock together? Where resides half of our population? Right… online!
My youngest daughter is seven now. When she was two, she walked to the television set and tried to swipe the program that was on.
When we told her that you cant swipe television, she lost all interest.
She does not understand the time based principle of television.
And it is not just kids! Parents are just as bad! Recently there were a lot of articles in youth development magazines: kids complaint about the telephone habits of their parents. They were absent minded/ on their phone so much that kids feel neglected.
Even up to the beginning of this decade museum directors could make remarks like “ I dont believe in the Internet”… It is not a religion, it is a real place where everyone goes!
I had the privelege to work at the rijksmuseum in Amsterdam when the museum was closed. The museum was closed for 10 years… and the entire world outside our walls was changing while we were waiting for our new galleries.
But it gave us time to think about about collections in the 21 century. And some of our conclusions (by the way, most of us had never worked in a real museum with real visitors. Heck I had worked in an archival institution) were, and still are quite revolutionary
To be Open you have to be open and Generous. You have to give it all away (open data)
There is no bigger exhibitions space than the internet. You are not limited by time, space, conservation criteria, walls of institutionsyou are limited by your own imagination, technical skills and openness of your colleagues
Even on the internet people love to curate/collect/combine stuff and … share itRijks studio, the pinterest like part of the Rijksmuseum website was created as a fun extension on the side. Currently it is the core of the Rijksmueum website and even inspired the Rijksmuseum to start the Rijksstudio awards.
I was fascinated by the fact that Rijksstudio became such a hit:
People spent 12 minutes on the Rijksmuseum website! On an iPad. In their free time.
Even curatorial staff picked up on this and started promoting their own collections in Rijksstudio.
For instance Mattie Boom, curator of Photography has found exciting new ways to promote old photograph from the collection online. She has a huge fan base.
A lot of you will say. Well thats fun and all… but how about research, ehumanities, research data…
As we all sit here today were are inventing new sciences, new ways to practise science…
Most of you will probably not know that the Rijksmuseum has a big research department. Their main field of expertise is restauration and conservation. Paintings, paper, textiles metals etc.
In 2012 I got involved in paint sample research at the Rijksmuseum and I was in chock. Research material was digitally recorded and photographed in the worst ways thinkable. And i dont want to blame my colleages: this was (maybe still is) common practise.
The black and white image on the right is not the worst image of a paintsample… but it is unbelievable that paint colours are described, in a full colour era, on a black and white image!
We wanted to record paintsamples colour true and make all paintsample data fully accessible… no one else had ever done this. We had to build the tiniest pico card on the planet to colour manage paint samples. It fits under a microscope. We also had to build our own information systems and structure our data.
Other colleagues to it further in the Jeroen Bosch project and built beautiful apps on restauration and conservation. Making all layers visible to a big audience.
But making stuff available in apps or websites is not enough!
Research data should not just be generated to write theses or build apps.
Researchers need data, big, digital amounts of data to discover, explore, mark, visualize etc.
So as researchers: don’t just look at GLAMs for data. Also look at the person sitting next to you and ask him or her if he or she is making research data available in an open, reusable format! Your colleagues are not just interested in stealing your data or attacking your results. They also need data to do their own research.
Currently there is a big gap between the services built for research and researchers and for human/big audiences.
Even though I am asking you all (both researchers and GLAMS) to produce Open, Structured data… I don’t think researchers have specific psychic skills.
So why do we freak researchers (and aggregators like myself) out with clumsy, undocumented, slow and unreliable Endpoints?
Did we promis the boss to be open, but we dont really believe in it?
How many of you have tried to work with harvest scripts, API’s or Sparql endpoints that are broken or broke down half way the process…
We all grew up with Science Fiction: good against evil. Humans versus aliens, computers or alien computers… I know that most people are scared by the thought that computers will take over.
To late to late, Steve Harley would say. Computers have already taken over. Traffic, schools, healthcare.. Our banking system.
I havent seen any applications of Block Chain Technology in the GLAM sector yet. But I have learnt to be prepared and bought my way into a couple of crypto currencies and technologies to create crypto currencies. So far I haven’t been very successful, but better be prepared…
-------
Most GLAMS are unaware of a big and fast growing audience for their data: COMPUTERS!
In the next decade computers will take over the internet. Humans will no langer be the searchers. Machines/ scripts will gather data, big data.
Computers will harvest and scrape, combine and connect, analyse and predict.
Linked Open Data is the basis for machine learning/machine understanding. It is the basis of Business Intelligence (2.0) and an open, web based structure to annotate information. It is something every historian is trained in: it is a structure for referencing your data.
But… when it comes to computers, most historians forget to annotate where they got their information from.
And this makes data from GLAMs a nightmare for computers!
Actually, we are now working on computer programs to clean up the mess people created when they started describing objects and archival document.
This is what we call our matching machine: it is software that analyses data and matches structured datasets, thesauri with metadata of objects, images, archives etc.
The software is able to understand preferred and non preferred terms, to see misspellings and abuse of computer programs etc.
And it also communicates with its users: asking you if the interpretations are right.
This software (there is also other open software like OpenRefine and LOD Laundromat) is able to match data and create Linked Open Data. You don’t have to do this manually.
There is good news for us, humans! Even though computers are taking over the internet, they still work on our behalf. For us.
For me this is the best example of computer aggregated data served back to a human audience. The Google Info Box.
Gathered from DBPedia and presented in a well designed, structered format. Easy to read and helpful if you are looking for related information.
There is a Dutch proverb that you can better steal a good creation than come up with a bad one yourself.
So I would like to present our own version of the Info Box. Unfortunately our information is in Dutch, but we used our thesaurus to create
encyclopaedic information boxes to help and inform our audiences.
But, Huston we have a problem.
Computers are built to rank relevancy based on actuality and/or economic value. Google, but also Twitter, Facebook etc. Are blind to history.
And even historic events are described in time lines, starting today…
Our resources, our data, our research can only get on or stay on the timelines of Google, facebook, twitter if we create our own digital actuality.
A tweeting world war, pinterest boards with World War fashion.
We can extend the impact of our data, by opening up our collections to external partners with big impact.
The Rijksmuseum shared its paintings collection with Wikimedia Commons and these are the statistics of the page views of articles with or about Rijksmuseum Art.
This is what I call impact!
Actually… this is what I call impact!
This is when science, art and a scientist (Antonie van Leeuwenhoeck) beat Donald Trump
Now that is what I call news!
But lets return to the analogue world.
Where most resources don’t beat donald trump. Never get a chance to beat the POTUS. Simply because they are not digitized/put away in brown boxes.
We estimated that a maximum of 7% of War Resources in the Netherlands is digitized. And I guess this percentage goes for most historical and art historical resources.
So first and foremost: we need to digitize – big time/big scale!
We can’t continue at our current speed and with our current analogue working processes.
We need to come up with digitized ways to digitize collections:
conveyor belt digitization /
computers weighing, measuring/
recognizing and adding metadata to objects/
Linked data annotation and translations of texts/
non synchronous digitization …
GLAMs think they are unique in their digital challenges, because the hold unique collections.
But please, collection managers in this room: go spend a day in a modern warehouse. Check the computer systems and equipment they’ve got there!
Some of you, maybe most of you have run or worked on a crowd source project.
It is an excelent way for digitizing collections.
There is a big difference between Crowd sourcing and tagging.
The first crowd sourcing projects Steve.museum and the Powerhouse 2.0 collection, experimented with the power of the audience.
So many people with so much knowledge… The potential! But how to tap into this collective knowledge.
The first projects allowed users to tag collections.
And although you are not Dutch you may recognize some words: some one was able to identify the painter. His given name….
And then issues issues with Tagging started.
If you look at the results of tagging projects, you can see that the systems are polluted and abused and the results usually aren't very smart. Even asking audiences to name colours can lead to weird results and computers are actually much better at it than humans!
Currently we try and get experts involved. Its crowd sourcing with experts : we call it niche sourcing!
Experts are people from the audience with knowledge on specific subjects. The kind of knowledge you don't have in your institution: bird lovers, fashion freaks, genealogists.
These people are able to recognise, to see information and interpret information that is completely out of your institutional scope.
I remember this bird expert that found a specific type of Crane painted on a chest that was dated in the early 17th century. This Crane was only scientifically described one year earlier and de bird expert was amazed that the painter had already seen samples of this new found species. It shows how quick trading routes were in the seventeenth century.
People can also collaborate because they want to help. We are currently crowd sourcing camp Cards. Concentration Camp records are very emotional. It is more than just a piece of paper. We have 160 volunteers helping us to transcribe the hand written or chaotically typed cards. We launched this project a week ago: we have approximately 30.000 cards. Monday 6.500 cards were transcribed. Twice.
Last year I left the rijksmuseum to start at the Network for Dutch War Resources. A digital platform for War Collections. Our portal currently holds 41 collections from 35 different Institutes. 9.5 million items. Next year we plan to connect 30 more collections.
We want to become a new kind of research institute / infrastructure: fully digital, subject driven – not collection driven.
In the Netherlands there are 400+ institutes with minor or major war collections. No researcher will be able to research all collections. Hardly anyone knows the ware abouts or existence of all these collections.
We help organizations digitize, metadata and structure their collections so we can harvest the digital collections and connect them to other/simular or relevant collections.
We need institutional friends to fill our portal. But we also make friends by helping organizations make their collections digitally accessible.
We even teach some computer tricks to make life easier.
Currently we are moving beyond physical objects…
We are moving into the field of tertiary data: primary digital data being a digitized object / secondary data is metadata. Data about objects / Tertiary data is meta data about the settings of an object or subject: people, places, events, concepts, dates or named events.
We are currently collecting People and personal events in the lives of people. This is the first preview of our People Portal WW2!
We have gathered data about aproximately 200.000 individuals. But this is just the tip of the iceberg.
We are working on events, even on digital event modelling so computers can recognize and interpret events.
Events in personal lives: birth, arrest, deportation, death. (I am very sorry: the Second World War is a very depressing subject)
Events have actors, dates and places.
Events with plural actors, maybe even plural places and plural dates are middle level events/ and can be named events
Big scale events are events like The Second World War or Middle Ages.
We are trying to compose or decompose events to recognize a personal event and connect this to a bigger event. When was a train ride a train ride and when a deportation?
Who was where, when? What happened to my grandfather and when did he go where?
Important questions, still occupying people, 75 years after the War.
But the more the world becomes digital, the more we run into legal issues.
20 years ago I worked in an archive and we digitized photo’s. Totally unaware of any law. We just did and we put them online. No big deal.
But then photographers discovered that the internet was putting them out of business.
Like the music industry, the image industry started fighting back. And don’t get me wrong, I think artists are fully entitled to earn a living from their art. I am not in the business of ripping off. But current copy right laws have a huge side effect:
[next slide]
Copy right laws are creating huge gaps in our cultural memory. Our children will be able to study 17th century art, images. But not 20th century art.
Will they know about Damien Hirst or Andy Warhol?
They probably will, because they will find rip offs/ bad and illegal copies of these famous artists. On obscure websites. Not in validated resources, like scholarly Institutions and websites
Like Wikipedia.
But will they know about the lesser gods of the 20 th century?
We will either have to come up with Spotify like services for copyrighted arts and history… or fight for Heritage clauses in copyright laws. Not to steal images, but to keep culture in our collective memory.
And an other twist in our times is privacy:
The more we digitize, the more we know about each other. Everybody has a right to privacy, a right to be forgotten or not be known.
But unfortunately… we are in the business of digital humanities. Quit difficult without humans…
Computers can do so many good things: like tracking down the footprint and faith of little Flora Pijpeman. A jewish girl, 4 years old when she was incarcerated in Camp Vught. 5 years old when she died in Sobibor.
We can connect sources, combine administrations of German concentration camps and construct lives.
But we can not ask Flora or her parents if they want to be remembered. Court laws teach us that some of the Jewish descendants dont want their ancestors
mentioned on digital monuments, because they don’t want to be documented as jews.
Current privacy laws in the Netherlands allow us to process data of the deceased, not of the living. And in our case we probably don’t have to check every individual case.
We can also build in barriers (log in), providing limited data etcetera to stay within the laws. But I am not sure about the future. Laws are getting more strict and our work more difficult.
Concluding Remarks:
We are at the beginning of a digital revolution. A profound change of the world as we know it. Changing not just our information but even our fysique.
Coming from an analogue time I had to change my skills: from no data to data overload. From no news to fake news and being able to detect fake news.
A second revolution will bring more computers, computer analysis, block chains etc in the realm. Please prepare for impact. Start acquiring programming skills start collecting data.
When we started digitizing collections we had to start from scratch. Inventing structures, criteria and quality controls from scratch. I think we did a pretty great job. And even though people now say that a lot of money was wasted: I believe in the google principle that you have to do things, test things, fail at things to become better. NEVER DO NOTHING.
There are set backs and back lashes. The biggest threats for us are our own laws. With the expansion of the digital world came the expansion of protectionism: copyright laws and privacy laws make our jobs hard and create a real risk for historical research. We need to stand up louder and fight for freedom of research! There is a difference between us and commercial companies. But we also need to Open ourselves.
The digital world is part of our real world. An important part of our real world, but it is blind to history. We have to become relevant by becoming smarter:
Join the digital leaders
Create linked, valuable, vast amounts of data
Use computers to become digital
Dont just be a user: also become a producer!