Before I go into why and how we are digitizing, I’d like to give you a little background on our archives and archives in general, since many people have only a vague idea of what we’re about.
The City of Vancouver Archives turned 80 a week ago. That makes us the oldest Canadian municipal archives outside of Quebec. On June 8, 1933, Major James Matthews was appointed the first City Archivist by City Council. Our clients aren’t all academics: they include people researching their house or neighbourhood, businesses, artists, and more. We have 3 kilometres of records in our holdings. Like other Canadian public archives, we are a “total” archives, which means we acquire both government and private-sector records.
There’s a City by-law that says we’re the official repository for City records worth keeping, and every year hundreds of boxes of City records arrive.
Roughly half of our holdings come from the private sector, and are the records of businesses, individuals, associations, families and more. They might be personal papers, diaries, photographs, or other media. Acquiring private-sector records means that you can get a more complete account of the city’s history: it’s not all told from the government’s point of view. I keep talking about records, and by that I mean primary sources. We are interested in the original documents, what the creator created, and in preserving the creator’s meaning. Although many of our holdings are fun and interesting, we don’t look at them as memorabilia, but as evidence , and we treat them that way.
We need to keep these records unaltered, and to be able to show that has been done. How could you trust them otherwise? It’s a very important part of what we do: we even named our blog AuthentiCity.
Once the records come into our possession, we have chain of custody procedures. The records are registered and stored in our secure facility. We employ professional archivists who are trained to understand the records and their meaning, and to describe them accurately. We don’t make the records available until they have been described, assessed for legal issues such as privacy, and rehoused safely. But then we DO make them available. Our mission boils down to preserving records and making them available to you. That’s why we exist. Digitization doesn’t suddenly allow us to share things – we’ve been sharing our holdings, for free and on demand, for 80 years, and online for 15. Later on, I’ll look at some of the other advantages digitization gives us.
Right now I’d like to talk about access. Let’s look at how we make these digital objects available to you. If you search our online database, the first thing you get is a results list with a brief description and a little thumbnail to show you’re on the right track. If you click on this, you’re taken to the full description, which will also have a larger image.
This full description is what has a permalink: not the images, not the search string, but this URL. We might put a new access image up in the future, but this link will always stay the same. If you click on this image . . .
. . . you’ll be taken to the larger version, which should fill your browser. Click one last time . . .
. . . and you can see the image in full resolution. You’ll have to scroll around to see it all. You can read on the sign how much it cost to get into the show. We figure you’re probably not going to see any more detail by looking at the original item, so we think this supports research very well. You can go ahead and right click and download the full resolution version and do whatever you want with it.
It works almost the same way with video: you get a thumbnail, then this player, which is a bit blocky but allows you to preview a video quickly. If you want to watch or save the full-resolution version, you hit the “download movie” link.
For access, it’s our practice to give you as much as we possibly can and still stay within copyright law. For instance, this is from a list of results of videos. If you are at home doing your research, it will look like this. The top one is City of Vancouver copyright, and since we treat that like public domain, you are free to click through, watch it, download it, whatever. The bottom one is 3 rd -party copyright AND we don’t have the right to show it to you on the Web, so there’s a yellow warning that you can’t click through and watch it. We do show you thumbnails for 3 rd -party copyrighted material, so at least you get a glimpse. If you come down to the Archives to do your research, we can show you all the 3 rd -party copyrighted media. We just can’t broadcast them on the Web. You can log into our wireless network with your own laptop, or use one of our PCs, and you won’t see that yellow warning any more: you can go ahead and watch it. You can even download it for personal study or any of the other legal uses of copyrighted material. Since you are the one making the copy, the legal responsibility is yours. And that’s our access: we try to give you as much as we possibly can to support your research. As for digitization . . . The types of media we preserve and make accessible have changed over the years, and will no doubt keep changing. The ways we make it accessible to you have changed, too. But all our activities—including digitization—are driven by our goals of preservation, access and the need for authenticity.
We started our digitization program in 1998. There’s a huge research demand for photographic images, and before then, the only way we could give researchers access to negatives was to either print them onto paper as positives (which we did, but that’s expensive), or to just give researchers the negatives and let them figure it out. It’s tricky to make sense of photographs in negative form. It’s hard enough in black and white, but . . .
. . . even harder in colour. And if the researcher has to pay to have the negative turned into a print in order to understand it, we think that’s a barrier to access.
Digitizing these negatives gives the researcher not just quicker access, but better access. Preservation is better, too: we don’t have to worry about providing direct access to breakable glass negatives, or making sure the researcher wears gloves and doesn’t get fingerprints on the original. A few years after we started digitizing, we built a walk-in freezer to preserve our negatives, to stop the colours from fading and to stop the cellulose acetate materials from shrinking from this . . .
A few years after we started digitizing, we built a walk-in freezer to preserve our negatives, to stop the colours from fading and to stop the cellulose acetate materials from shrinking from this . . .
. . . to this. So one big preservation bonus we got from digitization was that we could put the plastic negatives into the freezer and leave them there. We do pull them out if someone needs the original, but it’s rare. Other access problems solved by digitization are small format negatives, like 16 or 35 mm, which are easier to appreciate, especially the fine detail, when digitized and enlarged on a computer monitor;
and really huge negatives, such as these panorama negatives which can be up to 8 feet long. We now have nearly 80,000 photographs digitized and available online, and we plan to add a lot more by the end of the year.
As well as still images, we’ve been digitizing moving images. We don’t have the capacity to do that ourselves, but we inspect, prep and rehouse all our films before we send them out. The films are digitized all the way through, from leader to leader, so you can tell we haven’t left anything out.
We inspect videotape ourselves as well. For all magnetic media, whether it’s video or audio, there’s an urgency to digitize the older tapes that are deteriorating, as well as the rarer formats that we expect nobody will be able to play (and so won’t be able to digitize) in a few years. We have had all our endangered videotapes digitized to a preservation format that we should be able to preserve into the future. We have well over 500 films and videos online.
We have in-house capability to digitize audiotape. Most of the audio we have is either open reel or cassette, and we are able to digitize both with professional equipment. Sometimes the tapes are so damaged that it’s impossible to digitize them all the way through as one continuous file.
In that case, we can produce separate audio files and another file called the Audio Decision List that shows how all these files fit together—it’s structural metadata. It’s also an Audio Engineering standard. This slide shows just an excerpt. This metadata is automatically generated by our software, and it’s part of our documentation for preserving the authenticity of the original. We don’t just produce a lot of files with possible timeline gaps between them.
We’ve added capacity for digitizing large maps and plans. We have tens of thousands of maps and so far we’ve scanned about 600. We could stop at making them accessible as digital images, but we realize that modern researchers need more than just a picture of a map.
We would like to turn our analogue records into data wherever we can . . .
. . . and here’s one way to do that. This software is called the Map Warper. We are in discussion with the developers, hoping to roll this out when we find $10,000. We intend that it will reside as a public-facing application on City servers, and we will upload high-resolution scans of our maps to it. Each scan would exist merely as an image until someone wanted to use it. Then they would match known points on the old map with known points on Open Street Map, and, if there were enough control points, they would rectify the old map: that is, the map would know where it belongs geographically. Once the map is rectified, the user can save it to the system in common formats and then others can download the rectified versions. We don’t have the resources to rectify all our scanned maps ourselves, but we think we can set up a system to enable you to roll your own when you need it.
In the meantime, while we try to figure out where to get $10,000, we’ve thought of another way to make digitized maps more useful. This is Vanmap, an amazing City GIS application. It has about 130 layers of useful information about Vancouver that you can turn on and off. Down the left side are all the layers, and you click them to turn them on and off. There’s the address search, zooming and many other functions. We’ve approached the City’s Vanmap team to see if they would be interested in turning our 1912 Fire Insurance Plan into a layer.
Fire Insurance plans exist as hundreds of separate pages. They have a lot of detail on the buildings, their materials, and building footprints on the property to allow insurance underwriters to determine the risk of fire. They are a fabulous historical resource. Although they are published, each copy is often a little different. Ours has been annotated by hand by Major Matthews. The plans were updated by gluing in little paper updates over the old buildings, but not every copy has those.
Look at the detail you can get. There’s the English Bay Bathhouse, the streetcar lines, the names of some of the apartment buildings, & the original shoreline.
LAC has this available as separate scanned pages. You need to figure out which page you want using this key plan and then click through the web page until you find it. Nothing wrong with this—it’s great that they have it online, and researchers find it tremendously useful. But we think it could be even better. Imagine this as a continuous, high-resolution layer, all the pages mosaicked together so you could virtually walk the streets of 1912 Vancouver. Since it would be part of Vanmap, you could also overlay other Vanmap layers to compare with features in the present-day city. We’d also make sure the layer was available through the City’s open data catalogue for download so you could do something else with it.
Going back to our work with moving images, it’s not always the older formats that are the problem. We’re even “digitizing” video that’s already digital. We have an in-house setup for doing that with DV codec video. For example, we have the original video from the Vancouver Electoral Reform Commission meetings, and they’re on miniDV tapes. It’s a digital format, but it’s not useful in that format. The plastic layer used for digital videotape is really thin, and if it gets stretched during playback the signal can be destroyed. You might not think of digital videotape from 2004 as endangered, but it is. Even finding professional playback equipment in good shape can be hard. We worked with AV specialists who had come up with a way to transfer DV tape containing human rights testimony for the United Nations. This comes right back to the idea of records as evidence. You can’t transfer human rights testimony from a tape to a file and just say “trust us, it’s all there, we didn’t leave anything out”. It has to be proven to be all there: that’s the authenticity piece. And this is how we provide that: We are not just transferring the sound and moving images, but also all the metadata when we turn the videotape into a file, so we can show that we got it all, and can even show when the recorder for the original video was turned on and off. Everything that was on that original tape, we have transferred into the file. The results and success of this transfer is all automatically recorded in a metadata file. As well as providing authenticity, this method saves us time. It uses scripting and visualizations to automate a lot of the work of checking the quality of the transfer.
The problem areas are highlighted, and we can go back to those to see if the problem is the transfer or if there’s something wrong with the tape itself.
Otherwise we’d have to watch every minute of every file to look for errors like this.
Finally, I’d like to look at the digitization of textual documents. We started with this Mc & Mc hardware catalogue, which has text and lots of images. We did optical character recognition on the pages so you can search them. There’s keyword searching, you can jump to different chapters and also browse the pages. If you need to know what colour of elevator paints were available through this catalogue, you can keyword search . . .
. . . and you’ll find this page.
We’ve also digitized the works of Harland Bartholomew that we have. These are very early planning documents for the City. They were digitized and OCR’d by the Internet Archive at their facility at the University of Toronto, and are available on the IA site in many formats, including this nifty bookreader. It has full-text searching, zooming, and it will even read the book to you.
This year we’re undertaking our most ambitious text digitization project yet. We’ll be scanning all 22,000 pages of City Council Minutes from the 1970s and then running OCR on them. This slide shows segmentation, which tells the OCR program what part of a page is text, a table or an image. This step gives a much more accurate result. We will make the Minutes available as searchable PDFs online for most researchers, but we’d like to work toward making the text available as open data, and PDF isn’t a useful open data format.
We want to separate the text layer from the image of the pages, and have the text available in a plain text file format. That’s pretty great in itself, but it’s really an intermediate stage, as it’s still just text, not structured information.
Ultimately, we’d like to be able to apply a standard XML markup to the text and make the minutes available as semantic data, both downloadable as open data and queryable on the Web.
There’s an XML markup standard for legislative documents that was developed in Africa and has become widely used worldwide. The problem is that it hasn’t been extended to apply to Council Minutes and reports. We’ll be working with others around the world who would like to have such a standard, and it will probably be built on this one. That’s way in the future; for now we need to start digitizing the minutes.
You can find all our digitized materials through searcharchives.vancouver.ca. We’ve also put curated samples onto social sites.
Find our digitized content hereMAIN: searcharchives.vancouver.caflickrYouTubeInternet Archive (both text & video)HistorypinFacebookAuthentiCity blog
Thanks to our digitization funders!• Friends of the Vancouver City Archives• National Archival Development Program (andpredecessors)• B.C. History Digitization Program• Bing Thom Architects• Dunbar Residents Association• Vancouver 125 Program• Vancouver Historical Society• Archival Community Digitization Program• Young Canada Works in Heritage Institutions• City of Vancouver