QUALITY AND QUANTITY: 
OPENING UP THE ARCHIVES 
Katja Bargum, producer, Yle Archives
Archives are gold in the digital age 
• Online, everything and nothing is archive material
NYT Innovation Report 2014 
” We need to think more 
about resurfacing 
evergreen content, 
organizing and 
packaging our work in 
more useful ways and 
pushing relevant 
content to readers. 
And to power these 
efforts, we should invest 
more in the 
unglamorous but 
essential work of 
tagging and 
structuring data.”
Yle Archives 
• Collections of the Finnish public broadcasting 
company, Yle 
• Yle est. 1927, tax funded 
• TV, radio, photographs, music recordings, sheet 
music, sound effects... 
• Large digitization effort underway
Online archive service
Online archive service 
• ”The Living Archive” web portal, est. 2006 
• Well-liked brand, 100 000 weekly visits 
• Widely used in e.g. schools 
• Large-scale remake underway
Quantity and quality 
• Freeing the masses 
• Findability and linkability 
• Added value 
• Co-creating / co-curating 
QUANTITY QUALITY
Freeing the masses 
• Currently starting to build an online program library 
• Using the on-demand player 
• Priority for development (e.g. personalisation)
Freeing the masses 
• Currently starting to build an online media library 
• Using the on-demand player 
• Priority for development (i.e. personalisation) 
• Aim: to open the archives large-scale 
• Serving the narrow audience segments 
• ”One person’s treasure is another’s trash” 
• Usable across yle.fi platform
Discoverability 
• Ontologized, linked metadata 
• Across yle.fi platform (API solutions, graph thinking) 
• How much is sufficient / useful?
Other content 
Program 
description 
Key terms 
Persons
Discoverability 
• Ontologized, linked key terms 
• How much is sufficient / useful? 
• Search engine optimization 
• UX perspective 
• Spreading the word 
• social media 
• personalised player to suggest
Added value 
• Articles, curated collections 
• Automatic vs. curated content production & 
linking
Part of these themes 
Connected to this 
TV show 
More stuff like this 
Most popular 
Key terms
Added value 
• Articles, curated collections 
• Automatic vs. curated content production & linking 
• Need to look at what is useful 
• Describing the past? 
• Historical background of current affairs? 
• Explaining unfamiliar content? 
• Meme generation
”Co-curation” 
• Concerns over copyright & other things 
• Privacy concerns, journalistic integrity of materials 
• PD and CC-BY for some materials 
• Flickr Commons, Freesound 
• Embedded solutions!
”Co-curation” 
• Concerns over copyright & other things 
• Privacy concerns, journalistic integrity of materials, 
curating expertise 
• PD and CC-BY for some materials 
• Flickr Commons, Freesound 
• Embedded solutions 
• Crowdsourcing, offline activities, audience requests
Challenges 
• National vs. international audience 
• What are the relevant terms for impact 
measurement? 
• Quantity vs. quality 
• Collaborations and outside platforms harder to measure
KEY POINTS 
•Make your metadata linkable 
•Enable usage across platforms 
•Optimise search 
•Analyse your user needs 
•Let others use your stuff (as 
far as possible)
Quality and quantity: opening up the archives

Quality and quantity: opening up the archives

  • 1.
    QUALITY AND QUANTITY: OPENING UP THE ARCHIVES Katja Bargum, producer, Yle Archives
  • 2.
    Archives are goldin the digital age • Online, everything and nothing is archive material
  • 3.
    NYT Innovation Report2014 ” We need to think more about resurfacing evergreen content, organizing and packaging our work in more useful ways and pushing relevant content to readers. And to power these efforts, we should invest more in the unglamorous but essential work of tagging and structuring data.”
  • 4.
    Yle Archives •Collections of the Finnish public broadcasting company, Yle • Yle est. 1927, tax funded • TV, radio, photographs, music recordings, sheet music, sound effects... • Large digitization effort underway
  • 6.
  • 7.
    Online archive service • ”The Living Archive” web portal, est. 2006 • Well-liked brand, 100 000 weekly visits • Widely used in e.g. schools • Large-scale remake underway
  • 8.
    Quantity and quality • Freeing the masses • Findability and linkability • Added value • Co-creating / co-curating QUANTITY QUALITY
  • 9.
    Freeing the masses • Currently starting to build an online program library • Using the on-demand player • Priority for development (e.g. personalisation)
  • 11.
    Freeing the masses • Currently starting to build an online media library • Using the on-demand player • Priority for development (i.e. personalisation) • Aim: to open the archives large-scale • Serving the narrow audience segments • ”One person’s treasure is another’s trash” • Usable across yle.fi platform
  • 12.
    Discoverability • Ontologized,linked metadata • Across yle.fi platform (API solutions, graph thinking) • How much is sufficient / useful?
  • 13.
    Other content Program description Key terms Persons
  • 14.
    Discoverability • Ontologized,linked key terms • How much is sufficient / useful? • Search engine optimization • UX perspective • Spreading the word • social media • personalised player to suggest
  • 15.
    Added value •Articles, curated collections • Automatic vs. curated content production & linking
  • 16.
    Part of thesethemes Connected to this TV show More stuff like this Most popular Key terms
  • 18.
    Added value •Articles, curated collections • Automatic vs. curated content production & linking • Need to look at what is useful • Describing the past? • Historical background of current affairs? • Explaining unfamiliar content? • Meme generation
  • 19.
    ”Co-curation” • Concernsover copyright & other things • Privacy concerns, journalistic integrity of materials • PD and CC-BY for some materials • Flickr Commons, Freesound • Embedded solutions!
  • 21.
    ”Co-curation” • Concernsover copyright & other things • Privacy concerns, journalistic integrity of materials, curating expertise • PD and CC-BY for some materials • Flickr Commons, Freesound • Embedded solutions • Crowdsourcing, offline activities, audience requests
  • 22.
    Challenges • Nationalvs. international audience • What are the relevant terms for impact measurement? • Quantity vs. quality • Collaborations and outside platforms harder to measure
  • 23.
    KEY POINTS •Makeyour metadata linkable •Enable usage across platforms •Optimise search •Analyse your user needs •Let others use your stuff (as far as possible)

Editor's Notes

  • #3 Archives are treasure troves. Speaking to the choir here, but I think it is increasingly realized. I think one of the reasons is that online, the concept of archives change. The internet in itself is a large archive. At FIAT-Ifta apparently there was talk of moving from speaking about Archives to Content. And so we see these phenomena: Open GLAM movement Linked Open Data as many have been talking about Competitive edge: Netflix is algorithms + Rotten Tomatoes + IMDB recently
  • #4 Increasingly, the Media is tapping into its resources as well. I’m sure many of you have read this NYT Innovation Report, that was apparently initially meant for internal use but was leaked: Long tail of content.
  • #5 My organisation is... Large collection of e.g. Half a million video objects and over a million audio objects
  • #6 So how do we get from here... To here?
  • #7 Today I will focus on online archive use at Yle. We launched an archive portal in 2006. The ”Living Archive” has been quite popular with some hundred thousand site visits weekly (The pop. Of Finland is 5,5 million) and is widely used e.g. in schools. Its focus has been on providing a background to the material, so basically articles accompanied by media clips. Currently, we are renewing the portal completely. And what we have been thinking a lot about is how to do this in terms of two things: Quantity and quality.
  • #8 Today I will focus on online archive use at Yle. We were out quite early in this business and launched an archive portal in 2006. The ”Living Archive” has been quite popular with some hundred thousand site visits weekly (The pop. Of Finland is 5,5 million) and is widely used e.g. in schools. Its focus has been on providing a background to the material, highly curated, so basically articles accompanied by media clips. Currently, we are renewing the portal completely. And what we have been thinking a lot about is how to do this in terms of two things: Quantity and quality.
  • #9 So with any archive, we usually have a lot of material to choose from. But some of the material is perhaps more valuable to users, and some material requires curating to be useful – for instance providing a background to why a certain TV program from the 60s was such a big deal in its time. When we want to ”open the archive”, how do we go about it? Do we Free the masses, i.e. Create an online library of all that is availiable? So sow a field of flowers? Provide ways to find the content and link it to other stuff? Sort the flowers into patches? What kind of added value do we provide such as curating or providing journalistic content? Make buquets? And how can we get the public to really interact with the content, to be co-creators or co-curators? Allow others to make their own flower arrangements? I think these have been themes that we have been discussing here during these two days, and so I’d like to tell you what we´ve been doing at Yle, which is basically trying to combine all of the above.
  • #10 Currently, we are starting to build an online program library. The programs will be availiable on our on-demand player, which currently functions as a catch-up service. This will make it easier for the users to find our stuff, as they don’t have to know what we consider to be archive material. We’re also aiming to provide unique web identifiers for the programs, in order to keep for instance links alive for a long time.
  • #11 So here is an example of what the ”program page” might look like. I’ll return to this in a little bit and show you the different parts.
  • #12 Our aim is to open the archives large-scale. The broadcast archive is part of the nation’s memory. And we’re also thinking that this is a great way for a public broadcaster to serve narrow audience segments that might not get a newly produced TV series very often. And we’re trying out the idea that we cannot always know what people would most like to see or what they remember – by acting large scale we can provide something for everyone. Copyright is still a nightmare for us. We have not been able to solve the issue like the other Scandinavians, so we have very limited possibilities for drama, entertainment and music. Hope lives on.
  • #13 For the second step – to make it easy for users to find the content. Most importantly: we are building a library of ontologized key terms that we tag onto the programs. And these also work to link content across the yle.fi platform We are also providing metadata such as program descriptions. There, we face a trade-off. Our database is not structured, the entries are of variable quality and so there is a lot of work that goes into providing descriptions. And the amount of time spent on this is time away from publishing more. So we’re going to have to look really closely at how much we invest on this. Another possibility is of course automatic metadata production, which we will be investigating in the years to come.
  • #15 Also adjusting our search engine from a user experience perspective.How do people arrive at the site? What are they looking for? Is it important for them to be able to search on e.g. Time span, TV channel, brand, etc... And of course marketing the collection is important.
  • #16 So then about adding value to the programs that we open. Firstly we have a team of journalists working on the archive material to write articles around it and create curated collections etc. The key terms that I described will make this work easier, and we can envision e.g. Some collections that are created quite automatically around e.g. A person or a term.
  • #17 So here is an article that incorporates a show. And there are many ways to commect it to other stuff.
  • #19 We need to look at what is useful though. There are many reasons that you’d want to provide added content. For instance you could want to create a collection that shows what life was like in the 60s. Or you could want to write an article about the history of a certain event to accompany some video of it. Last week the Swedes went on a Russian submarine hunt, this also happened in the 80s and we lifted it into an article. Or you could want to explain what an old show was about or how it affected society. Then thinking about web logic, it seems obvious that we want to be able to lift short clips out to share. One of the biggest successes is a 1-min clip of a very famous Finnish rock star that plays in Hanoi Rocks (if there are any 80s music fans here) being interviewed as a 17 year old wannabe guitar hero.
  • #20 And lastly, how to make it possible for people to really interact with the material. Here we are facing large challenges concerning rights. Firstly copyright issues, which limits how the material can be reused. And then other concerns over e.g. The privacy of people appearing, or the journalistic integrity of the materials. Some material we’ve been able to release either under Public Domain or as CC-BY. For the video content, I don’t see it as being possible in the large scale. Instead, we’ve managed to negociate the right for our player to be embedded on other pages. So this makes it possible for private users as well as e.g. Museums or newspapers to use the stuff, but again, not to edit it.
  • #21 Here is an example, there’s a Finnish politician very much on the rise, who appeared in a tv interview with her father when she was nine, and the largest Finnish newspaper embedded the clip on its pages.
  • #22 We are also looking into crowdsourcing as a way to decide what to prioritize, as well as a way to source metadata. But I have to say that one hurdle we face in-house is that this is a media house, where the journalists have strong professional pride, so that it’s not always easy to sell the argument that the audience should be able to do jouralistic work. And there is also a strong concern for the brand. We are a public broadcaster, tax funded, and for us it can be problematic if there is a blemish on the brand due to how these materials are treated.
  • #23 Lastly I’d like to list some other challenges that we’ve identified. Firstly about the audience segment that we are working with. The current archive service audience is +50, and so we do have a challenge to bring in the young audience. We’re hoping one way of doing this is by allowing archive material to be used across the yle platform. So on Yles music pages there could be archive material on a band etc. Copyright is a big concern and it can be seen to be even a risk to open what we can now. Perhaps people will be so disappointed that it will work against us? We’ve got Today, everything is measured and yle is striving to increase its audience, also because the mandate (tax funding) depends on having a large share of the audience. So how do we measure the usefulness of opening the archives?