• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Principles of Open Data
 

Principles of Open Data

on

  • 2,039 views

 

Statistics

Views

Total Views
2,039
Views on SlideShare
2,033
Embed Views
6

Actions

Likes
1
Downloads
0
Comments
1

4 Embeds 6

https://twitter.com 2
http://www.onlydoo.com 2
http://www.comtel.in 1
http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Who am I? Geographer by training. Worked as a scientist looking after environmental NSDs. Now at DigitalNZ.\nLike many geographers I am a bit of a generalist. Background is in geography, data management, web and data visualisation.\nI am wearing three hats: National Library, software developer, hobbyist. \nThere are studies and surveys about open data. I will link to some at the end, but I figure that it would be most useful for me to speak from personal experience.\n
  • Central to DigitalNZ is data aggregation - we bring together the metadata of over 120 New Zealand content partners digital collections. The partners range from libraries, museums, galleries and archives to radio and TV stations, not to mention community organisations and international organisations that hold NZ-related content. \nDigitalNZ structures and standardises the metadata and stores it in a database.\nPerhaps the most important part of DigitalNZ is that we provide access to that data via a gateway called the public API (application programming interface). \nThis API enables developers and programmers to build new discovery tools to help expose our partners’ content in innovative and exciting ways.\nThe API is vital - because DigitalNZ also uses the API to power the DigitalNZ search engine…\n
  • So here is the DigitalNZ Search today http://www.digitalnz.org/ - all built on the API\n
  • And the search results… in this case a search on David Lange, a Labour Prime Minister in the 1980s.\nSo DigitalNZ is a great place to start an NZ related search. Our content partners include Te Ara, TVNZ, Te Papa, a wide variety of community organisations, museums and libraries.\n\nYou can then dig down further\n
  • Onto our more detailed landing pages\n
  • And go right through to the original collection sites, where you can focus in your search even more. \n
  • While navigating the results you can also create your own sets of favourite items.\n‘Sets’ is a feature that we launched in June this year and it is quite exciting because for the first time it allows people to come to one place and pool together related items from so many different organisations. We’ll give you a chance to make your own sets after this session. \n
  • And so this is a public set that someone has pulled together about New Zealand prime ministers.\nWhat we are seeing is people doing the digging and connecting, and then sharing it with others. You can imagine that the service is of particular interest to the education sector for use by students and classrooms.\n
  • Let’s get back to the all important metadata that sits behind DigitalNZ. \nA significant offering for DigitalNZ is our role as a data provider. \nAnyone can sign up for an API key (which gives them the credentials to use the API) and thereby access the full data record of each of the 25 million items aggregated in DigitalNZ.\n
  • UC CEISMIC is a project, run out of the University of Canterbury in Christchurch. It aims to build a comprehensive digital archive of material related to the Canterbury earthquakes.\nThey were planning to build their own service to aggregate this material, but they were able to use the DigitalNZ service instead. This is their website, and they are using our DigitalNZ data service to search across items related to the earthquakes. \nPeople wouldn’t know that we had anything to do with this, apart from the fact that they are fans of our work and they credit us. But the point here is that we can let them run their service, they’ve designed their own site, and we take a behind-the-scenes role in distributing access to NZ material.\n
  • I mentioned NZResearch as one of the early collaborative efforts in New Zealand. This is a service called ‘NZResearch’, it’s a facility for sharing the NZ research outputs from the organisations such as NZ universities and other tertiary education institutions. See http://nzresearch.org.nz/ . This site was redeveloped and migrated onto the DigitalNZ search infrastructure in 2011.\nSo it is now all powered by DigitalNZ and all of the NZ research outputs in this services are aggregated by DigitalNZ and delivered through the API.\n
  • This is the New Zealand Ministry of Education website Digistore, it is a storehouse of free-to-use digital content to support learning across the\nNew Zealand school curriculum from early childhood through to senior secondary student level.\nDigitalNZ items have been integrated as support resources for teachers. http://digistore.tki.org.nz/ec/p/home This has been done using the DigitalNZ API.\n
  • A current favourite of the DigitalNZ team, it is a keyword analysis tool across the digitised newspaper collections in DigitalNZ, namely National Library’s ‘Papers Past’ collection.\n http://wraggelabs.com/shed/querypic/ \nWhat you are seeing here is the frequency of the word “penguin” in historical newspaper articles.\nThe peak in 1910 is around the shipwreck of the SS Penguin.\n
  • \n
  • There are a couple of ways to think about this. The first is that good things come out of sharing scholarly data with others.\nOne aspect of this is captured in the motto of the Royal Society. Replication has a long history in science. The motto of The Royal Society is 'Nullius in verba', translated "Take no man's word for it."\n
  • \n
  • \n
  • You may also want to mention that overseas funders like the Wellcome Trust are mandating open access to the research they fund, they often require CC-BY too. I am not going to speak to these -- I assume that you are better informed about these institutions and their funding procedures than I am, but I wanted to acknowledge that they are part of the mix.\n
  • \n
  • \n
  • \n
  • Here was the thing that caught my eye. It’s the terms from the Marsden Fund contracts. \n
  • \n
  • \n
  • This stuff is obvious, but I really see it in practice.\n
  • \n
  • \n
  • \n
  • Linked data gets tricky quickly. I personally oscillate between thinking this is all so easy and obvious to believing it to be incredibly difficult. Entity reconciliation is hard. The technologies are maturing but still not entirely there. Ontology is HARD.\nI show this chart to give you a sense that it is not all or nothing. A star is a star is a star. You have some data as a scanned image? Great! Put it on the web. You have the data in an old Access database? Great! (...and so on...)\nDigitalNZ is at 3.5 stars. I want to say that we are at four stars, but I think there is some work we need to do around persistent URLs. We would like to get to five stars.\nMention authority mapping exercise. Cross-walking over National Library, Te Papa, NZETC, Cenotaph and NZ on Screen.\n
  • 1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22865170\n
  • Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms\n Accommodates widest range of research. Greatest flexibility\n You can always scale back up\n
  • \n
  • Machines can handle certain kinds of inputs much better than others. For example, handwritten notes on paper are very difficult for machines to process. Scanning text via Optical Character Recognition (OCR) results in many matching and formatting errors. Information shared in the widely-used PDF format, for example, is very difficult for machines to parse. Thus, information should be stored in widely-used file formats that easily lend themselves to machine processing. (When other factors necessitate the use of difficult-to-parse formats, data should also be available in machine-friendly formats.) These files should be accompanied by documentation related to the format and how to use it in relation to the data.\n
  • \n
  • \n
  • \n
  • \n
  • Find and correct errors\nFill in gaps\nNetwork effects\nEncourages use and reuse - living data\n
  • Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref: 1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22788072\n

Principles of Open Data Principles of Open Data Presentation Transcript

  • Principles of Open DataChris McDowallDigitalNZUniversity of OtagoOctober 25, 2012
  • • New CEISMIC Screen shot
  • Why?
  • Nullius in verba: Take no ones word for it Bookplate of the Royal Society (Great Britain) Flickr User : Penn Provenance Project CC-BY 2.0 http://www.flickr.com/photos/ 58558794@N07/6737170267/
  • “Half of these achievements were amongthe original ‘design goals’ of the SDSS, butthe other half were either entirelyunanticipated or not expected to be nearlyas exciting or powerful as they turned out tobe.” http://www.sdss.org/signature.html
  • (i) Ensure that, where appropriate, Research Findings are made public within a reasonable period of time. If, in the opinion of the Society, the Research Findings have not been made public within a reasonable period of time, the Contractor will provide the Society with written reasons on request regarding why the Research Findings have not been made public.(j) Include the following matters in the final report to the Society required under clause 4.2(c): (i) which data and sample repositories will be used to store the metadata, data and samples collected as part of the Programme; and (ii) where the metadata will be stored if no data or sample repositories are available.(k) Unless prohibited under any required ethical consent or approval, establish adequate and reasonable access to metadata, data and samples collected as part of the Programme within twelve months of the Completion Date of the Contract to: (i) people carrying out research; and (ii) national and international repositories.(l) Seek an exemption from the Society if it is not possible to comply with clauses 4.2(j) or 4.2(k) and provide written reasons why it is not possible to comply with those clauses. The Society will grant an exemption from clauses 4.2(j) and 4.2(k) where it considers that it would be unreasonable to require the Contractor to comply with them.
  • How?
  • The Zen of Open DataOpen is better than closed.Transparent is better than opaque.Simple is better than complex.Accessible is better than inaccessible.Sharing is better than hoarding.Linked is more useful than isolated.Fine-grained is preferable to aggregated.(Although there are legitimate privacy and security limitations.)Optimise for machine readability — they can translate forhumans.Barriers prevent worthwhile things from happening.Flawed, but out there is a million times better than perfect, butunattainable.Opening data up to thousands of eyes makes the data better.Iterate in response to demand.There is no one true feed for all eternity — people need tomaintain this stuff.
  • The Zen of Open DataOpen is better than closed.Transparent is better than opaque.Simple is better than complex.Accessible is better than inaccessible.Sharing is better than hoarding.Linked is more useful than isolated.Fine-grained is preferable to aggregated.(Although there are legitimate privacy and security limitations.)Optimise for machine readability — they can translate forhumans.Barriers prevent worthwhile things from happening.Flawed, but out there is a million times better than perfect, butunattainable.Opening data up to thousands of eyes makes the data better.Iterate in response to demand.There is no one true feed for all eternity — people need tomaintain this stuff.
  • Linked is more useful than isolatedTelegraph office. Lander, J M :Photographs of telephone exchanges and other post office material. Ref: 1/2-111693-G.Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22801049
  • ★ make your stuff available on the web (whatever format) ★★ make it available as structured data (e.g. excel instead of image scan of a table) ★★★ non-proprietary format (e.g. csv instead of excel) ★★★★ use URLs to identify things, so people can point at your stuff★★★★★ link your data to other people’s data to provide context
  • Fine-grained is preferable to aggregated...1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand.http://beta.natlib.govt.nz/records/22865170
  • Optimise for machine readability —they can translate for humansComputer room. K E Niven and Co :Commercial negatives. Ref: 1/2-227397-F.Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22900308
  • Flawed, but out there is a million times betterthan perfect, but unattainable
  • Opening data up to thousands of eyes makesthe data betterCrowd in Willis Street, Wellington, awaiting the results of the 1931 general election. Raine, William Hall Ref: 1/1-004500-G.Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/23116892
  • Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref:1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22788072