15. Nullius in verba: Take no one's word for it
Bookplate of the Royal Society (Great Britain)
Flickr User : Penn Provenance Project CC-BY 2.0
http://www.flickr.com/photos/
58558794@N07/6737170267/
16.
17. “Half of these achievements were among
the original ‘design goals’ of the SDSS, but
the other half were either entirely
unanticipated or not expected to be nearly
as exciting or powerful as they turned out to
be.”
http://www.sdss.org/signature.html
18.
19.
20.
21.
22. (i) Ensure that, where appropriate, Research Findings are made public within a
reasonable period of time. If, in the opinion of the Society, the Research
Findings have not been made public within a reasonable period of time, the
Contractor will provide the Society with written reasons on request regarding why
the Research Findings have not been made public.
(j) Include the following matters in the final report to the Society required under
clause 4.2(c):
(i) which data and sample repositories will be used to store the
metadata, data and samples collected as part of the Programme; and
(ii) where the metadata will be stored if no data or sample repositories
are available.
(k) Unless prohibited under any required ethical consent or approval, establish
adequate and reasonable access to metadata, data and samples collected
as part of the Programme within twelve months of the Completion Date of
the Contract to:
(i) people carrying out research; and
(ii) national and international repositories.
(l) Seek an exemption from the Society if it is not possible to comply with clauses
4.2(j) or 4.2(k) and provide written reasons why it is not possible to comply with
those clauses. The Society will grant an exemption from clauses 4.2(j) and 4.2(k)
where it considers that it would be unreasonable to require the Contractor to
comply with them.
25. The Zen of Open Data
Open is better than closed.
Transparent is better than opaque.
Simple is better than complex.
Accessible is better than inaccessible.
Sharing is better than hoarding.
Linked is more useful than isolated.
Fine-grained is preferable to aggregated.
(Although there are legitimate privacy and security limitations.)
Optimise for machine readability — they can translate for
humans.
Barriers prevent worthwhile things from happening.
'Flawed, but out there' is a million times better than 'perfect, but
unattainable'.
Opening data up to thousands of eyes makes the data better.
Iterate in response to demand.
There is no one true feed for all eternity — people need to
maintain this stuff.
26. The Zen of Open Data
Open is better than closed.
Transparent is better than opaque.
Simple is better than complex.
Accessible is better than inaccessible.
Sharing is better than hoarding.
Linked is more useful than isolated.
Fine-grained is preferable to aggregated.
(Although there are legitimate privacy and security limitations.)
Optimise for machine readability — they can translate for
humans.
Barriers prevent worthwhile things from happening.
'Flawed, but out there' is a million times better than 'perfect, but
unattainable'.
Opening data up to thousands of eyes makes the data better.
Iterate in response to demand.
There is no one true feed for all eternity — people need to
maintain this stuff.
27. Linked is more useful than isolated
Telegraph office. Lander, J M :Photographs of telephone exchanges and other post office material. Ref: 1/2-111693-G.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22801049
28.
29. ★ make your stuff available on the web
(whatever format)
★★ make it available as structured data
(e.g. excel instead of image scan of a table)
★★★ non-proprietary format
(e.g. csv instead of excel)
★★★★ use URLs to identify things, so people can point
at your stuff
★★★★★ link your data to other people’s data to provide
context
30. Fine-grained is preferable to aggregated...
1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand.
http://beta.natlib.govt.nz/records/22865170
31.
32.
33. Optimise for machine readability —
they can translate for humans
Computer room. K E Niven and Co :Commercial negatives. Ref: 1/2-227397-F.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22900308
34.
35. 'Flawed, but out there' is a million times better
than 'perfect, but unattainable'
36.
37.
38. Opening data up to thousands of eyes makes
the data better
Crowd in Willis Street, Wellington, awaiting the results of the 1931 general election. Raine, William Hall Ref: 1/1-004500-G.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/23116892
39. Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref:
1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/
records/22788072
Editor's Notes
Who am I? Geographer by training. Worked as a scientist looking after environmental NSDs. Now at DigitalNZ.\nLike many geographers I am a bit of a generalist. Background is in geography, data management, web and data visualisation.\nI am wearing three hats: National Library, software developer, hobbyist. \nThere are studies and surveys about open data. I will link to some at the end, but I figure that it would be most useful for me to speak from personal experience.\n
Central to DigitalNZ is data aggregation - we bring together the metadata of over 120 New Zealand content partners digital collections. The partners range from libraries, museums, galleries and archives to radio and TV stations, not to mention community organisations and international organisations that hold NZ-related content. \nDigitalNZ structures and standardises the metadata and stores it in a database.\nPerhaps the most important part of DigitalNZ is that we provide access to that data via a gateway called the public API (application programming interface). \nThis API enables developers and programmers to build new discovery tools to help expose our partners’ content in innovative and exciting ways.\nThe API is vital - because DigitalNZ also uses the API to power the DigitalNZ search engine…\n
So here is the DigitalNZ Search today http://www.digitalnz.org/ - all built on the API\n
And the search results… in this case a search on David Lange, a Labour Prime Minister in the 1980s.\nSo DigitalNZ is a great place to start an NZ related search. Our content partners include Te Ara, TVNZ, Te Papa, a wide variety of community organisations, museums and libraries.\n\nYou can then dig down further\n
Onto our more detailed landing pages\n
And go right through to the original collection sites, where you can focus in your search even more. \n
While navigating the results you can also create your own sets of favourite items.\n‘Sets’ is a feature that we launched in June this year and it is quite exciting because for the first time it allows people to come to one place and pool together related items from so many different organisations. We’ll give you a chance to make your own sets after this session. \n
And so this is a public set that someone has pulled together about New Zealand prime ministers.\nWhat we are seeing is people doing the digging and connecting, and then sharing it with others. You can imagine that the service is of particular interest to the education sector for use by students and classrooms.\n
Let’s get back to the all important metadata that sits behind DigitalNZ. \nA significant offering for DigitalNZ is our role as a data provider. \nAnyone can sign up for an API key (which gives them the credentials to use the API) and thereby access the full data record of each of the 25 million items aggregated in DigitalNZ.\n
UC CEISMIC is a project, run out of the University of Canterbury in Christchurch. It aims to build a comprehensive digital archive of material related to the Canterbury earthquakes.\nThey were planning to build their own service to aggregate this material, but they were able to use the DigitalNZ service instead. This is their website, and they are using our DigitalNZ data service to search across items related to the earthquakes. \nPeople wouldn’t know that we had anything to do with this, apart from the fact that they are fans of our work and they credit us. But the point here is that we can let them run their service, they’ve designed their own site, and we take a behind-the-scenes role in distributing access to NZ material.\n
I mentioned NZResearch as one of the early collaborative efforts in New Zealand. This is a service called ‘NZResearch’, it’s a facility for sharing the NZ research outputs from the organisations such as NZ universities and other tertiary education institutions. See http://nzresearch.org.nz/ . This site was redeveloped and migrated onto the DigitalNZ search infrastructure in 2011.\nSo it is now all powered by DigitalNZ and all of the NZ research outputs in this services are aggregated by DigitalNZ and delivered through the API.\n
This is the New Zealand Ministry of Education website Digistore, it is a storehouse of free-to-use digital content to support learning across the\nNew Zealand school curriculum from early childhood through to senior secondary student level.\nDigitalNZ items have been integrated as support resources for teachers. http://digistore.tki.org.nz/ec/p/home This has been done using the DigitalNZ API.\n
A current favourite of the DigitalNZ team, it is a keyword analysis tool across the digitised newspaper collections in DigitalNZ, namely National Library’s ‘Papers Past’ collection.\n http://wraggelabs.com/shed/querypic/ \nWhat you are seeing here is the frequency of the word “penguin” in historical newspaper articles.\nThe peak in 1910 is around the shipwreck of the SS Penguin.\n
\n
There are a couple of ways to think about this. The first is that good things come out of sharing scholarly data with others.\nOne aspect of this is captured in the motto of the Royal Society. Replication has a long history in science. The motto of The Royal Society is 'Nullius in verba', translated "Take no man's word for it."\n
\n
\n
You may also want to mention that overseas funders like the Wellcome Trust are mandating open access to the research they fund, they often require CC-BY too. I am not going to speak to these -- I assume that you are better informed about these institutions and their funding procedures than I am, but I wanted to acknowledge that they are part of the mix.\n
\n
\n
\n
Here was the thing that caught my eye. It’s the terms from the Marsden Fund contracts. \n
\n
\n
This stuff is obvious, but I really see it in practice.\n
\n
\n
\n
Linked data gets tricky quickly. I personally oscillate between thinking this is all so easy and obvious to believing it to be incredibly difficult. Entity reconciliation is hard. The technologies are maturing but still not entirely there. Ontology is HARD.\nI show this chart to give you a sense that it is not all or nothing. A star is a star is a star. You have some data as a scanned image? Great! Put it on the web. You have the data in an old Access database? Great! (...and so on...)\nDigitalNZ is at 3.5 stars. I want to say that we are at four stars, but I think there is some work we need to do around persistent URLs. We would like to get to five stars.\nMention authority mapping exercise. Cross-walking over National Library, Te Papa, NZETC, Cenotaph and NZ on Screen.\n
1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22865170\n
Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms\n Accommodates widest range of research. Greatest flexibility\n You can always scale back up\n
\n
Machines can handle certain kinds of inputs much better than others. For example, handwritten notes on paper are very difficult for machines to process. Scanning text via Optical Character Recognition (OCR) results in many matching and formatting errors. Information shared in the widely-used PDF format, for example, is very difficult for machines to parse. Thus, information should be stored in widely-used file formats that easily lend themselves to machine processing. (When other factors necessitate the use of difficult-to-parse formats, data should also be available in machine-friendly formats.) These files should be accompanied by documentation related to the format and how to use it in relation to the data.\n
\n
\n
\n
\n
Find and correct errors\nFill in gaps\nNetwork effects\nEncourages use and reuse - living data\n
Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref: 1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22788072\n