SlideShare a Scribd company logo
1 of 39
Principles of Open Data
Chris McDowall
DigitalNZ

University of Otago
October 25, 2012
• New CEISMIC Screen shot
Why?
Nullius in verba: Take no one's word for it




                          Bookplate of the Royal Society (Great Britain)
                          Flickr User : Penn Provenance Project CC-BY 2.0
                          http://www.flickr.com/photos/
                          58558794@N07/6737170267/
“Half of these achievements were among
the original ‘design goals’ of the SDSS, but
the other half were either entirely
unanticipated or not expected to be nearly
as exciting or powerful as they turned out to
be.”


                     http://www.sdss.org/signature.html
(i)     Ensure that, where appropriate, Research Findings are made public within a
      reasonable period of time. If, in the opinion of the Society, the Research
      Findings have not been made public within a reasonable period of time, the
      Contractor will provide the Society with written reasons on request regarding why
      the Research Findings have not been made public.

(j)     Include the following matters in the final report to the Society required under
      clause 4.2(c):

      (i)         which data and sample repositories will be used to store the
             metadata, data and samples collected as part of the Programme; and

      (ii)         where the metadata will be stored if no data or sample repositories
             are available.

(k)     Unless prohibited under any required ethical consent or approval, establish
      adequate and reasonable access to metadata, data and samples collected
      as part of the Programme within twelve months of the Completion Date of
      the Contract to:

      (i)      people carrying out research; and

      (ii)     national and international repositories.

(l)     Seek an exemption from the Society if it is not possible to comply with clauses
      4.2(j) or 4.2(k) and provide written reasons why it is not possible to comply with
      those clauses. The Society will grant an exemption from clauses 4.2(j) and 4.2(k)
      where it considers that it would be unreasonable to require the Contractor to
      comply with them.
How?
The Zen of Open Data

Open is better than closed.
Transparent is better than opaque.
Simple is better than complex.
Accessible is better than inaccessible.
Sharing is better than hoarding.
Linked is more useful than isolated.
Fine-grained is preferable to aggregated.
(Although there are legitimate privacy and security limitations.)
Optimise for machine readability — they can translate for
humans.
Barriers prevent worthwhile things from happening.
'Flawed, but out there' is a million times better than 'perfect, but
unattainable'.
Opening data up to thousands of eyes makes the data better.
Iterate in response to demand.
There is no one true feed for all eternity — people need to
maintain this stuff.
The Zen of Open Data

Open is better than closed.
Transparent is better than opaque.
Simple is better than complex.
Accessible is better than inaccessible.
Sharing is better than hoarding.
Linked is more useful than isolated.
Fine-grained is preferable to aggregated.
(Although there are legitimate privacy and security limitations.)
Optimise for machine readability — they can translate for
humans.
Barriers prevent worthwhile things from happening.
'Flawed, but out there' is a million times better than 'perfect, but
unattainable'.
Opening data up to thousands of eyes makes the data better.
Iterate in response to demand.
There is no one true feed for all eternity — people need to
maintain this stuff.
Linked is more useful than isolated




Telegraph office. Lander, J M :Photographs of telephone exchanges and other post office material. Ref: 1/2-111693-G.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22801049
★   make your stuff available on the web
        (whatever format)
   ★★   make it available as structured data
        (e.g. excel instead of image scan of a table)
  ★★★   non-proprietary format
        (e.g. csv instead of excel)
 ★★★★   use URLs to identify things, so people can point
        at your stuff
★★★★★   link your data to other people’s data to provide
        context
Fine-grained is preferable to aggregated...




1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand.
http://beta.natlib.govt.nz/records/22865170
Optimise for machine readability —
they can translate for humans




Computer room. K E Niven and Co :Commercial negatives. Ref: 1/2-227397-F.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22900308
'Flawed, but out there' is a million times better
than 'perfect, but unattainable'
Opening data up to thousands of eyes makes
the data better




Crowd in Willis Street, Wellington, awaiting the results of the 1931 general election. Raine, William Hall Ref: 1/1-004500-G.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/23116892
Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref:
1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/
records/22788072

More Related Content

What's hot

Going Glocal—Polar Data in a Global Infrastructure
Going Glocal—Polar Data in a Global InfrastructureGoing Glocal—Polar Data in a Global Infrastructure
Going Glocal—Polar Data in a Global Infrastructure
Research Data Alliance
 

What's hot (12)

Going Glocal—Polar Data in a Global Infrastructure
Going Glocal—Polar Data in a Global InfrastructureGoing Glocal—Polar Data in a Global Infrastructure
Going Glocal—Polar Data in a Global Infrastructure
 
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path Forward
 
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningThe Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
 
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
 
New challenges for digital scholarship and curation in the era of ubiquitous ...
New challenges for digital scholarship and curation in the era of ubiquitous ...New challenges for digital scholarship and curation in the era of ubiquitous ...
New challenges for digital scholarship and curation in the era of ubiquitous ...
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
 
A workflow experiment; or (The Unexpected Virtue of Ignorance)
A workflow experiment; or (The Unexpected Virtue of Ignorance)A workflow experiment; or (The Unexpected Virtue of Ignorance)
A workflow experiment; or (The Unexpected Virtue of Ignorance)
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 

Similar to Principles of Open Data

Similar to Principles of Open Data (20)

GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Internet Prospective Study
Internet Prospective StudyInternet Prospective Study
Internet Prospective Study
 
Critique and Reflections on Open Data Initiatives
Critique and Reflections on  Open Data  InitiativesCritique and Reflections on  Open Data  Initiatives
Critique and Reflections on Open Data Initiatives
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
 
Research data management: definitions, drivers and resources
Research data management: definitions, drivers and resourcesResearch data management: definitions, drivers and resources
Research data management: definitions, drivers and resources
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
EUDAT Research Data Management | www.eudat.eu |
EUDAT Research Data Management | www.eudat.eu | EUDAT Research Data Management | www.eudat.eu |
EUDAT Research Data Management | www.eudat.eu |
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
20160523 23 Research Data Things
20160523 23 Research Data Things20160523 23 Research Data Things
20160523 23 Research Data Things
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Cloud Computing Security From Sngle to multi Clouds Full Documentaion
Cloud Computing Security From Sngle to multi Clouds Full DocumentaionCloud Computing Security From Sngle to multi Clouds Full Documentaion
Cloud Computing Security From Sngle to multi Clouds Full Documentaion
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practices
 

Principles of Open Data

  • 1. Principles of Open Data Chris McDowall DigitalNZ University of Otago October 25, 2012
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. • New CEISMIC Screen shot
  • 11.
  • 12.
  • 13.
  • 14. Why?
  • 15. Nullius in verba: Take no one's word for it Bookplate of the Royal Society (Great Britain) Flickr User : Penn Provenance Project CC-BY 2.0 http://www.flickr.com/photos/ 58558794@N07/6737170267/
  • 16.
  • 17. “Half of these achievements were among the original ‘design goals’ of the SDSS, but the other half were either entirely unanticipated or not expected to be nearly as exciting or powerful as they turned out to be.” http://www.sdss.org/signature.html
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. (i) Ensure that, where appropriate, Research Findings are made public within a reasonable period of time. If, in the opinion of the Society, the Research Findings have not been made public within a reasonable period of time, the Contractor will provide the Society with written reasons on request regarding why the Research Findings have not been made public. (j) Include the following matters in the final report to the Society required under clause 4.2(c): (i) which data and sample repositories will be used to store the metadata, data and samples collected as part of the Programme; and (ii) where the metadata will be stored if no data or sample repositories are available. (k) Unless prohibited under any required ethical consent or approval, establish adequate and reasonable access to metadata, data and samples collected as part of the Programme within twelve months of the Completion Date of the Contract to: (i) people carrying out research; and (ii) national and international repositories. (l) Seek an exemption from the Society if it is not possible to comply with clauses 4.2(j) or 4.2(k) and provide written reasons why it is not possible to comply with those clauses. The Society will grant an exemption from clauses 4.2(j) and 4.2(k) where it considers that it would be unreasonable to require the Contractor to comply with them.
  • 23. How?
  • 24.
  • 25. The Zen of Open Data Open is better than closed. Transparent is better than opaque. Simple is better than complex. Accessible is better than inaccessible. Sharing is better than hoarding. Linked is more useful than isolated. Fine-grained is preferable to aggregated. (Although there are legitimate privacy and security limitations.) Optimise for machine readability — they can translate for humans. Barriers prevent worthwhile things from happening. 'Flawed, but out there' is a million times better than 'perfect, but unattainable'. Opening data up to thousands of eyes makes the data better. Iterate in response to demand. There is no one true feed for all eternity — people need to maintain this stuff.
  • 26. The Zen of Open Data Open is better than closed. Transparent is better than opaque. Simple is better than complex. Accessible is better than inaccessible. Sharing is better than hoarding. Linked is more useful than isolated. Fine-grained is preferable to aggregated. (Although there are legitimate privacy and security limitations.) Optimise for machine readability — they can translate for humans. Barriers prevent worthwhile things from happening. 'Flawed, but out there' is a million times better than 'perfect, but unattainable'. Opening data up to thousands of eyes makes the data better. Iterate in response to demand. There is no one true feed for all eternity — people need to maintain this stuff.
  • 27. Linked is more useful than isolated Telegraph office. Lander, J M :Photographs of telephone exchanges and other post office material. Ref: 1/2-111693-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22801049
  • 28.
  • 29. make your stuff available on the web (whatever format) ★★ make it available as structured data (e.g. excel instead of image scan of a table) ★★★ non-proprietary format (e.g. csv instead of excel) ★★★★ use URLs to identify things, so people can point at your stuff ★★★★★ link your data to other people’s data to provide context
  • 30. Fine-grained is preferable to aggregated... 1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22865170
  • 31.
  • 32.
  • 33. Optimise for machine readability — they can translate for humans Computer room. K E Niven and Co :Commercial negatives. Ref: 1/2-227397-F. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22900308
  • 34.
  • 35. 'Flawed, but out there' is a million times better than 'perfect, but unattainable'
  • 36.
  • 37.
  • 38. Opening data up to thousands of eyes makes the data better Crowd in Willis Street, Wellington, awaiting the results of the 1931 general election. Raine, William Hall Ref: 1/1-004500-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/23116892
  • 39. Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref: 1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/ records/22788072

Editor's Notes

  1. Who am I? Geographer by training. Worked as a scientist looking after environmental NSDs. Now at DigitalNZ.\nLike many geographers I am a bit of a generalist. Background is in geography, data management, web and data visualisation.\nI am wearing three hats: National Library, software developer, hobbyist. \nThere are studies and surveys about open data. I will link to some at the end, but I figure that it would be most useful for me to speak from personal experience.\n
  2. Central to DigitalNZ is data aggregation - we bring together the metadata of over 120 New Zealand content partners digital collections. The partners range from libraries, museums, galleries and archives to radio and TV stations, not to mention community organisations and international organisations that hold NZ-related content. \nDigitalNZ structures and standardises the metadata and stores it in a database.\nPerhaps the most important part of DigitalNZ is that we provide access to that data via a gateway called the public API (application programming interface). \nThis API enables developers and programmers to build new discovery tools to help expose our partners’ content in innovative and exciting ways.\nThe API is vital - because DigitalNZ also uses the API to power the DigitalNZ search engine…\n
  3. So here is the DigitalNZ Search today http://www.digitalnz.org/ - all built on the API\n
  4. And the search results… in this case a search on David Lange, a Labour Prime Minister in the 1980s.\nSo DigitalNZ is a great place to start an NZ related search. Our content partners include Te Ara, TVNZ, Te Papa, a wide variety of community organisations, museums and libraries.\n\nYou can then dig down further\n
  5. Onto our more detailed landing pages\n
  6. And go right through to the original collection sites, where you can focus in your search even more. \n
  7. While navigating the results you can also create your own sets of favourite items.\n‘Sets’ is a feature that we launched in June this year and it is quite exciting because for the first time it allows people to come to one place and pool together related items from so many different organisations. We’ll give you a chance to make your own sets after this session. \n
  8. And so this is a public set that someone has pulled together about New Zealand prime ministers.\nWhat we are seeing is people doing the digging and connecting, and then sharing it with others. You can imagine that the service is of particular interest to the education sector for use by students and classrooms.\n
  9. Let’s get back to the all important metadata that sits behind DigitalNZ. \nA significant offering for DigitalNZ is our role as a data provider. \nAnyone can sign up for an API key (which gives them the credentials to use the API) and thereby access the full data record of each of the 25 million items aggregated in DigitalNZ.\n
  10. UC CEISMIC is a project, run out of the University of Canterbury in Christchurch. It aims to build a comprehensive digital archive of material related to the Canterbury earthquakes.\nThey were planning to build their own service to aggregate this material, but they were able to use the DigitalNZ service instead. This is their website, and they are using our DigitalNZ data service to search across items related to the earthquakes. \nPeople wouldn’t know that we had anything to do with this, apart from the fact that they are fans of our work and they credit us. But the point here is that we can let them run their service, they’ve designed their own site, and we take a behind-the-scenes role in distributing access to NZ material.\n
  11. I mentioned NZResearch as one of the early collaborative efforts in New Zealand. This is a service called ‘NZResearch’, it’s a facility for sharing the NZ research outputs from the organisations such as NZ universities and other tertiary education institutions. See http://nzresearch.org.nz/ . This site was redeveloped and migrated onto the DigitalNZ search infrastructure in 2011.\nSo it is now all powered by DigitalNZ and all of the NZ research outputs in this services are aggregated by DigitalNZ and delivered through the API.\n
  12. This is the New Zealand Ministry of Education website Digistore, it is a storehouse of free-to-use digital content to support learning across the\nNew Zealand school curriculum from early childhood through to senior secondary student level.\nDigitalNZ items have been integrated as support resources for teachers. http://digistore.tki.org.nz/ec/p/home This has been done using the DigitalNZ API.\n
  13. A current favourite of the DigitalNZ team, it is a keyword analysis tool across the digitised newspaper collections in DigitalNZ, namely National Library’s ‘Papers Past’ collection.\n http://wraggelabs.com/shed/querypic/ \nWhat you are seeing here is the frequency of the word “penguin” in historical newspaper articles.\nThe peak in 1910 is around the shipwreck of the SS Penguin.\n
  14. \n
  15. There are a couple of ways to think about this. The first is that good things come out of sharing scholarly data with others.\nOne aspect of this is captured in the motto of the Royal Society. Replication has a long history in science. The motto of The Royal Society is 'Nullius in verba', translated "Take no man's word for it."\n
  16. \n
  17. \n
  18. You may also want to mention that overseas funders like the Wellcome Trust are mandating open access to the research they fund, they often require CC-BY too. I am not going to speak to these -- I assume that you are better informed about these institutions and their funding procedures than I am, but I wanted to acknowledge that they are part of the mix.\n
  19. \n
  20. \n
  21. \n
  22. Here was the thing that caught my eye. It’s the terms from the Marsden Fund contracts. \n
  23. \n
  24. \n
  25. This stuff is obvious, but I really see it in practice.\n
  26. \n
  27. \n
  28. \n
  29. Linked data gets tricky quickly. I personally oscillate between thinking this is all so easy and obvious to believing it to be incredibly difficult. Entity reconciliation is hard. The technologies are maturing but still not entirely there. Ontology is HARD.\nI show this chart to give you a sense that it is not all or nothing. A star is a star is a star. You have some data as a scanned image? Great! Put it on the web. You have the data in an old Access database? Great! (...and so on...)\nDigitalNZ is at 3.5 stars. I want to say that we are at four stars, but I think there is some work we need to do around persistent URLs. We would like to get to five stars.\nMention authority mapping exercise. Cross-walking over National Library, Te Papa, NZETC, Cenotaph and NZ on Screen.\n
  30. 1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22865170\n
  31. Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms\n Accommodates widest range of research. Greatest flexibility\n You can always scale back up\n
  32. \n
  33. Machines can handle certain kinds of inputs much better than others. For example, handwritten notes on paper are very difficult for machines to process. Scanning text via Optical Character Recognition (OCR) results in many matching and formatting errors. Information shared in the widely-used PDF format, for example, is very difficult for machines to parse. Thus, information should be stored in widely-used file formats that easily lend themselves to machine processing. (When other factors necessitate the use of difficult-to-parse formats, data should also be available in machine-friendly formats.) These files should be accompanied by documentation related to the format and how to use it in relation to the data.\n
  34. \n
  35. \n
  36. \n
  37. \n
  38. Find and correct errors\nFill in gaps\nNetwork effects\nEncourages use and reuse - living data\n
  39. Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref: 1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22788072\n