The document discusses principles of open data. It outlines key principles such as making data openly accessible on the web in structured formats like CSV instead of images or proprietary formats. It emphasizes linking data to provide context and optimizing data for machine readability so computers can translate for humans. The document also notes that imperfect data released is better than perfect data that is unattainable, and that opening data to many eyes improves data quality over time through iteration.
Polar research is inherently interdisciplinary and is becoming more so. Correspondingly, polar data managers have been working to meet very diverse communities and needs, especially after the progress of the International Polar Year 2007-8 (IPY). But is it enough? Despite their best efforts, the polar data and research communities can be rather insular. The unique challenges of polar research and data management may sometimes blind us to relevant developments in other parts of the world. At the same time, global initiatives and research in the lower latitudes often underplay, or even ignore, data needs and solutions in the polar regions. This conference emphasizes the need to extend polar issues more globally, yet the polar voice is still not loud enough in global conversations about data infrastructure.
Infrastructure, by its nature, must work across all scales. It requires a “glocal” perspective that simultaneously embraces both universalizing and particularizing tendencies. In this presentation I will discuss how there needs to be a constant interplay between local implementation and global design of data infrastructure. I will describe where the polar regions have had success in this area and where key challenges remain. I will describe a path forward for the polar data community to be better represented on the global stage through initiatives like the Research Data Alliance while also amplifying their effectiveness at the regional and local level. A goal is to improve the global understanding of polar issues while also improving the practice of polar data practitioners.
New challenges for digital scholarship and curation in the era of ubiquitous ...Derek Keats
A keynote presentation that I gave at the The 4th African Digital Scholarship and Curation Conference (see: http://www.nedicc.ac.za/test/Programme.aspx) on 16 May 2011.
A workflow experiment; or (The Unexpected Virtue of Ignorance)James Baker
Deck for a talk I gave at IT's Personal: collecting, preserving and using personal digital archives, Digital Preservation Coalition, 28 April 2015.
Notes at https://gist.github.com/drjwbaker/91ab21a95a1dd73d6e96
GBIF and reuse of research data, Bergen (2016-12-14)Dag Endresen
Biodiversity informatics seminar at the Department of Biology, University of Bergen on data publication and reuse of GBIF-mediated biodiversity data on 14th December 2016. Organized by the Norwegian GBIF Node and the Norwegian Biodiversity Information Center (NBIC, Artsdatabanken).
See also: http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
See also: http://doi.org/10.13140/RG.2.2.24290.32969
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
The Internet is currently the largest network of communication worldwide and is where technological advances could be observed. The original creation of the Internet was based on the idea that this network would be formed mainly by multiple independent networks with an arbitrary design. The Internet is the place where all countries communicate and disseminate information in real time, this phenomenon directly affects economies, businesses, and society. This article shows what the future of the Internet is, our research carries out a qualitative prospective analysis on projects and investigations in which the scientific community is currently working, the information is analyzed, and the highlighted topics are shown.
Polar research is inherently interdisciplinary and is becoming more so. Correspondingly, polar data managers have been working to meet very diverse communities and needs, especially after the progress of the International Polar Year 2007-8 (IPY). But is it enough? Despite their best efforts, the polar data and research communities can be rather insular. The unique challenges of polar research and data management may sometimes blind us to relevant developments in other parts of the world. At the same time, global initiatives and research in the lower latitudes often underplay, or even ignore, data needs and solutions in the polar regions. This conference emphasizes the need to extend polar issues more globally, yet the polar voice is still not loud enough in global conversations about data infrastructure.
Infrastructure, by its nature, must work across all scales. It requires a “glocal” perspective that simultaneously embraces both universalizing and particularizing tendencies. In this presentation I will discuss how there needs to be a constant interplay between local implementation and global design of data infrastructure. I will describe where the polar regions have had success in this area and where key challenges remain. I will describe a path forward for the polar data community to be better represented on the global stage through initiatives like the Research Data Alliance while also amplifying their effectiveness at the regional and local level. A goal is to improve the global understanding of polar issues while also improving the practice of polar data practitioners.
New challenges for digital scholarship and curation in the era of ubiquitous ...Derek Keats
A keynote presentation that I gave at the The 4th African Digital Scholarship and Curation Conference (see: http://www.nedicc.ac.za/test/Programme.aspx) on 16 May 2011.
A workflow experiment; or (The Unexpected Virtue of Ignorance)James Baker
Deck for a talk I gave at IT's Personal: collecting, preserving and using personal digital archives, Digital Preservation Coalition, 28 April 2015.
Notes at https://gist.github.com/drjwbaker/91ab21a95a1dd73d6e96
GBIF and reuse of research data, Bergen (2016-12-14)Dag Endresen
Biodiversity informatics seminar at the Department of Biology, University of Bergen on data publication and reuse of GBIF-mediated biodiversity data on 14th December 2016. Organized by the Norwegian GBIF Node and the Norwegian Biodiversity Information Center (NBIC, Artsdatabanken).
See also: http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
See also: http://doi.org/10.13140/RG.2.2.24290.32969
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
The Internet is currently the largest network of communication worldwide and is where technological advances could be observed. The original creation of the Internet was based on the idea that this network would be formed mainly by multiple independent networks with an arbitrary design. The Internet is the place where all countries communicate and disseminate information in real time, this phenomenon directly affects economies, businesses, and society. This article shows what the future of the Internet is, our research carries out a qualitative prospective analysis on projects and investigations in which the scientific community is currently working, the information is analyzed, and the highlighted topics are shown.
These are the slides from a plenary panel that I participated in at IEEE Cloud 2011 on July 5, 2011 in Washington, D.C. I discussed the Open Science Data Cloud and concluded the talk by three research questions
EUDAT Research Data Management | www.eudat.eu | EUDAT
| www.eudat.eu | The presentation gives an introduction to Research Data Management, explaining why it is important to manage and share data.
November 2016
Presentation given at the Consorcio Madrono conference on Data Management Plans in Horizon 2020 http://www.consorciomadrono.es/info/web/blogs/formacion/217.php
15. Nullius in verba: Take no one's word for it
Bookplate of the Royal Society (Great Britain)
Flickr User : Penn Provenance Project CC-BY 2.0
http://www.flickr.com/photos/
58558794@N07/6737170267/
16.
17. “Half of these achievements were among
the original ‘design goals’ of the SDSS, but
the other half were either entirely
unanticipated or not expected to be nearly
as exciting or powerful as they turned out to
be.”
http://www.sdss.org/signature.html
18.
19.
20.
21.
22. (i) Ensure that, where appropriate, Research Findings are made public within a
reasonable period of time. If, in the opinion of the Society, the Research
Findings have not been made public within a reasonable period of time, the
Contractor will provide the Society with written reasons on request regarding why
the Research Findings have not been made public.
(j) Include the following matters in the final report to the Society required under
clause 4.2(c):
(i) which data and sample repositories will be used to store the
metadata, data and samples collected as part of the Programme; and
(ii) where the metadata will be stored if no data or sample repositories
are available.
(k) Unless prohibited under any required ethical consent or approval, establish
adequate and reasonable access to metadata, data and samples collected
as part of the Programme within twelve months of the Completion Date of
the Contract to:
(i) people carrying out research; and
(ii) national and international repositories.
(l) Seek an exemption from the Society if it is not possible to comply with clauses
4.2(j) or 4.2(k) and provide written reasons why it is not possible to comply with
those clauses. The Society will grant an exemption from clauses 4.2(j) and 4.2(k)
where it considers that it would be unreasonable to require the Contractor to
comply with them.
25. The Zen of Open Data
Open is better than closed.
Transparent is better than opaque.
Simple is better than complex.
Accessible is better than inaccessible.
Sharing is better than hoarding.
Linked is more useful than isolated.
Fine-grained is preferable to aggregated.
(Although there are legitimate privacy and security limitations.)
Optimise for machine readability — they can translate for
humans.
Barriers prevent worthwhile things from happening.
'Flawed, but out there' is a million times better than 'perfect, but
unattainable'.
Opening data up to thousands of eyes makes the data better.
Iterate in response to demand.
There is no one true feed for all eternity — people need to
maintain this stuff.
26. The Zen of Open Data
Open is better than closed.
Transparent is better than opaque.
Simple is better than complex.
Accessible is better than inaccessible.
Sharing is better than hoarding.
Linked is more useful than isolated.
Fine-grained is preferable to aggregated.
(Although there are legitimate privacy and security limitations.)
Optimise for machine readability — they can translate for
humans.
Barriers prevent worthwhile things from happening.
'Flawed, but out there' is a million times better than 'perfect, but
unattainable'.
Opening data up to thousands of eyes makes the data better.
Iterate in response to demand.
There is no one true feed for all eternity — people need to
maintain this stuff.
27. Linked is more useful than isolated
Telegraph office. Lander, J M :Photographs of telephone exchanges and other post office material. Ref: 1/2-111693-G.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22801049
28.
29. ★ make your stuff available on the web
(whatever format)
★★ make it available as structured data
(e.g. excel instead of image scan of a table)
★★★ non-proprietary format
(e.g. csv instead of excel)
★★★★ use URLs to identify things, so people can point
at your stuff
★★★★★ link your data to other people’s data to provide
context
30. Fine-grained is preferable to aggregated...
1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand.
http://beta.natlib.govt.nz/records/22865170
31.
32.
33. Optimise for machine readability —
they can translate for humans
Computer room. K E Niven and Co :Commercial negatives. Ref: 1/2-227397-F.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22900308
34.
35. 'Flawed, but out there' is a million times better
than 'perfect, but unattainable'
36.
37.
38. Opening data up to thousands of eyes makes
the data better
Crowd in Willis Street, Wellington, awaiting the results of the 1931 general election. Raine, William Hall Ref: 1/1-004500-G.
Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/23116892
39. Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref:
1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/
records/22788072
Editor's Notes
Who am I? Geographer by training. Worked as a scientist looking after environmental NSDs. Now at DigitalNZ.\nLike many geographers I am a bit of a generalist. Background is in geography, data management, web and data visualisation.\nI am wearing three hats: National Library, software developer, hobbyist. \nThere are studies and surveys about open data. I will link to some at the end, but I figure that it would be most useful for me to speak from personal experience.\n
Central to DigitalNZ is data aggregation - we bring together the metadata of over 120 New Zealand content partners digital collections. The partners range from libraries, museums, galleries and archives to radio and TV stations, not to mention community organisations and international organisations that hold NZ-related content. \nDigitalNZ structures and standardises the metadata and stores it in a database.\nPerhaps the most important part of DigitalNZ is that we provide access to that data via a gateway called the public API (application programming interface). \nThis API enables developers and programmers to build new discovery tools to help expose our partners’ content in innovative and exciting ways.\nThe API is vital - because DigitalNZ also uses the API to power the DigitalNZ search engine…\n
So here is the DigitalNZ Search today http://www.digitalnz.org/ - all built on the API\n
And the search results… in this case a search on David Lange, a Labour Prime Minister in the 1980s.\nSo DigitalNZ is a great place to start an NZ related search. Our content partners include Te Ara, TVNZ, Te Papa, a wide variety of community organisations, museums and libraries.\n\nYou can then dig down further\n
Onto our more detailed landing pages\n
And go right through to the original collection sites, where you can focus in your search even more. \n
While navigating the results you can also create your own sets of favourite items.\n‘Sets’ is a feature that we launched in June this year and it is quite exciting because for the first time it allows people to come to one place and pool together related items from so many different organisations. We’ll give you a chance to make your own sets after this session. \n
And so this is a public set that someone has pulled together about New Zealand prime ministers.\nWhat we are seeing is people doing the digging and connecting, and then sharing it with others. You can imagine that the service is of particular interest to the education sector for use by students and classrooms.\n
Let’s get back to the all important metadata that sits behind DigitalNZ. \nA significant offering for DigitalNZ is our role as a data provider. \nAnyone can sign up for an API key (which gives them the credentials to use the API) and thereby access the full data record of each of the 25 million items aggregated in DigitalNZ.\n
UC CEISMIC is a project, run out of the University of Canterbury in Christchurch. It aims to build a comprehensive digital archive of material related to the Canterbury earthquakes.\nThey were planning to build their own service to aggregate this material, but they were able to use the DigitalNZ service instead. This is their website, and they are using our DigitalNZ data service to search across items related to the earthquakes. \nPeople wouldn’t know that we had anything to do with this, apart from the fact that they are fans of our work and they credit us. But the point here is that we can let them run their service, they’ve designed their own site, and we take a behind-the-scenes role in distributing access to NZ material.\n
I mentioned NZResearch as one of the early collaborative efforts in New Zealand. This is a service called ‘NZResearch’, it’s a facility for sharing the NZ research outputs from the organisations such as NZ universities and other tertiary education institutions. See http://nzresearch.org.nz/ . This site was redeveloped and migrated onto the DigitalNZ search infrastructure in 2011.\nSo it is now all powered by DigitalNZ and all of the NZ research outputs in this services are aggregated by DigitalNZ and delivered through the API.\n
This is the New Zealand Ministry of Education website Digistore, it is a storehouse of free-to-use digital content to support learning across the\nNew Zealand school curriculum from early childhood through to senior secondary student level.\nDigitalNZ items have been integrated as support resources for teachers. http://digistore.tki.org.nz/ec/p/home This has been done using the DigitalNZ API.\n
A current favourite of the DigitalNZ team, it is a keyword analysis tool across the digitised newspaper collections in DigitalNZ, namely National Library’s ‘Papers Past’ collection.\n http://wraggelabs.com/shed/querypic/ \nWhat you are seeing here is the frequency of the word “penguin” in historical newspaper articles.\nThe peak in 1910 is around the shipwreck of the SS Penguin.\n
\n
There are a couple of ways to think about this. The first is that good things come out of sharing scholarly data with others.\nOne aspect of this is captured in the motto of the Royal Society. Replication has a long history in science. The motto of The Royal Society is 'Nullius in verba', translated "Take no man's word for it."\n
\n
\n
You may also want to mention that overseas funders like the Wellcome Trust are mandating open access to the research they fund, they often require CC-BY too. I am not going to speak to these -- I assume that you are better informed about these institutions and their funding procedures than I am, but I wanted to acknowledge that they are part of the mix.\n
\n
\n
\n
Here was the thing that caught my eye. It’s the terms from the Marsden Fund contracts. \n
\n
\n
This stuff is obvious, but I really see it in practice.\n
\n
\n
\n
Linked data gets tricky quickly. I personally oscillate between thinking this is all so easy and obvious to believing it to be incredibly difficult. Entity reconciliation is hard. The technologies are maturing but still not entirely there. Ontology is HARD.\nI show this chart to give you a sense that it is not all or nothing. A star is a star is a star. You have some data as a scanned image? Great! Put it on the web. You have the data in an old Access database? Great! (...and so on...)\nDigitalNZ is at 3.5 stars. I want to say that we are at four stars, but I think there is some work we need to do around persistent URLs. We would like to get to five stars.\nMention authority mapping exercise. Cross-walking over National Library, Te Papa, NZETC, Cenotaph and NZ on Screen.\n
1957 General Election results board, Wellington. Ref: 1/2-101460-F. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22865170\n
Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms\n Accommodates widest range of research. Greatest flexibility\n You can always scale back up\n
\n
Machines can handle certain kinds of inputs much better than others. For example, handwritten notes on paper are very difficult for machines to process. Scanning text via Optical Character Recognition (OCR) results in many matching and formatting errors. Information shared in the widely-used PDF format, for example, is very difficult for machines to parse. Thus, information should be stored in widely-used file formats that easily lend themselves to machine processing. (When other factors necessitate the use of difficult-to-parse formats, data should also be available in machine-friendly formats.) These files should be accompanied by documentation related to the format and how to use it in relation to the data.\n
\n
\n
\n
\n
Find and correct errors\nFill in gaps\nNetwork effects\nEncourages use and reuse - living data\n
Gum digger having dentistry work done. Northwood brothers :Photographs of Northland. Ref: 1/1-011206-G. Alexander Turnbull Library, Wellington, New Zealand. http://beta.natlib.govt.nz/records/22788072\n