Successfully reported this slideshow.
Your SlideShare is downloading. ×

A Revolution in Open Science: Open Data and the Role of Libraries (Professor Geoffrey Boulton at LIBER 2013)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 23 Ad

A Revolution in Open Science: Open Data and the Role of Libraries (Professor Geoffrey Boulton at LIBER 2013)

This talk was given by Prof. Geoffrey Boulton of the University of Edinburgh at LIBER's 42nd annual conference in Munich. Here is a brief summary: "The data storm that has been unleashed by novel means of data acquisition, manipulation and their instantaneous communication have posed both great challenges and opportunities for science. The challenge is to maintain scientific self-correction, which depends on concurrent publication of concepts and the underlying evidence. The opportunity is to exploit massive and complex data volumes in creating new knowledge. Both are non-trivial tasks. The former requires ‘intelligent openness‘."

"The latter requires new ways of thinking and new forms of collaboration, which make major demands on scientists, their institutions, those that fund science and those who publish it. Open access publishing is important, but open data is fundamental to scientific progress."

"In a post-Gutenberg era, can the library maintain its historic role as an efficient repository of scientific knowledge? Can it provide support for the creation of new knowledge? What responsibilities should it discharge, and how? What skills are required by those discharging the library function? And how do we achieve a realisable objective, of having all the publications online, all the data online, and for the two to be interoperable?"

Learn more about LIBER at www.libereurope.eu

This talk was given by Prof. Geoffrey Boulton of the University of Edinburgh at LIBER's 42nd annual conference in Munich. Here is a brief summary: "The data storm that has been unleashed by novel means of data acquisition, manipulation and their instantaneous communication have posed both great challenges and opportunities for science. The challenge is to maintain scientific self-correction, which depends on concurrent publication of concepts and the underlying evidence. The opportunity is to exploit massive and complex data volumes in creating new knowledge. Both are non-trivial tasks. The former requires ‘intelligent openness‘."

"The latter requires new ways of thinking and new forms of collaboration, which make major demands on scientists, their institutions, those that fund science and those who publish it. Open access publishing is important, but open data is fundamental to scientific progress."

"In a post-Gutenberg era, can the library maintain its historic role as an efficient repository of scientific knowledge? Can it provide support for the creation of new knowledge? What responsibilities should it discharge, and how? What skills are required by those discharging the library function? And how do we achieve a realisable objective, of having all the publications online, all the data online, and for the two to be interoperable?"

Learn more about LIBER at www.libereurope.eu

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to A Revolution in Open Science: Open Data and the Role of Libraries (Professor Geoffrey Boulton at LIBER 2013) (20)

Advertisement

More from LIBER Europe (20)

Recently uploaded (20)

Advertisement

A Revolution in Open Science: Open Data and the Role of Libraries (Professor Geoffrey Boulton at LIBER 2013)

  1. 1. A revolution in open science? - open data & the role of libraries - Geoffrey Boulton LIBER Munich June 2013
  2. 2. Open communication of data: the source of a scientific revolution and of scientific progress Henry Oldenburg
  3. 3. Problems & opportunities in the data deluge 1020 bytes Available storage
  4. 4. The challenge to Oldenburg’s principle: - a crisis of replicability and credibility? A fundamental principle: the data providing the evidence for a published concept MUST be concurrently published, together with the metadata But what about the vast data volumes that are not used to support publication as well as those that are?
  5. 5. The opportunity: new knowledge from data (some technical opportunities & priorities) Exploiting the potential of linked data requires: • data integration • dynamic data Solutions/agreements are needed for: • provenance • persistent identifiers • standards • data citation formats • algorithm integration • file-format translation • software-archiving • automated data reading • metadata generation • timing of data release
  6. 6. BUT - its not just sharing and integrating data – its also what we do with it! Jim Gray - “When you go and look at what scientists are doing, day in and day out, in terms of data analysis, it is truly dreadful. We are embarrassed by our data!” ….. and we need a new breed of informatics-trained data scientist as the new librarians of the post- Gutenberg world but will they be in the Library? So what are the priorities? 1. Ensuring valid reasoning 2. Innovative manipulation to create new information 3. Effective management of the data ecology 4. Education & training in data informatics & statistics
  7. 7. Benefits of open communication of data • Greater benefit to the individual than hugging their own data (e.g. bioinformatics) • Permits faster responses to emergencies (e.g. Hamburg 2011 E. Coli outbreak) • Deterrent to fraud (system integrity as well as personal integrity) • Stimulates novel and creative collaborations (e.g. Tim Gowers & “crowd-sourcing in mathematics) • Support for citizen science movement ( e.g. Galaxy Zoo, Fold-it, Ash-Tag, etc) • Demands by citizens for access (e.g. “Climategate”) • Efficient responses to global challenges
  8. 8. Benefits 1 : data sharing in ethos & practice – the example of bio-informatics ELIXIR Hub (European Bioinformatic Institute) and ELIXIR Nodes provide infrastructure for data, computing, tools, standards and training.
  9. 9. • E-coli outbreak spread through several countries affecting 4000 people • Strain analysed and genome released under an open data license. • Two dozen reports in a week with interest from 4 continents • Crucial information about strain’s virulence and resistance Benefits 2: Response to Gastro-intestinal infection in Hamburg
  10. 10. “Scientific fraud is rife: it's time to stand up for good science” “Science is broken” Examples:  psychology academics making up data,  anaesthesiologist Yoshitaka Fujii with 172 faked articles  Nature - rise in biomedical retraction rates overtakes rise in published papers Cause: Rewards and pressures promote extreme behaviours, and normalise malpractice (e.g. selective publication of positive novel findings) Cures: Open data for replication Transparent peer review Not just personal integrity – but system integrity Benefits 3: Response to fraud
  11. 11. Mathematics related discussions Tim Gowers - crowd-sourced mathematics An unsolved problem posed on his blog. 32 days – 27 people – 800 substantive contributions Emerging contributions rapidly developed or discarded Problem solved! “Its like driving a car whilst normal research is like pushing it” What inhibits such processes? - The criteria for credit and promotion. Benefits 4: Opening-up science: e.g. crowd-sourcing
  12. 12. • Openly collected science is already helping policy makers. • AshTag app allows users to submit photos and locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra). Chalara spread: 1992-2012 Benefits 5: Citizen Science
  13. 13. • Benefits 6: The Economy
  14. 14. Benefits 7: Openness to public scrutiny
  15. 15. Benefits 8: Global challenges
  16. 16. Boundaries of openness? Openness should be the default position, with proportional exceptions for: • Legitimate commercial interests (sectoral variation) • Privacy (completely anonymised data is impossible) • Safety, security & dual use (impacts contentious) All these boundaries are fuzzy
  17. 17. Openness of data per se has little value. Open science is more than disclosure For effective communication, replication and re-purposing we need intelligent openness. Data and meta-data must be: • Accessible • Intelligible • Assessable • Re-usable Only when these four criteria are fulfilled are data properly open. But, intelligent openness must be audience sensitive. Intelligent openness to fellow citizens normally makes far greater demands than openness to peers – we should prioritise “PUBLIC INTEREST SCIENCE” for the former.
  18. 18. Responsibilities & actions • Scientists: - changing the mindset • Learned Societies: - influencing their communities • Universities/Insts: - responsibility for the knowledge they produce? - incentives & promotion criteria - proactive, not just compliant - strategies (e.g. the library) - management processes • Funders of research: - mandate intelligent openness - accept diverse outputs - cost of open data is a cost of science - strategic funding for technical solutions (a priority for international collaboration) • Publishers: - mandate concurrent open deposition
  19. 19. A data management ecology? The role of the top-down and the bottom-up? Massive data loss Sum (little science data) > Sum (big science data)?
  20. 20. Can libraries rise to the challenges of a post- Gutenberg world? “Libraries do the wrong things, employ the wrong people” People • Funders mandate novel customers – the public • Can they attract data scientists? • Support for researchers & students Things • Reversing centralisation • A data repository – directory - metadata – background • Dynamic data • Selection problem • Compliant or proactive?
  21. 21. A taxonomy of openness Inputs Outputs Open access Administrative data (held by public authorities e.g. prescription data) Public Sector Research data (e.g. Met Office weather data) Research Data (e.g. CERN, generated in universities) Research publications (i.e. papers in journals) Open data Open science Collecting the data Doing research Doing science openly Researchers - Citizens - Citizen scientists – Businesses – Govt & Public sector Science as a public enterprise
  22. 22. A realiseable aspiration: all scientific literature open & online, all data open & online, and for them to interoperate … but, this is a process, not an event!
  23. 23. www.royalsociety.org

Editor's Notes

  • You have just been discussing global challenges; major targets for modern science – about science for policy. I will talk about some of the processes of science that will be vital if those challenges are to be effectively met – about policy for science. I restrict my talk to science itself, Chas Bountra will talk about one of the main domains of application – that of the commercial.
  • But first a little helpful history. This is Henry Oldenberg, the first secretary of the newly formed Royal Society in the early 1660s. Henry was an inveterate correspondent, with those we would now call scientists both in Europe and beyond. Rather than keep this correspondence private, he thought it would be a good idea to publish it, and persuaded the new Society to do so by creating the Philosophical Transactions, which remains a top-flight journal to the present day. But he demanded two things of his correspondents: that they should submit in the vernacular and not Latin; and that evidence (data) that supported a concept must be published together with the concept. It permitted others to scrutinize the logic of the concept, the extent to which it was supported by the data and permitted replication and re-use. Open publication of concept and evidence is the basis of “scientific self-correction”, which historians of science argue were the crucial building blocks on which the scientific revolution of the 18th and 19th centuries was built and remain fundamental to the progress of science, and its value to humanity as the most reliable means of acquiring knowledge. Openness to scrutiny by scientific peers is the most powerful form of peer review.
  • But Oldenberg’s world has changed. The last 20 years has seen a revolution in the rate at which data can be acquired, in the volume and complexity that can be stored and in the immediacy of ubiquitous communication. We have an unprecedented data storm. This poses challenges and opportunities in the way science is done.
  • The fundamental challenge is to scientific self-correction. Journals can no longer contain the data, and neither scientists nor journals have taken the obvious step of having data relevant to a publication concurrently available in an electronic database. (example of last year’s Nature paper revealing that only 11% of results in 50 benchmark papers in pre-clinical oncology were replicable – a crisis of credibility?)But enormous data volumes are created by publicly funded research that are never used as the basis for publication. There is a powerful argument that open sharing of such data has enormous potential.
  • The opportunity is, as some have argued, of another scientific revolution of a scale and significance equivalent to that of the 18th and 19th centuries. Integrating data from a large number of cognate databases, ensuring dynamic data that is automatically up-dated, and using powerful techniques for database interrogation and data and text mining permit us to ask questions in new ways with the potential to gain a more profound understanding of deep data relationships and the information and knowledge that they potentially contain, as promised by the vision of a semantic web.
  • Henry Oldenburg: the scientific journal and the process of peer review  Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society.  He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries.  He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote:   "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind."  Oldenburg also initiated the process of peer review of submissions by asking three of the Society’s Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.REFAaron Klug, (2000) “Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999”, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).
  • Open communication of data offers many benefits:  Data sharing in specific scientific communities offers greater benefits to the individual than does hugging their own data (e.g. bio-informatics).Data sharing permits faster, more efficient response to emergencies (e.g. the 2011 Hamburg-centred e-coli infection).Mandating open data concurrent with publication has the potential to deter fraud (examples of invented data) and malpractice (mention clinical trials) – stress system integrity as well as personal integrity.Openness stimulates novel, highly creative and efficient modes of scientific collaboration (e.g. crowd sourcing and Tim Gowers).The stimulus it offers to the “citizen science movement”, which has the potential in the next decade or so to fundamentally change the social dynamics of science.A response to the increasing demand from many citizens to interrogate for themselves the evidence for a particular policy that may have major impacts on the lives of individuals and society, and after all they pay, through their taxes for publicly funded scienceAnd finally, and crucially, it offers more efficient and speedier means of addressing many modern science-related challenges (e.g. climate change; energy; infectious pandemics etc)
  • Henry Oldenburg: the scientific journal and the process of peer review  Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society.  He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries.  He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote:   "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind."  Oldenburg also initiated the process of peer review of submissions by asking three of the Society’s Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.REFAaron Klug, (2000) “Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999”, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).
  • Henry Oldenburg: the scientific journal and the process of peer review  Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society.  He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries.  He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote:   "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind."  Oldenburg also initiated the process of peer review of submissions by asking three of the Society’s Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.REFAaron Klug, (2000) “Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999”, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).
  • Henry Oldenburg: the scientific journal and the process of peer review  Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society.  He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries.  He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote:   "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind."  Oldenburg also initiated the process of peer review of submissions by asking three of the Society’s Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.REFAaron Klug, (2000) “Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999”, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).
  • Openness of itself has no value unless it is “intelligent openness”, where data are:Accessible – they can be foundIntelligible – they can be understoodAssessable – e.g. does the creator have an interest in a particular outcome?Re-useable – sufficient meta-data to permit re-use and re-purposing.These should be standard criteria for an open data regime.But we must recognise that the amount of meta and background data required for intelligent openness to fellow citizens is usually far greater than that required for openness to scientific peers. If all data were to be intelligently open to fellow citizens on the basis that that have ultimately paid for it, science would stop tomorrow. A way forward would be to make a much greater effort to make data intelligently open in what we could call “public interest science”, including those issues that frequently arise in public debate or concern.

×