This document discusses open data and open science. It highlights Jean-Claude Bradley as a pioneer of open notebook science and open data who believed closed data means people die. It describes tools like ContentMine that can automatically extract data like chemical reactions, phylogenetic trees and clinical trial results from papers. Visitors can extract specific types of data while repositories can solve problems communally with continuous publication and validation.
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
Jean-Claude Bradley was a pioneer of doing Open Science and on 2014-07-14 we held a memorial meeting in Cambridge (see also http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium)
Copyright is one of the greatest barrier to Open Data. This presentation for insidegovernment UK shows the struggle between those who want to reform copyright and those opposed to reform
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
contentmine.org (funded by Shuttleworth Foundation) has developed tools and workshops to allow anyone to mine scientific content. This 10-minute presentation at Wellcome Trust encourages you to become involved - no previous knowledge required.
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
Jean-Claude Bradley was a pioneer of doing Open Science and on 2014-07-14 we held a memorial meeting in Cambridge (see also http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium)
Copyright is one of the greatest barrier to Open Data. This presentation for insidegovernment UK shows the struggle between those who want to reform copyright and those opposed to reform
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
PhD Theses are normally locked away digitally. They cost 20 billion dollars to create and we waste much of this value. By making them open we can use software to read, index, reuse, compute and add massive value
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
Followup meeting in London to OpenCon2014, on the need for different models of scholarly communication. I explore the history of 20thC academic student-based revolutions, with special relevance to young people and the scope for action today.
Published on Aug 22, 2014 by PMR
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
A presentation by Open Climate Knowledge for European Forum for Advanced Practices. Showing how the scientific literature can be searched for knowledge on this multidisciplinary topic.
The scientific scholarly literature now contains many millions of articles. The contain semi-structured information of high quality and veracity. We show how this resource can be converted to a universal Wikicite format and full-text indexed against Wikidata dictionaries. We now have > 5 million bibliographic records and over 200 dictionaries based in Wikidata properties and queriable by SPARQL.
The Publisher -Academic complex is a dystopian cycle where academia gives (mega)publishers manuscripts, reviews and money and the publishers give personal and institutional glory(vanity). This is analysed in its origins, impact and harm. The disruption can come from Advocacy/Activism, Community and Tools. Disruption comes from doing things Better or Novel, not Prices
AUDIO : https://soundcloud.com/damahub/peter-murray-rust-disturbing-the-publisher-academic-complex-210418-british-library
Thanks to DaMaHub
This has now been edited by Ewan McAndrew (Edinburgh Wikimedian in Residence) many thanks - to synchronize the slides with the soundtrack. https://media.ed.ac.uk/media/1_46h85ltt Brilliant
ContentMining (aka Text and Data Mining TDM) is beneficial, legal in the UK and a few other countries. Many groups in Europe are looking to make it legal there as well but there are many vested interests who oppose it.
This short presentation shows the benefits of content mining, some of the technology, and the way that it can be used and promotedby communities of practice. I urge all attendees at CopyCamp and also the wider world to press for liberalization of Copyright
The Culture of Research Data, by Peter Murray-RustLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Peter Murray-Rust, ContentMine.org and University of Cambridge
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
PhD Theses are normally locked away digitally. They cost 20 billion dollars to create and we waste much of this value. By making them open we can use software to read, index, reuse, compute and add massive value
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
Followup meeting in London to OpenCon2014, on the need for different models of scholarly communication. I explore the history of 20thC academic student-based revolutions, with special relevance to young people and the scope for action today.
Published on Aug 22, 2014 by PMR
Open Data and Open Science presented in Rio for Open Science 2014-08-22. I argue that Open Notebook Science is the way forward and will lead to great benefits
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
A presentation by Open Climate Knowledge for European Forum for Advanced Practices. Showing how the scientific literature can be searched for knowledge on this multidisciplinary topic.
The scientific scholarly literature now contains many millions of articles. The contain semi-structured information of high quality and veracity. We show how this resource can be converted to a universal Wikicite format and full-text indexed against Wikidata dictionaries. We now have > 5 million bibliographic records and over 200 dictionaries based in Wikidata properties and queriable by SPARQL.
The Publisher -Academic complex is a dystopian cycle where academia gives (mega)publishers manuscripts, reviews and money and the publishers give personal and institutional glory(vanity). This is analysed in its origins, impact and harm. The disruption can come from Advocacy/Activism, Community and Tools. Disruption comes from doing things Better or Novel, not Prices
AUDIO : https://soundcloud.com/damahub/peter-murray-rust-disturbing-the-publisher-academic-complex-210418-british-library
Thanks to DaMaHub
This has now been edited by Ewan McAndrew (Edinburgh Wikimedian in Residence) many thanks - to synchronize the slides with the soundtrack. https://media.ed.ac.uk/media/1_46h85ltt Brilliant
ContentMining (aka Text and Data Mining TDM) is beneficial, legal in the UK and a few other countries. Many groups in Europe are looking to make it legal there as well but there are many vested interests who oppose it.
This short presentation shows the benefits of content mining, some of the technology, and the way that it can be used and promotedby communities of practice. I urge all attendees at CopyCamp and also the wider world to press for liberalization of Copyright
The Culture of Research Data, by Peter Murray-RustLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Peter Murray-Rust, ContentMine.org and University of Cambridge
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
Talk at the World Science Festival at Columbia, June 2, 2017: session on Big Data and Physics: http://www.worldsciencefestival.com/programs/big-data-future-physics/
Keynote address 'Opening Science' at NORFest 2023 on November 2, 2023 at the Royal Irish Academy in Dublin Ireland. Keynote speaker: Chelle Gentemann, science lead for NASA’s Transform to Open Science Mission and co-chair of the U.S. White House Office for Science and Technology and Policy (OSTP) Sub-working group on the Year of Open Science
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
Data Science: History repeated? – The heritage of the Free and Open Source GI...Peter Löwe
Data Science is described as the process of knowledge extraction from large data sets by means of scientific
methods. The discipline draws heavily from techniques and theories from many fields, which are jointly used to
furthermore develop information retrieval on structured or unstructured very large datasets. While the term Data
Science was already coined in 1960, the current perception of this field places is still in the first section of the hype cycle according to Gartner, being well en route from the technology trigger stage to the peak of inflated
expectations.
In our view the future development of Data Science could benefit from the analysis of experiences from
related evolutionary processes. One predecessor is the area of Geographic Information Systems (GIS). The
intrinsic scope of GIS is the integration and storage of spatial information from often heterogeneous sources, data
analysis, sharing of reconstructed or aggregated results in visual form or via data transfer. GIS is successfully
applied to process and analyse spatially referenced content in a wide and still expanding range of science
areas, spanning from human and social sciences like archeology, politics and architecture to environmental and
geoscientific applications, even including planetology.
This paper presents proven patterns for innovation and organisation derived from the evolution of GIS,
which can be ported to Data Science. Within the GIS landscape, three strategic interacting tiers can be denoted: i) Standardisation, ii) applications based on closed-source software, without the option of access to and analysis of the implemented algorithms, and iii) Free and Open Source Software (FOSS) based on freely accessible program code enabling analysis, education and ,improvement by everyone. This paper focuses on patterns gained from the synthesis of three decades of FOSS development. We identified best-practices which evolved from long term FOSS projects, describe the role of community-driven global umbrella organisations such as OSGeo, as well as the standardization of innovative services. The main driver is the acknowledgement of a meritocratic attitude.
These patterns follow evolutionary processes of establishing and maintaining a web-based democratic culture
spawning new kinds of communication and projects. This culture transcends the established compartmentation and
stratification of science by creating mutual benefits for the participants, irrespective of their respective research
interest and standing. Adopting these best practices will enable
What is Open Science / Open Research?; Initiative of the European Union (EU); Elements of Open Science: open research process / cycle; open access (open repositories); open data; open source software; open notebook / lab book; open workflows; open reputation systems; citizen science; relationship between open research and e-research; open science in Africa and South Africa
Slides describing Force11 Work and background of several of the speakers, used for talks to University of Lethbridge, Carnegie Mellon and to Elsevier internally
The slides that will accompany my live webcast for OpenCon 2014 attendees, all about open data in research. The benefits, the how to (both legally & technically), examples, pitfalls, and the future of open research data.
Open Access and Research Communication: The Perspective of Force11Maryann Martone
Presentation at the National Federation of Advanced Information Services Workshop: Open Access to Published Research: Current Status and Future Directions, Philadelphia, PA USA November 22, 2013
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
Can machines understand the scientific literature?petermurrayrust
A presentation to Cambridge MPhil Computational Biology. 2020-11-11 . Presenters Peter Murray-Rust, Shweata Hegde and Ambreen Hamadani from https://github.com/petermr/openvirus .
This chunk is PMR with a large break in the middle for SH and AH talks.
I cover Global Challenges, knowledge equity, semantics of scientific articles, Wikidata, Data Extraction from images, and ethics/politics.
Answer: Yes, technically. No, politically as the Publisher-Academic Complex will block it.
Semantic content created from Open Access papers to help in the fight against viral epidemics. Includes contributions from NIPGR interns, 5 supported by Indian National Young Academy of Scientists.
Overview of openVirus project. Interns in India have worked for 2 months to extract scientific knowledge from the literature about viral epidemics. Covers data science, machine learning and virtual collaboration
Automatic mining of data from materials science literaturepetermurrayrust
The literature on materials science (batteries, etc.) contains huge amounts of scientific facts, but not in easily accessible form. our AMI program has been developed to automatically:
scrape , clean, annotate and display/publish
data for re-use in science.
Examples will be given from electrochemistry, magnetism and other fields . The general principles and (open) tech are applicable to many other disciplines.
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
XML can represent almost all well derfined scientific objects. chemistry, plants medcine. But it's not yet widely used. Is this because publishers oppose thr re-use of science?
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
Highlights the importance of supporting Early Career Researchers to pursue their own ideas, possibly alongside their main research. Illustrated with biology but applies to all fields of science. This was a 14 min presentation and shows narratives of how ECRs develop and reinforce each other.
Presentation given at NUI, Galway 2019-04-11 for Open Science Week.
An overview of Early Career Researchers, their innovation and contribution towards Open Infrastructure
The ContentMine system (Open Source) can search EuropePMC and download hundreds of articles in seconds. These can be indexed by AMI dictionaries allowing a rapid evaluations and refinement of the search
The scientific and medical literature is a vast resource of knowledge, but it needs turning into semantic FAIR form. The ContentMine can do this and we presented a rapid overview of the potential
A 10-minute talk to lovers of early science (e.g. 1600-1900) at the Royal Society. Archivists , computer vision, scientific historical metadata all relevant.
I chose 4 examples of monochrome diagrams that I can extract something from automatically. Some of the methids would scale to larger volumes , e.g. tables for figures, or maps with points
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
WikiFactMine https://www.wikidata.org/wiki/Wikidata:WikiFactMine consists of several hundreds dictionaries created from Wikidata. They cover everything from science to medicine to geo to arts. Every item has a unique identifier (Q) and normally has several properties (P) creating a series of triples. Using SPARQL it's possible to create sophiticated queries and run them in seconds
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
Presented to UIUC CIRSS seminars to a mixed group of Library, CS, domain scientists with a great contingent of Early Career Researchers. Starts by honouring the creation of the wonderful NCSA Mosaic at UIUC in 1993 and the paradise of knowledge and community it opened. Then shows the gradual and tragic decline of the web into a megacorporate neocolonialist empire, where knowledge is sacrificed for money and power.
You have seen many of the slides before but the words are different and have been recorded.
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
increasingly we find that mega-corporations have taken control over scholarship. We could use the scholarly literature as a knowledge resource but megacorps try to stop this - and often libraries support them rather than researchers.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Ebi
1. Open Data
Peter Murray-Rust*,
Open Knowledge and University of Cambridge
European Bioinformatics Institute, UK, 2014-05-15
*Shuttleworth Fellow 2014-5
2. Overview
• Most scientific data is lost; costs many billions…
• … AND LIVES. Closed Data Means People Die
• Human problem; lack of vision + active opposition.
• Fully open data can change this
• Appreciation of Jean-Claude Bradley’s work
• Panton Fellows (Ross Mounce, Sophie Kershaw)
• Content Mining as partial solution (Hargreaves UK)
• WHAT YOU MUST DO
19. Mat Todd, University of Sydney
• JC was a pioneer in open science, and uncompromising
about its importance. We had so many productive
interactions over the years, starting from the end of
January 2006, when we started our open chemistry project
on The Synaptic Leap (JC was the first to comment!) and JC
posted his very first experiment online at Usefulchem. I
remember starting to think about how to do completely
open projects, looking around the web in 2005 to see if
anything open was going on in chemistry, and coming
across JC's lone voice, and I thought "Wow, who is this
guy?" He had dedication and integrity - we'll all miss him.
2014-05-15 (Mail to PM-R)
22. The economic value of data
• I believe that we spend globally ca 400 billion
USD / yr on public research.
• The outputs include:
– Knowledge / papers / patents
– Organizations
– People
– materials
– Data – many billions/year and much is lost
27. https://en.wikipedia.org/wiki/Reinventing_Discovery
Michael Neilsen
Kasparov versus the World, The Wisdom of Crowds, various online collaborative projects
InnoCentive, collective intelligence, Paul Seabright's economic theory, online chat
History of Linux, Open Architecture Network, Wikipedia, MathWorks' computer programming
contest
communication in small groups, particularly as studied by Stasser and Titus; praxis of science; a
discussion of communication among scientists
Don R. Swanson and Literature-based discovery, predicting influenza with Google searches,
Sloan Digital Sky Survey, Allen Institute for Brain Science, Ocean Observatories Initiative, Human
Genome Project, Google Translate
Democratizing Science Galaxy Zoo, Foldit, citizen science, eBird, open access, arXiv, PLoS
The Challenge of Doing Science in the Open Complexity Zoo, academic publishing
The Open Science Imperative Open science, academic journal publishing reform, SPIRES
appendix - The problem solved by the Polymath Project
28.
29. “Free” and “Open”
• "Free software is a matter of liberty, not price. ’free
speech', not 'free beer'”. (RMS)
• “A piece of data or content is open if anyone is free to use,
reuse, and redistribute it”
(OKFN)http://opendefinition.org/
• “open” (access) has multiple incompatible “definitions”.
Major split is “human eyeballs” vs copying and machine
“reusability”
• “Open” is a marketing term for publishers, who frequently
(often deliberately) do not grant full Openness.
30. 4 Freedoms (Richard Stallman)
• Freedom 0: The freedom to run the program for any purpose.
• Freedom 1: The freedom to study how the program works, and
change it to make it do what you wish.
• Freedom 2: The freedom to redistribute copies so you can help
your neighbor.
• Freedom 3: The freedom to improve the program, and release
your improvements (and modified versions in general) to the
public, so that the whole community benefits.
"I’ve spent a third of my life building software based on Stallman’sfour freedoms, and
I’ve been astonished by the results. WordPress wouldn’t be here if it weren’t for those
freedoms, and it couldn’t have evolved the way it has.”
- Matt Mullenweg, co-creator of WordPress
31. Critical Historical Open Events
• Free Software Foundation (RMS,
1985) and Linux (Torvalds, 1991)
• The World Wide Web (TBL, 1991)
• The human genome (1990-2001)
The life of Aaron Swarz (1986-2013)
32. https://en.wikipedia.org/wiki/Bermuda_Principles
• Automatic release of sequence assemblies larger than 1
kb (preferably within 24 hours).
• Immediate publication of finished annotated
sequences.
• Aim to make the entire sequence freely available in the
public domain for both research and development in
order to maximise benefits to society.
33. http://www.budapestopenaccessinitiative.org/read
… an unprecedented public good. …
… completely free and unrestricted access to [peer-
reviewed literature] by all scientists, scholars, teachers,
students, and other curious minds. …
…Removing access barriers to this literature will
accelerate research, enrich education, share the
learning of the rich with the poor and the poor with
the rich, make this literature as useful as it can be, and
lay the foundation for uniting humanity in a common
intellectual conversation and quest for knowledge.
(BOAI, 2003)
35. Mendeley
From Wikipedia, the free encyclopedia
• Mendeley – a social media site used by many
scientists to store metadata …
• … purchased by Elsevier in 2013
• David Dobbs, in The New Yorker, described
motive as:
– to acquire its user data,
– to destroy or coöpt an open-science icon that
threatens its business model.
• PM-R: Mendeley can also Snoop and Control
43. Panton Principles for Open Data in
science(2010)
• …make an explicit and robust statement of your
wishes.
• Use a recognized waiver or license that is appropriate
for data.
• open as defined by the Open Knowledge/Data
Definition (… NOT non-commercial)
• Explicit dedication of data … into the public domain
via PDDL or CCZero
47. Reproducibility?
Begley & Ellis (2012)
Nature 483, 531-533
Image shown is from front page of Begley & Ellis
(2012), produced by the Nature Publishing Group
48. “Train a new generation of data scientists
and broaden public understanding”
“Riding The Wave”
European Commission
October 2010
49. Rotation-Based Learning (RBL)
Phase 1: Initiator
• No communication
permitted between groups
• Attempt to reproduce
existing literature
• Deliver a coherent research
story by the end of Phase 1
Phase 2: Successor
• Communication between
groups still prohibited
• Validate and develop the
inherited research story
• Critique your predecessors
• Role of research producer vs. research user
• Can this approach help to foster awareness of reproducibility issues?
Throughout Phases 1 & 2:
• Daily lectures on open
science culture & techniques
• First-hand application to own
research work
• Version control using GitHub
• Daily group supervision
50. “Do you think you would be
more confident in the future
about trying to apply Open
techniques to your work..?”
• 50% Yes, by myself
• 41% Yes, with help/guidance
• 9% No opinion/neutral
• 0% No
51. Ross Mounce (Bath), Panton Fellow
• Sharing research data:
http://www.slideshare.net/rossmounce
• How to figures from PLOS/One [link]:
Ross shows how to bring figures to life:
• PLOSOne at http://bit.ly/PLOStrees
• PLOS at http://bit.ly/phylofigs (demo)
53. Content Mining
“Lab” work paper/th
esis
Write
publish
???
DATA
Intelligent software
To read scientific papers
DATA
Despite the inefficiency and loss much unused data remains
In published articles. Publishers have tried to stop us mining it.
On 2014-06-01 IT WILL BE LEGAL IN UK!
54. Content Mining
• 1,000,000 papers/year => 3,000 / day => 2 /min
• 10,000+ phylogenetic trees (Ross Mounce, BBSRC)
• 20,000 chemical reactions / day
• >> 1 million graphs, plots, bar charts, statistics
• Possible on a laptop
• http://contentmine.org
Anyone interested in data from clinical trials papers?
55. AMI2: High-throughput extraction of
semantic chemistry from the scientific
literature
Andy Howlett, Mark Williamson, Peter Murray-Rust,
Unilever Centre, Cambridge
56. AMI2 is a framework that can extract
semantic data from the scientific
literature.
58. Visitor Design Pattern/Example
Visitor= something that extracts a specific type of data
SpeciesVisitor, ChemVisitor, PhylogeneticTreeVisitor,
GeoLocationVisitor, ClinicalTrialVisitor …
Visitable= something that can have specific data extracted
PDF, SVG, Table
66. Thanks
• BBSRC for PLUTo project (Bath)
• Unilever Research for PhD (Andy Howlett)
• TSB / Cambridge IP (PDRA Mark Williamson)
• Shuttleworth Foundation (Fellowship PM-R)
• Julian Huppert MP and David Willetts (support for
Hargreaves copyright reform)
• Christoph Steinbeck (EBI) Metabolights
• The ContentMine team (Michelle Brook, Ross Mounce,
Jenny Molloy, Richard Smith-Unna, CottageLabs)
• The Blue Obelisk
• Open Knowledge
• Apache PDFBox and all F/LOSS software authors
• Unilever Centre and University of Cambridge
67. CLOSED ACCESS MEANS PEOPLE DIE
• Create Open Notebook Science in your discipline
• Actively release data into Public Domain.
• Actively campaign against any re-use restrictions
(including CC-BY-NC)
• Refuse to work with closed organizations
CLOSED DATA MEANS PEOPLE DIE