This document provides a summary of a presentation on open scientific knowledge and building a knowledgebase beyond traditional journals. The presentation discusses the problems with publishers controlling infrastructure and restricting access to knowledge. It demonstrates software tools like getpapers and AMI that can be used to freely access and search across scientific literature. The presentation advocates for open access to all scientific literature and building a sustainable community and organization to achieve this goal.
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
Open Access to Scientific Knowledge
1. MRC Cognition and Brain
Sciences Unit, Cambridge,
UK, 2018-11-20
Open Scientific Knowledge
Peter Murray-Rust
TheContentMine and
Dept of Chemistry , Univ of Cambridge
A new knowledgebase beyond journals
Images from ContentMine CC BY and Wikimedia CC BY-SA
pm286@cam.ac.uk
peter@contentmine.org
2. Tux and GNU: Open and Free Heroes
This is a story of liberation You can be part of it
And it will make life easier for you and citizens everywhere
TUX Linux GNU FSF Might be
controversial
3. OurOur story is
In 3 ACTS
Our
We’ll show
you WHY we
need OPEN
Our
Then a DEMO
of SOFTWARE
getpapers and
AMI
Our
Building
COMMUNITY
We need YOU/US
Structure of the presentation
Rapidly reading the literature and supporting systematic reviews
4. Sustainable Open?
We need volunteers And a sustainable organization
Cannot be bought commercially, 501(c)3, OpenLock
SSI,
Numfocus
6. (2x digital music industry!)
ContentMine is OpenLocked Non-Profit http://contentmine.org
The Right to Read is the Right to Mine
7. The problem: publishers control the infrastructure
Sucking money
out of the system
And destroying science in the Global South…
*In Fahrenheit 451 firemen burned books; in C21st publishers restrict knowledge
Completely
unregulated industry
Megapub451*
8. cc by-nc-sa license LabHack and Alliance Earth
1 APC = 1900 USD
1 bioreactor = 25 USD
1 Raspberry PI 55 USD
1 submission to bioRxiv
Free (10 USD hidden)
“a PCR machine in the UK
is around £6000 but in
Zimbabwe about $33000 -
try convincing someone to
pay APCs when they have
to try and save for that.”
CITIZENS!
Zimbabwe. LabHack team from
Harare Institute of Technology.
10. @Senficon (Julia Reda) :Text & Data mining in times of
#copyright maximalism:
"Elsevier stopped me doing my research"
http://onsnetwork.org/chartgerink/2015/11/16/elsevi
er-stopped-me-doing-my-research/ … #opencon #TDM
Elsevier stopped me doing my research
Chris Hartgerink
11. I am a statistician interested in detecting potentially problematic research such as data fabrication,
which results in unreliable findings and can harm policy-making, confound funding decisions, and
hampers research progress.
To this end, I am content mining results reported in the psychology literature. Content mining the
literature is a valuable avenue of investigating research questions with innovative methods. For
example, our research group has written an automated program to mine research papers for errors in
the reported results and found that 1/8 papers (of 30,000) contains at least one result that could
directly influence the substantive conclusion [1].
In new research, I am trying to extract test results, figures, tables, and other information reported in
papers throughout the majority of the psychology literature. As such, I need the research papers
published in psychology that I can mine for these data. To this end, I started ‘bulk’ downloading research
papers from, for instance, Sciencedirect. I was doing this for scholarly purposes and took into account
potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention
to redistribute the downloaded materials, had legal access to them because my university pays a
subscription, and I only wanted to extract facts from these papers.
Full disclosure, I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days.
This boils down to a server load of 0.0021GB/[min], 0.125GB/h, 3GB/day.
Approximately two weeks after I started downloading psychology research papers, Elsevier notified my
university that this was a violation of the access contract, that this could be considered stealing of
content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading
(which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university.
I am now not able to mine a substantial part of the literature, and because of this Elsevier is directly
hampering me in my research.
[1] Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The
prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 1–22.
doi: 10.3758/s13428-015-0664-2
Chris Hartgerink’s blog post
12. It costs 10 USD to mount an article on (bio)arXiv…
So why 2000 USD for a megapub451 article?
13. I can charge whatever I like!! No regulator!
academics pay – it’s not their money – and they get glory
14. APCs and Journals MUST GO!
arXiv
bioRxiv
chemRxiv
10$
Commercial publisher
1800$
Review
Production
Hosting
Corporate
Branding
Marketing
philanthropy
Shareholder
Profit
17. …so closed access means people die…
…The software will demonstrate how we can search in future …
I’m from Congo where Ebola comes
from. The Liberia outbreak
Was predicted 30 years ago in a
paywalled paper
18. Semantic Fulltext
• EuropePMC coherent OpenAccess
• getpapers: query , download (through API).
• AMI filters, checks[1], transforms facts in papers.
• sequences, species, genera, genes,
dictionaries
[0] All operations shown run in total of <3 minutes.
[1] Dictionaries and lookup.
[2] Usable from home by anyone
Zika endemic areas
Wikimedia CC-BY-SA
19. Open Components
• All the literature – free FULLTEXT everywhere
• Universal dictionary
• Open software – modular
• FRICTIONLESS – no gatekeepers
• CC BY, CC0, BSD/MIT/Apache/GNU,
22. All the world’s 5 million FAIR Open Scientific articles (* 0.1 MB = 0.5 TB),
indexed by ContentMine . Disk 30 GBP Raspberry Pi3. 50 GBP
CC BY, PeterMR
Disk
Raspberry PI
Power
23. *** getpapers runs FAST! Downloads 50 papers /
sec => 3000 / min => 200,000 /hour
*** AMI-search:
Dictionaries based on anything in Wikidata (50
million items!) or your own.
We show country, brainparts, funders, disease…
looking for feedback, volunteers, examples
25. DEMO!!
(a) What is “neuroimaging”??
getpapers –q “neuroimaging” –x –k 100
–o neuro;
ami-search-cooccur neuro
country disease funders
(b) What does the MRC unit do?
getpapers –q “MRC Cognition and Brain
Studies Unit” –x –k 2000 –o cbsu;
ami-search-cooccur neuro
country brainparts braincognition
funders animaltesting
26.
27. ECR communities we work with
• Open MOOC (Jon Tennant)
• OpenKnowledge Maps (Peter Kraker)
• Unpaywall (Heather Piwowar)
• World brain (Oli Sauter)
• And ContentMine Fellows
• Alexandra Bannach-Brown (Edinburgh, Bond)
(neuroscience and animal experiments)
• And …
28. AMI-Bio Proposal to Mozilla
We invite you to submit a full application
for AMI-bio: Citizen search and use of the
biomedical literature - Request ID number MF-
1811-05957.
Please submit your application by 11/30/2018.
29. Guanyang Zhang
Biology, Arizona
„My ContentMine Fellowship project will focus on mining weevil-plant associations from literature
records.“
„Motivation. Comprising ~70,000 described and 220,000 estimated species, weevils
(Curculionoidea) are one of the most diverse plant-feeding insect lineages and constitute nearly
5% of all known animals.“
„Knowledge of host plant associations is critical for pest management, conservation, and
comparative biological research. This knowledge is, however, scattered in 300 years of historical
literature and difficult to access.“
Weevil-plant association network graph made with Google Fusion Table. Each blue circle is a weevil
tribe and yellow circle a plant genus. The size of a circle represents the number of associations.
30. Neo Christopher Chung
Warsaw, Computational Biology
Wants to find out geographic and temporal differences in the use of genomic software tools
32. Julia Reda, Pirate MEP, running ContentMine
software to liberate science 2016-04-16
33. Lars Willighagen
15 years old NL
Wants: extract data about conifers (relations to chemicals, height etc.)
Outcome: database with webpage containing conifer properties
Table Facts Visualiser DEMO
Card DEMO
Word Cloud
„ I applied to this fellowship to learn new things and combine the ContentMine with two previous
projects I never got to finish, and I got really excited by the idea and the ContentMine at large.“
34. bioRxiv in
Citizen Health Search (CHS)
A proposal to Wellcome Trust (
Open Research in Health call) with
ContentMine, Cochrane and UCL-EPPI (CCU)
CHS puts semantic search on the desktop
of the searcher. We index all the visible
Medical literature, normalize, section
and index against a bank of user-chosen
dictionaries.
CHS takes input from EPMC, bioRxiv and
emerging community sources such as
Crossref, unpaywall and outputs to Zenodo,
Wikidata and CM-Science Source.
Citizen Dashboard
35. Question/s
• “How can I help?”
– Create dictionaries
– Document your voyage
– Spread the word
– Advocate
– Meet at the pub for hacking?
– Code (especially downstream - visualisation)
?Anyone seriously interested in automatic extraction of
data from tables and plots?
36. http://www.budapestopenaccessinitiative.org/read
… an unprecedented public good. …
… completely free and unrestricted access to [peer-
reviewed literature] by all scientists, scholars, teachers,
students, and other curious minds. …
…Removing access barriers to this literature will
accelerate research, enrich education, share the
learning of the rich with the poor and the poor with
the rich, make this literature as useful as it can be, and
lay the foundation for uniting humanity in a common
intellectual conversation and quest for knowledge.
(Budapest Open Access Initiative, 2003)