Better Data for a Better World

Rothamsted Research, UK
Rothamsted Research, UKBioinformatics Specialist
Marco Brandizi <marco.brandizi@rothamsted.ac.uk>
Oct, 16th - iGEM 2020 webinar
BetterDataforaBetterWorld
Find this presentation on SlidesShare
Background source: https://pxhere.com/en/photo/857152
Hello!
• Geek since 1980s and C=64 times
• Started working with Life Science Data 2003
• at Univ. of Milano-Bicocca, EMBL-EBI
• and now Rothamsted Research
• Meanwhile, (h)activism in open source, open
data
A Long History
Mankind and Data
• Gather knowledge
• Know how things work, make predictions
• Improve our lives
• (in addition to being good on itself)
Egypt, 2500BC (https://brewminate.com/census-taking-in-the-ancient-world/)
In the past 20yrs or so
Economist, 2010
(https://www.economist.com/node/21521548)
Why and How?
In the past 20yrs or so
We advanced in
• Gathering (eg, smartphones, IoT, 5G)
• Stocking (eg, clouds)
• Processing (eg, AI, Machine Learning)
• Sharing (eg, web, standards, data portals)
• Searching (eg, NoSQL, Indexing)
• Visualising (eg, literature on HCI, data
charts)...
...Data, Information, Knowledge
duction Precision Farming TIM AgRA Present Future Conclusion References
recision Farming [1]
13 / 42
Images Source:
http://ieeeagra.com/ieeeagra/Downloads/20141204-Fernandez-Presentation.pdf
and establish virtuous circles
(background: https://www.flickr.com/photos/kevinmgill/14676390490/in/photostream/)
A World of Openness
The Cause for Open Data/Knowledge
• Data portals, policies, standards
• https://www.data.gov/, https://data.gov.uk/
• https://www.europeandataportal.eu/en
• https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information
• https://joinup.ec.europa.eu/
• In science
• https://fairsharing.org/
• https://www.nature.com/sdata/
• Data and activism
• DBPedia, aka Wikipedia as data (https://wiki.dbpedia.org/about)
• Wikidata (https://www.wikidata.org/)
• Open Street Map (https://www.openstreetmap.org/about)
Open Data Cause: The Life Science Use Case
https://evaprofecmc.jimdofree.com/unit-4-the-genetic-revolution/2-2-chromosomes-and-genes/
So, sequencing was (is) pretty much important...
Source: https://boydfuturist.wordpress.com/tag/human-genome-project/
(also an interesting reading)
...indeed
• The race to sequence the human genome
https://www.youtube.com/watch?v=AhsIF-cmoQQ
• The Human Genome Project Race
https://genomics-old.soe.ucsc.edu/research/hgp_race
• How to sequence human genome
https://www.youtube.com/watch?v=MvuYATh7Y74
Recommended:
Fast-forward to nowadays
Which integrates with a wealth of (open) data
And allows for Reuse and further Advancements
The Cause for Open Data
• Allows for reuse
• no need to regenerate
• less expensive
• Allows for integration between heterogeneous data
• different entities (genes, proteins, chemistry,
species, literature...)
• different scales (cells, organs, individuals,
populations)
• New discoveries, novel uses
• Reproducible science
• and quality improvement
Practical Reasons
The Cause for Open Data
• Public-funded data are ours
• Savings opportunities add up
• (but giving them out for free has a cost)
• Data are ours anyway (eg, genetic data)
• Transparency (and again, reproducibility)
• Public benefits outweigh private interests
Ethical Reasons
But, how?
Based on publications, which genes are related to yellow
rust? In which biological processes are their encoded
proteins involved?
1 2
3 4
5
6
1
2
3
4
5
6
Good Data Principles:
Interoperability through Standards
https://tinyurl.com/y5e6kfa2
https://doi.org/10.1186/s41074-019-0055-1
https://tinyurl.com/y3h9c65k
https://tinyurl.com/y2wzlwbk
Data Standards: schema.org example
https://www.bbcgoodfood.com/recipes/classic-potato-salad
Source & recommended read: https://www.slideshare.net/NiallBeard/bioschemas-workshop
schema.org used for Knetminer and Agrifood Data
github.com/Rothamsted/agri-schemas https://tinyurl.com/y44a5lj9
References
• Brandizi et al, 2018, https://europepmc.org/
article/med/30085931
• IB2018 presentation https://tinyurl.com/
yaq8nt5e
• AgriSchemas and data standards, IB 2019
• Reusing Knetminer data with Python/Jupyter
• https://tinyurl.com/yyhnkuyk
• https://tinyurl.com/y446y979
Good Data Principles: FAIR
• Findable
• ex, Give your dataset a DOI, which resolves to schema.org
descriptor, register it on datasetsearch.research.google.com
• Accessible
• ex, resolvable DOI makes it accessible. Wrap with access
control as needed
• Interoperable
• Eg, data described with schema.org, GO and other OBO
ontologies
• Query protocols/standards (eg, SPARQL, GraphQL APIs,
JSON Schema APIs, JSON-LD APIs)
• Reusable
• Clear licence
• Ideally, machine-readable licence (eg, CCREL)
Source and recommended read: https://tinyurl.com/yxocd3b9
Issues: Easier to Say than to Do
https://tinyurl.com/yxsftwvy
https://xkcd.com/927/
Issues: Common Good vs Private Interests
• ...Parts of the standard that are not priorities for Google are not well documented
anywhere. If they are priorities for Google, however, Google itself provides excellent
documentation about how information should be specified in schema.org so that Google
can use it. Because schema.org’s documentation is poor, the focus of attention stays on
Google.
Time to end Google’s domination of schema.org,
https://tinyurl.com/y6j7ke8u
• Not everyone wants data published, eg, failed clinical trials
• Balance needed between research needs and private lives, eg,
• The Immortal Life of Henrietta, Rebecca Skloot
• k-anonymity, mediation approaches
(Brandizi et al, 2017, https://doi.org/10.1186/s12911-017-0424-6)
Issues: Data are Power
http://www.tylervigen.com/spurious-correlations
Issues: Data are Power
• My son was a typically developing toddler. ... He received his first MMR at 19 months of
age. The change in him was almost immediate. He did not regress in development, but
his social skills became extremely compromised. Noises became unbearable...
MMR vaccine caused my son's autism, https://tinyurl.com/y2udlfcb
It's sad, but it's a spurious correlation, vaccines do not cause autism
Issues: are We in Control?
https://www.nature.com/articles/d41586-020-01874-9
https://tinyurl.com/yxay8w2j
https://www.bbc.com/news/business-42959755
https://tinyurl.com/ydykjugt
https://tinyurl.com/hu3lh32
And Which Control?
https://tinyurl.com/y2yjrkpa
https://tinyurl.com/y82zf8qu
https://www.youtube.com/watch?v=ciBLsJkQ1WY
So...
• Future is even more digital
• And even more data-intensive
• Everyone should at least have an idea
• Especially if you want to become a scientist
• About producing data (eg, FAIR, formats,
standards)
• And consuming data (eg, data resources, Graph
DB query languages)
• And more (eg, Python, Pandas, Graph DBs,
APIs)https://tinyurl.com/y5rdq7qx
So...
• Probably we need better management and (a
bit of, international) regulation
• of technical aspects (eg, PA standards,
research data publishing)
• of ethical aspects (eg, open access,
algorithms, censorships)
• But also more grassroots participation
• we are all responsible, especially as scientists
• Data science is cool!
https://tinyurl.com/y5rdq7qx
Acknowledgements
Ajit Singh

Software Engineer
• Joseph Hearnshaw, software engineer
• Samiul Haque, Ed Eyles, IT admins
• Alice Minotto, Earlham Inst, hosting providers
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, Data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Madhu Donepudi, Richard Holland, ext contractors,
developers
Keywan Hassani-Pak

Knetminer Team Leader
Chris Rawlings

Head of Computational & Analytical Sciences
Jeremy Parsons

Bioinformatics Scientist
Acknowledgements
Ajit Singh

Software Engineer
• Joseph Hearnshaw, software engineer
• Samiul Haque, Ed Eyles, IT admins
• Alice Minotto, Earlham Inst, hosting providers
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, Data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Madhu Donepudi, Richard Holland, ext contractors,
developers
Keywan Hassani-Pak

KnetMiner Team Leader
Chris Rawlings

Head of Computational & Analytical Sciences
Jeremy Parsons

Bioinformatics Scientist
AndYou!
Extras
The Cause for Open Data/Knowledge
• Open data is the idea that some data should be freely available to everyone to use and
republish as they wish, without restrictions from copyright, patents or other mechanisms of
control (https://en.wikipedia.org/wiki/Open_data)
• Popularised by Obama in 2009 [1], Hans Rosling [3], Tim Berners Lee [2] (recommended
readings/watches)
• [1] https://www.govtech.com/data/What-Obama-Did-for-Tech-Transparency-and-Open-Data.html
• [2] https://www.ted.com/talks/tim_berners_lee_the_next_web?language=en
• [3] https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen
IBM Watson
• Not the first time that AI passed the
Turing test (eg, Deep Blue and Chess,
1996)
• But big milestone (in 2011) about
knowledge management
• Specialisations possible, e.g., IBM
Watson Health
Mini documentary at
https://www.youtube.com/watch?v=P18EdAKuC1U
Surprising Data Insights
• Couples who argue often are more likely to
last long (90% accuracy)
• If you want such a life...
• Many other examples of surprising data:
9 Bizarre and Surprising Insights from
Data Science (https://tinyurl.com/yywgr2rv)
https://www.businessinsider.com/mathematical-secret-to-lasting-relationships-2015-6
Issues: Data are Power
Source and recommended read:
https://theconversation.com/five-maps-that-will-change-how-you-see-the-world-74967
1 of 37

Recommended

AgriFood Data, Models, Standards, Tools, Use Cases by
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesRothamsted Research, UK
304 views30 slides
FAIR Agronomy, where are we? The KnetMiner Use Case by
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
137 views26 slides
DCC Keynote 2007 by
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007Carole Goble
1K views72 slides
Sources of Change in Modern Knowledge Organization Systems by
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
2.4K views28 slides
Making your data good enough for sharing. by
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.FAIRDOM
4.8K views15 slides
Machines are people too by
Machines are people tooMachines are people too
Machines are people tooPaul Groth
1.1K views50 slides

More Related Content

What's hot

Biovision2017 Accessing the scientific literature by
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literaturepetermurrayrust
374 views11 slides
Open PHACTS : Linked Data Future Challenges by
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
1.1K views70 slides
High throughput mining of the scholarly literature; talk at NIH by
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
768 views77 slides
Metagenomic Data Provenance and Management using the ISA infrastructure --- o... by
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
5.9K views82 slides
Finding and Accessing Human Genomics Datasets by
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
457 views43 slides

What's hot(20)

Biovision2017 Accessing the scientific literature by petermurrayrust
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
petermurrayrust374 views
Open PHACTS : Linked Data Future Challenges by SciBite Limited
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
SciBite Limited1.1K views
High throughput mining of the scholarly literature; talk at NIH by petermurrayrust
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
petermurrayrust768 views
Metagenomic Data Provenance and Management using the ISA infrastructure --- o... by Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Finding and Accessing Human Genomics Datasets by Manuel Corpas
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
Manuel Corpas457 views
Data and Donuts: How to write a data management plan by C. Tobin Magle
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
C. Tobin Magle673 views
Content Mining of Science in Europe by petermurrayrust
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
petermurrayrust1.5K views
ContentMine: Mining the Scientific Literature by petermurrayrust
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
petermurrayrust1.2K views
Can machines understand the scientific literature by petermurrayrust
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
petermurrayrust1K views
Research Data Sharing: A Basic Framework by Paul Groth
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
Paul Groth1K views
Bioinformatics in the Era of Open Science and Big Data by Philip Bourne
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
Philip Bourne4.2K views
BioSharing.org - mapping the landscape of community standards, databases, dat... by Alejandra Gonzalez-Beltran
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
Towards Responsible Content Mining: A Cambridge perspective by petermurrayrust
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
petermurrayrust414 views
ContentMining for France and Europe; Lessons from 2 years in UK by petermurrayrust
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
petermurrayrust1.2K views
A Global Commons for Scientific Data: Molecules and Wikidata by petermurrayrust
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
petermurrayrust549 views
Amanuens.is HUmans and machines annotating scholarly literature by petermurrayrust
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
petermurrayrust1.1K views
Automatic Extraction of Knowledge from the Literature by TheContentMine
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
TheContentMine184 views
Wikidata and the Semantic Web of Food by Benjamin Good
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
Benjamin Good5.6K views

Similar to Better Data for a Better World

Biomedical Data Science: We Are Not Alone by
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
155 views48 slides
Data Science Meets Biomedicine, Does Anything Change by
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
14 views50 slides
Open Data in a Big Data World: easy to say, but hard to do? by
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
777 views41 slides
2016 09 cxo forum by
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
982 views36 slides
Open Data in a Global Ecosystem by
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
1.3K views36 slides
A coordinated framework for open data open science in Botswana/Simon Hodson by
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
330 views38 slides

Similar to Better Data for a Better World(20)

Biomedical Data Science: We Are Not Alone by Philip Bourne
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
Philip Bourne155 views
Data Science Meets Biomedicine, Does Anything Change by Philip Bourne
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
Philip Bourne14 views
Open Data in a Big Data World: easy to say, but hard to do? by LEARN Project
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
LEARN Project777 views
2016 09 cxo forum by Chris Dwan
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
Chris Dwan982 views
Open Data in a Global Ecosystem by Philip Bourne
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
Philip Bourne1.3K views
GODAN presentation with South Chinese Scientific Institutions by Johannes Keizer
GODAN presentation with South Chinese Scientific InstitutionsGODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific Institutions
Johannes Keizer263 views
Informatics Transform : Re-engineering Libraries for the Data Decade by Liz Lyon
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
Liz Lyon1.5K views
Trust and Accountability: experiences from the FAIRDOM Commons Initiative. by Carole Goble
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Carole Goble1.4K views
Introduction to Open Science and EOSC by Sarah Jones
Introduction to Open Science and EOSCIntroduction to Open Science and EOSC
Introduction to Open Science and EOSC
Sarah Jones331 views
CODATA International Training Workshop in Big Data for Science for Researcher... by Johann van Wyk
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
Johann van Wyk1.3K views
Rising tide of data update 20171024 by Keith Russell
Rising tide of data update 20171024Rising tide of data update 20171024
Rising tide of data update 20171024
Keith Russell46 views
Rising tide of data update by ARDC
Rising tide of data update Rising tide of data update
Rising tide of data update
ARDC311 views
New and Emerging Forms of Data by David De Roure
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of Data
David De Roure607 views
2019 June 27 - Big data and data science by Fabio Stella
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data science
Fabio Stella223 views
I o dav data workshop prof wafula final 19.9.17 by Tom Nyongesa
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
Tom Nyongesa333 views

More from Rothamsted Research, UK

Interoperable Data for KnetMiner and DFW Use Cases by
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
146 views15 slides
AgriSchemas: Sharing Agrifood data with Bioschemas by
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasRothamsted Research, UK
105 views7 slides
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain by
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food DomainRothamsted Research, UK
184 views16 slides
Continuos Integration @Knetminer by
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @KnetminerRothamsted Research, UK
409 views13 slides
AgriSchemas Progress Report by
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress ReportRothamsted Research, UK
243 views13 slides
Notes about SWAT4LS 2018 by
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Rothamsted Research, UK
96 views24 slides

More from Rothamsted Research, UK(20)

Publishing and Consuming FAIR Data A Case in the Agri-Food Domain by Rothamsted Research, UK
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine... by Rothamsted Research, UK
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and... by Rothamsted Research, UK
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle... by Rothamsted Research, UK
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
graph2tab, a library to convert experimental workflow graphs into tabular for... by Rothamsted Research, UK
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...

Recently uploaded

Experimental animal Guinea pigs.pptx by
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptxMansee Arya
42 views16 slides
TF-FAIR.pdf by
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdfDirk Roorda
6 views120 slides
별헤는 사람들 2023년 12월호 전명원 교수 자료 by
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료sciencepeople
68 views30 slides
Krishna VSC 692 Credit Seminar.pptx by
Krishna VSC 692 Credit Seminar.pptxKrishna VSC 692 Credit Seminar.pptx
Krishna VSC 692 Credit Seminar.pptxKrishnaSharma682993
11 views54 slides
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...SwagatBehera9
5 views36 slides
ALGAL PRODUCTS.pptx by
ALGAL PRODUCTS.pptxALGAL PRODUCTS.pptx
ALGAL PRODUCTS.pptxRASHMI M G
7 views17 slides

Recently uploaded(20)

Experimental animal Guinea pigs.pptx by Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya42 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople68 views
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera95 views
Determination of color fastness to rubbing(wet and dry condition) by crockmeter. by ShadmanSakib63
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
ShadmanSakib636 views
Indian council for child welfare by RenuWaghmare2
Indian council for child welfareIndian council for child welfare
Indian council for child welfare
RenuWaghmare27 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI9 views
Note on the Riemann Hypothesis by vegafrank2
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesis
vegafrank28 views
2. Natural Sciences and Technology Author Siyavula.pdf by ssuser821efa
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdf
ssuser821efa12 views
Presentation on experimental laboratory animal- Hamster by Kanika13641
Presentation on experimental laboratory animal- HamsterPresentation on experimental laboratory animal- Hamster
Presentation on experimental laboratory animal- Hamster
Kanika136416 views
Oral_Presentation_by_Fatma (2).pdf by fatmaalmrzqi
Oral_Presentation_by_Fatma (2).pdfOral_Presentation_by_Fatma (2).pdf
Oral_Presentation_by_Fatma (2).pdf
fatmaalmrzqi8 views
ELECTRON TRANSPORT CHAIN by DEEKSHA RANI
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAIN
DEEKSHA RANI16 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific121 views
Best Hybrid Event Platform.pptx by Harriet Davis
Best Hybrid Event Platform.pptxBest Hybrid Event Platform.pptx
Best Hybrid Event Platform.pptx
Harriet Davis10 views

Better Data for a Better World

  • 1. Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Oct, 16th - iGEM 2020 webinar BetterDataforaBetterWorld Find this presentation on SlidesShare Background source: https://pxhere.com/en/photo/857152
  • 2. Hello! • Geek since 1980s and C=64 times • Started working with Life Science Data 2003 • at Univ. of Milano-Bicocca, EMBL-EBI • and now Rothamsted Research • Meanwhile, (h)activism in open source, open data
  • 3. A Long History Mankind and Data • Gather knowledge • Know how things work, make predictions • Improve our lives • (in addition to being good on itself) Egypt, 2500BC (https://brewminate.com/census-taking-in-the-ancient-world/)
  • 4. In the past 20yrs or so Economist, 2010 (https://www.economist.com/node/21521548)
  • 5. Why and How? In the past 20yrs or so
  • 6. We advanced in • Gathering (eg, smartphones, IoT, 5G) • Stocking (eg, clouds) • Processing (eg, AI, Machine Learning) • Sharing (eg, web, standards, data portals) • Searching (eg, NoSQL, Indexing) • Visualising (eg, literature on HCI, data charts)... ...Data, Information, Knowledge duction Precision Farming TIM AgRA Present Future Conclusion References recision Farming [1] 13 / 42 Images Source: http://ieeeagra.com/ieeeagra/Downloads/20141204-Fernandez-Presentation.pdf and establish virtuous circles
  • 8. The Cause for Open Data/Knowledge • Data portals, policies, standards • https://www.data.gov/, https://data.gov.uk/ • https://www.europeandataportal.eu/en • https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information • https://joinup.ec.europa.eu/ • In science • https://fairsharing.org/ • https://www.nature.com/sdata/ • Data and activism • DBPedia, aka Wikipedia as data (https://wiki.dbpedia.org/about) • Wikidata (https://www.wikidata.org/) • Open Street Map (https://www.openstreetmap.org/about)
  • 9. Open Data Cause: The Life Science Use Case https://evaprofecmc.jimdofree.com/unit-4-the-genetic-revolution/2-2-chromosomes-and-genes/
  • 10. So, sequencing was (is) pretty much important... Source: https://boydfuturist.wordpress.com/tag/human-genome-project/ (also an interesting reading)
  • 11. ...indeed • The race to sequence the human genome https://www.youtube.com/watch?v=AhsIF-cmoQQ • The Human Genome Project Race https://genomics-old.soe.ucsc.edu/research/hgp_race • How to sequence human genome https://www.youtube.com/watch?v=MvuYATh7Y74 Recommended:
  • 13. Which integrates with a wealth of (open) data
  • 14. And allows for Reuse and further Advancements
  • 15. The Cause for Open Data • Allows for reuse • no need to regenerate • less expensive • Allows for integration between heterogeneous data • different entities (genes, proteins, chemistry, species, literature...) • different scales (cells, organs, individuals, populations) • New discoveries, novel uses • Reproducible science • and quality improvement Practical Reasons
  • 16. The Cause for Open Data • Public-funded data are ours • Savings opportunities add up • (but giving them out for free has a cost) • Data are ours anyway (eg, genetic data) • Transparency (and again, reproducibility) • Public benefits outweigh private interests Ethical Reasons
  • 17. But, how? Based on publications, which genes are related to yellow rust? In which biological processes are their encoded proteins involved? 1 2 3 4 5 6 1 2 3 4 5 6
  • 18. Good Data Principles: Interoperability through Standards https://tinyurl.com/y5e6kfa2 https://doi.org/10.1186/s41074-019-0055-1 https://tinyurl.com/y3h9c65k https://tinyurl.com/y2wzlwbk
  • 19. Data Standards: schema.org example https://www.bbcgoodfood.com/recipes/classic-potato-salad Source & recommended read: https://www.slideshare.net/NiallBeard/bioschemas-workshop
  • 20. schema.org used for Knetminer and Agrifood Data github.com/Rothamsted/agri-schemas https://tinyurl.com/y44a5lj9
  • 21. References • Brandizi et al, 2018, https://europepmc.org/ article/med/30085931 • IB2018 presentation https://tinyurl.com/ yaq8nt5e • AgriSchemas and data standards, IB 2019 • Reusing Knetminer data with Python/Jupyter • https://tinyurl.com/yyhnkuyk • https://tinyurl.com/y446y979
  • 22. Good Data Principles: FAIR • Findable • ex, Give your dataset a DOI, which resolves to schema.org descriptor, register it on datasetsearch.research.google.com • Accessible • ex, resolvable DOI makes it accessible. Wrap with access control as needed • Interoperable • Eg, data described with schema.org, GO and other OBO ontologies • Query protocols/standards (eg, SPARQL, GraphQL APIs, JSON Schema APIs, JSON-LD APIs) • Reusable • Clear licence • Ideally, machine-readable licence (eg, CCREL) Source and recommended read: https://tinyurl.com/yxocd3b9
  • 23. Issues: Easier to Say than to Do https://tinyurl.com/yxsftwvy https://xkcd.com/927/
  • 24. Issues: Common Good vs Private Interests • ...Parts of the standard that are not priorities for Google are not well documented anywhere. If they are priorities for Google, however, Google itself provides excellent documentation about how information should be specified in schema.org so that Google can use it. Because schema.org’s documentation is poor, the focus of attention stays on Google. Time to end Google’s domination of schema.org, https://tinyurl.com/y6j7ke8u • Not everyone wants data published, eg, failed clinical trials • Balance needed between research needs and private lives, eg, • The Immortal Life of Henrietta, Rebecca Skloot • k-anonymity, mediation approaches (Brandizi et al, 2017, https://doi.org/10.1186/s12911-017-0424-6)
  • 25. Issues: Data are Power http://www.tylervigen.com/spurious-correlations
  • 26. Issues: Data are Power • My son was a typically developing toddler. ... He received his first MMR at 19 months of age. The change in him was almost immediate. He did not regress in development, but his social skills became extremely compromised. Noises became unbearable... MMR vaccine caused my son's autism, https://tinyurl.com/y2udlfcb It's sad, but it's a spurious correlation, vaccines do not cause autism
  • 27. Issues: are We in Control? https://www.nature.com/articles/d41586-020-01874-9 https://tinyurl.com/yxay8w2j https://www.bbc.com/news/business-42959755 https://tinyurl.com/ydykjugt https://tinyurl.com/hu3lh32
  • 29. So... • Future is even more digital • And even more data-intensive • Everyone should at least have an idea • Especially if you want to become a scientist • About producing data (eg, FAIR, formats, standards) • And consuming data (eg, data resources, Graph DB query languages) • And more (eg, Python, Pandas, Graph DBs, APIs)https://tinyurl.com/y5rdq7qx
  • 30. So... • Probably we need better management and (a bit of, international) regulation • of technical aspects (eg, PA standards, research data publishing) • of ethical aspects (eg, open access, algorithms, censorships) • But also more grassroots participation • we are all responsible, especially as scientists • Data science is cool! https://tinyurl.com/y5rdq7qx
  • 31. Acknowledgements Ajit Singh
 Software Engineer • Joseph Hearnshaw, software engineer • Samiul Haque, Ed Eyles, IT admins • Alice Minotto, Earlham Inst, hosting providers • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, Data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak
 Knetminer Team Leader Chris Rawlings
 Head of Computational & Analytical Sciences Jeremy Parsons
 Bioinformatics Scientist
  • 32. Acknowledgements Ajit Singh
 Software Engineer • Joseph Hearnshaw, software engineer • Samiul Haque, Ed Eyles, IT admins • Alice Minotto, Earlham Inst, hosting providers • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, Data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak
 KnetMiner Team Leader Chris Rawlings
 Head of Computational & Analytical Sciences Jeremy Parsons
 Bioinformatics Scientist AndYou!
  • 34. The Cause for Open Data/Knowledge • Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control (https://en.wikipedia.org/wiki/Open_data) • Popularised by Obama in 2009 [1], Hans Rosling [3], Tim Berners Lee [2] (recommended readings/watches) • [1] https://www.govtech.com/data/What-Obama-Did-for-Tech-Transparency-and-Open-Data.html • [2] https://www.ted.com/talks/tim_berners_lee_the_next_web?language=en • [3] https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen
  • 35. IBM Watson • Not the first time that AI passed the Turing test (eg, Deep Blue and Chess, 1996) • But big milestone (in 2011) about knowledge management • Specialisations possible, e.g., IBM Watson Health Mini documentary at https://www.youtube.com/watch?v=P18EdAKuC1U
  • 36. Surprising Data Insights • Couples who argue often are more likely to last long (90% accuracy) • If you want such a life... • Many other examples of surprising data: 9 Bizarre and Surprising Insights from Data Science (https://tinyurl.com/yywgr2rv) https://www.businessinsider.com/mathematical-secret-to-lasting-relationships-2015-6
  • 37. Issues: Data are Power Source and recommended read: https://theconversation.com/five-maps-that-will-change-how-you-see-the-world-74967