SlideShare a Scribd company logo
Marco Brandizi <marco.brandizi@rothamsted.ac.uk>
Oct, 16th - iGEM 2020 webinar
BetterDataforaBetterWorld
Find this presentation on SlidesShare
Background source: https://pxhere.com/en/photo/857152
Hello!
• Geek since 1980s and C=64 times
• Started working with Life Science Data 2003
• at Univ. of Milano-Bicocca, EMBL-EBI
• and now Rothamsted Research
• Meanwhile, (h)activism in open source, open
data
A Long History
Mankind and Data
• Gather knowledge
• Know how things work, make predictions
• Improve our lives
• (in addition to being good on itself)
Egypt, 2500BC (https://brewminate.com/census-taking-in-the-ancient-world/)
In the past 20yrs or so
Economist, 2010
(https://www.economist.com/node/21521548)
Why and How?
In the past 20yrs or so
We advanced in
• Gathering (eg, smartphones, IoT, 5G)
• Stocking (eg, clouds)
• Processing (eg, AI, Machine Learning)
• Sharing (eg, web, standards, data portals)
• Searching (eg, NoSQL, Indexing)
• Visualising (eg, literature on HCI, data
charts)...
...Data, Information, Knowledge
duction Precision Farming TIM AgRA Present Future Conclusion References
recision Farming [1]
13 / 42
Images Source:
http://ieeeagra.com/ieeeagra/Downloads/20141204-Fernandez-Presentation.pdf
and establish virtuous circles
(background: https://www.flickr.com/photos/kevinmgill/14676390490/in/photostream/)
A World of Openness
The Cause for Open Data/Knowledge
• Data portals, policies, standards
• https://www.data.gov/, https://data.gov.uk/
• https://www.europeandataportal.eu/en
• https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information
• https://joinup.ec.europa.eu/
• In science
• https://fairsharing.org/
• https://www.nature.com/sdata/
• Data and activism
• DBPedia, aka Wikipedia as data (https://wiki.dbpedia.org/about)
• Wikidata (https://www.wikidata.org/)
• Open Street Map (https://www.openstreetmap.org/about)
Open Data Cause: The Life Science Use Case
https://evaprofecmc.jimdofree.com/unit-4-the-genetic-revolution/2-2-chromosomes-and-genes/
So, sequencing was (is) pretty much important...
Source: https://boydfuturist.wordpress.com/tag/human-genome-project/
(also an interesting reading)
...indeed
• The race to sequence the human genome
https://www.youtube.com/watch?v=AhsIF-cmoQQ
• The Human Genome Project Race
https://genomics-old.soe.ucsc.edu/research/hgp_race
• How to sequence human genome
https://www.youtube.com/watch?v=MvuYATh7Y74
Recommended:
Fast-forward to nowadays
Which integrates with a wealth of (open) data
And allows for Reuse and further Advancements
The Cause for Open Data
• Allows for reuse
• no need to regenerate
• less expensive
• Allows for integration between heterogeneous data
• different entities (genes, proteins, chemistry,
species, literature...)
• different scales (cells, organs, individuals,
populations)
• New discoveries, novel uses
• Reproducible science
• and quality improvement
Practical Reasons
The Cause for Open Data
• Public-funded data are ours
• Savings opportunities add up
• (but giving them out for free has a cost)
• Data are ours anyway (eg, genetic data)
• Transparency (and again, reproducibility)
• Public benefits outweigh private interests
Ethical Reasons
But, how?
Based on publications, which genes are related to yellow
rust? In which biological processes are their encoded
proteins involved?
1 2
3 4
5
6
1
2
3
4
5
6
Good Data Principles:
Interoperability through Standards
https://tinyurl.com/y5e6kfa2
https://doi.org/10.1186/s41074-019-0055-1
https://tinyurl.com/y3h9c65k
https://tinyurl.com/y2wzlwbk
Data Standards: schema.org example
https://www.bbcgoodfood.com/recipes/classic-potato-salad
Source & recommended read: https://www.slideshare.net/NiallBeard/bioschemas-workshop
schema.org used for Knetminer and Agrifood Data
github.com/Rothamsted/agri-schemas https://tinyurl.com/y44a5lj9
References
• Brandizi et al, 2018, https://europepmc.org/
article/med/30085931
• IB2018 presentation https://tinyurl.com/
yaq8nt5e
• AgriSchemas and data standards, IB 2019
• Reusing Knetminer data with Python/Jupyter
• https://tinyurl.com/yyhnkuyk
• https://tinyurl.com/y446y979
Good Data Principles: FAIR
• Findable
• ex, Give your dataset a DOI, which resolves to schema.org
descriptor, register it on datasetsearch.research.google.com
• Accessible
• ex, resolvable DOI makes it accessible. Wrap with access
control as needed
• Interoperable
• Eg, data described with schema.org, GO and other OBO
ontologies
• Query protocols/standards (eg, SPARQL, GraphQL APIs,
JSON Schema APIs, JSON-LD APIs)
• Reusable
• Clear licence
• Ideally, machine-readable licence (eg, CCREL)
Source and recommended read: https://tinyurl.com/yxocd3b9
Issues: Easier to Say than to Do
https://tinyurl.com/yxsftwvy
https://xkcd.com/927/
Issues: Common Good vs Private Interests
• ...Parts of the standard that are not priorities for Google are not well documented
anywhere. If they are priorities for Google, however, Google itself provides excellent
documentation about how information should be specified in schema.org so that Google
can use it. Because schema.org’s documentation is poor, the focus of attention stays on
Google.
Time to end Google’s domination of schema.org,
https://tinyurl.com/y6j7ke8u
• Not everyone wants data published, eg, failed clinical trials
• Balance needed between research needs and private lives, eg,
• The Immortal Life of Henrietta, Rebecca Skloot
• k-anonymity, mediation approaches
(Brandizi et al, 2017, https://doi.org/10.1186/s12911-017-0424-6)
Issues: Data are Power
http://www.tylervigen.com/spurious-correlations
Issues: Data are Power
• My son was a typically developing toddler. ... He received his first MMR at 19 months of
age. The change in him was almost immediate. He did not regress in development, but
his social skills became extremely compromised. Noises became unbearable...
MMR vaccine caused my son's autism, https://tinyurl.com/y2udlfcb
It's sad, but it's a spurious correlation, vaccines do not cause autism
Issues: are We in Control?
https://www.nature.com/articles/d41586-020-01874-9
https://tinyurl.com/yxay8w2j
https://www.bbc.com/news/business-42959755
https://tinyurl.com/ydykjugt
https://tinyurl.com/hu3lh32
And Which Control?
https://tinyurl.com/y2yjrkpa
https://tinyurl.com/y82zf8qu
https://www.youtube.com/watch?v=ciBLsJkQ1WY
So...
• Future is even more digital
• And even more data-intensive
• Everyone should at least have an idea
• Especially if you want to become a scientist
• About producing data (eg, FAIR, formats,
standards)
• And consuming data (eg, data resources, Graph
DB query languages)
• And more (eg, Python, Pandas, Graph DBs,
APIs)https://tinyurl.com/y5rdq7qx
So...
• Probably we need better management and (a
bit of, international) regulation
• of technical aspects (eg, PA standards,
research data publishing)
• of ethical aspects (eg, open access,
algorithms, censorships)
• But also more grassroots participation
• we are all responsible, especially as scientists
• Data science is cool!
https://tinyurl.com/y5rdq7qx
Acknowledgements
Ajit Singh

Software Engineer
• Joseph Hearnshaw, software engineer
• Samiul Haque, Ed Eyles, IT admins
• Alice Minotto, Earlham Inst, hosting providers
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, Data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Madhu Donepudi, Richard Holland, ext contractors,
developers
Keywan Hassani-Pak

Knetminer Team Leader
Chris Rawlings

Head of Computational & Analytical Sciences
Jeremy Parsons

Bioinformatics Scientist
Acknowledgements
Ajit Singh

Software Engineer
• Joseph Hearnshaw, software engineer
• Samiul Haque, Ed Eyles, IT admins
• Alice Minotto, Earlham Inst, hosting providers
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, Data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Madhu Donepudi, Richard Holland, ext contractors,
developers
Keywan Hassani-Pak

KnetMiner Team Leader
Chris Rawlings

Head of Computational & Analytical Sciences
Jeremy Parsons

Bioinformatics Scientist
AndYou!
Extras
The Cause for Open Data/Knowledge
• Open data is the idea that some data should be freely available to everyone to use and
republish as they wish, without restrictions from copyright, patents or other mechanisms of
control (https://en.wikipedia.org/wiki/Open_data)
• Popularised by Obama in 2009 [1], Hans Rosling [3], Tim Berners Lee [2] (recommended
readings/watches)
• [1] https://www.govtech.com/data/What-Obama-Did-for-Tech-Transparency-and-Open-Data.html
• [2] https://www.ted.com/talks/tim_berners_lee_the_next_web?language=en
• [3] https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen
IBM Watson
• Not the first time that AI passed the
Turing test (eg, Deep Blue and Chess,
1996)
• But big milestone (in 2011) about
knowledge management
• Specialisations possible, e.g., IBM
Watson Health
Mini documentary at
https://www.youtube.com/watch?v=P18EdAKuC1U
Surprising Data Insights
• Couples who argue often are more likely to
last long (90% accuracy)
• If you want such a life...
• Many other examples of surprising data:
9 Bizarre and Surprising Insights from
Data Science (https://tinyurl.com/yywgr2rv)
https://www.businessinsider.com/mathematical-secret-to-lasting-relationships-2015-6
Issues: Data are Power
Source and recommended read:
https://theconversation.com/five-maps-that-will-change-how-you-see-the-world-74967

More Related Content

What's hot

The expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry communityThe expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry community
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
petermurrayrust
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
SciBite Limited
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
petermurrayrust
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Alejandra Gonzalez-Beltran
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
Manuel Corpas
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
C. Tobin Magle
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
petermurrayrust
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
petermurrayrust
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
petermurrayrust
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
Paul Groth
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
Philip Bourne
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
petermurrayrust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
petermurrayrust
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
petermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
petermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
TheContentMine
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
Benjamin Good
 

What's hot (20)

The expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry communityThe expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry community
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 

Similar to Better Data for a Better World

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
Philip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
Philip Bourne
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
LEARN Project
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
Chris Dwan
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
Philip Bourne
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
African Open Science Platform
 
GODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific InstitutionsGODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific Institutions
Johannes Keizer
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
Liz Lyon
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Carole Goble
 
Introduction to Open Science and EOSC
Introduction to Open Science and EOSCIntroduction to Open Science and EOSC
Introduction to Open Science and EOSC
Sarah Jones
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
Johann van Wyk
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
African Open Science Platform
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
Philip Bourne
 
Rising tide of data update 20171024
Rising tide of data update 20171024Rising tide of data update 20171024
Rising tide of data update 20171024
Keith Russell
 
Rising tide of data update
Rising tide of data update Rising tide of data update
Rising tide of data update
ARDC
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of Data
David De Roure
 
2019 June 27 - Big data and data science
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data science
Fabio Stella
 
2016 08 gxaas
2016 08 gxaas2016 08 gxaas
2016 08 gxaas
Johannes Keizer
 

Similar to Better Data for a Better World (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
GODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific InstitutionsGODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific Institutions
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
Introduction to Open Science and EOSC
Introduction to Open Science and EOSCIntroduction to Open Science and EOSC
Introduction to Open Science and EOSC
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Rising tide of data update 20171024
Rising tide of data update 20171024Rising tide of data update 20171024
Rising tide of data update 20171024
 
Rising tide of data update
Rising tide of data update Rising tide of data update
Rising tide of data update
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of Data
 
2019 June 27 - Big data and data science
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data science
 
2016 08 gxaas
2016 08 gxaas2016 08 gxaas
2016 08 gxaas
 

More from Rothamsted Research, UK

Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
Rothamsted Research, UK
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
Continuos Integration @Knetminer
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @Knetminer
Rothamsted Research, UK
 
AgriSchemas Progress Report
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress Report
Rothamsted Research, UK
 
Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018
Rothamsted Research, UK
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
Rothamsted Research, UK
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
Rothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Rothamsted Research, UK
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?
Rothamsted Research, UK
 
Linked Data with the EBI RDF Platform
Linked Data with the EBI RDF PlatformLinked Data with the EBI RDF Platform
Linked Data with the EBI RDF Platform
Rothamsted Research, UK
 
BioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons LearnedBioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons Learned
Rothamsted Research, UK
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference service
Rothamsted Research, UK
 
Dev 2014 LOD tutorial
Dev 2014 LOD tutorialDev 2014 LOD tutorial
Dev 2014 LOD tutorial
Rothamsted Research, UK
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
Rothamsted Research, UK
 
Semic 2013
Semic 2013Semic 2013

More from Rothamsted Research, UK (20)

Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Continuos Integration @Knetminer
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @Knetminer
 
AgriSchemas Progress Report
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress Report
 
Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
 
Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?
 
Linked Data with the EBI RDF Platform
Linked Data with the EBI RDF PlatformLinked Data with the EBI RDF Platform
Linked Data with the EBI RDF Platform
 
BioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons LearnedBioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons Learned
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference service
 
Dev 2014 LOD tutorial
Dev 2014 LOD tutorialDev 2014 LOD tutorial
Dev 2014 LOD tutorial
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Semic 2013
Semic 2013Semic 2013
Semic 2013
 

Recently uploaded

The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
azzyixes
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 

Recently uploaded (20)

The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 

Better Data for a Better World

  • 1. Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Oct, 16th - iGEM 2020 webinar BetterDataforaBetterWorld Find this presentation on SlidesShare Background source: https://pxhere.com/en/photo/857152
  • 2. Hello! • Geek since 1980s and C=64 times • Started working with Life Science Data 2003 • at Univ. of Milano-Bicocca, EMBL-EBI • and now Rothamsted Research • Meanwhile, (h)activism in open source, open data
  • 3. A Long History Mankind and Data • Gather knowledge • Know how things work, make predictions • Improve our lives • (in addition to being good on itself) Egypt, 2500BC (https://brewminate.com/census-taking-in-the-ancient-world/)
  • 4. In the past 20yrs or so Economist, 2010 (https://www.economist.com/node/21521548)
  • 5. Why and How? In the past 20yrs or so
  • 6. We advanced in • Gathering (eg, smartphones, IoT, 5G) • Stocking (eg, clouds) • Processing (eg, AI, Machine Learning) • Sharing (eg, web, standards, data portals) • Searching (eg, NoSQL, Indexing) • Visualising (eg, literature on HCI, data charts)... ...Data, Information, Knowledge duction Precision Farming TIM AgRA Present Future Conclusion References recision Farming [1] 13 / 42 Images Source: http://ieeeagra.com/ieeeagra/Downloads/20141204-Fernandez-Presentation.pdf and establish virtuous circles
  • 8. The Cause for Open Data/Knowledge • Data portals, policies, standards • https://www.data.gov/, https://data.gov.uk/ • https://www.europeandataportal.eu/en • https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information • https://joinup.ec.europa.eu/ • In science • https://fairsharing.org/ • https://www.nature.com/sdata/ • Data and activism • DBPedia, aka Wikipedia as data (https://wiki.dbpedia.org/about) • Wikidata (https://www.wikidata.org/) • Open Street Map (https://www.openstreetmap.org/about)
  • 9. Open Data Cause: The Life Science Use Case https://evaprofecmc.jimdofree.com/unit-4-the-genetic-revolution/2-2-chromosomes-and-genes/
  • 10. So, sequencing was (is) pretty much important... Source: https://boydfuturist.wordpress.com/tag/human-genome-project/ (also an interesting reading)
  • 11. ...indeed • The race to sequence the human genome https://www.youtube.com/watch?v=AhsIF-cmoQQ • The Human Genome Project Race https://genomics-old.soe.ucsc.edu/research/hgp_race • How to sequence human genome https://www.youtube.com/watch?v=MvuYATh7Y74 Recommended:
  • 13. Which integrates with a wealth of (open) data
  • 14. And allows for Reuse and further Advancements
  • 15. The Cause for Open Data • Allows for reuse • no need to regenerate • less expensive • Allows for integration between heterogeneous data • different entities (genes, proteins, chemistry, species, literature...) • different scales (cells, organs, individuals, populations) • New discoveries, novel uses • Reproducible science • and quality improvement Practical Reasons
  • 16. The Cause for Open Data • Public-funded data are ours • Savings opportunities add up • (but giving them out for free has a cost) • Data are ours anyway (eg, genetic data) • Transparency (and again, reproducibility) • Public benefits outweigh private interests Ethical Reasons
  • 17. But, how? Based on publications, which genes are related to yellow rust? In which biological processes are their encoded proteins involved? 1 2 3 4 5 6 1 2 3 4 5 6
  • 18. Good Data Principles: Interoperability through Standards https://tinyurl.com/y5e6kfa2 https://doi.org/10.1186/s41074-019-0055-1 https://tinyurl.com/y3h9c65k https://tinyurl.com/y2wzlwbk
  • 19. Data Standards: schema.org example https://www.bbcgoodfood.com/recipes/classic-potato-salad Source & recommended read: https://www.slideshare.net/NiallBeard/bioschemas-workshop
  • 20. schema.org used for Knetminer and Agrifood Data github.com/Rothamsted/agri-schemas https://tinyurl.com/y44a5lj9
  • 21. References • Brandizi et al, 2018, https://europepmc.org/ article/med/30085931 • IB2018 presentation https://tinyurl.com/ yaq8nt5e • AgriSchemas and data standards, IB 2019 • Reusing Knetminer data with Python/Jupyter • https://tinyurl.com/yyhnkuyk • https://tinyurl.com/y446y979
  • 22. Good Data Principles: FAIR • Findable • ex, Give your dataset a DOI, which resolves to schema.org descriptor, register it on datasetsearch.research.google.com • Accessible • ex, resolvable DOI makes it accessible. Wrap with access control as needed • Interoperable • Eg, data described with schema.org, GO and other OBO ontologies • Query protocols/standards (eg, SPARQL, GraphQL APIs, JSON Schema APIs, JSON-LD APIs) • Reusable • Clear licence • Ideally, machine-readable licence (eg, CCREL) Source and recommended read: https://tinyurl.com/yxocd3b9
  • 23. Issues: Easier to Say than to Do https://tinyurl.com/yxsftwvy https://xkcd.com/927/
  • 24. Issues: Common Good vs Private Interests • ...Parts of the standard that are not priorities for Google are not well documented anywhere. If they are priorities for Google, however, Google itself provides excellent documentation about how information should be specified in schema.org so that Google can use it. Because schema.org’s documentation is poor, the focus of attention stays on Google. Time to end Google’s domination of schema.org, https://tinyurl.com/y6j7ke8u • Not everyone wants data published, eg, failed clinical trials • Balance needed between research needs and private lives, eg, • The Immortal Life of Henrietta, Rebecca Skloot • k-anonymity, mediation approaches (Brandizi et al, 2017, https://doi.org/10.1186/s12911-017-0424-6)
  • 25. Issues: Data are Power http://www.tylervigen.com/spurious-correlations
  • 26. Issues: Data are Power • My son was a typically developing toddler. ... He received his first MMR at 19 months of age. The change in him was almost immediate. He did not regress in development, but his social skills became extremely compromised. Noises became unbearable... MMR vaccine caused my son's autism, https://tinyurl.com/y2udlfcb It's sad, but it's a spurious correlation, vaccines do not cause autism
  • 27. Issues: are We in Control? https://www.nature.com/articles/d41586-020-01874-9 https://tinyurl.com/yxay8w2j https://www.bbc.com/news/business-42959755 https://tinyurl.com/ydykjugt https://tinyurl.com/hu3lh32
  • 29. So... • Future is even more digital • And even more data-intensive • Everyone should at least have an idea • Especially if you want to become a scientist • About producing data (eg, FAIR, formats, standards) • And consuming data (eg, data resources, Graph DB query languages) • And more (eg, Python, Pandas, Graph DBs, APIs)https://tinyurl.com/y5rdq7qx
  • 30. So... • Probably we need better management and (a bit of, international) regulation • of technical aspects (eg, PA standards, research data publishing) • of ethical aspects (eg, open access, algorithms, censorships) • But also more grassroots participation • we are all responsible, especially as scientists • Data science is cool! https://tinyurl.com/y5rdq7qx
  • 31. Acknowledgements Ajit Singh
 Software Engineer • Joseph Hearnshaw, software engineer • Samiul Haque, Ed Eyles, IT admins • Alice Minotto, Earlham Inst, hosting providers • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, Data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak
 Knetminer Team Leader Chris Rawlings
 Head of Computational & Analytical Sciences Jeremy Parsons
 Bioinformatics Scientist
  • 32. Acknowledgements Ajit Singh
 Software Engineer • Joseph Hearnshaw, software engineer • Samiul Haque, Ed Eyles, IT admins • Alice Minotto, Earlham Inst, hosting providers • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, Data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak
 KnetMiner Team Leader Chris Rawlings
 Head of Computational & Analytical Sciences Jeremy Parsons
 Bioinformatics Scientist AndYou!
  • 34. The Cause for Open Data/Knowledge • Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control (https://en.wikipedia.org/wiki/Open_data) • Popularised by Obama in 2009 [1], Hans Rosling [3], Tim Berners Lee [2] (recommended readings/watches) • [1] https://www.govtech.com/data/What-Obama-Did-for-Tech-Transparency-and-Open-Data.html • [2] https://www.ted.com/talks/tim_berners_lee_the_next_web?language=en • [3] https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen
  • 35. IBM Watson • Not the first time that AI passed the Turing test (eg, Deep Blue and Chess, 1996) • But big milestone (in 2011) about knowledge management • Specialisations possible, e.g., IBM Watson Health Mini documentary at https://www.youtube.com/watch?v=P18EdAKuC1U
  • 36. Surprising Data Insights • Couples who argue often are more likely to last long (90% accuracy) • If you want such a life... • Many other examples of surprising data: 9 Bizarre and Surprising Insights from Data Science (https://tinyurl.com/yywgr2rv) https://www.businessinsider.com/mathematical-secret-to-lasting-relationships-2015-6
  • 37. Issues: Data are Power Source and recommended read: https://theconversation.com/five-maps-that-will-change-how-you-see-the-world-74967