2. 2
Vincenzo Lagani
from Calabria, Italy
Associate Professor at Ilia State
University, Tbilisi, Georgia
Co-founder of Gnosis Data Analysis,
Heraklion, Greece
6. 6
Molecular biology: a normal day inside a cell
Adapted from https://www.novusbio.com/apoptosis-pathway
7. 7
What information can we collect?
Oh, the beauty of modern biotechnology!
Genomic modifications
Differences in the DNA code
Epigenetic modifications
Changes in the DNA surrounding
Adapted from https://www.wikipedia.org
8. 8
Gene expression
Level of operation of
each gene
Proteomics / metabolomics
Concentration of fundamental
molecules within the cell
Adapted from https://www.wikipedia.org
What information can we collect?
Oh, the beauty of modern biotechnology!
9. 9
Current revolution: single cell data
Partially adapted from https://shenorrlab.github.io/bseqsc/index.html
Bulk Measurements
Each sample encoded
as a vector
Single cell
Measurements
Each sample encoded
as a matrix
C1
C2
CN
14. 14
Data Science tasks in Life Science
diagnosis, prognosis, risk stratification
supervised learning
population identification
clustering, dimensionality
reduction
Identification of molecular
interactions
network reconstruction
curation of biological findings
text mining, knowledge representation
16. 16
Supervised learning:
Risk prediction for Lung Cancer
Adapted from Markaki et al, EBioMedicine 2018
http://mensxmachina.org/en/HUNT-NTNU-lung-cancer-risk-calculator/
19. 19
Text mining:
identification of relevant findings
https://www.nlm.nih.gov/bsd/stats/cit_added.html
Citation added to MEDLINE yearly
20. 2.
Open Data in Life
Science
unfortunately, not as open as we
would like
20
21. 21
(Important) example: Clinical Trials
Many countries require or strongly
recommend clinical trials registration
https://clinicaltrials.gov
https://www.clinicaltrialsregister.eu
There is no legal obligation in
communicating the results
~half of approved studies does not produce a publication
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0114023
No obligation of providing the raw data
22. 22
Are the data from clinical trials
becoming more open?
Some institutes already provide their raw
data
https://biolincc.nhlbi.nih.gov/studies/ (heart, lung, blood)
https://data-archive.nimh.nih.gov/ndct/ (mental health)
https://repository.niddk.nih.gov/home/ (diabetes, kidney)
https://www.ukdataservice.ac.uk/ (UK central repository)
Several initiatives for:
making mandatory to allow access to the data in order to
publish any result
creating suitable standards and procedures for opening
clinical trials’ data
23. 23
Open Data for molecular biology
Most journals require molecular data to be
deposited in on-line repository before
publication
Molecular data raise fewer privacy concerns
(except for genomic data)
A number of standards have been developed
for exchanging and storing molecular data
MIAME (Minimum Information About a Microarray
Experiment), MIAPE (minimum information about a
proteomics experiment), et cetera.
32. Credits
Special thanks to
◎Ilia State University: www.iliauni.edu.ge
◎my former colleagues from University of Crete:
http://mensxmachina.org/en/
◎Gnosis Data Analysis: www.gnosisda.gr
32