Dipartimento DI
INFORMATICA

Tecniche di data mining per la
caratterizzazione di entità biologiche
Michelangelo Ceci, Corr...
Dipartimento DI
INFORMATICA

RL1: Discovery of Frequent Syntactic Structures
“....Presenilin mutations have been hypothesi...
Dipartimento DI
INFORMATICA

RL1: Discovery of Temporal Links

1983

Migraine

1984

Mesh-Terms

...

Mesh-Terms

1997

Ma...
Dipartimento DI
INFORMATICA

RL1: Extraction of Bio-molecular Events
Bio-molecular events are processes which involve and ...
Dipartimento DI
INFORMATICA

RL2: Discoverying miRNA-mRNA
interaction networks
miRNAs

microRNAs (miRNAs) are small noncod...
Dipartimento DI
INFORMATICA

HOCCLUS2: Hierarchical and Overlapping Co-CLUStering 2
1)
2)
3)

Bottom-up approach, from sin...
Dipartimento DI
INFORMATICA

HOCCLUS2 - Input data
Predicted interactions (mirDip)
Verified interactions (miRTarBase)
o La...
Dipartimento DI
INFORMATICA

RL3: Gene Function Hierarchical Multi-label Classification
•

Instances to be classified may ...
Dipartimento DI
INFORMATICA

DIP Yeast network (PPI network)
(b) Examples
are randomly
arranged along
the border

(a)Non-c...
Dipartimento DI
INFORMATICA

The Basic Idea
•

We develop a tree-based algorithm NHMC (Network Hierarchical Multi-label
Cl...
Dipartimento DI
INFORMATICA

RL4: IS-BioBank project
• A framework for:
– enabling the interoperability among different bi...
Dipartimento DI
INFORMATICA

IS-BioBank project
• Our goal is to develop a Web delivery system which:
– enables the intero...
Dipartimento DI
INFORMATICA

IS-BioBank project

BiP-Day 2013

Tecniche di data mining per la caratterizzazione
di entità ...
Dipartimento DI
INFORMATICA

Q&A

BiP-Day 2013

Tecniche di data mining per la caratterizzazione
di entità biologiche

14
Upcoming SlideShare
Loading in …5
×

Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entità biologiche

560 views

Published on

Tecniche di data-mining per la caratterizzazione di entità biologiche

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
560
On SlideShare
0
From Embeds
0
Number of Embeds
174
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entità biologiche

  1. 1. Dipartimento DI INFORMATICA Tecniche di data mining per la caratterizzazione di entità biologiche Michelangelo Ceci, Corrado Loglisci, Gianvito Pio, Fabio Fumarola, Pasqua Fabiana Lanotte, Donato Malerba BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 1
  2. 2. Dipartimento DI INFORMATICA RL1: Discovery of Frequent Syntactic Structures “....Presenilin mutations have been hypothesised to cause Alzheimer disease either by altering amyloid precursor protein metabolism or ...” “... PS mutations cause the same functional consequence as mutations on amyloid precursor protein ...” A frequent syntactic structure: mesh_vb_mesh(T, M1, cause, M2), mesh_vb_mesh(T, M1, cause, M3). is_a(M1, mutat), is a(M2, protein), is a(M3, amyloid). [frequency:80%] Application: Integration of the syntactic structures into Pubmed search engine A. Appice, M. Ceci, C. Loglisci: Discovering Informative Syntactic Relationships between Named Entities in Biomedical Literature. DBKDA 2010:120-125 BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 2
  3. 3. Dipartimento DI INFORMATICA RL1: Discovery of Temporal Links 1983 Migraine 1984 Mesh-Terms ... Mesh-Terms 1997 Magnesium Deficiency Migraine → TermA,TermB TermB,TermC → Term D Association rules mined from the literature published in a time-interval TermD, Term, → Term F,Term G Term F, TermG → Magnesium Deficiency Application: Generation of hypothesis on the biological associations developed over time C. Loglisci. Time-based Discovery in Biomedical Literature: Mining Temporal Links. International Journal of Data Analysis Techniques and Strategies (IJDATS), Vol. 5, No. 2, 2013 C. Loglisci, M. Ceci: Discovering Temporal Bisociations for Linking Concepts over Time. ECML/PKDD 2011:358-373 BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 3
  4. 4. Dipartimento DI INFORMATICA RL1: Extraction of Bio-molecular Events Bio-molecular events are processes which involve and transform biological entities. They can be formalized as conceptual frames. The frames are characterized by entities associated to specific roles played in the event. For instance, for the event catalyse: catalyse catalyst reaction being catalysed Application: extraction from the literature of the entities involved in an event and classification of their roles: “Helicases not only catalyse the disruption of hydrogen boding between complementary regions of nucleic acids, but also move along nucleic acid strands in a polar fashion.” catalyse Helicases the disruption of hydrogen boding between complementary regions of nucleic acids C. Loglisci, A. Appice, M. Ceci, D. Malerba, F. Esposito: MBlab: Molecular Biodiversity Laboratory. IRCDL 2011:132-135 C. Loglisci, M. Ceci, A. Consiglio, D. D'Elia, G. Grillo, F. Licciulli, D. Malerba, S. Liuni: Functional Analysis and annotation of noncoding RNAs: a Text Mining approach, Decimo Meeting Annuale della Società Italiana di Bioinformatica 2013 BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 4
  5. 5. Dipartimento DI INFORMATICA RL2: Discoverying miRNA-mRNA interaction networks miRNAs microRNAs (miRNAs) are small noncoding RNAs acting as post-transcriptional regulators of gene expression. mRNAs miRNAs-mRNAs networks Goal: identification of miRNA-mRNA interaction networks through biclustering/co-clustering approaches • discovery of miRNA regulatory modules/networks • identification of unknown functional properties BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 5
  6. 6. Dipartimento DI INFORMATICA HOCCLUS2: Hierarchical and Overlapping Co-CLUStering 2 1) 2) 3) Bottom-up approach, from single miRNA-mRNA interactions Discovery of (possibly) overlapping biclusters Hierarchical organization of the discovered biclusters 1) 2) 3) G. Pio, M. Ceci, D. D'Elia, C. Loglisci, D. Malerba, A Novel Biclustering Algorithm for the Discovery of Meaningful Biological Correlations between microRNAs and their Target Genes, BMC Bioinformatics 14 (Suppl 7), S8 (2013) BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 6
  7. 7. Dipartimento DI INFORMATICA HOCCLUS2 - Input data Predicted interactions (mirDip) Verified interactions (miRTarBase) o Large datasets o Context-specific o High level of noise (false positives) o Small datasets A semi-supervised ensemble learning approach which learns to combine the score of different prediction algorithms G. Pio, M. Ceci, D. D'Elia, D. Malerba, Integrating microRNA target predictions for the discovery of gene regulatory networks: a semi-supervised ensemble learning approach, BMC Bioinformatics (in press) BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 7
  8. 8. Dipartimento DI INFORMATICA RL3: Gene Function Hierarchical Multi-label Classification • Instances to be classified may belong to multiple classes at the same time. • Hierarchical organization of the classes (hierarchical constraint) Gene function prediction (e.g. FUN or GO) BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 8
  9. 9. Dipartimento DI INFORMATICA DIP Yeast network (PPI network) (b) Examples are randomly arranged along the border (a)Non-connected examples arranged along the border (c) Examples grouped according to the 1st level of FUN BiP-Day 2013 (d) Examples grouped according to the 2nd level of FUN Tecniche di data mining per la caratterizzazione di entità biologiche 9
  10. 10. Dipartimento DI INFORMATICA The Basic Idea • We develop a tree-based algorithm NHMC (Network Hierarchical Multi-label Classification) for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC) • It learns Predictive Clustering Trees (PCTs) for HMC and the network is used as background knowledge during training • Clustering is based on autocorrelation: each cluster should contain highly autocorrelated entities for the considered level Daniela Stojanova, Michelangelo Ceci, Donato Malerba, Saso Dzeroski: Using PPI network autocorrelation in hierarchical multilabel classification trees for gene function prediction. BMC Bioinformatics 14: 285 (2013) Daniela Stojanova, Michelangelo Ceci, Donato Malerba, Saso Dzeroski: Learning Hierarchical Multi-label Classification Trees from Network Data. Discovery Science 2013: 233-248 BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 10
  11. 11. Dipartimento DI INFORMATICA RL4: IS-BioBank project • A framework for: – enabling the interoperability among different biological data sources and – supporting expert users in the complex process of studying of cancer microenvironments. • This framework is obtained by extending Connectivity Map with databases, data repositories, and ontologies. Michelangelo Ceci, Pietro Hiram Guzzi, Elio Masciari, Mauro Coluccia, Federica Mandreoli, Massimo Mecella, Fabio Fumarola, Riccardo Martoglia, Wilma Penzo: The IS-BioBank project: a framework for biological data normalization, interoperability, and mining for cancer microenvironment analysis. SIGHIT Record 2(2): 16-21 (2012) BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 11
  12. 12. Dipartimento DI INFORMATICA IS-BioBank project • Our goal is to develop a Web delivery system which: – enables the interoperability among queryable data sources, – captures the different kinds of relationships that exist among them, – reinforces the cooperation of heterogeneous and distributed data bank sources – supports the users in the complex process of extraction, navigation and visualization of the knowledge base BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 12
  13. 13. Dipartimento DI INFORMATICA IS-BioBank project BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 13
  14. 14. Dipartimento DI INFORMATICA Q&A BiP-Day 2013 Tecniche di data mining per la caratterizzazione di entità biologiche 14

×