Back to 2008 this is the presentation of the master thesis titled "Pathway discovery in cancer: the Bayesian approach".
In this thesis I focused on ovarian cancer microarray data analysis to predict gene-gene interactions using supervised learning and probabilistic methods.
C. Guyon, T. Bouwmans. E. Zahzah, “Foreground Detection via Robust Low Rank Matrix Decomposition including Spatio-Temporal Constraint”, International Workshop on Background Model Challenges, ACCV 2012, Daejon, Korea, November 2012.
This document discusses graphical models and probabilistic graphical models. It provides an example of a cellular signal transduction pathway modeled as a graphical model. It explains that graphical models use graph structures to compactly represent joint probability distributions and allow for efficient inference and learning algorithms by exploiting conditional independence relationships between variables. The document also discusses Bayesian networks specifically, including their factorization property, conditional independence semantics, and use of directed graphical structures to represent causal relationships between variables.
https://imatge.upc.edu/web/people/xavier-giro
These slides provide an overview of our research group at UPC, which has been applying deep learning to computer vision since 2014. We are one of the pioneering research groups in Europe and, despite the youth of most of its member, it has already contributed to the community with a diverse range of publications and software at top scientific venues.
The document discusses using a local dissimilarity measure (LDM) for binary pattern matching. It introduces LDM, which quantifies differences between binary images in a localized manner. LDM is computed from the distance transforms of the two images being compared. The document proposes using LDM as a shape matching technique by calculating the LDM between a reference shape and subsets of a binary image to find the best match location. Initial results suggest LDM provides localized and quantified comparison of images without feature extraction. However, it increases computation time compared to other methods.
This document discusses vectors, coordinate systems, and geometric transformations that are fundamental concepts in computer graphics. It provides examples of different coordinate systems and how to project points from one system to another. It also explains various 2D affine transformations like translation, scaling, rotation, shearing, and reflection through transformation matrices. Homogeneous coordinates are introduced as a technique to represent 2D points as 3D homogeneous coordinates to allow for general linear transformations.
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trendzukun
The document discusses using wavelet representations for density estimation and shape analysis. It proposes using a constrained maximum likelihood objective to estimate density coefficients in a multi-resolution wavelet basis. Model selection criteria like MDL, AIC and BIC are compared for selecting the number of resolution levels in the wavelet expansion, with MDL shown to be invariant to the multi-resolution analysis used. The criteria are tested on 1D densities with different shapes, with MDL and MSE performing best in distinguishing the densities.
Virtualisation-based security countermeasures in software runtime systemsFrancesco Gadaleta
Research in engineering the security aspects of hypervisors in the field of virtualization technology, the basic block of cloud computing. I describe some designed and implemented countermeasures for native code vulnerabilities, runtime and virtualization environments.
Feel free to download and share.
C. Guyon, T. Bouwmans. E. Zahzah, “Foreground Detection via Robust Low Rank Matrix Decomposition including Spatio-Temporal Constraint”, International Workshop on Background Model Challenges, ACCV 2012, Daejon, Korea, November 2012.
This document discusses graphical models and probabilistic graphical models. It provides an example of a cellular signal transduction pathway modeled as a graphical model. It explains that graphical models use graph structures to compactly represent joint probability distributions and allow for efficient inference and learning algorithms by exploiting conditional independence relationships between variables. The document also discusses Bayesian networks specifically, including their factorization property, conditional independence semantics, and use of directed graphical structures to represent causal relationships between variables.
https://imatge.upc.edu/web/people/xavier-giro
These slides provide an overview of our research group at UPC, which has been applying deep learning to computer vision since 2014. We are one of the pioneering research groups in Europe and, despite the youth of most of its member, it has already contributed to the community with a diverse range of publications and software at top scientific venues.
The document discusses using a local dissimilarity measure (LDM) for binary pattern matching. It introduces LDM, which quantifies differences between binary images in a localized manner. LDM is computed from the distance transforms of the two images being compared. The document proposes using LDM as a shape matching technique by calculating the LDM between a reference shape and subsets of a binary image to find the best match location. Initial results suggest LDM provides localized and quantified comparison of images without feature extraction. However, it increases computation time compared to other methods.
This document discusses vectors, coordinate systems, and geometric transformations that are fundamental concepts in computer graphics. It provides examples of different coordinate systems and how to project points from one system to another. It also explains various 2D affine transformations like translation, scaling, rotation, shearing, and reflection through transformation matrices. Homogeneous coordinates are introduced as a technique to represent 2D points as 3D homogeneous coordinates to allow for general linear transformations.
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trendzukun
The document discusses using wavelet representations for density estimation and shape analysis. It proposes using a constrained maximum likelihood objective to estimate density coefficients in a multi-resolution wavelet basis. Model selection criteria like MDL, AIC and BIC are compared for selecting the number of resolution levels in the wavelet expansion, with MDL shown to be invariant to the multi-resolution analysis used. The criteria are tested on 1D densities with different shapes, with MDL and MSE performing best in distinguishing the densities.
Virtualisation-based security countermeasures in software runtime systemsFrancesco Gadaleta
Research in engineering the security aspects of hypervisors in the field of virtualization technology, the basic block of cloud computing. I describe some designed and implemented countermeasures for native code vulnerabilities, runtime and virtualization environments.
Feel free to download and share.
Jylee probabilistic reasoning with bayesian networksJungyeol
The document discusses probabilistic reasoning using Bayesian networks, which are directed acyclic graphs representing conditional independence relationships between random variables, and how exact and approximate inference can be performed on Bayesian networks to calculate conditional probabilities given evidence.
Bayesian networks are graphical models that represent probabilistic relationships between variables. They consist of nodes representing random variables and arcs representing dependencies between nodes. Bayesian networks encode independence assumptions between variables to reduce the number of probability distributions needed and ensure probabilities remain consistent. Exact inference in Bayesian networks is possible for singly connected networks using variable elimination or clustering for multiply connected networks, while approximate inference can be performed through sampling methods. The initial probabilities in a Bayesian network are determined subjectively by experts.
Bayesian Networks - A Brief IntroductionAdnan Masood
- A Bayesian network is a graphical model that depicts probabilistic relationships among variables. It represents a joint probability distribution over variables in a directed acyclic graph with conditional probability tables.
- A Bayesian network consists of a directed acyclic graph whose nodes represent variables and edges represent probabilistic dependencies, along with conditional probability distributions that quantify the relationships.
- Inference using a Bayesian network allows computing probabilities like P(X|evidence) by taking into account the graph structure and probability tables.
The document discusses using Bayesian methods for image classification. It describes applying a k-nearest neighbors (KNN) algorithm to classify images into categories like motorcycles, bicycles, humans, and cars. A probabilistic version of KNN is developed that assigns probabilities of category membership based on neighboring classified examples. Bayesian analysis is then used to estimate hyperparameters of the model and make predictions, implemented via MCMC sampling.
The document discusses subspace indexing on Grassmannian manifolds for large scale visual identification. It proposes using local subspace models built on neighborhoods defined by queries, but notes issues with computational complexity and lack of optimality. It then introduces Grassmannian and Stiefel manifolds to characterize subspace similarity and define distances. A model hierarchical tree is proposed to index subspaces through iterative merging based on distances on the Grassmannian manifold.
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
The document summarizes Gota Morota's master's thesis defense on applying Bayesian and sparse network models to assess linkage disequilibrium in animals and plants. The thesis aims to evaluate linkage disequilibrium (LD) using networks that capture loci associations. It first provides background on standard LD metrics and graphical models. It then describes using a Bayesian network and L1-regularized Markov network to analyze LD in dairy cattle, identifying networks of strongly associated SNPs related to milk protein yield. The thesis concludes the results support LD having a multivariate nature better described by networks than pairwise metrics alone.
This document discusses lazy sparse stochastic gradient descent for regularized multinomial logistic regression. It introduces multinomial logistic regression and maximum likelihood estimation. It then discusses adding regularization through Gaussian, Laplace, Cauchy, and uniform priors over the model parameters. The error function and gradient are defined to optimize the log likelihood of the data while also incorporating the log prior. Stochastic gradient descent is used to efficiently optimize this regularized objective.
This document discusses causal Bayesian networks. It begins by introducing basic graph theory concepts like vertices, edges, directed and undirected graphs. It then explains that Bayesian networks are a type of directed acyclic graph that can be used to represent conditional independence relationships between variables. The document outlines some key properties and theorems regarding causal Bayesian networks, such as d-separation and the Markov condition. It also discusses how causal Bayesian networks can be used for inference and representing the effects of interventions.
The document is an introduction to graphical models. It discusses that graphical models define probability distributions over random variables using graphs to encode conditional independence assumptions. It then describes popular classes of graphical models including directed Bayesian networks and undirected Markov random fields. Bayesian networks define a factorization of the joint distribution over parent variables, while Markov random fields factorize over potentials at cliques in the graph. An example Markov random field is also shown.
Neuronal structures are intricately related to their functions. Study of the neuronal structures reveals healthy and pathologic conditions, crucial to understanding how the Brain works. Current advances in microscopy techniques produce huge volume of data where manual reconstruction and analysis may take several years. Moreover, most of this data is sparse; hence digital reconstructions capturing the essential structural information of the neuronal networks provide ease of archiving, exchanging and analysing. The lack of powerful computational tools to automatically reconstruct neuronal arbors has emerged as a major technical bottleneck in neuroscience research. This work extends the Marked Point Process methodology, which has been proved to be an efficient framework for network extraction in 2D, to 3D neuronal network extraction from microscopy image stacks. The optimization process considers a multiple birth and death dynamics embedded in a simulated annealing scheme. To speed up the convergence a birth map based on the projection of the neuronal processes is considered.
icml2004 tutorial on spectral clustering part Izukun
This document provides a tutorial on spectral clustering. It discusses the history and foundations of spectral clustering including graph partitioning, ratio cut, normalized cut and minmax cut objectives. It also covers properties of the graph Laplacian matrix and how to recover partitions from eigenvectors. The document compares different clustering objectives and shows their performance on example datasets. Finally, it discusses extensions of spectral clustering to bipartite graphs.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Physiology and chemistry of skin and pigmentation, hairs, scalp, lips and nail, Cleansing cream, Lotions, Face powders, Face packs, Lipsticks, Bath products, soaps and baby product,
Preparation and standardization of the following : Tonic, Bleaches, Dentifrices and Mouth washes & Tooth Pastes, Cosmetics for Nails.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Jylee probabilistic reasoning with bayesian networksJungyeol
The document discusses probabilistic reasoning using Bayesian networks, which are directed acyclic graphs representing conditional independence relationships between random variables, and how exact and approximate inference can be performed on Bayesian networks to calculate conditional probabilities given evidence.
Bayesian networks are graphical models that represent probabilistic relationships between variables. They consist of nodes representing random variables and arcs representing dependencies between nodes. Bayesian networks encode independence assumptions between variables to reduce the number of probability distributions needed and ensure probabilities remain consistent. Exact inference in Bayesian networks is possible for singly connected networks using variable elimination or clustering for multiply connected networks, while approximate inference can be performed through sampling methods. The initial probabilities in a Bayesian network are determined subjectively by experts.
Bayesian Networks - A Brief IntroductionAdnan Masood
- A Bayesian network is a graphical model that depicts probabilistic relationships among variables. It represents a joint probability distribution over variables in a directed acyclic graph with conditional probability tables.
- A Bayesian network consists of a directed acyclic graph whose nodes represent variables and edges represent probabilistic dependencies, along with conditional probability distributions that quantify the relationships.
- Inference using a Bayesian network allows computing probabilities like P(X|evidence) by taking into account the graph structure and probability tables.
The document discusses using Bayesian methods for image classification. It describes applying a k-nearest neighbors (KNN) algorithm to classify images into categories like motorcycles, bicycles, humans, and cars. A probabilistic version of KNN is developed that assigns probabilities of category membership based on neighboring classified examples. Bayesian analysis is then used to estimate hyperparameters of the model and make predictions, implemented via MCMC sampling.
The document discusses subspace indexing on Grassmannian manifolds for large scale visual identification. It proposes using local subspace models built on neighborhoods defined by queries, but notes issues with computational complexity and lack of optimality. It then introduces Grassmannian and Stiefel manifolds to characterize subspace similarity and define distances. A model hierarchical tree is proposed to index subspaces through iterative merging based on distances on the Grassmannian manifold.
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
The document summarizes Gota Morota's master's thesis defense on applying Bayesian and sparse network models to assess linkage disequilibrium in animals and plants. The thesis aims to evaluate linkage disequilibrium (LD) using networks that capture loci associations. It first provides background on standard LD metrics and graphical models. It then describes using a Bayesian network and L1-regularized Markov network to analyze LD in dairy cattle, identifying networks of strongly associated SNPs related to milk protein yield. The thesis concludes the results support LD having a multivariate nature better described by networks than pairwise metrics alone.
This document discusses lazy sparse stochastic gradient descent for regularized multinomial logistic regression. It introduces multinomial logistic regression and maximum likelihood estimation. It then discusses adding regularization through Gaussian, Laplace, Cauchy, and uniform priors over the model parameters. The error function and gradient are defined to optimize the log likelihood of the data while also incorporating the log prior. Stochastic gradient descent is used to efficiently optimize this regularized objective.
This document discusses causal Bayesian networks. It begins by introducing basic graph theory concepts like vertices, edges, directed and undirected graphs. It then explains that Bayesian networks are a type of directed acyclic graph that can be used to represent conditional independence relationships between variables. The document outlines some key properties and theorems regarding causal Bayesian networks, such as d-separation and the Markov condition. It also discusses how causal Bayesian networks can be used for inference and representing the effects of interventions.
The document is an introduction to graphical models. It discusses that graphical models define probability distributions over random variables using graphs to encode conditional independence assumptions. It then describes popular classes of graphical models including directed Bayesian networks and undirected Markov random fields. Bayesian networks define a factorization of the joint distribution over parent variables, while Markov random fields factorize over potentials at cliques in the graph. An example Markov random field is also shown.
Neuronal structures are intricately related to their functions. Study of the neuronal structures reveals healthy and pathologic conditions, crucial to understanding how the Brain works. Current advances in microscopy techniques produce huge volume of data where manual reconstruction and analysis may take several years. Moreover, most of this data is sparse; hence digital reconstructions capturing the essential structural information of the neuronal networks provide ease of archiving, exchanging and analysing. The lack of powerful computational tools to automatically reconstruct neuronal arbors has emerged as a major technical bottleneck in neuroscience research. This work extends the Marked Point Process methodology, which has been proved to be an efficient framework for network extraction in 2D, to 3D neuronal network extraction from microscopy image stacks. The optimization process considers a multiple birth and death dynamics embedded in a simulated annealing scheme. To speed up the convergence a birth map based on the projection of the neuronal processes is considered.
icml2004 tutorial on spectral clustering part Izukun
This document provides a tutorial on spectral clustering. It discusses the history and foundations of spectral clustering including graph partitioning, ratio cut, normalized cut and minmax cut objectives. It also covers properties of the graph Laplacian matrix and how to recover partitions from eigenvectors. The document compares different clustering objectives and shows their performance on example datasets. Finally, it discusses extensions of spectral clustering to bipartite graphs.
Similar to Pathway Discovery in Cancer: the Bayesian Approach (11)
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Physiology and chemistry of skin and pigmentation, hairs, scalp, lips and nail, Cleansing cream, Lotions, Face powders, Face packs, Lipsticks, Bath products, soaps and baby product,
Preparation and standardization of the following : Tonic, Bleaches, Dentifrices and Mouth washes & Tooth Pastes, Cosmetics for Nails.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Pathway Discovery in Cancer: the Bayesian Approach
1. Pathway Discovery in Cancer:
the Bayesian Approach
Francesco Gadaleta
Developed and written at ESAT dept. of Electrical Engineering of the Faculty of Engineering
Katholieke Universiteit Leuven (Belgium)
2. Genes and Deseases
Biological Assumptions
• Cancer normally originate in a single cell
• Cell’s life is regulated by many genes activated in different steps
Types Of Genes
• Oncogene
• Tumor-suppressor
• DNA-repair
4. Genes and Deseases:
genetic predisposition
Normal cell
First mutation Second mutation Third mutation
Malignant Cell
5. Genes and Deseases:
Microarray Technology
Cancer Cells
Red Fluorescent Probes
mRNA cDNA
Reverse
RNA Isolation Transcriptase Combine Target
Normal Cells Labeling
mRNA cDNA
Green Fluorescent Probes
6. Genes and Deseases:
the goal of biologists and genetists
• Prenatal diagnosis for recognized deseases, eg Down Sindrome
• Carrier testing to help couples with hereditary desease in the risky
decision of breeding
• Patient tailored diagnosis for genetic deseases
7. Goal of this Thesis
• Microarray Analysis by more complex tools
• Integrate in a unique model what is already known from other
experiments
• Identify those genes that form desease pathways
8. Type Of Data
• Normalization (fluorescent intensity)
• Filtering of microarray data (how to select subsets of genes)
• Data Discretization (Are bio reactions discrete events?)
➡ Interval discretization
➡ Quantile discretization
➡ Exporting
9. Interval Discretization
• Sort n observations
• Divide observations in d levels (uniformly spaced intervals)
• i-th obs. is disretized as j-th level iff:
x0 + j(xn-1 - x0) (j+1)(xn-1 - x0)
d < xi < x0 + d
10. Quantile Discretization
• Sort n observations
• Divide all observations in d levels by placing an equal number of obs.
in each bin: all levels are equally represented
• i-th observation belongs to j-th level iff:
jn (j+1)n
d <i< d
12. Knowledge Base
• Many biological processes are still not known
• Reliabiality of data
➡ hybridization is still a handmade process
• Small sample size - Huge number of genes
➡ integration with heterogeneous data
13. What we want to solve?
• Genetical cancer forecasting?
• Need for a model to handle uncertain knowledge
• A model that biologists and epidemiologists can understand
• A model to be updated in different times
14. What we want to solve?
• Need for a model to handle uncertain knowledge
• A model that biologists and epidemiologists can understand
• A model to be updated in different times
15. Bayesian Networks:
features
• Can handle uncertain knowledge with probability
• Can handle subsequent changes (bio noise, multiple measurements)
• Intuitive model a biologist can understand: white box vs. black box
(neural networks)
16. Bayesian Networks:
definition
• Direct Acyclic Graph
(how variables interact each other)
A B
C
• Set of local probability distributions F
(p(xi=k | Pa(xi)=j) = ijk)
E D
G
17. Bayesian Networks:
definition
•
p(A)
Direct Acyclic Graph A B
(how variables interact each other)
C
• Set of local probability distributions F
(p(xi=k | Pa(xi)=j) = ijk)
E D
G
18. Bayesian Networks:
definition
•
p(A) p(B)
Direct Acyclic Graph A B
(how variables interact each other)
C
• Set of local probability distributions F
(p(xi=k | Pa(xi)=j) = ijk)
E D
G
19. Bayesian Networks:
definition
•
p(A) p(B)
Direct Acyclic Graph A B
(how variables interact each other)
p(C|A,B)
C
• Set of local probability distributions F
(p(xi=k | Pa(xi)=j) = ijk)
E D
G
20. Bayesian Networks:
definition
•
p(A) p(B)
Direct Acyclic Graph A B
(how variables interact each other)
p(C|A,B)
C
• Set of local probability distributions F p(F|B)
(p(xi=k | Pa(xi)=j) = ijk)
E D
G
21. Bayesian Networks:
definition
•
p(A) p(B)
Direct Acyclic Graph A B
(how variables interact each other)
p(C|A,B)
C
• Set of local probability distributions F p(F|B)
(p(xi=k | Pa(xi)=j) = ijk)
p(E|C)
E D
G
22. Bayesian Networks:
definition
•
p(A) p(B)
Direct Acyclic Graph A B
(how variables interact each other)
p(C|A,B)
C
• Set of local probability distributions F p(F|B)
(p(xi=k | Pa(xi)=j) = ijk)
p(E|C) p(D|C)
E D
G
23. Bayesian Networks:
definition
•
p(A) p(B)
Direct Acyclic Graph A B
(how variables interact each other)
p(C|A,B)
C
• Set of local probability distributions F p(F|B)
(p(xi=k | Pa(xi)=j) = ijk)
p(E|C) p(D|C)
E D
p(G|F)
G
24. Bayesian Networks:
formal assumptions
• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
25. Bayesian Networks:
formal assumptions
• Structure Possibility Each of the n! structures is possible
• Complete Data p(Si |) 0
• Markov Condition
• Observational Equivalence
• Scoring Function
26. Bayesian Networks:
formal assumptions
• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
27. Bayesian Networks:
formal assumptions
• Structure Possibility No missing data in order to compute
• Complete Data p(S, S|) and p(C|D, S, ),
• Markov Condition C new observation,
in closed form
• Observational Equivalence
• Scoring Function
28. Bayesian Networks:
formal assumptions
• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
32. Bayesian Networks:
formal assumptions
• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
33. Bayesian Networks:
formal assumptions
• Structure Possibility A function to measure how well a
• Complete Data structure fits the data
• Markov Condition
• Observational Equivalence
• Scoring Function
34. Bayesian Networks:
formal assumptions
• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
35. Bayesian Network:
structure learning
• Constraint Satisfaction Problem vs. Optimization Problem
• CSP tries to discover dependencies from the data with a statistical
hypothesis test
• OP searches and tries to improve the score assigned by a scoring
function
36. Bayesian Networks:
K2 algorithm
• Goal: maximize the structure probability given the data
• A initial order is given (A,B,C, D, E, F, G)
[Quality measure of the
net given the data by
Cooper Herskovits]
37. Bayesian Networks:
K2 algorithm
[Quality measure of the
net given the data by
Cooper Herskovits]
38. Bayesian Networks:
K2 algorithm
• let D the dataset, N the number of examples,
• G the network structure, paij the j th instantiation of P a(xi ),
• Nijk the number of data where xi = k and P a(xi ) = j, and
ri
• Nij = k=1 Nijk
[Quality measure of the
net given the data by
Cooper Herskovits]
39. Bayesian Networks:
K2 algorithm
• let D the dataset, N the number of examples,
• G the network structure, paij the j th instantiation of P a(xi ),
• Nijk the number of data where xi = k and P a(xi ) = j, and
ri
• Nij = k=1 Nijk
P (G, D) = P (G)P (D|G) [Quality measure of the
n qi ri net given the data by
(ri −1)!
P (D|G) = i=1 j=1 (Nij +ri −1)! k=1 Nijk ! Cooper Herskovits]
43. Data Integration
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
44. Data Integration
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
45. Data Integration
Microarray data
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
46. Data Integration
Microarray data
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
47. Data Integration
Microarray data Clinical data
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
48. Data Integration
Microarray data Clinical data
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
49. Data Integration
Microarray data Clinical data
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
50. Data Integration
Microarray data Clinical data
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
51. Data Integration
Microarray data Clinical data
• heterogeneous data integration
• binary gene-gene relations
• bayesian network collective learning
(Partial Integration)
52. Experiments and results
a generator of synthetic gene expression data
SynTReN for design and analysis of structure learning
algorithms
syntetic model syntetic data
Validator
Structure
Learning
Framework
learned model
53. Experiments and results
• Results (random net + bio net (without clinical data))
• Idea that clinical data may improve structure learning: more complete
biological models (not bad considering that it is a type of data medical centers are
equipped)
55. Conclusions
• Partial Integration of two data sources improves performance within
the Bayesian Network Framework
• A huge pure-microarray dataset is not helpful
• Data Integration leads to fewer variables for each source (pure
microarray is expensive)