Invited lecture on Machine Learning in Medicine at the joint "Integrated Omics" course of Hanze University and University Hospital UMCG, Groningen, The Netherlands
The statistical physics of learning revisted: Phase transitions in layered ne...University of Groningen
"The statistical physics of learning revisted: Phase transitions in layered neural networks"
Physics Colloquium at the University of Leipzig/Germany, June 29, 2021
24 slides, ca 45 minutes
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...IJECEIAES
This paper proposes and examines the performance of a hybrid model called the wavelet radial bases function neural networks (WRBFNN). The model will be compared its performance with the wavelet feed forward neural networks (WFFN model by developing a prediction or forecasting system that considers two types of input formats: input9 and input17, and also considers 4 types of non-stationary time series data. The MODWT transform is used to generate wavelet and smooth coefficients, in which several elements of both coefficients are chosen in a particular way to serve as inputs to the NN model in both RBFNN and FFNN models. The performance of both WRBFNN and WFFNN models is evaluated by using MAPE and MSE value indicators, while the computation process of the two models is compared using two indicators, many epoch, and length of training. In stationary benchmark data, all models have a performance with very high accuracy. The WRBFNN9 model is the most superior model in nonstationary data containing linear trend elements, while the WFFNN17 model performs best on non-stationary data with the non-linear trend and seasonal elements. In terms of speed in computing, the WRBFNN model is superior with a much smaller number of epochs and much shorter training time.
A survey on methods and applications of meta-learning with GNNsShreya Goyal
This survey paper has provided a comprehensive review of works that are a combination of graph neural networks (GNNs) and meta-learning. They have also provided a thorough review, summary of methods, and applications in these categories. The application of meta-learning to GNNs is a growing and exciting field; many graph problems will benefit immensely from the combination of the two approaches.
Gabriella Casalino, Nicoletta Del Buono, Corrado Mencar (2011) Subtractive Initialization of Nonnegative Matrix Factorizations for Document Clustering, 188-195. In Fuzzy Logic and Applications (WILF 2011).
The 9th International Workshop on Fuzzy Logic and Applications, August 29-31 2011, Trani
The statistical physics of learning revisted: Phase transitions in layered ne...University of Groningen
"The statistical physics of learning revisted: Phase transitions in layered neural networks"
Physics Colloquium at the University of Leipzig/Germany, June 29, 2021
24 slides, ca 45 minutes
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...IJECEIAES
This paper proposes and examines the performance of a hybrid model called the wavelet radial bases function neural networks (WRBFNN). The model will be compared its performance with the wavelet feed forward neural networks (WFFN model by developing a prediction or forecasting system that considers two types of input formats: input9 and input17, and also considers 4 types of non-stationary time series data. The MODWT transform is used to generate wavelet and smooth coefficients, in which several elements of both coefficients are chosen in a particular way to serve as inputs to the NN model in both RBFNN and FFNN models. The performance of both WRBFNN and WFFNN models is evaluated by using MAPE and MSE value indicators, while the computation process of the two models is compared using two indicators, many epoch, and length of training. In stationary benchmark data, all models have a performance with very high accuracy. The WRBFNN9 model is the most superior model in nonstationary data containing linear trend elements, while the WFFNN17 model performs best on non-stationary data with the non-linear trend and seasonal elements. In terms of speed in computing, the WRBFNN model is superior with a much smaller number of epochs and much shorter training time.
A survey on methods and applications of meta-learning with GNNsShreya Goyal
This survey paper has provided a comprehensive review of works that are a combination of graph neural networks (GNNs) and meta-learning. They have also provided a thorough review, summary of methods, and applications in these categories. The application of meta-learning to GNNs is a growing and exciting field; many graph problems will benefit immensely from the combination of the two approaches.
Gabriella Casalino, Nicoletta Del Buono, Corrado Mencar (2011) Subtractive Initialization of Nonnegative Matrix Factorizations for Document Clustering, 188-195. In Fuzzy Logic and Applications (WILF 2011).
The 9th International Workshop on Fuzzy Logic and Applications, August 29-31 2011, Trani
Study of Different Multi-instance Learning kNN AlgorithmsEditor IJCATR
Because of it is applicability in various field, multi-instance learning or multi-instance problem becoming more popular in
machine learning research field. Different from supervised learning, multi-instance learning related to the problem of classifying an
unknown bag into positive or negative label such that labels of instances of bags are ambiguous. This paper uses and study three
different k-nearest neighbor algorithm namely Bayesian -kNN, citation -kNN and Bayesian Citation -kNN algorithm for solving multiinstance
problem. Similarity between two bags is measured using Hausdroff distance. To overcome the problem of false positive
instances constructive covering algorithm used. Also the problem definition, learning algorithm and experimental data sets related to
multi-instance learning framework are briefly reviewed in this paper
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
For the last five decades, computational physics has been a valuable scientific instrument in physics. In comparison to using only theoretical and experimental approaches, it has enabled physicists to understand complex problems better. Computational physics was mostly a scientific activity at the time, with relatively few organised undergraduate study.
Ph.D. Assistance serves as an external mentor to brainstorm your idea and translate that into a research model. Hiring a mentor or tutor is common and therefore let your research committee know about the same. We do not offer any writing services without the involvement of the researcher.
Learn More: https://bit.ly/3AUvG0y
Contact Us:
Website: https://www.phdassistance.com/
UK NO: +44–1143520021
India No: +91–4448137070
WhatsApp No: +91 91769 66446
Email: info@phdassistance.com
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...csandit
The growing population of elders in the society calls for a new approach in care giving. By
inferring what activities elderly are performing in their houses it is possible to determine their
physical and cognitive capabilities. In this paper we show the potential of important
discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional
Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor
patterns in a smart home environment. We address also the class imbalance problem in activity
recognition field which has been known to hinder the learning performance of classifiers. Cost
sensitive learning is attractive under most imbalanced circumstances, but it is difficult to
determine the precise misclassification costs in practice. We introduce a new criterion for
selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four
real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed
criterion outperforms the state-of-the-art discriminative methods in activity recognition.
Multimodal authentication is one of the prime concepts in current applications of real scenario. Various
approaches have been proposed in this aspect. In this paper, an intuitive strategy is proposed as a
framework for providing more secure key in biometric security aspect. Initially the features will be
extracted through PCA by SVD from the chosen biometric patterns, then using LU factorization technique
key components will be extracted, then selected with different key sizes and then combined the selected key
components using convolution kernel method (Exponential Kronecker Product - eKP) as Context-Sensitive
Exponent Associative Memory model (CSEAM). In the similar way, the verification process will be done
and then verified with the measure MSE. This model would give better outcome when compared with SVD
factorization[1] as feature selection. The process will be computed for different key sizes and the results
will be presented.
The effect of gamma value on support vector machine performance with differen...IJECEIAES
Currently, the support vector machine (SVM) regarded as one of supervised machine learning algorithm that provides analysis of data for classification and regression. This technique is implemented in many fields such as bioinformatics, face recognition, text and hypertext categorization, generalized predictive control and many other different areas. The performance of SVM is affected by some parameters, which are used in the training phase, and the settings of parameters can have a profound impact on the resulting engine’s implementation. This paper investigated the SVM performance based on value of gamma parameter with used kernels. It studied the impact of gamma value on (SVM) efficiency classifier using different kernels on various datasets descriptions. SVM classifier has been implemented by using Python. The kernel functions that have been investigated are polynomials, radial based function (RBF) and sigmoid. UC irvine machine learning repository is the source of all the used datasets. Generally, the results show uneven effect on the classification accuracy of three kernels on used datasets. The changing of the gamma value taking on consideration the used dataset influences polynomial and sigmoid kernels. While the performance of RBF kernel function is more stable with different values of gamma as its accuracy is slightly changed.
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERcscpconf
In this paper, we study the performance criterion of machine learning tools in classifying breast cancer. We compare the data mining tools such as Naïve Bayes, Support vector machines, Radial basis neural networks, Decision trees J48 and simple CART. We used both binary and multi class data sets namely WBC, WDBC and Breast tissue from UCI machine learning depositary. The experiments are conducted in WEKA. The aim of this research is to find out the best classifier with respect to accuracy, precision, sensitivity and specificity in detecting breast cancer
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
In this paper, we present an algorithm for feature selection. This algorithm labeled QC-FS: Quantum
Clustering for Feature Selection performs the selection in two steps. Partitioning the original features
space in order to group similar features is performed using the Quantum Clustering algorithm. Then the
selection of a representative for each cluster is carried out. It uses similarity measures such as correlation
coefficient (CC) and the mutual information (MI). The feature which maximizes this information is chosen
by the algorithm
On the High Dimentional Information Processing in Quaternionic Domain and its...IJAAS Team
There are various high dimensional engineering and scientific applications in communication, control, robotics, computer vision, biometrics, etc.; where researchers are facing problem to design an intelligent and robust neural system which can process higher dimensional information efficiently. The conventional real-valued neural networks are tried to solve the problem associated with high dimensional parameters, but the required network structure possesses high complexity and are very time consuming and weak to noise. These networks are also not able to learn magnitude and phase values simultaneously in space. The quaternion is the number, which possesses the magnitude in all four directions and phase information is embedded within it. This paper presents a well generalized learning machine with a quaternionic domain neural network that can finely process magnitude and phase information of high dimension data without any hassle. The learning and generalization capability of the proposed learning machine is presented through a wide spectrum of simulations which demonstrate the significance of the work.
Study of Different Multi-instance Learning kNN AlgorithmsEditor IJCATR
Because of it is applicability in various field, multi-instance learning or multi-instance problem becoming more popular in
machine learning research field. Different from supervised learning, multi-instance learning related to the problem of classifying an
unknown bag into positive or negative label such that labels of instances of bags are ambiguous. This paper uses and study three
different k-nearest neighbor algorithm namely Bayesian -kNN, citation -kNN and Bayesian Citation -kNN algorithm for solving multiinstance
problem. Similarity between two bags is measured using Hausdroff distance. To overcome the problem of false positive
instances constructive covering algorithm used. Also the problem definition, learning algorithm and experimental data sets related to
multi-instance learning framework are briefly reviewed in this paper
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
For the last five decades, computational physics has been a valuable scientific instrument in physics. In comparison to using only theoretical and experimental approaches, it has enabled physicists to understand complex problems better. Computational physics was mostly a scientific activity at the time, with relatively few organised undergraduate study.
Ph.D. Assistance serves as an external mentor to brainstorm your idea and translate that into a research model. Hiring a mentor or tutor is common and therefore let your research committee know about the same. We do not offer any writing services without the involvement of the researcher.
Learn More: https://bit.ly/3AUvG0y
Contact Us:
Website: https://www.phdassistance.com/
UK NO: +44–1143520021
India No: +91–4448137070
WhatsApp No: +91 91769 66446
Email: info@phdassistance.com
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...csandit
The growing population of elders in the society calls for a new approach in care giving. By
inferring what activities elderly are performing in their houses it is possible to determine their
physical and cognitive capabilities. In this paper we show the potential of important
discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional
Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor
patterns in a smart home environment. We address also the class imbalance problem in activity
recognition field which has been known to hinder the learning performance of classifiers. Cost
sensitive learning is attractive under most imbalanced circumstances, but it is difficult to
determine the precise misclassification costs in practice. We introduce a new criterion for
selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four
real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed
criterion outperforms the state-of-the-art discriminative methods in activity recognition.
Multimodal authentication is one of the prime concepts in current applications of real scenario. Various
approaches have been proposed in this aspect. In this paper, an intuitive strategy is proposed as a
framework for providing more secure key in biometric security aspect. Initially the features will be
extracted through PCA by SVD from the chosen biometric patterns, then using LU factorization technique
key components will be extracted, then selected with different key sizes and then combined the selected key
components using convolution kernel method (Exponential Kronecker Product - eKP) as Context-Sensitive
Exponent Associative Memory model (CSEAM). In the similar way, the verification process will be done
and then verified with the measure MSE. This model would give better outcome when compared with SVD
factorization[1] as feature selection. The process will be computed for different key sizes and the results
will be presented.
The effect of gamma value on support vector machine performance with differen...IJECEIAES
Currently, the support vector machine (SVM) regarded as one of supervised machine learning algorithm that provides analysis of data for classification and regression. This technique is implemented in many fields such as bioinformatics, face recognition, text and hypertext categorization, generalized predictive control and many other different areas. The performance of SVM is affected by some parameters, which are used in the training phase, and the settings of parameters can have a profound impact on the resulting engine’s implementation. This paper investigated the SVM performance based on value of gamma parameter with used kernels. It studied the impact of gamma value on (SVM) efficiency classifier using different kernels on various datasets descriptions. SVM classifier has been implemented by using Python. The kernel functions that have been investigated are polynomials, radial based function (RBF) and sigmoid. UC irvine machine learning repository is the source of all the used datasets. Generally, the results show uneven effect on the classification accuracy of three kernels on used datasets. The changing of the gamma value taking on consideration the used dataset influences polynomial and sigmoid kernels. While the performance of RBF kernel function is more stable with different values of gamma as its accuracy is slightly changed.
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERcscpconf
In this paper, we study the performance criterion of machine learning tools in classifying breast cancer. We compare the data mining tools such as Naïve Bayes, Support vector machines, Radial basis neural networks, Decision trees J48 and simple CART. We used both binary and multi class data sets namely WBC, WDBC and Breast tissue from UCI machine learning depositary. The experiments are conducted in WEKA. The aim of this research is to find out the best classifier with respect to accuracy, precision, sensitivity and specificity in detecting breast cancer
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
In this paper, we present an algorithm for feature selection. This algorithm labeled QC-FS: Quantum
Clustering for Feature Selection performs the selection in two steps. Partitioning the original features
space in order to group similar features is performed using the Quantum Clustering algorithm. Then the
selection of a representative for each cluster is carried out. It uses similarity measures such as correlation
coefficient (CC) and the mutual information (MI). The feature which maximizes this information is chosen
by the algorithm
On the High Dimentional Information Processing in Quaternionic Domain and its...IJAAS Team
There are various high dimensional engineering and scientific applications in communication, control, robotics, computer vision, biometrics, etc.; where researchers are facing problem to design an intelligent and robust neural system which can process higher dimensional information efficiently. The conventional real-valued neural networks are tried to solve the problem associated with high dimensional parameters, but the required network structure possesses high complexity and are very time consuming and weak to noise. These networks are also not able to learn magnitude and phase values simultaneously in space. The quaternion is the number, which possesses the magnitude in all four directions and phase information is embedded within it. This paper presents a well generalized learning machine with a quaternionic domain neural network that can finely process magnitude and phase information of high dimension data without any hassle. The learning and generalization capability of the proposed learning machine is presented through a wide spectrum of simulations which demonstrate the significance of the work.
Tutorial at the Winter School on Machine Learning, Gran Canaria, January 2020 (ppsx format, 52 slides)
Michael Biehl, University of Groningen, The Netherlands
Talk presented at WSOM 2016 in Houston/Texas.
Machine learning based classification of FDG-PET scan data for the diagnosis of neurodegenerative disorders
June 2017: Biomedical applications of prototype-based classifiers and relevan...University of Groningen
A presentation of several biomedical applications of prototype-based machine learning and relevance learning. Invited talk at the AlCoB conference 2017 in Aveiro/Portugal.
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Similar to last year, I’ve posted all the content (lectures, labs and software) for any one to follow along with at their own pace. I also plan to release videos for all the lectures and labs.
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...cscpconf
The growing population of elders in the society calls for a new approach in care giving. By inferring what activities elderly are performing in their houses it is possible to determine their
physical and cognitive capabilities. In this paper we show the potential of important discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor patterns in a smart home environment. We address also the class imbalance problem in activity recognition field which has been known to hinder the learning performance of classifiers. Cost sensitive learning is attractive under most imbalanced circumstances, but it is difficult to determine the precise misclassification costs in practice. We introduce a new criterion for selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed criterion outperforms the state-of-the-art discriminative methods in activity recognition.
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
Introductory lecture to multivariate analysis of proteomic data.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024University of Groningen
An introduction to interpretable machine learning in endocrinology.
In particular, the application of Generalized Matrix Relevance LVQ to the classification of andrenocortical tumors and the differential diagnosis of primary aldosteronism is given.
A tutorial given at the AMALEA workshop 2022:
Unsupervised and supervised prototype-based learning is illustrated in terms of bio-medical applications.
A tutorial given at the AMALEA workshop 2022.
This talk presents the statistical physics based theory of machine learning in terms of simple example systems. As a recent application, the occurrence of phase transitions in layered networks is discussed.
Short presentation (15 minutes) focussing on the application of unsupervised and supervised machine learning in the paper "Tissue- and development-stage specific mRNA and heterogeneous CNV signatures of human ribosomal proteins in normal and cancer samples
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
1. Michael Biehl,
www.cs.rug.nl/~biehl
Intelligent Systems
Bernoulli Institute for Mathematics,
Computer Science and Artificial Intelligence
University of Groningen, The Netherlands
Medical applications of machine learning:
Prototype-based classifiers and relevance learning
2. Prototype and distance-based systems
• basic concepts of Learning Vector Quantization
• distance measures and relevance learning
• adrenal tumor classification: steroid metabolomics
• early stages of rheumatoid arthritis: cytokine expression
• neurodegenerative diseases: 3D FDG-PET scan images
Application examples
Challenges, summary and outlook
Medical applications of machine learning:
Prototype-based classifiers and relevance learning
3. supervised learning
classification / regression / prediction
based on labeled example data
generic workflow:
example data model apply to novel data
training working
obvious performance measures: overall / class-wise accuracy
ROC, Precision Recall ...
but ...
validation
estimate working performance
set parameters of model / training
compare different models
4. accuracy is not enough (P. Lisboa)
a machine learning urban legend
US military in the 1990s:
- classifier to distinguish US from Russian tanks
- trained on a data set of still images
- nearly perfect classification performance
(training and also validation / test)
- complete failure “in practice”
American tank Russian tank
only almost true :-)
5. 5
models should be:
transparent / intuitive / interpretable, white box
e.g.: decision criteria used by the classifier
important features contributing
- avoid artifacts, e.g. due to hidden bias in the data
- gain better insight into the data set / problem
- potentially understand underlying mechanisms
one useful framework:
similarity or distance based methods
representation / parameterization in terms of prototypes
to be avoided: blind application of black box machine learning
accuracy is not enough
6. IAC Winter School 2018, La Laguna
distance-based classifiers
a simple distance-based system: (K) NN classifier
• store a set of labeled examples
• classify a query according to the
label of the Nearest Neighbor
(or the majority of K NN)
• piece-wise linear decision
boundaries according
to (e.g.) Euclidean distance
from all examples
?
N-dim. feature space
+ conceptually simple,
+ no training phase
+ only one parameter (K)
- expensive (storage, computation)
- sensitive to mislabeled data
- overly complex decision boundaries
7. prototype-based classification
• represent the data by one or
several prototypes per class
• classify a query according to the
label of the nearest prototype
(or alternative schemes)
• local decision boundaries acc.
to (e.g.) Euclidean distances
+
+ robust, low storage needs,
little computational effort
+ natural for multi-class problems
- model selection: number of prototypes per class, etc.
requires training: placement of prototypes in feature space
N-dim. feature space
?
parameterization in feature space, interpretability
Learning Vector Quantization [Kohonen]
-
8. ∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
competitive learning: LVQ1 [Kohonen]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
9. ∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ generalization ability
correct classification of new data
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )
10. cost function based LVQ
one example: Generalized LVQ (GLVQ) cost function [Sato&Yamada, 1995]
two winning prototypes:
minimize
E favors
- small number of misclassifications, e.g. with
- large margins between classes:
small , large
- class-typical prototypes
There is nothing objective about objective functions
- J. McClelland
11. GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic gradient descent,
update of two prototypes per step:
based on non-negative, differentiable distance
addtitional requirement:
a variety of distance measures can be used in the cost function
12. 12
fixed, pre-defined distance measures:
Minkowski measures
kernelized distances
divergences, e.g.
...
alternative distance measures
possible work-flow
- select several distance measures according to prior knowledge
or in a data-driven preprocessing step
- compare performance of various measures (e.g. cross-validation)
13. astronomy vs. astrology
‘right’ distance?
angle instead
of physical
3-dim. distance
~
normalization
to unit sphere
even worse
in combination
with over-
fitting
14. Relevance Learning
elegant framework: relevance learning / adaptive distances
- employ a parameterized distance measure
with only the mathematical form fixed in advance
- optimize its parameters in the training process
- adaptive, data driven dissimilarity
example: Matrix Relevance LVQ
- data-driven optimization of prototypes
and relevance matrix
- in the same training process (≠ pre-processing )
17. GMLVQ
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
variants:
one global, several local, class-wise relevance matrices
rectangular low-dim. representation / visualization
[Bunte et al., 2012]
diagonal matrices: single feature weights [Hammer et al., 2002]
training: adaptation of prototypes
and distance measure guided by
GLVQ cost function
Generalized Matrix Relevance LVQ:
18. 18
interpretation
summarizes
• the contribution of a single dimension
• the relevance of original features in the classifier
Note: interpretation assumes implicitly that
features have equal order of magnitude
e.g. after z-score-transformation →
(averages over data set)
quantifies the contribution of pairs of
features (i,j) to the distance
after training:
prototypes represent typical class properties or subtypes
Relevance Matrix
19. Relevance Matrix LVQ
optimization of
prototype positions
distance measure(s)
in one training process
(≠ pre-processing)
motivation:
improved performance
- weighting of features and pairs of features
simplified classification schemes
- elimination of non-informative, noisy features
- discriminative low-dimensional representation
insight into the data / classification problem
- identification of most discriminative features
- incorporation of prior knowledge (e.g. structure of Ω)
21. empirical observation / theory:
relevance matrix becomes
singular, dominated by
very few eigenvectors
prevents over-fitting in
high-dim. feature spaces
facilitates discriminative
visualization / low-dim.
representation of datasets
confirms: Setosa well-separated
from Virginica / Versicolor
Relevance Matrix LVQ
22. 22
three application examples
I) steroid metabolomics
- discrimination of malignant vs. benign adrenal tumors
based on urinary steroid metabolite excretion
main aim: practical diagnosis support tool
II) cytokine expression data
- detection of (early) rheumatoid arthritis
based on synovial tissue samples
main aim: marker identification, disease mechanisms
III) FDG-PET scan brain images
- diagnosis / discrimination of neurodegenerative diseases
based on 3D functional imaging
main aim: method development /processing pipelines
23. (I) Steroid metabolomics: detecting
malignancy in adrenocortical tumors
www.ensat.org
W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider,
D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat,
F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton,
X. Bertagna, M.Fassnacht, P. Stewart
Urine Steroid Metabolomics as a Biomarker Tool for Detecting
Malignancy in Patients with Adrenal Tumors
J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011)
24. www.ensat.org
classification of adrenocortical tumors (adenoma vs. carcinoma)
based on steroid hormone excretion profiles
benign ACA malignant ACC
features: 32 steroid metabolite excretion values
non-invasive measurement (24 hrs. urine samples)
steroid metabolomics
aim: develop a novel biomarker tool for differential diagnosis
idea: identify characteristic steroid profiles (prototypes)
25. Generalized Matrix LVQ , ACC vs. ACA classification
∙ data divided in 90% training, 10% test set, (z-score transformed)
∙ determine prototypes
typical profiles (1 per class)
∙ apply classifier to test data
evaluate performance (error rates, ROC)
∙ adaptive generalized quadratic distance measure
parameterized by
∙ repeat and average over many random splits
[Arlt et al., 2011]
[Biehl et al., 2012]
steroid metabolomics
27. subset of selected steroids ↔ technical realization (patented, UoB)
using 9 markers only, similar ROC
Relevance matrix
… of pairs of markers
contribution of single markers
steroid metabolomics
28. ∙ Receiver Operating Characteristics (ROC)
ROC considers modified (biased) classification scheme:
false positive rate
(1-specificity)
true
positive
rate
(sensitivity)
θ = 0
Area under Curve
(AUC)
all tumors classified as ACA
- no false positives
- no true positives detected
all tumors classified as ACC
- all true positives detected
- max. number of false positives
steroid metabolomics
Note: different types of errors have very different consequences!
29. ROC characteristics
clear improvement due to
adaptive distances
90% / 10% randomized
splits of the data in
training and test set
averages over 1000 runs
(1-specificity)
(sensitivity)
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97
steroid metabolomics
32. visualization of the data set
ACA
ACC
generic property: relevance matrix becomes highly singular
33. • monitoring of patients after surgery and/or under medication
aim: recurrence detection proof of concept study submitted
work in progress
• high-throughput LC/MS assay to replace GC/MS,
publication in preparation
• other disorders affecting / related to steroid metabolism
e.g. liver disease (NAFLD etc.), first results submitted
(with J. Tomlinson, Oxford)
• prospective study w.r.t. ~ 2000 patients, submitted
confirms performance as a practical diagnosis system
34. (II) Early stages of Rheumatoid Arthritis
Expression of chemokines CXCL4 and CXCL7 by synovial
macrophages defines an early stage of rheumatoid arthritis
Annals of the Rheumatic Diseases 75:763-771 (2016)
L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow
C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner
35. uninflamed control established RA early inflammation
resolving early RA
cytokine based diagnosis of RA
at earliest possible stage ?
ultimate goals:
understand pathogenesis and
mechanism of progression
rheumatoid arthritis (RA)
37. GMLVQ analysis
pre-processing:
• log-transformed expression values
• 21 leading principal components explain 95% of the variation
Two two-class problems: (A) established RA vs. uninflamed controls
(B) early RA vs. resolving inflammation
• 1 prototype per class, global relevance matrix, distance measure:
• leave-two-out validation (one from each class)
evaluation in terms of Receiver Operating Characteristics
40. CXCL4 chemokine (C-X-C motif) ligand 4
CXCL7 chemokine (C-X-C motif) ligand 7
direct study on protein level, staining / imaging of sinovial tissue:
macrophages : predominant source of CXCL4/7 expression
protein level studies
• high levels of CXCL4 and
CXLC7 in early RA
• expression on macrophages
outside of blood vessels
discriminates
early RA / resolving cases
42. future work
• more samples (difficult...) needed in order
to obtain a reliable early diagnosis
• integrated analysis of gene expression and other data
from the same / an analogous patient cohort
43. (III) Analysis of FDG-PET image data for the
diagnosis of neurodegenerative disorders
44. 44
based on: FDG-PET scan brain images
subject scores derived from 3D images
data: acquired at three different medical centers
identical equipment, identical processing (??)
ultimate early reliable diagnosis of neurodegenerative disorders
aim: Alzheimer’s disease (AD), Parkinson’s disease (PD), etc.
analysis: machine learning, classifiers: SVM, (L)GMLVQ
within center and across center performances
questions: reliable FDG-PET based diagnosis ?
compatible across different medical centers ?
can we obtain a robust‘universal classifier’ ?
overview
45. Subjects
Source HC PD AD
CUN 19 49 -
UGOSM 44 58 55
UMCG 19 20 21
FDG-PET brain scans from 3 centers
• Clínica Universidad de Navarra
• Univ. Genoa/IRCCS San Martino
• Univ. Medical Center Groningen
Glucose
uptake
http://glimpsproject.com
subjects
A
B
C
FDG-PET 3D images
Fluorodeoxyglucose
positron emission tomography
Healthy Controls HC
Parkinson’s Disease PD
Alzheimer’s Disease AD
data
46. 46
work flow
subjects
~
200000
voxels
subject specific
anatomy
high intensity,
low noise voxels
log-transform
masking (*)
low-dimensional
projections
by SSM/PCA (*)
subject
socres
subjects
details of pre-processing:
D. Mudali et al.
Computational and Mathematical Methods in Medicine.
March 2015, Art.ID 136921 and references therein
(*) Scaled Subprofile Model / PCA based
on a (disjoint) reference group of subjects
48. 48
(A) Perceptron of optimal stability (aka “SVM with linear kernel”)
- linear threshold classifier
- large margin (with errors)
- Matlab R2016a (Statistics Toolbox):
fitcsvm, predict with default parameters
performance evaluation:
averages over 10 randomized runs of 10-fold cross-validation
accuracies, sensitivity /specificity, ROC, ...
(A,B) have outperformed Decision Trees in previous projects
classifiers
(B) Generalized Matrix Learning Vector Quantization (GMLVQ)
www.cs.rug.nl/~biehl/gmlvq
(C) Local Relevance Matrix LVQ (LGMLVQ)
http://matlabserver.cs.rug.nl/gmlvqweb/web/
with default parameters, one prototype per class
49. results: within centers
• subjects from one center only, here: UGOSM
relatively good within-center performance (also in 3-class setting)
Classifier Sens. (%) Spec. (%) AUC (ROC)
PD vs HC SVM 74.23 (19.0) 68.05 (25.9) 0.80 (0.2)
GMLVQ 75.13 (16.9) 77.50 (22.5) 0.84 (0.1)
LGMLVQ 79.23 (15.2) 68.15 (22.6) 0.83 (0.1)
AD vs HC SVM 95.40 (8.9) 92.00 (13.1) 0.99 (0.0)
GMLVQ 88.67 (15.0) 92.90 (13.4) 0.97 (0.1)
LGMLVQ 91.47 (12.3) 91.45 (14.4) 0.98 (0.0)
PD vs AD SVM 82.10 (16.2) 83.83 (16.0) 0.92 (0.1)
GMLVQ 81.00 (17.2) 81.67 (15.8) 0.91 (0.1)
LGMLVQ 84.70 (15.2) 86.63 (14.8) 0.95 (0.1)
[ mean (std. dev.)]
50. results: across centers
• compatible across different medical centers ?
reasonable, yet lower accuracies across centers
PD vs. HC Classifier Sens. (%) Spec. (%) AUC(ROC)
Training: CUN
Test: UGOSM
SVM 58.62 70.45 0.68
GMLVQ 86.21 31.82 0.72
LGMLVQ 98.28 4.55 0.57
Training: UGOSM
Test: UMCG
SVM 100.00 21.05 0.82
GMLVQ 70.00 63.16 0.74
LGMLVQ 95.00 47.37 0.91
Training: UMCG
Test: CUN
SVM 54.41 73.68 0.70
GMLVQ 33.82 89.47 0.70
LGMLVQ 47.06 84.21 0.72
51. experiment - can we classify subjects according to medical center ?
results: prediction of centers
possible explanations:
- center-specific (pre-)processing despite supposedly
identical equipment and work flows
- significantly different patient cohorts (not the case in HC)
HC only Classifier Sens. (%) Spec. (%) AUC (ROC)
CUN vs.
UGOSM
SVM 99.75 93.00 1.00
GMLVQ 97.30 91.00 0.99
LGMLVQ 100.00 89.50 0.99
52. outlook: voxel space interpretation
PD / AD
prototypes
(low-dim.)
back-projections
(pseudo-inverse)
on-going:
assessment by
radiologists /
neurologists
53. outlook: voxel space interpretation
discriminative directions in voxel-space
prototypes
54. outlook: across center classification
aim: unified classifiers with good inter-center performance
check/improve: consistent protocols / essays
unified pre-processing
dummy measurments
matching patient cohorts (?)
transfer learning:
identifiy and correct systematic differences
adjustment using center-specific prototypes
eliminate center-discriminating directions
55. 55
summary/conclusion
prototype- and distance based systems:
- intuitive, transparent, interpretable
- easy to implement, flexible, natural tool for multiclass-problems
- classification, regression, unsupervised learning, visualization ...
- relevance learning: further insight into data and problem
- suitable for a variety of bio-medical problems
review articles:
M. Biehl, B. Hammer, T. Villmann. Prototype-based models in
Machine Learning. Advanced Review: WIRES Cognitive Science 7(2):
92-111 (2016)
M. Biehl: Biomedical Applications of Prototype Based Classifiers
and Relevance Learning. In: Proc. 4th Intl. Conf. on Algorithms for
Comput. Biology Springer Lecture Notes in Comp. Sci. 10252, 2017
56. http://matlabserver.cs.rug.nl/gmlvqweb/web/
Matlab code:
Relevance and Matrix adaptation in Learning Vector
Quantization (GRLVQ, GMLVQ and LiRaM LVQ) [K. Bunte]
http://www.cs.rug.nl/~biehl/
links
Related pre- and re-prints etc.:
A no-nonsense beginners’ tool for GMLVQ:
http://www.cs.rug.nl/~biehl/gmlvq
A Scikit-Learn compatible collection of Python code
for LVQ and variants, including GMLVQ [Rick van Veen]:
https://sklvq.readthedocs.io/en/stable/
source code:
https://github.com/rickvanveen/sklvq