Talk presented at WSOM 2016 in Houston/Texas.
Machine learning based classification of FDG-PET scan data for the diagnosis of neurodegenerative disorders
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
Machine learning is the scientific study of algorithms and statistical models that is used by the machines to perform a specific task depending on patterns and inference rather than explicit instructions. This research and analysis aims to observe how precisely a machine can predict that a patient suspected of breast cancer is having malignant or benign cancer.In this paper the classification of cancer type and prediction of risk levels is done by various model of machine learning and is pictorially depicted by various tools of visual analytics.
Clustering and Classification of Cancer Data Using Soft Computing Technique IOSR Journals
Clustering and classification of cancer data has been used with success in field of medical side. In
this paper the two algorithm K-means and fuzzy C-means proposed for the comparison and find the accuracy of
the result. this paper address the problem of learning to classify the cancer data with two different method and
information derived from the training and testing .various soft computing based classification and show the
comparison of classification technique and classification of this health care data .this paper present the
accuracy of the result in cancer data.
Talk presented at WSOM 2016 in Houston/Texas.
Machine learning based classification of FDG-PET scan data for the diagnosis of neurodegenerative disorders
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
Machine learning is the scientific study of algorithms and statistical models that is used by the machines to perform a specific task depending on patterns and inference rather than explicit instructions. This research and analysis aims to observe how precisely a machine can predict that a patient suspected of breast cancer is having malignant or benign cancer.In this paper the classification of cancer type and prediction of risk levels is done by various model of machine learning and is pictorially depicted by various tools of visual analytics.
Clustering and Classification of Cancer Data Using Soft Computing Technique IOSR Journals
Clustering and classification of cancer data has been used with success in field of medical side. In
this paper the two algorithm K-means and fuzzy C-means proposed for the comparison and find the accuracy of
the result. this paper address the problem of learning to classify the cancer data with two different method and
information derived from the training and testing .various soft computing based classification and show the
comparison of classification technique and classification of this health care data .this paper present the
accuracy of the result in cancer data.
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
Medical data mining has great deal for exploring new knowledge from large amount of data. Classification is one of the important data mining techniques for classification of data. In this research work, we have used various data mining based classification techniques for classification of cancer diseases patient or not. We applied the Breast Cancer-Wisconsin (Original) data set into different data mining techniques and compared the accuracy of models with two different data partitions. BayesNet achieved highest accuracy as 97.13% in case of 10-fold data partitions. We have also applied the info gain feature selection technique on BayesNet and Support Vector Machine (SVM) and achieved best accuracy 97.28% accuracy with BayesNet in case of 6 feature subset.
Get best thesis topics in machine learning from Experienced Ph.D. Writers at Techsparks with 100% Plagiarism Free Work & Affordable price. Our goal is to make students free from their assignments burden, by providing the best thesis assistance. For more details call us at-9465330425 or Visit at: https://bit.ly/3zRB3vN
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
I gave this talk in the EDBT 2014 conference, which tool place in Athens, Greece.
I show how data examples can be used to characterize the behavior of scientific modules. I present a new methods that automatically generate the data examples, and show that such data examples are useful for the human user to understand the task of the modules, and that they can be used to assist curators in repairing broken workflows (i.e., workflows for which one or more modules are no longer supplied by their providers)
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
Invited lecture on Machine Learning in Medicine at the joint "Integrated Omics" course of Hanze University and University Hospital UMCG, Groningen, The Netherlands
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
Medical data mining has great deal for exploring new knowledge from large amount of data. Classification is one of the important data mining techniques for classification of data. In this research work, we have used various data mining based classification techniques for classification of cancer diseases patient or not. We applied the Breast Cancer-Wisconsin (Original) data set into different data mining techniques and compared the accuracy of models with two different data partitions. BayesNet achieved highest accuracy as 97.13% in case of 10-fold data partitions. We have also applied the info gain feature selection technique on BayesNet and Support Vector Machine (SVM) and achieved best accuracy 97.28% accuracy with BayesNet in case of 6 feature subset.
Get best thesis topics in machine learning from Experienced Ph.D. Writers at Techsparks with 100% Plagiarism Free Work & Affordable price. Our goal is to make students free from their assignments burden, by providing the best thesis assistance. For more details call us at-9465330425 or Visit at: https://bit.ly/3zRB3vN
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
I gave this talk in the EDBT 2014 conference, which tool place in Athens, Greece.
I show how data examples can be used to characterize the behavior of scientific modules. I present a new methods that automatically generate the data examples, and show that such data examples are useful for the human user to understand the task of the modules, and that they can be used to assist curators in repairing broken workflows (i.e., workflows for which one or more modules are no longer supplied by their providers)
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
Invited lecture on Machine Learning in Medicine at the joint "Integrated Omics" course of Hanze University and University Hospital UMCG, Groningen, The Netherlands
June 2017: Biomedical applications of prototype-based classifiers and relevan...University of Groningen
A presentation of several biomedical applications of prototype-based machine learning and relevance learning. Invited talk at the AlCoB conference 2017 in Aveiro/Portugal.
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...Barbara Russo
Predicting system failures can be of great benefit to managers that get a better command over system performance.
Data that systems generate in the form of logs is a valuable source of information to predict system reliability. As such, there is an increasing demand of
tools to mine logs and provide accurate predictions. However, interpreting information in logs poses some challenges. This talk
presents how to effectively mining sequences of logs and provide correct predictions.
The approach integrates different machine learning techniques to control for data brittleness, provide accuracy of model selection and validation,
and increase robustness of classification results. We apply the proposed approach to log sequences of 25 different applications of a software system for
telemetry of cars
Tutorial at the Winter School on Machine Learning, Gran Canaria, January 2020 (ppsx format, 52 slides)
Michael Biehl, University of Groningen, The Netherlands
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
Review of "Survey Research Methods & Design in Psychology"James Neill
Reviews the 150 hour, third year psychology unit which examined survey research methods, with an emphasis on the second-half of the unit on MLR, ANOVA, power, and effect size.
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...IJECEIAES
Much real world decision making is based on binary categories of information that agree or disagree, accept or reject, succeed or fail and so on. Information of this category is the output of a classification method that is the domain of statistical field studies (eg Logistic Regression method) and machine learning (eg Learning Vector Quantization (LVQ)). The input argument of a classification method has a very crucial role to the resulting output condition. This paper investigated the influence of various types of input data measurement (interval, ratio, and nominal) to the performance of logistic regression method and LVQ in classifying an object. Logistic regression modeling is done in several stages until a model that meets the suitability model test is obtained. Modeling on LVQ was tested on several codebook sizes and selected the most optimal LVQ model. The best model of each method compared to its performance on object classification based on Hit Ratio indicator. In logistic regression model obtained 2 models that meet the model suitability test is a model with predictive variables scaled interval and nominal, while in LVQ modeling obtained 3 pieces of the most optimal model with a different codebook. In the data with interval-scale predictor variable, the performance of both methods is the same. The performance of both models is just as bad when the data have the predictor variables of the nominal scale. In the data with predictor variable has ratio scale, the LVQ method able to produce moderate enough performance, while on logistic regression modeling is not obtained the model that meet model suitability test. Thus if the input dataset has interval or ratio-scale predictor variables than it is preferable to use the LVQ method for modeling the object classification.
Episode 12 : Research Methodology ( Part 2 )
Approach to de-synthesizing data, informational, and/or factual elements to answer research questions
Method of putting together facts and figures
to solve research problem
Systematic process of utilizing data to address research questions
Breaking down research issues through utilizing controlled data and factual information
SAJJAD KHUDHUR ABBAS
Chemical Engineering , Al-Muthanna University, Iraq
Oil & Gas Safety and Health Professional – OSHACADEMY
Trainer of Trainers (TOT) - Canadian Center of Human
Development
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Episode 18 : Research Methodology ( Part 8 )
Approach to de-synthesizing data, informational, and/or factual elements to answer research questions
Method of putting together facts and figures
to solve research problem
Systematic process of utilizing data to address research questions
Breaking down research issues through utilizing controlled data and factual information
SAJJAD KHUDHUR ABBAS
Chemical Engineering , Al-Muthanna University, Iraq
Oil & Gas Safety and Health Professional – OSHACADEMY
Trainer of Trainers (TOT) - Canadian Center of Human
Development
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Matthias Braunhofer
Context-Aware Recommender System (CARS) models are trained on datasets of context-dependent user preferences (ratings and context information). Since the number of context-dependent preferences increases exponentially with the number of contextual factors, and certain contextual in- formation is still hard to acquire automatically (e.g., the user’s mood or for whom the user is buying the searched item) it is fundamental to identify and acquire those factors that truly influence the user preferences and the ratings. In particular, this ensures that (i) the user effort in specifying contextual information is kept to a minimum, and (ii) the system’s performance is not negatively impacted by irrelevant contextual information. In this paper, we propose a novel method which, unlike existing ones, directly estimates the impact of context on rating predictions and adaptively identifies the contextual factors that are deemed to be useful to be elicited from the users. Our experimental evaluation shows that it compares favourably to various state-of-the-art context selection methods.
Similar to 2015: Distance based classifiers: Basic concepts, recent developments and application examples (20)
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024University of Groningen
An introduction to interpretable machine learning in endocrinology.
In particular, the application of Generalized Matrix Relevance LVQ to the classification of andrenocortical tumors and the differential diagnosis of primary aldosteronism is given.
A tutorial given at the AMALEA workshop 2022:
Unsupervised and supervised prototype-based learning is illustrated in terms of bio-medical applications.
A tutorial given at the AMALEA workshop 2022.
This talk presents the statistical physics based theory of machine learning in terms of simple example systems. As a recent application, the occurrence of phase transitions in layered networks is discussed.
The statistical physics of learning revisted: Phase transitions in layered ne...University of Groningen
"The statistical physics of learning revisted: Phase transitions in layered neural networks"
Physics Colloquium at the University of Leipzig/Germany, June 29, 2021
24 slides, ca 45 minutes
Short presentation (15 minutes) focussing on the application of unsupervised and supervised machine learning in the paper "Tissue- and development-stage specific mRNA and heterogeneous CNV signatures of human ribosomal proteins in normal and cancer samples
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
What is greenhouse gasses and how many gasses are there to affect the Earth.
2015: Distance based classifiers: Basic concepts, recent developments and application examples
1. Michael Biehl
Mathematics and Computing Science
University of Groningen / NL
Tutorial as satellite event of CAIP 2015
Saint Martin’s Institute of Higher Education
Malta, August 31, 2015
Distance based classifiers: Basic concepts,
recent developments, and application examples
www.cs.rug.nl/~ biehl
2. St. Martin’s Institute, August 2015
1) Distance based classifieres, Learning Vector Quantization
classification problems
distance based classifiers, from KNN to prototypes
the basic scheme: LVQ1
cost function based training: GLVQ
Application: classification of adrenal tumors (I)
Receiver Operator Characteristics
performance evaluation by (cross-) validation
2) GLVQ implementation
stochastic gradient descent, learning rate schedule
batch gradient descent, step size control
Demo: GLVQ with the no-nonsense GMLVQ toolbox
Overview
3. St. Martin’s Institute, August 2015
3) Alternative distance measures and Relevance Learning
Fixed distance measures:
Minkowski measures, Kernelized distances, Divergences
Application example: detection of Cassava Mosaic Disease
Adaptive distance measures
Matrix Relevance Learning Vector Quantizaion
Application example: Adrenal Tumors cont‘d
Demos: GMLVQ with the no-nonsense GMLVQ toolbox
Application example: Early diagnosis of Rheumatoid Arthritis
Uniqueness, regularization and singularity control
Challenges in bio-medical data analysis
Concluding remarks, references
Overview
5. St. Martin’s Institute, August 2015 5
classification problems
- character/digit/speech recognition
- medical diagnoses
- pixel-wise segmentation in image processing
- object recognition/scene analysis
- fault detection in technical systems
- ...
machine learning approach:
extract information from example data
parameterized in a learning system (neural network, LVQ, SVM...)
working phase: application to novel data
here only: supervised learning , classification:
6. St. Martin’s Institute, August 2015 6
distance based classification
assignment of data (objects, observations,...)
to one or several classes (crisp/soft) (categories, labels)
based on comparison with reference data (samples, prototypes)
in terms of a distance measure (dis-similarity, metric)
representation of data (a key step!)
- collection of qualitative/quantitative descriptors
- vectors of numerical features
- sequences, graphs, functional data
- relational data, e.g. in terms of pairwise (dis-) similarities
7. St. Martin’s Institute, August 2015
K-NN classifier
a simple distance-based classifier
- store a set of labeled examples
- classify a query according to the
label of the Nearest Neighbor
(or the majority of K NN)
- local decision boundary acc.
to (e.g.) Euclidean distances
?
- piece-wise linear class borders
parameterized by all examples
feature space
+ conceptually simple, no training required, one parameter (K)
- expensive storage and computation, sensitivity to “outliers”
can result in overly complex decision boundaries
8. St. Martin’s Institute, August 2015
prototype based classification
a prototype based classifier [Kohonen 1990, 1997]
- represent the data by one or
several prototypes per class
- classify a query according to the
label of the nearest prototype
(or alternative schemes)
- local decision boundaries according
to (e.g.) Euclidean distances
- piece-wise linear class borders
parameterized by prototypes
feature space
?
+ less sensitive to outliers, lower storage needs, little computational
effort in the working phase
- training phase required in order to place prototypes,
model selection problem: number of prototypes per class, etc.
9. St. Martin’s Institute, August 2015
What about the curse of dimensionality ?
concentration of norms/distances for large N
„distance based methods are bound to fail in high dimensions“ ?
LVQ:
- prototypes are not just random data points
- carefully selected representatives of the data
- distances of a given data point to prototypes are compared
projection to non-trivial
low-dimensional subspace!
[Ghosh et al., 2007, Witoelar et al., 2010]
models of LVQ training, analytical treatment in the limit
successful training needs training examples
see also:
10. St. Martin’s Institute, August 2015
set of prototypes
carrying class-labels
based on dissimilarity / distance measure
nearest prototype classifier (NPC):
given - determine the winner with
Nearest Prototype Classifier (NPC)
minimal requirements:
- assign to class
standard example:
squared Euclidean
11. St. Martin’s Institute, August 2015
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
competititve learning: LVQ1 [Kohonen, 1990]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
12. St. Martin’s Institute, August 2015
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ generalization ability
correct classification of new data
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )
13. St. Martin’s Institute, August 2015
sequential presentation of labelled examples
… the winner takes it all:
learning rate
many heuristic variants/modifications: [Kohonen, 1990,1997]
- learning rate schedules ηw (t) [Darken & Moody, 1992]
- update more than one prototype per step
iterative training procedure:
randomized initial , e.g. close to the class-conditional means
LVQ1
LVQ1 update step:
14. St. Martin’s Institute, August 2015
LVQ1 update step:
LVQ1-like update for
generalized distance:
requirement:
update decreases (increases) distance if classes coincide (are different)
LVQ1
15. St. Martin’s Institute, August 2015
cost function based LVQ
one example: Generalized LVQ [Sato & Yamada, 1995]
sigmoidal (linear for small arguments), e.g.
E approximates number of misclassifications
linear
E favors large margin separation of classes, e.g.
two winning prototypes:
minimize
small , large
E favors class-typical prototypes
16. St. Martin’s Institute, August 2015
cost function based LVQ
There is nothing objective about objective functions
James L. McClelland
17. St. Martin’s Institute, August 2015
GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic gradient descent,
update of two prototypes per step
based on non-negative, differentiable distance
requirement:
18. St. Martin’s Institute, August 2015
GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic sequence of examples,
update of two prototypes per step
based on non-negative, differentiable distance
19. St. Martin’s Institute, August 2015
GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic sequence of examples,
update of two prototypes per step
based on Euclidean distance
moves prototypes towards / away from
sample with prefactors
20. St. Martin’s Institute, August 2015
related schemes
Many variants of LVQ
intuitive schemes: LVQ2, LVQ2.1, LVQ3, OLVQ, ...
cost function based: RSLVQ (likelihood ratios) ...
Supervised Neural Gas (NG)
many prototypes, rank based update
Supervised Self-Organizing Maps (SOM)
neighborhood relations, topology preserving mapping
Radial Basis Function Networks (RBF)
hidden units = centroids (prototypes) with Gaussian activation
21. An example problem:
classification of adrenal tumors
Wiebke Arlt , Angela Taylor
Dave J. Smith, Peter Nightingale
P.M. Stewart, C.H.L. Shackleton
et al.
Petra Schneider
Han Stiekema
Michael Biehl
Johann Bernoulli Institute for
Mathematics and Computer Science
University of Groningen
School of Medicine
Queen Elizabeth Hospital
University of Birmingham/UK
(+ several centers in Europe)
tumor classification
[Arlt et al., J. Clin. Endocrinology & Metabolism, 2011]
22. St. Martin’s Institute, August 2015
∙ adrenal tumors are common (1-2%)
and mostly found incidentally
∙ adrenocortical carcinomas (ACC) account
for 2-11% of adrenal incidentalomas
( ACA: adrenocortical adenomas )
∙ conventional diagnostic tools lack sensitivity
and are labor and cost intensive (CT, MRI)
www.ensat.org
adrenal
gland
∙ idea: tumor classification based on steroid excretion profile
tumor classification
23. St. Martin’s Institute, August 2015
- urinary steroid excretion (24 hours)
- 32 potential biomarkers
- biochemistry imposes correlations, grouping of steroids
tumor classification
24. St. Martin’s Institute, August 2015
ACApatient#
ACCpatient#
# steroid marker
102 patients with benign ACA
45 patients with malignant ACC
color coded excretion values
(logarithmic scale, relative to healthy controls)
data set:
tumor classification
25. St. Martin’s Institute, August 2015
Generalized LVQ , training and performance evaluation
∙ data divided in 90% training and 10% test set
∙ determine prototypes by (stochastic) gradient descent
typical profiles (1 per class)
∙ apply classifier to test data
evaluate performance (error rates)
∙ employ Euclidean distance measure
in the 32-dim. feature space
∙ repeat and average over many random splits
tumor classification
26. St. Martin’s Institute, August 2015
ACA
ACC
prototypes:
steroid excretion
in ACA/ACC
tumor classification
27. St. Martin’s Institute, August 2015
∙ Receiver Operator Characteristics (ROC) [Fawcett, 2000]
obtained by introducing a biased NPC:
false positive rate
(1-specificity)
truepositiverate
(sensitivity)
θ = 0
Area under Curve
all tumors classified as ACA
- no false alarms
- no true positives detected
all tumors classified as ACC
- all true positives detected
- max. number of false alarms
tumor classification
(NPC)
28. St. Martin’s Institute, August 2015
ROC characteristics (averaged over splits of the data set)
AUC=0.87
GLVQ performance:
tumor classification
30. St. Martin’s Institute, August 2015 30
brief excursion: gradient descent
stochastic gradient descent: convergence requires
decreasing learning rate with ‘time’ (number of steps t ),
e.g. as
condition [Robbins and Monro, 1954]:
?
alternatives:
- more general optimization schemes
(conjugate gradient, line search, second order derivatives…)
- adaptive learning rates
- …
31. St. Martin’s Institute, August 2015 31
batch gradient descent
batch gradient-based descent w.r.t. GLVQ costs
concatenated prototype vector
update in the direction of
the negative (full) gradient
step size for normalized gradient
32. St. Martin’s Institute, August 2015
batch gradient descent
too small:
slow convergence
too large:
over-shooting
zig-zagging
oscillatory behavior
divergence
Waypoint averaging
[Papari, Biehl, Bunte, 2011]
(here: modified default step)
default: increase αw by factor, e.g. 1.1
if E(mean over k last ) < E (next )
replace next by mean
reduce αw by a factor, e.g. 2/3
end
33. St. Martin’s Institute, August 2015 33
- collection of Matlab code (no toolboxes required)
includes example data sets and limited documentation
- mainly for demo-purposes (do not use for critical applications)
efficiency, programming style, etc. were not in the focus
“no nonsense” GMLVQ code collection
provides: single runs, visualization of the data set
leave-one-out, subset validation procedure
variants/options: GLVQ, [GRLVQ], GMLVQ
null-space projection
singularity-control
A no-nonsense beginners’ tool for G(M)LVQ:
http://www.cs.rug.nl/~biehl/No-Nonsense-GMLVQ.zip
34. St. Martin’s Institute, August 2015 34
example demo
>> load twoclass-difficult.mat (98 examples, 34-dim. feature vectors, binary labels)
>> [gmlvq_system,curves_single,param_set]=run_single(fvec,lbl,100)
learning curves
and step sizes
prototypes
35. St. Martin’s Institute, August 2015 35
example demo
>> load twoclass-difficult.mat
>> [gmlvq_system,curves_single,param_set]=run_single(fvec,lbl,100)
training set ROC visualization (features 33, 34)
36. St. Martin’s Institute, August 2015 36
example demo
avg. validation set ROCavg. learning curves
>> [gmlvq_mean,roc_val,lcurves_mean,lcurves_std,param_set]=…
run_validation(fvec,lbl,50);
GLVQ without relevances
…
learning curves, averages over 5 validation runs
with 10 % of examples left out for testing
avg. prototypes
37. St. Martin’s Institute, August 2015 37
http://matlabserver.cs.rug.nl/gmlvqweb/web/
More sophisticated Matlab code: [K. Bunte]
(more options, training by non-linear optimization etc.)
Relevance and Matrix adaptation in Learning Vector
Quantization (GLVQ, GRLVQ, GMLVQ and LiRaM LVQ):
http://www.cs.rug.nl/~biehl/
more links
Pre- and re-prints etc.:
39. St. Martin’s Institute, August 2015 39
fixed, pre-defined distance measures:
GLVQ (or more general cost function based LVQ):
can be based on general, differentiable distances,
e.g. Minkowski measures
Alternative distance measures
possible work-flow
- select several distance measures according to prior knowledge
or a driven-choice in a preprocessing step
- compare performance of various measures
examples: Kernelized distances
Divergences (statistics)
40. St. Martin’s Institute, August 2015 40
Kernelized distances
rewrite squared Euclidean
distance in terms of dot-product
distance measure associated with general inner product or
kernel function
e.g. Gaussian Kernel
implicit mapping to high-dimensional space for
better separability of classes, similar: Support Vector Machine
41. St. Martin’s Institute, August 2015
Divergence Based LVQ:
Detection of Cassava Mosaic Disease
Ernest Mwebaye
John Quinn
Jennifer Aduwo
Petra Schneider
Michael Biehl
Johann Bernoulli Institute
University of Groningen
Department of Computer Sciene
Makerere University, Kampala
Namulonge Crop Research Center, Uganda
41
Thomas Villmann
Sven Haase
Frank-Michael Schleif
University of Applied Sciences, Mittweida
University Bielefeld, Germany
divergence based LVQ
[Neurocomputing, 2011]
42. St. Martin’s Institute, August 2015 42
healthyMosaic
Example: detection of Mosaic disease in Cassava (maniok) plants
Makerere University and Namulonge Crop Research Center, Uganda
LVQ classifiers based on histogram specific distance measures
divergences (statistics) for non-negative, possibly normalized data
(densities, spectral data, more general functional data)
leaf images
divergence based LVQ
43. St. Martin’s Institute, August 2015 43
Squared Euclidean distance:
Cauchy-Schwartz divergence
(a) (b) (c)
divergence based LVQ
44. St. Martin’s Institute, August 2015 44
example family: γ-divergences
non-symmetric (in general) includes: Kullback-Leibler
violates triangle inequality Cauchy-Schwarz
Euclidean
divergence based LVQ
47. St. Martin’s Institute, August 2015 47
relevance learning:
- employ a parameterized distance measure
with only the mathematical form fixed in advance
- update its parameters in the training process
together with prototype training
- adaptive, data driven dissimilarity
example: Matrix Relevance LVQ
data-driven optimization of prototypes
and relevance matrix
in the same training process (≠ pre-processing )
Relevance Learning
48. St. Martin’s Institute, August 2015
Quadratic distance measure
generalized quadratic distance:
variants:
one global, several local, class-wise relevance matrices Λ(j)
→ piecewise quadratic decision boundaries
rectangular discriminative low-dim. representation
e.g. for visualization [Bunte et al., 2012]
diagonal matrices: single feature weights [Bojer et al., 2001]
[Hammer et al., 2002]
scaling of features, general linear transformation of feature space
potential normalization:
49. St. Martin’s Institute, August 2015
But this is just Mahalonobis distance…
[Mahalonobis, 1936]
S covariance matrix of random vectors
(calculated once from the data, fixed definition, not adaptive)
if you insist…
(‘two point version’)
So it is a generalized Mahalonobis distance ?
No.
a generalized
broccoli
a generalization
of Ohm’s Law
50. St. Martin’s Institute, August 2015
Relevance Matrix LVQ
optimization of prototypes and distance measure
WTA
Matrix-LVQ1
51. St. Martin’s Institute, August 2015
Relevance Matrix LVQ
Generalized Matrix LVQ
(GMLVQ)
optimization of prototypes and distance measure
52. St. Martin’s Institute, August 2015 52
heuristic interpretation
summarizes
- the contribution of the original dimension
- the relevance of original features for the classification
interpretation assumes implicitly:
features have equal order of magnitude
e.g. after z-score-transformation →
(averages over data set)
standard Euclidean distance for
linearly transformed features
53. St. Martin’s Institute, August 2015
Relevance Matrix LVQ
optimization of
prototype positions
distance measure(s)
in one training process
(≠ pre-processing)
motivation:
improved performance
- weighting of features and pairs of features
simplified classification schemes
- elimination of non-informative, noisy features
- discriminative low-dimensional representation
insight into the data / classification problem
- identification of most discriminative features
- incorporation of prior knowledge (e.g. structure of Ω)
54. St. Martin’s Institute, August 2015
related schemes
Relevance LVQ variants
local, rectangular, structured, restricted... relevance matrices
for visualization, functional data, texture recognition, etc.
relevance learning in Robust Soft LVQ, Supervised NG, etc.
combination of distances for mixed data ...
Relevance Learning related schemes in supervised learning ...
RBF Networks [Backhaus et al., 2012]
Neighborhood Component Analysis [Goldberger et al., 2005]
Large Margin Nearest Neighbor [Weinberger et al., 2006, 2010]
and many more!
Linear Discriminant Analysis (LDA)
one prototype per class + global matrix,
different objective function!
55. Classification of adrenal tumors (cont‘d)
Wiebke Arlt , Angela Taylor
Dave J. Smith, Peter Nightingale
P.M. Stewart, C.H.L. Shackleton
et al.
Petra Schneider
Han Stiekema
Michael Biehl
Johann Bernoulli Institute for
Mathematics and Computer Science
University of Groningen
School of Medicine
Queen Elizabeth Hospital
University of Birmingham/UK
(+ several centers in Europe)
[Arlt et al., J. Clin. Endocrinology & Metabolism, 2011]
[Biehl et al., Europ. Symp. Artficial Neural Networks (ESANN), 2012]
56. St. Martin’s Institute, August 2015
∙ adrenocortical tumors, difficult differential diagnosis:
ACC: adrenocortical carcinomas
ACA: adrenocortical adenomas
∙ idea: steroid metabolomics
tumor classification based on urinary steroid excretion
32 candidate steroid markers:
adrenocortical tumors
57. St. Martin’s Institute, August 2015
Generalized Matrix LVQ , ACC vs. ACA classification
∙ data divided in 90% training, 10% test set
∙ determine prototypes
typical profiles (1 per class)
∙ apply classifier to test data
evaluate performance (error rates, ROC)
∙ adaptive generalized quadratic distance measure
parameterized by
∙ repeat and average over many random splits
adrenocortical tumors
data set: 24 hrs. urinary steroid excretion
102 patients with benign ACA
45 patients with malignant ACC
58. St. Martin’s Institute, August 2015
Generalized Matrix LVQ , ACC vs. ACA classification
∙ data divided in 90% training, 10% test set, (z-score transformed)
∙ determine prototypes
typical profiles (1 per class)
∙ apply classifier to test data
evaluate performance (error rates, ROC)
∙ adaptive generalized quadratic distance measure
parameterized by
∙ repeat and average over many random splits
tumor classification (cont’d)
[Arlt et al., 2011]
[Biehl et al., 2012]
59. St. Martin’s Institute, August 2015
off-diagonal
diagonal elements
fraction of runs (random splits) in which a
steroid is rated among 9 most relevant markers
subset of 9 selected steroids ↔ technical realization (patented, University
of Birmingham/UK)
tumor classification
Relevance matrix
60. St. Martin’s Institute, August 2015
off-diagonaldiagonal elements
19
ACA
ACC
discriminative
e.g. steroid 19
tumor classification
61. St. Martin’s Institute, August 2015
off-diagonaldiagonal elements
8
ACA ACC
non-trivial role:
steroid 8 among the most relevant!
tumor classification
62. St. Martin’s Institute, August 2015
highly discriminative
combination of markers!
weaklydiscriminativemarkers
12
8
tumor classification
63. St. Martin’s Institute, August 2015
ROC characteristics
clear improvement due to
adaptive distances
(1-specificity)
(sensitivity)
8
GMLVQ
GRLVQ
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97
tumor classification
64. St. Martin’s Institute, August 2015
observation / theory :
low rank of resulting relevance matrix
often: single relevant eigendirection
eigenvalues
in ACA/ACC
classification
intrinsic regularization
nominally ~ NxN adaptive parameters in Matrix LVQ
reduce to ~ N effective degrees of freedom
low-dimensional representation
facilitates, e.g., visualization of labeled data sets
tumor classification
theory: stationarity of Matrix RLVQ
Biehl et al. Stationarity of Matrix Relevance LVQ
Proc. IJCNN 2015
65. St. Martin’s Institute, August 2015
tumor classification
visualization of the data set
ACA
ACC
66. St. Martin’s Institute, August 2015 66
modified batch gradient descent
batch gradient-based descent w.r.t. costs
concatenated prototype vector
elements of Ω
updates in the direction of
the normalized gradients
waypoint averaging and step size control
separately for and
67. St. Martin’s Institute, August 2015 67
example demo
>> load twoclass-difficult.mat (98 34-dim. feature vectors, binary classification)
>> [gmlvq_system,curves_single,param_set]=run_single(fvec,lbl,100)
prototypes and
relevance matrix
learning curves
and step sizes
68. St. Martin’s Institute, August 2015 68
example demo
>> load twoclass-difficult.mat
>> [gmlvq_system,curves_single,param_set]=run_single(fvec,lbl,100)
training set ROC visualization of the data set
69. St. Martin’s Institute, August 2015 69
example demo
avg. validation set ROCavg. prototypes and relevance matrix
>> [gmlvq_mean,roc_val,lcurves_mean,lcurves_std,param_set]=…
run_validation(fvec,lbl,50);
GMLVQ
…
learning curves, averages over 5 validation runs
with 10 % of examples left out for testing
70. St. Martin’s Institute, August 2015 70
a multi-class problem
visualization of 18-dim. data setavg. prototypes and rel. matrix
>> load uci-segmenation-sampled
>> [gmlvq_system, curves_single,param_set]=run_single(fvec,lbl,50)
71. St. Martin’s Institute, August 2015 71
Singularity control
Note:
singularity of relevance matrix can lead to numerical instabilities
and over-simplification effects
singularity control: penalty term
derivative
-> modified matrix update
(implemented in the no-nonsense gmlvq code collection)
72. St. Martin’s Institute, August 2015 72
Uniqueness
(I) uniqueness of Ω, given Λ
matrix square root is not unique
irrelevant rotations, reflections, symmetries….
canonical representation in terms of eigen-decomposition of Λ:
- pos. semi-definite
symmetric solution
(Matlab: “sqrtm”)
73. St. Martin’s Institute, August 2015 73
simple example:
contributions cancel exactly if
-> disregarded in the classification
of the training data
but naïve interpretation of diagonal
suggests high relevance, could cause
non-trivial effect for novel data
consider two identical, entirely
irrelevant features, e.g.
Uniqueness
(II) uniqueness of relevance matrix for given data set ?
74. St. Martin’s Institute, August 2015 74
(II) uniqueness
given transformation:
are in the null-space of
is possible if the rows of
→ identical mapping of examples, different for
possible to extend by prototypes
is singular if
features are correlated, dependent
Uniqueness
75. St. Martin’s Institute, August 2015 75
regularization
training process yields
determine with eigenvectors and eigenvalues
regularization:
(K<J ) retain the eigenspace corresponding to largest eigenvalues
removes also span of small non-zero eigenvalues
(K=J ) removes all null-space contributions, unique solution
with minimal Euclidean norm of row vectors
equivalent: (Moore-Penrose-Inverse X+ )
(implemented in the no-nonsense gmlvq code collection)
76. St. Martin’s Institute, August 2015 76
regularization
regularized mapping
after/during training
pre-processing of data
(PCA)
mapped feature space
fixed K
prototypes yet unknown
example: diagnosis of
rheumatoid arthritis
retains original features
flexible K
may include prototypes
example: Wine data set
Strickert, Hammer, Villmann, Biehl, IEEE SCCI 2013
Regularization and improved interpretation of linear data mappings
and adaptive distance measures
77. St. Martin’s Institute, August 2015 77
illustrative example
infra-red spectral data: 124 wine spamples
256 wavelengths 30 training data
94 test spectra
alcoholcontent
high
low
medium
GMLVQ classification
[UCI ML repository]
78. St. Martin’s Institute, August 2015 78
GMLVQ
best performance
7 dimensions remaining
over-fitting
effect
null-space correction
P=30 dimensions
79. St. Martin’s Institute, August 2015 79
original
regularized
regularization
- enhances generalization
- smoothens relevance profile/matrix
- removes ‘false relevances’
- improves interpretability of Λ
raw relevance matrix
posterior regularization
80. St. Martin’s Institute, August 2015
Early diagnosis of Rheumatoid Arthritis
Expression of chemokines CXCL4 and CXCL7 by synovial
macrophages defines an early stage of rheumatoid arthritis
Ann. of the Rheumatic Diseases, 2015 (available online)
L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow
C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner
81. St. Martin’s Institute, August 2015
uninflamed control established RA early inflammation
resolving early RA
cytokine based diagnosis of RA
at earliest possible stage ?
ultimate goals:
understand pathogenesis and
mechanism of progression
rheumatoid arthritis (RA)
83. St. Martin’s Institute, August 2015
GMLVQ analysis
pre-processing:
• log-transformed expression values (117 dim. data, 47 samples in total)
• 21 leading principal components explain ca. 90% of the total variation
Two two-class problems: (A) established RA vs. uninflamed controls
(B) early RA vs. resolving inflammation
• 1 prototype per class, global relevance matrix, distance measure:
• leave-one-out validation
evaluation in terms of Receiver Operating Characteristics
84. St. Martin’s Institute, August 2015
false positive rate
truepositiveratetruepositiverate
diagonal Λii vs. cytokine index i
established RA vs.
uninflamed control
early RA vs.
resolving inflammation
Matrix Relevance LVQ
diagonal relevancesleave-one-out
intialization
of LVQ system
85. St. Martin’s Institute, August 2015
CXCL4 chemokine (C-X-C motif) ligand 4
CXCL7 chemokine (C-X-C motif) ligand 7
direct study on protein level, staining / imaging of sinovial tissue:
macrophages : predominant source of CXCL4/7 expression
protein level studies
• high levels of CXCL4 and CXLC7
in the first 12 weeks of synovitis
in early RA
• expression on macrophages
outside of blood vessels
discriminates
early RA / resolving cases
(2 PhD thesis projects)
86. St. Martin’s Institute, August 2015
false positive rate
truepositiveratetruepositiverate
diagonal Λii vs. cytokine index i
established RA vs.
uninflamed control
early RA vs.
resolving inflammation
relevant cytokines
macrophage
stimulating 1
diagonal relevancesleave-one-out
87. St. Martin’s Institute, August 2015
four class problem
one prototype per class
and one global matrix
trained in one go
low-rank relevance
matrix (rank ≈ 2)
visualization of data
set in terms of
eigenvectors of Λ
Niels Kluiter
research internship
at JBI Groningen
88. St. Martin’s Institute, August 2015
four class problem
- extract binary classifiers (healthy vs. est. RA, resolving vs. early RA)
by restricting the system to the corresponding prototypes
for varying number K of PCs used as feature vectors
- determine corresponding ROC performances
robust in a range
of 14 < K < 20
healthy vs. est. RA
K=16: AUC = 0.92
early vs. resolving RA
K=16: AUC = 0.79
to do: nested L1O-val.
89. St. Martin’s Institute, August 2015
four class problem
read off problem-
specific relevances
from eigenvectors
of Λ
control
vs. est. RA
resolving
vs.earlyRA
91. St. Martin’s Institute, August 2015
challenges in bio-medical data
A. Filer, A. Clark, M. Juarez, J. Falconer et al.
- micro-array gene expression data
high-dimensional (~50000 probes)
PCA + GMLVQ
(work in progress)
early Arthritis vs. resolving inflammations
- preliminary result:
better than random classification
close inspection of high relevance genes:
system discriminates male/female patients
prediction reflects higher prevalence of RA in female patients
leave-one-out
“accuracy is not enough”
92. St. Martin’s Institute, August 2015 92
interpretability
- important: understand the basis of decisions
- white-box approaches for classification/regression etc.
- insights into the data and problem at hand
- e.g. selection of most discriminative bio-markers
challenges
relevance of steroid markers
wwwensat.org
adrenocortical tumors
adenomas (ACA)
carcinomas (ACC)
W. Arlt, M. Biehl et al.
Urine steroid metabolomics as a biomarker tool for detecting
malignancy in adrenal tumors J. of Clin. Endocrinology &
Metabolism 96: 3775-3784 (2011).
93. St. Martin’s Institute, August 2015 93
large amounts of data , e.g. image data bases
life lines (longitudinal patient data)
prescription data bases [E. Hak, K. Taxis]
challenges
A
B
C
D
query images
retrieval:
√ - same class
× - different classs
UMCG data base of skin lesion images
K. Bunte, M. Biehl, M.F. Jonkman, N. Petkov Learning Effective Color
Features for Content Based Image Retrieval in Dermatology. Pattern
Recognition 44 (2011) 1892-1902.
94. St. Martin’s Institute, August 2015 94
high-dimensional data, e.g. medical images (CT, MRI, PET …)
gene expression, DNA sequences, …
challenges
projection on first eigenvector of Λ
projectiononfirsteigenvectorofΛ
M. Biehl, K. Bunte, P. Schneider Analysis
of Flow Cytometry Data by Matrix
Relevance Learning Vector Quantization
PLOS One 8: e59401 (2013)
- low-dim. representation
- feature selection
- visualization
high-throughput flow cytometry
~ 10k cells x30 markers/sample
derive 186 features
GMLVQ, low-dim. projection
95. St. Martin’s Institute, August 2015 95
incomplete data
challenges
- missing values, noise, uncertain labels…
imputation, semi-supervised learning
- complementary data sets…
learning from privileged information, transfer learning
mixed data
- combination of different sources / technical platforms
suitable adaptive & integrative (dis-) similarity measures
E. Mwebaze, G. Bearda, M. Biehl, D. Zühlke Combining dissimilarity
measures for prototype-based classification Proc. of the 23rd European
Symposium on Artificial Neural Networks ESANN 2015, d-side publishing,
31-36 (2015)
96. St. Martin’s Institute, August 2015
distances combined
...
N-dim. vector M-bin histogram temporal sequence
Euclidean divergence (mis-)alignment
combined distance measure, e.g.
+source-specific prototypes
relevance learning!
E. Mwebaze, G. Bearda, M. Biehl, D. Zühlke Combining dissimilarity
measures for prototype-based classification Proc. of the 23rd European
Symposium on Artificial Neural Networks ESANN 2015, d-side publishing,
31-36 (2015)
97. St. Martin’s Institute, August 2015
challenges
imbalanced data sets
- prevalence of diseases (screening vs. differential diagnosis)
- role of false positive / false negatives
T. Villmann, M. Kaden, W. Herrmann, M. Biehl
Learning Vector Quantization for ROC-optimization
possible
working points
98. St. Martin’s Institute, August 2015
causal relations vs. correlation
challenges
- predictive power vs. causal dependence ?
www.causality.inf.ethz.ch/
data/LUCAS.html
E. Mwebaze, J. Quinn, M. Biehl
Causal Relevance Learning for Robust Classification under Interventions
Proc. 19th Europ. Symp. on Artificial Neural Networks ESANN 2011
99. St. Martin’s Institute, August 2015
challenges
data not given as vectors in a Euclidean space,
e.g. symbolic sequences of different length
known: pairwise dis-similarities, e.g. edit-distance
‘relational data’ given as matrix
loooooooongword
shrtwrd
pseudo-Euclidean embedding
prototypes expressed as
Non-vectorial data:
100. St. Martin’s Institute, August 2015
non-vectorial data
distances
Training: updates w.r.t. prototype coefficients, e.g. LVQ1-like or GLVQ
Working phase: WTA classification of novel data:
distance from known example data
distance from protoypes
[Hammer, Schleif, Zhu, 2011] [Hammer & Hasenfuss, 2010]
prototypes
101. St. Martin’s Institute, August 2015
CAIP contributions
Gert-Jan de Vries, Steffen Pauws and Michael Biehl.
Facial Expression Recognition using Learning Vector Quantization
Thomas Villmann, Marika Kaden, David Nebel and Michael Biehl.
Learning Vector Quantization with Adaptive Cost-based
Outlier-Rejection
102. St. Martin’s Institute, August 2015
a review article
For a recent review and further references see:
M. Biehl, B. Hammer, T. Villmann Distance measures for prototype
based classification
In: BrainComp, Proc. of the International Workshop on Brain-
Inspired Computing. Cetraro/Italy, July 2013
L. Grandinetti, T. Lippert, N. Petkov (editors)
Springer Lecture Notes in Computer Science Vol 8603
pp. 100-116 (2014)
check
www.cs.rug.nl/~biehl
for more references and application examples