This document provides a tutorial on fuzzy clustering techniques. It begins with definitions of clustering and fuzzy clustering. It then walks through examples of applying hard c-means and fuzzy c-means clustering algorithms to classify tiles and cancer cells. Hard c-means results in data points being strictly assigned to one cluster, while fuzzy c-means allows partial membership in multiple clusters. The document demonstrates how both algorithms can be used to find optimal cluster centers and classify new data points.
Here, in this paper we are introducing a dynamic clustering algorithm using fuzzy c-mean clustering algorithm. We will try to process several sets patterns together to find a common structure. The structure is finalized by interchanging prototypes of the given data and by moving the prototypes of the subsequent clusters toward each other. In regular FCM clustering algorithm, fixed numbers of clusters are chosen and those are pre-defined. If, in case, the number of chosen clusters is wrong, then the final result will degrade the purity of the cluster. In our proposed algorithm this drawback will be overcome by using dynamic clustering architecture. Here we will take fixed number of clusters in the beginning but on iterations the algorithm will increase the number of clusters automatically depending on the nature and type of data, which will increase the purity of the result at the end. A detailed clustering algorithm is developed on a basis of the standard FCM method and will be illustrated by means of numeric examples.
Here, in this paper we are introducing a dynamic clustering algorithm using fuzzy c-mean clustering algorithm. We will try to process several sets patterns together to find a common structure. The structure is finalized by interchanging prototypes of the given data and by moving the prototypes of the subsequent clusters toward each other. In regular FCM clustering algorithm, fixed numbers of clusters are chosen and those are pre-defined. If, in case, the number of chosen clusters is wrong, then the final result will degrade the purity of the cluster. In our proposed algorithm this drawback will be overcome by using dynamic clustering architecture. Here we will take fixed number of clusters in the beginning but on iterations the algorithm will increase the number of clusters automatically depending on the nature and type of data, which will increase the purity of the result at the end. A detailed clustering algorithm is developed on a basis of the standard FCM method and will be illustrated by means of numeric examples.
Presentation of my NSERC-USRA funded summer research project given at the Canadian Undergraduate Mathematics Conference (CUMC) 2014.
Please refer to the project site: http://jessebett.com/Radial-Basis-Function-USRA/
An image can be seen as a matrix I, where I(x, y) is the brightness of the pixel located at coordinates (x, y). In the Convolutional neural network, the kernel is nothing but a filter
that is used to extract the features from the images.
Self Organizing Maps: Fundamentals.
Introduction to Neural Networks.
1. What is a Self Organizing Map?
2. Topographic Maps
3. Setting up a Self Organizing Map
4. Kohonen Networks
5. Components of Self Organization
6. Overview of the SOM Algorithm
Data Augmentation and Disaggregation by Neal FultzData Con LA
Abstract:- Machine learning models may be very powerful, but many data sets are only released in aggregated form, precluding their use directly. Various heuristics can be used to bridge the gap, but they are typically domain-specific. The data augmentation algorithm, a classic tool from Bayesian computation, can be applied more generally. We will present a brief review of DA and how to apply it to disaggregation problems. We will also discuss a case study on disaggregating daily pricing data, along with a reference implementation R package.
Perimetric Complexity of Binary Digital ImagesRSARANYADEVI
Perimetric complexity is a measure of the complexity of binary pictures. It is defined as the sum of inside and outside perimeters of the foreground, squared, divided by the foreground area, divided by . Difficulties arise when this definition is applied to digital images composed of binary pixels. In this article we identify these problems and propose solutions. Perimetric complexity is often used as a measure of visual complexity, in which case it should take into account the limited resolution of the visual system. We propose a measure of visual perimetric complexity that meets this requirement.
Presentation of my NSERC-USRA funded summer research project given at the Canadian Undergraduate Mathematics Conference (CUMC) 2014.
Please refer to the project site: http://jessebett.com/Radial-Basis-Function-USRA/
An image can be seen as a matrix I, where I(x, y) is the brightness of the pixel located at coordinates (x, y). In the Convolutional neural network, the kernel is nothing but a filter
that is used to extract the features from the images.
Self Organizing Maps: Fundamentals.
Introduction to Neural Networks.
1. What is a Self Organizing Map?
2. Topographic Maps
3. Setting up a Self Organizing Map
4. Kohonen Networks
5. Components of Self Organization
6. Overview of the SOM Algorithm
Data Augmentation and Disaggregation by Neal FultzData Con LA
Abstract:- Machine learning models may be very powerful, but many data sets are only released in aggregated form, precluding their use directly. Various heuristics can be used to bridge the gap, but they are typically domain-specific. The data augmentation algorithm, a classic tool from Bayesian computation, can be applied more generally. We will present a brief review of DA and how to apply it to disaggregation problems. We will also discuss a case study on disaggregating daily pricing data, along with a reference implementation R package.
Perimetric Complexity of Binary Digital ImagesRSARANYADEVI
Perimetric complexity is a measure of the complexity of binary pictures. It is defined as the sum of inside and outside perimeters of the foreground, squared, divided by the foreground area, divided by . Difficulties arise when this definition is applied to digital images composed of binary pixels. In this article we identify these problems and propose solutions. Perimetric complexity is often used as a measure of visual complexity, in which case it should take into account the limited resolution of the visual system. We propose a measure of visual perimetric complexity that meets this requirement.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
ON OPTIMIZATION OF MANUFACTURING OF FIELD-EFFECT HETERO TRANSISTORS A THREE S...jedt_journal
In this paper we introduce an approach to increase density of field-effect hetero transistors framework a three-stage
amplifier circuit. At the same time one can obtain decreasing of dimensions of the above transistors. Dimensions of the elements will be decreased due to manufacture heterostructure with specific structure, doping of required areas of the hetero structure by diffusion or ion implantation and optimization of annealing of dopant and/or radiation defects.
Blind separation of complex-valued satellite-AIS data for marine surveillance...IJECEIAES
In this paper, the problem of the blind separation of complex-valued Satellite-AIS data for marine surveillance is addressed. Due to the specific properties of the sources under consideration: they are cyclo-stationary signals with two close cyclic frequencies, we opt for spatial quadratic time-frequency domain methods. The use of an additional diversity, the time delay, is aimed at making it possible to undo the mixing of signals at the multi-sensor receiver. The suggested method involves three main stages. First, the spatial generalized mean Ambiguity function of the observations across the array is constructed. Second, in the Ambiguity plane, Delay-Doppler regions of high magnitude are determined and Delay-Doppler points of peaky values are selected. Third, the mixing matrix is estimated from these Delay-Doppler regions using our proposed non-unitary joint zero-(block) diagonalization algorithms as to perform separation.
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Alexander Litvinenko
Just some ideas how low-rank matrices/tensors can be useful in spatial and environmental statistics, where one usually has to deal with very large data
ON INCREASING OF DENSITY OF ELEMENTS IN A MULTIVIBRATOR ON BIPOLAR TRANSISTORSijcsitcejournal
In this paper we consider an approach to increase density of elements of a multivibrator on bipolar transistors.
The considered approach based on manufacturing a heterostructure with necessity configuration,
doping by diffusion or ion implantation of required areas to manufacture the required type of conductivity
(p or n) in the areas and optimization of annealing of dopant and/or radiation defects to manufacture more
compact distributions of concentrations of dopants. We also introduce an analytical approach to prognosis
technological process.
On optimization ofON OPTIMIZATION OF DOPING OF A HETEROSTRUCTURE DURING MANUF...ijcsitcejournal
We introduce an approach of manufacturing of a p-i-n-heterodiodes. The approach based on using a δ-
doped heterostructure, doping by diffusion or ion implantation of several areas of the heterostructure. After
the doping the dopant and/or radiation defects have been annealed. We introduce an approach to optimize
annealing of the dopant and/or radiation defects. We determine several conditions to manufacture more
compact p-i-n-heterodiodes
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
Information-theoretic clustering with applicationsFrank Nielsen
Information-theoretic clustering with applications
Abstract: Clustering is a fundamental and key primitive to discover structural groups of homogeneous data in data sets, called clusters. The most famous clustering technique is the celebrated k-means clustering that seeks to minimize the sum of intra-cluster variances. k-Means is NP-hard as soon as the dimension and the number of clusters are both greater than 1. In the first part of the talk, we first present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means but also other kinds of clustering algorithms like the k-medoids, the k-medians, the k-centers, etc.
We extend the method to incorporate cluster size constraints and show how to choose the appropriate number of clusters using model selection. We then illustrate and refine the method on two case studies: 1D Bregman clustering and univariate statistical mixture learning maximizing the complete likelihood. In the second part of the talk, we introduce a generalization of k-means to cluster sets of histograms that has become an important ingredient of modern information processing due to the success of the bag-of-word modelling paradigm.
Clustering histograms can be performed using the celebrated k-means centroid-based algorithm. We consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of Jeffreys centroids. We prove that the Jeffreys centroid can be expressed analytically using the Lambert W function for positive histograms. We then show how to obtain a fast guaranteed approximation when dealing with frequency histograms and conclude with some remarks on the k-means histogram clustering.
References: - Optimal interval clustering: Application to Bregman clustering and statistical mixture learning IEEE ISIT 2014 (recent result poster) http://arxiv.org/abs/1403.2485
- Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms.
IEEE Signal Process. Lett. 20(7): 657-660 (2013) http://arxiv.org/abs/1303.7286
http://www.i.kyoto-u.ac.jp/informatics-seminar/
Computing f-Divergences and Distances of High-Dimensional Probability Density...Alexander Litvinenko
Poster presented on Stochastic Numerics and Statistical Learning: Theory and Applications Workshop in KAUST, Saudi Arabia.
The task considered here was the numerical computation of characterising statistics of
high-dimensional pdfs, as well as their divergences and distances,
where the pdf in the numerical implementation was assumed discretised on some regular grid.
Even for moderate dimension $d$, the full storage and computation with such objects become very quickly infeasible.
We have demonstrated that high-dimensional pdfs,
pcfs, and some functions of them
can be approximated and represented in a low-rank tensor data format.
Utilisation of low-rank tensor techniques helps to reduce the computational complexity
and the storage cost from exponential $\C{O}(n^d)$ to linear in the dimension $d$, e.g.
O(d n r^2) for the TT format. Here $n$ is the number of discretisation
points in one direction, r<n is the maximal tensor rank, and d the problem dimension.
The particular data format is rather unimportant,
any of the well-known tensor formats (CP, Tucker, hierarchical Tucker, tensor-train (TT)) can be used,
and we used the TT data format. Much of the presentation and in fact the central train
of discussion and thought is actually independent of the actual representation.
In the beginning it was motivated through three possible ways how one may
arrive at such a representation of the pdf. One was if the pdf was given in some approximate
analytical form, e.g. like a function tensor product of lower-dimensional pdfs with a
product measure, or from an analogous representation of the pcf and subsequent use of the
Fourier transform, or from a low-rank functional representation of a high-dimensional
RV, again via its pcf.
The theoretical underpinnings of the relation between pdfs and pcfs as well as their
properties were recalled in Section: Theory, as they are important to be preserved in the
discrete approximation. This also introduced the concepts of the convolution and of
the point-wise multiplication Hadamard algebra, concepts which become especially important if
one wants to characterise sums of independent RVs or mixture models,
a topic we did not touch on for the sake of brevity but which follows very naturally from
the developments here. Especially the Hadamard algebra is also
important for the algorithms to compute various point-wise functions in the sparse formats.
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
Talk presented on SIAM IS 2022 conference.
Very often, in the course of uncertainty quantification tasks or
data analysis, one has to deal with high-dimensional random variables (RVs)
(with values in $\Rd$). Just like any other RV,
a high-dimensional RV can be described by its probability density (\pdf) and/or
by the corresponding probability characteristic functions (\pcf),
or a more general representation as
a function of other, known, random variables.
Here the interest is mainly to compute characterisations like the entropy, the Kullback-Leibler, or more general
$f$-divergences. These are all computed from the \pdf, which is often not available directly,
and it is a computational challenge to even represent it in a numerically
feasible fashion in case the dimension $d$ is even moderately large. It
is an even stronger numerical challenge to then actually compute said characterisations
in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose
to approximate density by a low-rank tensor.
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
We develop fast and efficient stochastic methods for characterizing scattering
from objects of uncertain shapes. This is highly needed in the
fields of electromagnetics, optics, and photonics.
The continuation multilevel Monte Carlo (CMLMC) method is
used together with a surface integral equation solver. The
CMLMC method optimally balances statistical errors due to
sampling of the parametric space, and numerical errors due
to the discretization of the geometry using a hierarchy of
discretizations, from coarse to fine. The number of realizations
of finer discretizations can be kept low, with most samples
computed on coarser discretizations to minimize computational
work. Consequently, the total execution time is significantly
reduced, in comparison to the standard MC scheme.
Chemical dynamics and rare events in soft matter physicsBoris Fackovec
Talk for the Trinity Math Society Symposium. First summarises the approximations leading from Dirac equation to molecular description and then the synthesis towards non-equilibrium statistical mechanics. The relaxation approach to projection of a molecular system to a Markov jump process is discussed.
Building Your Employer Brand with Social MediaLuanWise
Presented at The Global HR Summit, 6th June 2024
In this keynote, Luan Wise will provide invaluable insights to elevate your employer brand on social media platforms including LinkedIn, Facebook, Instagram, X (formerly Twitter) and TikTok. You'll learn how compelling content can authentically showcase your company culture, values, and employee experiences to support your talent acquisition and retention objectives. Additionally, you'll understand the power of employee advocacy to amplify reach and engagement – helping to position your organization as an employer of choice in today's competitive talent landscape.
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesHolger Mueller
Holger Mueller of Constellation Research shares his key takeaways from SAP's Sapphire confernece, held in Orlando, June 3rd till 5th 2024, in the Orange Convention Center.
VAT Registration Outlined In UAE: Benefits and Requirementsuae taxgpt
Vat Registration is a legal obligation for businesses meeting the threshold requirement, helping companies avoid fines and ramifications. Contact now!
https://viralsocialtrends.com/vat-registration-outlined-in-uae/
An introduction to the cryptocurrency investment platform Binance Savings.Any kyc Account
Learn how to use Binance Savings to expand your bitcoin holdings. Discover how to maximize your earnings on one of the most reliable cryptocurrency exchange platforms, as well as how to earn interest on your cryptocurrency holdings and the various savings choices available.
"𝑩𝑬𝑮𝑼𝑵 𝑾𝑰𝑻𝑯 𝑻𝑱 𝑰𝑺 𝑯𝑨𝑳𝑭 𝑫𝑶𝑵𝑬"
𝐓𝐉 𝐂𝐨𝐦𝐬 (𝐓𝐉 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬) is a professional event agency that includes experts in the event-organizing market in Vietnam, Korea, and ASEAN countries. We provide unlimited types of events from Music concerts, Fan meetings, and Culture festivals to Corporate events, Internal company events, Golf tournaments, MICE events, and Exhibitions.
𝐓𝐉 𝐂𝐨𝐦𝐬 provides unlimited package services including such as Event organizing, Event planning, Event production, Manpower, PR marketing, Design 2D/3D, VIP protocols, Interpreter agency, etc.
Sports events - Golf competitions/billiards competitions/company sports events: dynamic and challenging
⭐ 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐝 𝐩𝐫𝐨𝐣𝐞𝐜𝐭𝐬:
➢ 2024 BAEKHYUN [Lonsdaleite] IN HO CHI MINH
➢ SUPER JUNIOR-L.S.S. THE SHOW : Th3ee Guys in HO CHI MINH
➢FreenBecky 1st Fan Meeting in Vietnam
➢CHILDREN ART EXHIBITION 2024: BEYOND BARRIERS
➢ WOW K-Music Festival 2023
➢ Winner [CROSS] Tour in HCM
➢ Super Show 9 in HCM with Super Junior
➢ HCMC - Gyeongsangbuk-do Culture and Tourism Festival
➢ Korean Vietnam Partnership - Fair with LG
➢ Korean President visits Samsung Electronics R&D Center
➢ Vietnam Food Expo with Lotte Wellfood
"𝐄𝐯𝐞𝐫𝐲 𝐞𝐯𝐞𝐧𝐭 𝐢𝐬 𝐚 𝐬𝐭𝐨𝐫𝐲, 𝐚 𝐬𝐩𝐞𝐜𝐢𝐚𝐥 𝐣𝐨𝐮𝐫𝐧𝐞𝐲. 𝐖𝐞 𝐚𝐥𝐰𝐚𝐲𝐬 𝐛𝐞𝐥𝐢𝐞𝐯𝐞 𝐭𝐡𝐚𝐭 𝐬𝐡𝐨𝐫𝐭𝐥𝐲 𝐲𝐨𝐮 𝐰𝐢𝐥𝐥 𝐛𝐞 𝐚 𝐩𝐚𝐫𝐭 𝐨𝐟 𝐨𝐮𝐫 𝐬𝐭𝐨𝐫𝐢𝐞𝐬."
Company Valuation webinar series - Tuesday, 4 June 2024FelixPerez547899
This session provided an update as to the latest valuation data in the UK and then delved into a discussion on the upcoming election and the impacts on valuation. We finished, as always with a Q&A
Improving profitability for small businessBen Wann
In this comprehensive presentation, we will explore strategies and practical tips for enhancing profitability in small businesses. Tailored to meet the unique challenges faced by small enterprises, this session covers various aspects that directly impact the bottom line. Attendees will learn how to optimize operational efficiency, manage expenses, and increase revenue through innovative marketing and customer engagement techniques.
[Note: This is a partial preview. To download this presentation, visit:
https://www.oeconsulting.com.sg/training-presentations]
Sustainability has become an increasingly critical topic as the world recognizes the need to protect our planet and its resources for future generations. Sustainability means meeting our current needs without compromising the ability of future generations to meet theirs. It involves long-term planning and consideration of the consequences of our actions. The goal is to create strategies that ensure the long-term viability of People, Planet, and Profit.
Leading companies such as Nike, Toyota, and Siemens are prioritizing sustainable innovation in their business models, setting an example for others to follow. In this Sustainability training presentation, you will learn key concepts, principles, and practices of sustainability applicable across industries. This training aims to create awareness and educate employees, senior executives, consultants, and other key stakeholders, including investors, policymakers, and supply chain partners, on the importance and implementation of sustainability.
LEARNING OBJECTIVES
1. Develop a comprehensive understanding of the fundamental principles and concepts that form the foundation of sustainability within corporate environments.
2. Explore the sustainability implementation model, focusing on effective measures and reporting strategies to track and communicate sustainability efforts.
3. Identify and define best practices and critical success factors essential for achieving sustainability goals within organizations.
CONTENTS
1. Introduction and Key Concepts of Sustainability
2. Principles and Practices of Sustainability
3. Measures and Reporting in Sustainability
4. Sustainability Implementation & Best Practices
To download the complete presentation, visit: https://www.oeconsulting.com.sg/training-presentations
The world of search engine optimization (SEO) is buzzing with discussions after Google confirmed that around 2,500 leaked internal documents related to its Search feature are indeed authentic. The revelation has sparked significant concerns within the SEO community. The leaked documents were initially reported by SEO experts Rand Fishkin and Mike King, igniting widespread analysis and discourse. For More Info:- https://news.arihantwebtech.com/search-disrupted-googles-leaked-documents-rock-the-seo-world/
Personal Brand Statement:
As an Army veteran dedicated to lifelong learning, I bring a disciplined, strategic mindset to my pursuits. I am constantly expanding my knowledge to innovate and lead effectively. My journey is driven by a commitment to excellence, and to make a meaningful impact in the world.
1. Jan Jantzen, DTU
1
Tutorial On Fuzzy Clustering
Jan Jantzen
Technical University of Denmark
jj@oersted.dtu.dk
Abstract
uProblem: To extract rules from data
uMethod: Fuzzy c-means
uResults: e.g., finding cancer cells
2. Jan Jantzen, DTU
2
Cluster (www.m-w.com)
uA number of similar individuals that
occur together as a: two or more
consecutive consonants or vowels in a
segment of speech b: a group of
houses (...) c: an aggregation of stars or
galaxies that appear close together in
the sky and are gravitationally
associated.
Cluster analysis (www.m-w.com)
uA statistical classification technique for
discovering whether the individuals of a
population fall into different groups by
making quantitative comparisons of
multiple characteristics.
3. Jan Jantzen, DTU
3
Vehicle Example
Vehicle Top speed
km/h
Colour Air
resistance
Weight
Kg
V1 220 red 0.30 1300
V2 230 black 0.32 1400
V3 260 red 0.29 1500
V4 140 gray 0.35 800
V5 155 blue 0.33 950
V6 130 white 0.40 600
V7 100 black 0.50 3000
V8 105 red 0.60 2500
V9 110 gray 0.55 3500
Vehicle Clusters
100 150 200 250 300
500
1000
1500
2000
2500
3000
3500
Top speed [km/h]
Weight[kg]
Sports cars
Medium market cars
Lorries
4. Jan Jantzen, DTU
4
Terminology
100 150 200 250 300
500
1000
1500
2000
2500
3000
3500
Top speed [km/h]
Weight[kg]
Sports cars
Medium market cars
Lorries
Object or data point
feature
feature space
cluster
feature
label
Example: Classify cracked tiles
5. Jan Jantzen, DTU
5
475Hz 557Hz Ok?
-----+-----+---
0.958 0.003 Yes
1.043 0.001 Yes
1.907 0.003 Yes
0.780 0.002 Yes
0.579 0.001 Yes
0.003 0.105 No
0.001 1.748 No
0.014 1.839 No
0.007 1.021 No
0.004 0.214 No
Table 1: frequency
intensities for ten
tiles.
Tiles are made from clay moulded into the right shape, brushed, glazed, and
baked. Unfortunately, the baking may produce invisible cracks. Operators can
detect the cracks by hitting the tiles with a hammer, and in an automated system
the response is recorded with a microphone, filtered, Fourier transformed, and
normalised. A small set of data is given in TABLE 1 (adapted from MIT, 1997).
Algorithm: hard c-means (HCM)
(also known as k means)
6. Jan Jantzen, DTU
6
Plot of tiles by frequencies (logarithms). The whole tiles (o) seem well
separated from the cracked tiles (*). The objective is to find the two
clusters.
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
1. Place two cluster centres (x) at random.
2. Assign each data point (* and o) to the nearest cluster centre (x)
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
7. Jan Jantzen, DTU
7
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
1. Compute the new centre of each class
2. Move the crosses (x)
Iteration 2
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
8. Jan Jantzen, DTU
8
Iteration 3
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
Iteration 4 (then stop, because no visible change)
Each data point belongs to the cluster defined by the nearest centre
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
9. Jan Jantzen, DTU
9
The membership matrix M:
1. The last five data points (rows) belong to the first cluster (column)
2. The first five data points (rows) belong to the second cluster (column)
M =
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
Membership matrix M
−≤−
=
otherwise
if
m jkik
ik
0
1
22
cucu
data point k cluster centre i
distance
cluster centre j
10. Jan Jantzen, DTU
10
c-partition
Kc
iallforUCØ
jiallforØCC
UC
i
ji
c
i
i
≤≤
⊂⊂
≠=∩
=
=
2
1
U
All clusters C
together fills the
whole universe U
Clusters do not
overlap
A cluster C is never
empty and it is
smaller than the
whole universe U
There must be at least 2
clusters in a c-partition and
at most as many as the
number of data points K
Objective function
∑ ∑∑
= ∈=
−==
c
i Ck
ik
c
i
i
ik
JJ
1
2
,1 u
cu
Minimise the total sum of
all distances
11. Jan Jantzen, DTU
11
Algorithm: fuzzy c-means (FCM)
Each data point belongs to two clusters to different degrees
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
12. Jan Jantzen, DTU
12
1. Place two cluster centres
2. Assign a fuzzy membership to each data point depending on
distance
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
1. Compute the new centre of each class
2. Move the crosses (x)
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
14. Jan Jantzen, DTU
14
Iteration 10
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
Iteration 13 (then stop, because no visible change)
Each data point belongs to the two clusters to a degree
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log(intensity)557Hz
Tiles data: o = whole tiles, * = cracked tiles, x = centres
15. Jan Jantzen, DTU
15
The membership matrix M:
1. The last five data points (rows) belong mostly to the first cluster (column)
2. The first five data points (rows) belong mostly to the second cluster (column)
M =
0.0025 0.9975
0.0091 0.9909
0.0129 0.9871
0.0001 0.9999
0.0107 0.9893
0.9393 0.0607
0.9638 0.0362
0.9574 0.0426
0.9906 0.0094
0.9807 0.0193
Fuzzy membership matrix M
( )
∑
=
−
=
c
j
q
jk
ik
ik
d
d
m
1
1/2
1
ikikd cu −=
Distance from point k to
current cluster centre i
Distance from point k to
other cluster centres j
Point k’s membership
of cluster i
Fuzziness
exponent
16. Jan Jantzen, DTU
16
Fuzzy membership matrix M
ikm ( )
( ) ( ) ( )
( )
( ) ( ) ( )1/21/2
2
1/2
1
1/2
1/21/2
2
1/2
1
1
1/2
111
1
1
1
−−−
−
−−−
=
−
+++
=
++
+
=
=
∑
q
ck
q
k
q
k
q
ik
q
ck
ik
q
k
ik
q
k
ik
c
j
q
jk
ik
ddd
d
d
d
d
d
d
d
d
d
L
L
Gravitation to
cluster i relative
to total gravitation
Electrical Analogy
R1 R2
i1 i2
U
I
I
i
i
UI
U
R
R
RRR
R
R
R
RRR
R
RIU
i
i
i
c
i
i
c
==
+++
=
+++
=
=
11
111
1
1
111
1
21
21
L
L Same form as
mik
17. Jan Jantzen, DTU
17
Fuzzy Membership
1 2 3 4 5
0
0.5
1
Cluster centres
Membershipoftestpoint
o is with q = 1.1, * is with q = 2
Data point
Fuzzy c-partition
Kc
iallforUCØ
jiallforØCC
UC
i
ji
c
i
i
≤≤
⊂⊂
≠=∩
=
=
2
1
U
All clusters C together fill the
whole universe U.
Remark: The sum of
memberships for a data point
is 1, and the total for all
points is K
Not valid: Clusters
do overlap
A cluster C is never
empty and it is
smaller than the
whole universe U
There must be at least 2
clusters in a c-partition and
at most as many as the
number of data points K
18. Jan Jantzen, DTU
18
Example: Classify cancer cells
Normal smear Severely dysplastic smear
Using a small brush, cotton stick, or wooden
stick, a specimen is taken from the uterincervix
and smeared onto a thin, rectangular glass plate,
a slide. The purpose of the smear screening is to
diagnose pre-malignant cell changes before they
progress to cancer. The smear is stained using
the Papanicolau method, hence the name Pap
smear. Different characteristics have different
colours, easy to distinguish in a microscope. A
cyto-technician performs the screening in a
microscope. It is time consuming and prone to
error, as each slide may contain up to 300.000
cells.
Dysplastic cells have undergone precancerous changes.
They generally have longer and darker nuclei, and they
have a tendency to cling together in large clusters. Mildly
dysplastic cels have enlarged and bright nuclei.
Moderately dysplastic cells have larger and darker
nuclei. Severely dysplastic cells have large, dark, and
often oddly shaped nuclei. The cytoplasm is dark, and it
is relatively small.
Possible Features
uNucleus and cytoplasm area
uNucleus and cyto brightness
uNucleus shortest and longest diameter
uCyto shortest and longest diameter
uNucleus and cyto perimeter
uNucleus and cyto no of maxima
u(...)
19. Jan Jantzen, DTU
19
Classes are nonseparable
Hard Classifier (HCM)
Ok light
moderate
severeOk
A cell is either one
or the other class
defined by a colour.
20. Jan Jantzen, DTU
20
Fuzzy Classifier (FCM)
Ok light
moderate
severeOk
A cell can belong to
several classes to a
Degree, i.e., one column
may have several colours.
Function approximation
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-1.5
-1
-0.5
0
0.5
1
1.5
Input
Output1
Curve fitting in a multi-dimensional space is also called function
approximation. Learning is equivalent to finding a function that best
fits the training data.
21. Jan Jantzen, DTU
21
Approximation by fuzzy sets
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-2
-1
0
1
2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
Procedure to find a model
1. Acquire data
2. Select structure
3. Find clusters, generate model
4. Validate model
22. Jan Jantzen, DTU
22
Conclusions
uCompared to neural networks, fuzzy
models can be interpreted by human
beings
uApplications: system identification,
adaptive systems
Links
u J. Jantzen: Neurofuzzy Modelling. Technical University of Denmark:
Oersted-DTU, Tech report no 98-H-874 (nfmod), 1998. URL
http://fuzzy.iau.dtu.dk/download/nfmod.pdf
u PapSmear tutorial. URL http://fuzzy.iau.dtu.dk/smear/
u U. Kaymak: Data Driven Fuzzy Modelling. PowerPoint, URL
http://fuzzy.iau.dtu.dk/tutor/ddfm.htm
23. Jan Jantzen, DTU
23
Exercise: fuzzy clustering (Matlab)
u Download and follow the instructions in this text file:
http://fuzzy.iau.dtu.dk/tutor/fcm/exerF5.txt
u The exercise requires Matlab (no special toolboxes
are required)