Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano.
http://www.facebook.com/polimigamecollective
https://twitter.com/@POLIMIGC
http://www.youtube.com/PierLucaLanzi
http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering
Abstract—In evolutionary high-level synthesis, design solutions
have to be evaluated to extract information about some
figures of merit (such as performance, area, etc.) and to allow
the genetic algorithm to evolve and converge to Pareto-optimal
solutions. Since the execution time of such evaluations increases
with the complexity of the specification, this could lead to
unacceptable execution time of the overall methodology. This
paper presents a model to exploit fitness inheritance in a multiobjective
optimization algorithm (i.e. NSGA-II [1]) by substituting
the expensive real evaluations with an estimation based
on neighbors in an hypothetical design space. The estimations
are based on a measure of distance between individuals and
a weighted average on fitnesses of closer ones. The results
shows that the Pareto-optimal set obtained by applying the
proposed model good approximates the set obtained without
fitness inheritance and overall execution time is reduced more
than 25% in average.
Machine Learning and Data Mining: 02 Machine LearningPier Luca Lanzi
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. This lecture gives a very short introduction to the three main machine learning paradigms.
The term Machine Learning was coined by Arthur Samuel in 1959, an American pioneer in the field of computer gaming and artificial intelligence, and stated that “it gives computers the ability to learn without being explicitly programmed”. Machine Learning is the latest buzzword floating around. It deserves to, as it is one of the most interesting subfields of Computer Science. So what does Machine Learning really mean? Let’s try to understand Machine Learning
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...hpaocec
Geographic information systems (GIS) allow us to visualize data to better understand public health issues in our communities. Maps help recognize patterns for hypothesis generation; however, spatial analysis is necessary to substantiate relationships and produce meaningful outcomes. In this presentation we will discuss a few of the basic questions related to spatial analysis:
ODSC India 2018: Topological space creation & Clustering at BigData scaleKuldeep Jiwani
Every data has an inherent natural geometry associated with it. We are generally influenced by how the world visually appears to us and apply the same flat Euclidean geometry to data. The data geometry could be curved, may have holes, distances cannot be defined in all cases. But if we still impose Euclidean geometry on it, then we may be distorting the data space and also destroying the information content inside it.
In the space of BigData world we have to regularly handle TBs of data and extract meaningful information from it. We have to apply many Unsupervised Machine Learning techniques to extract such information from the data. Two important steps in this process is building a topological space that captures the natural geometry of the data and then clustering in that topological space to obtain meaningful clusters.
This talk will walk through "Data Geometry" discovery techniques, first analytically and then via applied Machine learning methods. So that the listeners can take back, hands on techniques of discovering the real geometry of the data. The attendees will be presented with various BigData techniques along with showcasing Apache Spark code on how to build data geometry over massive data lakes.
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
20180526@Taiwan AI Academy, Professional Managers Class.
Covering important concepts of classical machine learning, in preparation for deep learning topics to follow. Topics include regression (linear, polynomial, gaussian and sigmoid basis functions), dimension reduction (PCA, LDA, ISOMAP), clustering (K-means, GMM, Mean-Shift, DBSCAN, Spectral Clustering), classification (Naive Bayes, Logistic Regression, SVM, kNN, Decision Tree, Classifier Ensembles, Bagging, Boosting, Adaboost) and Semi-Supervised learning techniques. Emphasis on sampling, probability, curse of dimensionality, decision theory and classifier generalizability.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
6. Prof. Pier Luca Lanzi
Clustering algorithms group a collection of data points
into “clusters” according to some distance measure
Data points in the same cluster should have
a small distance from one another
Data points in different clusters should be at
a large distance from one another.
7. Prof. Pier Luca Lanzi
Clustering finds “natural” grouping/structure in un-labeled data
(Unsupervised Learning)
8. Prof. Pier Luca Lanzi
What is Cluster Analysis?
• A cluster is a collection of data objects
§ Similar to one another within the same cluster
§ Dissimilar to the objects in other clusters
• Cluster analysis
§ Given a set data points try to understand their structure
§ Finds similarities between data according to the characteristics
found in the data
§ Groups similar data objects into clusters
§ It is unsupervised learning since there is no predefined classes
• Typical applications
§ Stand-alone tool to get insight into data
§ Preprocessing step for other algorithms
8
9. Prof. Pier Luca Lanzi
Clustering Methods
• Hierarchical vs point assignment
• Numeric and/or symbolic data
• Deterministic vs. probabilistic
• Exclusive vs. overlapping
• Hierarchical vs. flat
• Top-down vs. bottom-up
9
10. Prof. Pier Luca Lanzi
Clustering Applications
• Marketing
§ Help marketers discover distinct groups in their customer bases,
and then use this knowledge to develop targeted marketing
programs
• Land use
§ Identification of areas of similar land use in an earth observation
database
• Insurance
§ Identifying groups of motor insurance policy holders with a high
average claim cost
• City-planning
§ Identifying groups of houses according to their house type, value,
and geographical location
• Earth-quake studies
§ Observed earth quake epicenters should be clustered along
continent faults
10
11. Prof. Pier Luca Lanzi
What Is Good Clustering?
• A good clustering consists of high quality clusters with
§ High intra-class similarity
§ Low inter-class similarity
• The quality of a clustering result depends on both the similarity
measure used by the method and its implementation
• The quality of a clustering method is also measured by its ability
to discover some or all of the hidden patterns
• Evaluation
§ Various measure of intra/inter cluster similarity
§ Manual inspection
§ Benchmarking on existing labels
11
12. Prof. Pier Luca Lanzi
Measure the Quality of Clustering
• Dissimilarity/Similarity metric: Similarity is expressed in terms of a
distance function, typically metric d(i, j)
• There is a separate “quality” function that measures the “goodness” of
a cluster
• The definitions of distance functions are usually very different for
interval-scaled, boolean, categorical, ordinal ratio, and vector variables
• Weights should be associated with different variables based on
applications and data semantics
• It is hard to define “similar enough” or “good enough” as the answer is
typically highly subjective
12
13. Prof. Pier Luca Lanzi
Data Structures
0
d(2,1) 0
d(3,1) d(3,2) 0
: : :
d(n,1) d(n,2) ... ... 0
!
#
#
#
#
#
#
$
%
Outlook
Temp
Humidity
Windy
Play
Sunny
Hot
High
False
No
Sunny
Hot
High
True
No
Overcast
Hot
High
False
Yes
…
…
…
…
…
x
11
... x
1f
... x
1p
... ... ... ... ...
x
i1
... x
if
... x
ip
... ... ... ... ...
x
n1
... x
nf
... x
np
!
#
#
#
#
#
#
#
#
$
%
Data Matrix
13
Dis/Similarity Matrix
14. Prof. Pier Luca Lanzi
Type of Data in Clustering Analysis
• Interval-scaled variables
• Binary variables
• Nominal, ordinal, and ratio variables
• Variables of mixed types
14
16. Prof. Pier Luca Lanzi
Distance Measures
• Given a space and a set of points on this space, a distance
measure d(x,y) maps two points x and y to a real number,
and satisfies three axioms
• d(x,y) ≥
0
• d(x,y) = 0 if and only x=y
• d(x,y) = d(y,x)
• d(x,y) ≤ d(x,z) + d(z,y)
16
17. Prof. Pier Luca Lanzi
Euclidean Distances 17
here are other distance measures that have been used for Euclidean
any constant r, we can define the Lr-norm to be the distance me
ed by:
d([x1, x2, . . . , xn], [y1, y2, . . . , yn]) = (
n
i=1
|xi − yi|r
)1/r
case r = 2 is the usual L2-norm just mentioned. Another common d
ure is the L1-norm, or Manhattan distance. There, the distance b
points is the sum of the magnitudes of the differences in each dim
called “Manhattan distance” because it is the distance one would
• Lr-norm
• Euclidean distance (r=2)
• Manhattan distance (r=1)
• L∞-norm
2 Euclidean Distances
most familiar distance measure is the one we normally think of as “dis-
e.” An n-dimensional Euclidean space is one where points are vectors of n
numbers. The conventional distance measure in this space, which we shall
to as the L2-norm, is defined:
d([x1, x2, . . . , xn], [y1, y2, . . . , yn]) =
n
i=1
(xi − yi)2
is, we square the distance in each dimension, sum the squares, and take
positive square root.
is easy to verify the first three requirements for a distance measure are
fied. The Euclidean distance between two points cannot be negative, be-
e the positive square root is intended. Since all squares of real numbers are
egative, any i such that xi ̸= yi forces the distance to be strictly positive.
he other hand, if xi = yi for all i, then the distance is clearly 0. Symmetry
ws because (xi − yi)2
= (yi − xi)2
. The triangle inequality requires a good
of algebra to verify. However, it is well understood to be a property of
18. Prof. Pier Luca Lanzi
Jaccard Distance
• Jaccard distance is defined as d(x,y) = 1 – SIM(x,y) where SIM is
the Jaccard similarity,
• Which can also be interpreted as the percentage of identical
attributes
18
19. Prof. Pier Luca Lanzi
Cosine Distance
• The cosine distance between x, y is the angle that the vectors to
those points make
• This angle will be in the range 0 to 180 degrees, regardless of
how many dimensions the space has.
• Example: given x = (1,2,-1) and y = (2,1,1) the angle between the
two vectors is 60
19
20. Prof. Pier Luca Lanzi
Edit Distance
• Used when the data points are strings
• The distance between a string x=x1x2…xn and y=y1y2…ym is the smallest
number of insertions and deletions of single characters that will transform x
into y
• Alternatively, the edit distance d(x, y) can be compute as the longest common
subsequence (LCS) of x and y and then,
d(x,y) = |x| + |y| - 2|LCS|
• Example: the edit distance between x=abcde and y=acfdeg is 3 (delete b,
insert f, insert g), the LCS is acde which is coherent with the previous result
20
21. Prof. Pier Luca Lanzi
Hamming Distance
• Hamming distance between two vectors is the number of
components in which they differ
• Or equivalently, given the number of variables n, and the number
m of matching components, we define
• Example: the Hamming distance between the vectors 10101 and
11110 is 3.
21
22. Prof. Pier Luca Lanzi
Ordinal Variables
• An ordinal variable can be discrete or continuous
• Order is important, e.g., rank
• It can be treated as an interval-scaled
§ replace xif with their rank
§ map the range of each variable onto [0, 1] by replacing
i-th object in the f-th variable by
§ compute the dissimilarity using methods for interval-scaled variables
22
23. Prof. Pier Luca Lanzi
Requirements of Clustering in Data Mining
• Scalability
• Ability to deal with different types of attributes
• Ability to handle dynamic data
• Discovery of clusters with arbitrary shape
• Minimal requirements for domain knowledge to determine input
parameters
• Able to deal with noise and outliers
• Insensitive to order of input records
• High dimensionality
• Incorporation of user-specified constraints
• Interpretability and usability
23
24. Prof. Pier Luca Lanzi
Curse of Dimensionality
in high dimensions, almost all pairs of points
are equally far away from one another
almost any two vectors are almost orthogonal