SlideShare a Scribd company logo
Clustering Algorithm
COMPLEX NETWORK ALGORITHM
AMIR HADIFAR
1
Objectives
 At the end of this presentation you will understand :
 Understand data science and it’s application
 Get overview of Machine Learning
 Learn some type of clustering algorithm
 Implementation clustering with R
2
Data science and it’s Applications
 Extract knowledge or insight from data
 From speech-recognition and search engine to health-care and humanities
 These scenarios involves :
 Storing , organizing and integrating huge amount of unstructured data
 Processing and Analyzing data
 Extracting Knowledge , insight and predict future from data
 Processing , Analyzing , Extracting knowledge and insight done through Machine
Learning
3
Data science and it’s Applications
4
Machine Learning
 Field of study that gives computers the ability to learn without being explicitly
programmed
 Classified into three broad category :
 Supervised Learning
 Unsupervised Learning
 *Reinforcement Learning
5
Machine Learning Category
 Supervised learning
 Decision tree learning
 Classification
 …
 Unsupervised learning
 Clustering
 Association rule learning
 …
6
Cluster definition
 Cluster analysis or clustering grouping similar object together ( called cluster)
 Type of Clustering
 Intra-class similarity
 Inter-class similarity
7
Clustering Scenario
 The following scenarios implement clustering :
 Market segmentation
 Summarized news ( cluster and then find centroid )
 City planning
 Image segmentation
8
Methods of clustering
 Partitioning methods (Centroid models )
 Hierarchical methods (Connectivity models )
 Density-based methods
 Grid-based methods
 Model-based methods
 Constraint-based methods
9
Partitioning method
 database of ‘n’ objects and the partitioning method constructs ‘k’ partition of data
which satisfy following :
 Each group contains at least one object
 Each object must belong to exactly one group
 Points to remember
 This method create initial partitioning
 Use iterative relocation technique to improve partitioning
10
K-Mean or Lyold’s algorithm
11
Other K-mean variant
 K-mean++
 K-mean stream
 Mini batch k-mean
 K-medoids
 Fuzzy k-means
 Many others
12
K-mean Clustering with R
13
Hierarchical Clustering
 Agglomerative
 Bottom up
 Divisive
 Top down
14
Calculate distance between points
 Single linkage
 Complete linkage
 Average linkage
15
H Clustering with R
16
Density based Methods
 Areas of higher density consider as cluster
 Sparse areas usually consider as noise
 It use two basic idea
 Density reachable
 Density connectivity
17
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
18
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
19
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
 Advantage
 Does not require a-priori specification of number of clusters.
 Able to identify noise data while clustering.
 is able to find arbitrarily size and arbitrarily shaped clusters
 Disadvantage
 Fails in case of neck type of dataset.
 Does not work well in case of high dimensional data
20
Grid based algorithm
 Using multi-resolution grid data structure
 Clustering complexity depends on number of grid cell and not objects
 Space into finite number cells that form a grid structure on which all of the
operation for clustering is performed
 Clique , STING , WaveCluster
21
Clique ( CLustering-In-QUEst
 Clique is used for clustering high-dimensional data
 High dimensional data means have many attrs
 Clique identifies the dense unit in subspace
22
StackOverFlow Analysis Using R
23
StackOverFlow Analysis Using R
24
StackOverFlow Analysis Using R
25

More Related Content

What's hot

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 

What's hot (20)

Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Clustering
ClusteringClustering
Clustering
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
ML Basics
ML BasicsML Basics
ML Basics
 

Viewers also liked

NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and AlgorithmsNIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
zukun
 
Presentation ucb 2012
Presentation ucb 2012Presentation ucb 2012
Presentation ucb 2012
kranen
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Sunil Nair
 

Viewers also liked (20)

3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and AlgorithmsNIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
 
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
 
Presentation ucb 2012
Presentation ucb 2012Presentation ucb 2012
Presentation ucb 2012
 
Clustering data streams based on shared density between micro clusters
Clustering data streams based on shared density between micro clustersClustering data streams based on shared density between micro clusters
Clustering data streams based on shared density between micro clusters
 
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERSCLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Clustering
ClusteringClustering
Clustering
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
A survey on ant colony clustering papers
A survey on ant colony clustering papersA survey on ant colony clustering papers
A survey on ant colony clustering papers
 
Clusteryanam
ClusteryanamClusteryanam
Clusteryanam
 
I am an algorithm - workshop on understanding bias in coding
I am an algorithm - workshop on understanding bias in codingI am an algorithm - workshop on understanding bias in coding
I am an algorithm - workshop on understanding bias in coding
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
 
Humanrithm: why data without people is not enough
Humanrithm: why data without people is not enoughHumanrithm: why data without people is not enough
Humanrithm: why data without people is not enough
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
 

Similar to Introduction to Clustering algorithm

Similar to Introduction to Clustering algorithm (20)

Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Clustering
ClusteringClustering
Clustering
 
dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
My8clst
My8clstMy8clst
My8clst
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster Analysis
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
clustering ppt.pptx
clustering ppt.pptxclustering ppt.pptx
clustering ppt.pptx
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 

Recently uploaded

The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
plant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsplant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated crops
parmarsneha2
 

Recently uploaded (20)

MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
plant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsplant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated crops
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
NLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptxNLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 

Introduction to Clustering algorithm

  • 1. Clustering Algorithm COMPLEX NETWORK ALGORITHM AMIR HADIFAR 1
  • 2. Objectives  At the end of this presentation you will understand :  Understand data science and it’s application  Get overview of Machine Learning  Learn some type of clustering algorithm  Implementation clustering with R 2
  • 3. Data science and it’s Applications  Extract knowledge or insight from data  From speech-recognition and search engine to health-care and humanities  These scenarios involves :  Storing , organizing and integrating huge amount of unstructured data  Processing and Analyzing data  Extracting Knowledge , insight and predict future from data  Processing , Analyzing , Extracting knowledge and insight done through Machine Learning 3
  • 4. Data science and it’s Applications 4
  • 5. Machine Learning  Field of study that gives computers the ability to learn without being explicitly programmed  Classified into three broad category :  Supervised Learning  Unsupervised Learning  *Reinforcement Learning 5
  • 6. Machine Learning Category  Supervised learning  Decision tree learning  Classification  …  Unsupervised learning  Clustering  Association rule learning  … 6
  • 7. Cluster definition  Cluster analysis or clustering grouping similar object together ( called cluster)  Type of Clustering  Intra-class similarity  Inter-class similarity 7
  • 8. Clustering Scenario  The following scenarios implement clustering :  Market segmentation  Summarized news ( cluster and then find centroid )  City planning  Image segmentation 8
  • 9. Methods of clustering  Partitioning methods (Centroid models )  Hierarchical methods (Connectivity models )  Density-based methods  Grid-based methods  Model-based methods  Constraint-based methods 9
  • 10. Partitioning method  database of ‘n’ objects and the partitioning method constructs ‘k’ partition of data which satisfy following :  Each group contains at least one object  Each object must belong to exactly one group  Points to remember  This method create initial partitioning  Use iterative relocation technique to improve partitioning 10
  • 11. K-Mean or Lyold’s algorithm 11
  • 12. Other K-mean variant  K-mean++  K-mean stream  Mini batch k-mean  K-medoids  Fuzzy k-means  Many others 12
  • 14. Hierarchical Clustering  Agglomerative  Bottom up  Divisive  Top down 14
  • 15. Calculate distance between points  Single linkage  Complete linkage  Average linkage 15
  • 17. Density based Methods  Areas of higher density consider as cluster  Sparse areas usually consider as noise  It use two basic idea  Density reachable  Density connectivity 17
  • 18. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) 18
  • 19. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) 19
  • 20. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)  Advantage  Does not require a-priori specification of number of clusters.  Able to identify noise data while clustering.  is able to find arbitrarily size and arbitrarily shaped clusters  Disadvantage  Fails in case of neck type of dataset.  Does not work well in case of high dimensional data 20
  • 21. Grid based algorithm  Using multi-resolution grid data structure  Clustering complexity depends on number of grid cell and not objects  Space into finite number cells that form a grid structure on which all of the operation for clustering is performed  Clique , STING , WaveCluster 21
  • 22. Clique ( CLustering-In-QUEst  Clique is used for clustering high-dimensional data  High dimensional data means have many attrs  Clique identifies the dense unit in subspace 22

Editor's Notes

  1. Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured,[1][2] which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar toKnowledge Discovery in Databases (KDD). Data science employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, chemometrics, information science, and computer science, including signal processing, probability models, machine learning, statistical learning, data mining, database, data engineering, pattern recognition and learning, visualization, predictive analytics, uncertainty modeling, data warehousing, data compression, computer programming, artificial intelligence, and high performance computing. The development of machine learning has enhanced the growth and importance of data science Data science affects academic and applied research in many domains, including machine translation, speech recognition, robotics, search engines,digital economy, but also the biological sciences, medical informatics, health care, social sciences and the humanities. It heavily influences economics,business and finance. From the business perspective, data science is an integral part of competitive intelligence, a newly emerging field that encompasses a number of activities, such as data mining and data analysis.[3]
  2. Detection of fake book reviews (Amazon) and fake restaurant reviews (Zagat). A major car company exploring how deep learning can react to audio recordings from the engine to determine if maintenance is necessary, or if parts are nearing the need for replacement. Outdoor marketing company Route is using big data to define and justify its pricing model for advertising space on billboards, benches and the sides of busses. Traditionally, outdoor media pricing was priced “per impression” based on an estimate of how many eyes would see the ad in a given day. No more! Now they’re using sophisticated GPS, eye-tracking software, and analysis of traffic patterns to have a much more realistic idea of which advertisements will be seen the most — and therefore be the most effective.
  3. Alan Turing : "Can machines think?” Supervised learning : is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called thesupervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. Unsupervised learning : is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. Reinforcement learning : A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a car), without a teacher explicitly telling it whether it has come close to its goal. Another example is learning to play a game by playing against an opponent There are also exist other categories which categorized by output , … Between supervised and unsupervised learning is semi-supervised learning, where the teacher gives an incomplete training signal: a training set with some (often many) of the target outputs missing
  4. Supervised learning is the most common technique for training neural networks and decision trees. Differences between clustering and classification In general, in classification you have a set of predefined classes and want to know which class a new object belongs to. Clustering tries to group a set of objects and find whether there is some relationship between the objects.
  5. Intra calss : dissimilarity Inter class : similarity
  6. ( Find best place to Open Emergency-Care wards )
  7. Classification Clustering algorithms may be classified as listed below: Exclusive Clustering Overlapping Clustering Hierarchical Clustering Probabilistic Clustering some times use models for grouping : Connectivity models: for example, hierarchical clustering builds models based on distance connectivity. Centroid models: for example, the k-means algorithm represents each cluster by a single mean vector. Distribution models: clusters are modeled using statistical distributions, such as multivariate normal distributions used by the Expectation-maximization algorithm. Density models: for example, DBSCAN and OPTICS defines clusters as connected dense regions in the data space. Subspace models: in Biclustering (also known as Co-clustering or two-mode-clustering), clusters are modeled with both cluster members and relevant attributes. Group models: some algorithms do not provide a refined model for their results and just provide the grouping information. Graph-based models: a clique, that is, a subset of nodes in a graph such that every two nodes in the subset are connected by an edge can be considered as a prototypical form of cluster. Relaxations of the complete connectivity requirement (a fraction of the edges can be missing) are known as quasi-cliques, as in the HCS clustering algorithm. In recent years considerable effort has been put into improving the performance of existing algorithms. Among them are CLARANS (Ng and Han, 1994), and BIRCH (Zhang et al., 1996).With the recent need to process larger and larger data sets (also known as big data), the willingness to trade semantic meaning of the generated clusters for performance has been increasing. This led to the development of pre-clustering methods such as canopy clustering, which can process huge data sets efficiently, but the resulting "clusters" are merely a rough pre-partitioning of the data set to then analyze the partitions with existing slower methods such as k-means clustering. Various other approaches to clustering have been tried such as seed based clustering. For high-dimensional data, many of the existing methods fail due to the curse of dimensionality, which renders particular distance functions problematic in high-dimensional spaces. This led to newclustering algorithms for high-dimensional data that focus on subspace clustering (where only some attributes are used, and cluster models include the relevant attributes for the cluster) and correlation clustering that also looks for arbitrary rotated ("correlated") subspace clusters that can be modeled by giving a correlation of their attributes. Examples for such clustering algorithms are CLIQUE and SUBCLU. Ideas from density-based clustering methods (in particular the DBSCAN/OPTICS family of algorithms) have been adopted to subspace clustering (HiSC, hierarchical subspace clustering and DiSH) and correlation clustering (HiCO, hierarchical correlation clustering, 4C using "correlation connectivity" and ERiC exploring hierarchical density-based correlation clusters). Several different clustering systems based on mutual information have been proposed. One is Marina Meilă's variation of information metric; another provides hierarchical clustering. Using genetic algorithms, a wide range of different fit-functions can be optimized, including mutual information.[29] Also message passing algorithms, a recent development in Computer Science andStatistical Physics, has led to the creation of new types of clustering algorithms.[30]
  8. Points to remember : For a given number of partitions (say k), the partitioning method will create an initial partitioning. Then it uses the iterative relocation technique to improve the partitioning by moving objects from one group to other. K-means clustering can handle larger datasets than hierarchical cluster approaches.
  9. There are two package in R for this kind pam() , k-mean Selects K centroids (K rows chosen at random) Assigns each data point to its closest centroid Recalculates the centroids as the average of all data points in a cluster (i.e., the centroids are p-length mean vectors, where p is the number of variables) Assigns data points to their closest centroids Continues steps 3 and 4 until the observations are not reassigned or the maximum number of iterations (R uses 10 as a default) is reached.
  10. The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to reduce the computation time,
  11. Usually for small dataset ( 100 ) In R use hclust to Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. O(n^3) Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. O(2^n)
  12. Single LinkageIn single linkage hierarchical clustering, the distance between two clusters is defined as the shortest distance between two points in each cluster. For example, the distance between clusters “r” and “s” to the left is equal to the length of the arrow between their two closest points. Complete LinkageIn complete linkage hierarchical clustering, the distance between two clusters is defined as the longest distance between two points in each cluster. For example, the distance between clusters “r” and “s” to the left is equal to the length of the arrow between their two furthest points. Average LinkageIn average linkage hierarchical clustering, the distance between two clusters is defined as the average distance between each point in one cluster to every point in the other cluster. For example, the distance between clusters “r” and “s” to the left is equal to the average length each arrow between connecting the points of one cluster to the other.
  13. The idea is that if a particular point belongs to a cluster, it should be near to lots of other points in that cluster. It works like this: First we choose two parameters, a positive number epsilon and a natural number minPoints. We then begin by picking an arbitrary point in our dataset. If there are more than minPoints points within a distance of epsilon from that point, (including the original point itself), we consider all of them to be part of a "cluster". We then expand that cluster by checking all of the new points and seeing if they too have more than minPoints points within a distance of epsilon, growing the cluster recursively if so. Eventually, we run out of points to add to the cluster. We then pick a new arbitrary point and repeat the process. Now, it's entirely possible that a point we pick has fewer than minPoints points in its epsilon ball, and is also not a part of any other cluster. If that is the case, it's considered a "noise point" not belonging to any cluster.
  14. Advantage : reconginze noise Disadvanatage : cannot recongize cluster which are not dense ( OPTIC )
  15. OPTICS********************
  16. Clique – STING - WaveCluster
  17. Refrences : http://varianceexplained.org/r/introducing-stackr/
  18. Refrences : http://varianceexplained.org/r/introducing-stackr/