OPTICS: Ordering Points To
Identify the Clustering Structure
Presented By:
Rohit Paul
 Disadvantages of DBSCAN
 Requires two user inputs(Eps and MinPts)
 Unable to determine variable density cluster
 OPTICS:
 Able to get variable density cluster
 Mainly requires one inputs (i.e, MinPts)
 Eps can be considered as ‘infinite’
Idea
 Creates an augmented ordering of the database
representing its density-based clustering structure
 Help us gain a high level understanding of the way data
is structured
Observation
 For a constant MinPts value, density-based
clusters with higher density are completely
contained in density-connected sets with respect
to a lower density.
 Extend the DBSCAN algorithm such that several
distance parameters are processed at the same
time
OPTICS
 An infinite number of distance parameters eps’
which are smaller than a “generating distance”
eps (i.e. 0 <= eps’ <= eps).
 Order is stored in which the objects are
processed and the information which would be
used by an extended DBSCAN algorithm to
assign cluster memberships
 This information consists of only two values for
each object: the core-distance and a reachability-
distance
Terminology
 Core distance of an object p:
 The core-distance of an object p is simply the smallest
distance eps’ between p and an object in its e-
neighborhood such that p would be a core object with
respect to eps’ if this neighbor is contained in Ne(p).
 Reachability-distance object p w.r.t. object o:
 Reachability-distance of an object p with respect to
another object o is the smallest
distance such that p is directly
density-reachable from o if o is
a core object
Algorithm
 FOR i FROM 1 TO SetOfObjects.size DO
 IF NOT Object.Processed THEN
1. neighbors := SetOfObjects.neighbors(Object, e);
2. Object.Processed := TRUE;
3. Object.reachability_distance := UNDEFINED;
4. Object.setCoreDistance(neighbors, e, MinPts);
5. OrderedFile.write(Object);
6. IF Object.core_distance <> UNDEFINED THEN
 OrderSeeds.update(neighbors, Object);
 WHILE NOT orderSeeds.empty() DO
 Repeat Step 1, 2, 4, 5 and 6
 If reachability-distance of the current object
Object is larger than the clustering-distance eps’
 Object is not density-reachable from any of the objects
which are located before the current object in the
cluster-ordering.
 We look at the core-distance of Object and start a new
cluster if Object is a core object with respect to eps’
and MinPts; otherwise, Object is assigned to NOISE
 If reachability-distance of the current object is
smaller than eps’
 Can simply assign this object to the current cluster
because then it is density-reachable from a preceding
core object in the cluster-ordering.
Reachability plot insensitive to input
parameter
• The smaller the Eps
value, the more
objects have an
UNDEFINED
reachability-distance
• Lower values MinPts
reachability-plot looks
more jagged and
higher value
smoothen the curve.
Thank You

Optics

  • 1.
    OPTICS: Ordering PointsTo Identify the Clustering Structure Presented By: Rohit Paul
  • 2.
     Disadvantages ofDBSCAN  Requires two user inputs(Eps and MinPts)  Unable to determine variable density cluster  OPTICS:  Able to get variable density cluster  Mainly requires one inputs (i.e, MinPts)  Eps can be considered as ‘infinite’
  • 3.
    Idea  Creates anaugmented ordering of the database representing its density-based clustering structure  Help us gain a high level understanding of the way data is structured
  • 4.
    Observation  For aconstant MinPts value, density-based clusters with higher density are completely contained in density-connected sets with respect to a lower density.  Extend the DBSCAN algorithm such that several distance parameters are processed at the same time
  • 5.
    OPTICS  An infinitenumber of distance parameters eps’ which are smaller than a “generating distance” eps (i.e. 0 <= eps’ <= eps).  Order is stored in which the objects are processed and the information which would be used by an extended DBSCAN algorithm to assign cluster memberships  This information consists of only two values for each object: the core-distance and a reachability- distance
  • 6.
    Terminology  Core distanceof an object p:  The core-distance of an object p is simply the smallest distance eps’ between p and an object in its e- neighborhood such that p would be a core object with respect to eps’ if this neighbor is contained in Ne(p).  Reachability-distance object p w.r.t. object o:  Reachability-distance of an object p with respect to another object o is the smallest distance such that p is directly density-reachable from o if o is a core object
  • 7.
    Algorithm  FOR iFROM 1 TO SetOfObjects.size DO  IF NOT Object.Processed THEN 1. neighbors := SetOfObjects.neighbors(Object, e); 2. Object.Processed := TRUE; 3. Object.reachability_distance := UNDEFINED; 4. Object.setCoreDistance(neighbors, e, MinPts); 5. OrderedFile.write(Object); 6. IF Object.core_distance <> UNDEFINED THEN  OrderSeeds.update(neighbors, Object);  WHILE NOT orderSeeds.empty() DO  Repeat Step 1, 2, 4, 5 and 6
  • 8.
     If reachability-distanceof the current object Object is larger than the clustering-distance eps’  Object is not density-reachable from any of the objects which are located before the current object in the cluster-ordering.  We look at the core-distance of Object and start a new cluster if Object is a core object with respect to eps’ and MinPts; otherwise, Object is assigned to NOISE  If reachability-distance of the current object is smaller than eps’  Can simply assign this object to the current cluster because then it is density-reachable from a preceding core object in the cluster-ordering.
  • 9.
    Reachability plot insensitiveto input parameter • The smaller the Eps value, the more objects have an UNDEFINED reachability-distance • Lower values MinPts reachability-plot looks more jagged and higher value smoothen the curve.
  • 10.