An introduction to Torch; A Machine Learning Library in C++. Analysis and Implementation of CLARANS K-medoid Clustering Algorithm in the Java programming Language. Application of K-medoid to image Processing. By Adeyemi Fowe CPSC 7375 (Machine Learning) Spring 2008 Instructor Dr Mariofanna (Fani) Milanova Computer Science Department University of Arkansas at Little rock. Final Project Presentation
Torch : www.torch.ch Usage: Powerful & Fast High Learning Curve Made for Linux env. C++ OOP structure but C codes. Open Source plain txt .cc codes Features: Gradient Machines Support vector machines Ensemble models K-nearest-neighbors Distributions and Classifiers Speech recognition tools
Clustering (Unsupervised Learning) Different types of Clustering: Partitioning Algorithms: K-means, K-medoid. Hierarchical Clustering: Tree of clusters rather than disjoint. Density Based Clustering: Cluster based on region of concentration. Statistical Clustering: Statistical techniques like probability and test of hypothesis .
K-Means & K-medoid K-means clustering use the exact center of a cluster (means or the center of gravity) while K-medoid uses the most centrally located object in a cluster (medoid). K-medoid is less sensitive to outliers Compared to K-means. K value (number of clusters) has to be determined a-priori.
K-medoid Algorithms PAM (Partitioning Around Medoids) was developed by Kaufman and Rousseeuw (1990) Designed by Kaufman and Rousseeuw to handle large data sets, CLARA (Clustering LARge Applications) CLARANS: Clustering Large Applications based on Randomized Search. Raymond T. Ng and Jiawei Han(2002)
CLARANS Minimum Cost Search The diagram illustrates CLARANS algorithm which performs random search for Minimum cost over the entire data set. By changing swapping a medoid one at a time.
Java Implementation of CLARANS K-medoid Algorithm
To form a cluster (image classification). A medoid has to navigate within this 3-D space to find the closest set of pixels. This would make K-medoid take the pixel gray values into consideration wile clustering.
Spectra and Spatial Pattern Recognition Spectral pattern recognition refers to the set of spectral radiances measurements obtained in the various wavelength bands for each pixel. Spatial pattern recognition involves the categorization of image pixels on the basis of their spatial relationship with pixels surrounding them. The aim of this experiment is to delineate the behavior of the K-medoid clustering algorithm while varying this two criteria. We want to show that changing the weight w is a compromise of spectra spatial pattern of an image.
Spatial and Spectral Differences Cost of assigning node i to representative pixel j is given by: The weight w, serves has a measure of our preference for spatial or spectra pattern recognition. It’s a weight metric for the preference structure in MCDA. When w=0: Spatial pattern only. When w=1: Spectral pattern only. When 0<w<1: Both Spatial and Spectra pattern is considered; A typical MADA .
This clearly displays a Manhattan cluster for w=0; only spatial properties. This decision maker needs to consider the how the edges of the clusters Should be formed. This decision would Most likely be informed by the type of Information to be extracted.
Conclusion We implemented the more efficient CLARANS Algorithm for K-medoid using the Java programming language. We take advantage of our code and explore the differences in distance functions which could be part of the choice of a user. We showed that the choice of functions should depend on the expected edge-orientation of the clusters.
References  Chan, Y. (2001). Location Theory and Decision Analysis, ITP/South-Western  Chan, Y. Location, transport and land-use: Modeling spatial-temporal information. Heidelberg, Germany: Springer-Verlag.  Craig M. Wittenbrink, Glen Langdon, Jr. Gabriel Fernandez (1999), Feature Extraction of Clouds from GOES Satellite Data for Integrated Model Measurement Visualization, work paper  Raymond T. Ng, Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proceedings of the 20th VLDB Conference Santiago, Chile, 1994  Osmar R. Zaiane, Andrew Foss, Chi-Hoon Lee, and Weinan Wang, On Data Clustering Analysis: Scalability, Constraints and Validation, work paper  Gerald J. Dittberner (2001), NOAA’s GOES Satellite System – Status and Plans  Weather satellites teacher’s guide, Published by Environment Canada, ISBN Cat. No. En56-172/2001E-IN 0-662-31474-3  ArcView user’s manual  Websites: http://goes2.gsfc.nasa.gov http://www.osd.noaa.gov/sats/goes.htm http://rsd.gsfc.nasa.gov/goes/ http://gtielectronics.com Images: h ttp://images.ibsys.com/sh/images/weather/auto/2xat_ir_anim.gif http://ali.apple.com/space/space_images/9908212300Bret.jpg http://www.esri-ireland.ie/graphics/products/Image/ArcGIS_diag.jpg http://www.noaanews.noaa.gov/stories2006/images/goes-over-earth2.jpg http://www.slipperybrick.com/wp-content/uploads/2007/08/escape-key.jpg  Torch3vision Sebastien Marcel and Yann Rodriguez | http://torch3vision.idiap.ch/  R. Collobert, S. Bengio, and J. Mariéthoz. Torch: a modular machine learning software library . Technical Report IDIAP-RR 02-46, IDIAP, 2002  L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.