This document describes a clustering-based method for gene selection to classify samples in microarray data. It involves calculating the relevance of each gene to class labels and the redundancy between genes using mutual information. Genes are clustered based on their relevance, with the most relevant gene selected as the cluster representative. Min-hash clustering is then applied to reduce redundant genes and cluster size. The goal is to select a minimal set of non-redundant genes that can accurately classify samples by reducing noise from irrelevant genes.