This paper proposes an approach for indexing multimedia clips by speaker using audio segmentation, speaker recognition, and metadata indexing. Segmentation is done using Bayesian Information Criterion (BIC) and silence detection. Gaussian Mixture Models (GMMs) are trained for each speaker using Mel-Frequency Cepstral Coefficients (MFCC) as features. An ensemble of 3 GMMs is used to reduce errors from stochastic GMM training. Indexing utilizes sampled MFCC features from segments as metadata linked to speaker models. The system achieves a 20% true miss rate and 10% false alarm rate on segments 15-25 seconds, with performance decreasing for shorter segments.