Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Speaker Diarization

2,452 views

Published on

PyCon KR 2017
대선 TV토론 쪼개기

Published in: Data & Analytics
  • Login to see the comments

Speaker Diarization

  1. 1. 대선 TV토론 쪼개기 Speaker Diarization Hongjoo LEE
  2. 2. Who am I? ● Machine Learning Engineer ○ Fraud Detection System ○ Software Defect Prediction ● Software Engineer ○ Email Services (40+ mil. users) ○ High traffic server (IPC, network, concurrent programming) ● MPhil, HKUST ○ Major : Software Engineering based on ML tech ○ Research interests : ML, NLP, IR
  3. 3. Outline ● Problem definition ○ What is speaker diarization? ● Feature Extraction ○ Featurizing audio signal ○ Time domain vs Frequency domain ○ Mel-Frequency Cepstral Coefficients (MFCC) ● Segmentation ○ Chromagraph - pitch count vectorization ○ MFCC - Gaussian Mixture Model & Bayesian Information Criteria ● Clustering ○ k-means ○ Hierarchical Agglomerative Clustering
  4. 4. What is Speaker Diarization The process of partitioning an input audio stream into homogeneous segments according to the speaker identity. Image credit : G. Friedland et al. , “Prosodic and other Long-Term Features for Speaker Diarization” , 2009 심상정문재인 안철수 심상정문재인
  5. 5. Time domain vs Frequency domain Image credit : http://www.cbcity.de/this-animation-will-tell-you-everything-about-the-connection-between-time-and-frequency-domain-of-a-signal
  6. 6. Time domain vs Frequency domain Image credit : http://www.eenewsanalog.com/content/signal-chain-basics-56-clock-jitter-demystified%E2%80%94random-jitter-and-phase-noise
  7. 7. Feature Extraction Feature Vectorsxd [t] xd [t+1] f(t) f(t+1)
  8. 8. Time domain to Frequency domain import librosa as rs # waveform y, sample_rate = rs.load(wavefilename) rs.display.waveplot(y,sample_rate) # short time fourier transform d = np.abs(rs.stft(y)) # spectrogram rs.display.specshow(rs.amplitude_to_db(d)) # Mel-scale spectrogram s = rs.feature.melspectrogram(S=d**2) rs.display.specshow(rs.amplitude_to_db(s))
  9. 9. Chromagram y, sr = librosa.load(librosa.util.example_audio_file(), offset=15, duration=5) chroma = librosa.feature.chroma_stft(y=y, sr=sr) librosa.display.specshow(chroma)
  10. 10. Mel-Frequency Cepstral Coefficients mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=10, n_fft=4096, hop_length=2048) librosa.display.specshow(mfccs) mfccs_scaled = sklearn.preprocessing.scale(mfccs, axis=1) librosa.display.specshow(mfccs_scaled)
  11. 11. Segmentation
  12. 12. Segmentation
  13. 13. Segmentation B Bb A Ab G Gb F E Eb D Db C 0.08 0. 0.16 0.18 0.16 0.24 0.08 0. 0. 0.04 0. 0.06 0. 0.02 0. 0.12 0.3 0.18 0.22 0.14 0.02 0. 0. 0. 0.02 0.08 0.14 0.16 0.14 0.2 0.08 0.04 0.04 0. 0.06 0.04 12 x N
  14. 14. Clustering ● k-means Image credit : https://www.edureka.co/blog/k-means-clustering/
  15. 15. Clustering ● k-means ○ predefined K : 2 ○ distance measure : Euclidean distance
  16. 16. Segmentation Boundary between two speakers
  17. 17. Gaussian Mixture Model xd [t] xd [t+1]d N xd [:t] xd [t+1:]
  18. 18. Bayesian Information Criterion ● BIC is Likelihood criterion penalized by model complexity ● Used for Model Selection; Which model fits better the data (Likelihood)
  19. 19. Bayesian Information Criterion Window can be modeled by one Gaussian or two Gaussian (M1 and M2 ) M1 M2
  20. 20. Bayesian Information Criterion ● For segmentation, (H0 ) modeling each segment using one model M is better or (H1 ) modeling them using two separate models M1 and M1 ?
  21. 21. Bayesian Information Criteria from numpy import cov, log from scipy.linalg import det N1 = window1.shape[0] S1 = np.cov(window1, rowvar=False) N2 = window2.shape[0] S2 = np.cov(window2, rowvar=False) window = np.vstack([window1, window2]) S = np.cov(window, rowvar=False) d = window.shape[1] P = 0.5 * (d + 0.5 * d * (d + 1)) * log(N1+N2) BIC_delta = (N1+N2)*log(det(S)) - N1*log(det(S1)) - N2*log(det(S2)) - lambda_c * P
  22. 22. Bayesian Information Criteria
  23. 23. Bayesian Information Criterion from sklearn.datasets import make_blobs X, y = make_blobs(n_samples=500, n_features=2, centers=4, shuffle=True)
  24. 24. Bayesian Information Criterion from sklearn.cluster import KMeans from sklearn.mixture import GaussianMixture range_n_clusters = [2,3,4,5,6] bics = [] for i, n_clusters in enumerate(range_n_clusters): clusterer = KMeans(n_clusters=n_clusters, random_state=0) cluster_labels = clusterer.fit_predict(X) gmm = GaussianMixture(n_components=n_clusters) gmm.fit(X) bics.append(gmm.bic(X)) ax.scatter(X[:,0], X[:,1], c=colors) ax.bar(range_n_clusters, [(bic - min(bics)) / (max(bics) - min(bics)) for bic in bics])
  25. 25. Bayesian Information Criterion
  26. 26. Hierarchical Clustering ● building a binary tree of the data ○ that successively merges similar groups (agglomerative method) ○ or splits dissimilar groups of points. (divisive method) ● visualizing the tree provides a useful summary of the data ○ “dendrogram” shows how clusters are merged ● Hierarchical clustering vs k-means ○ k-means ■ A predeclared number of cluster k ■ An initial guess of centroids ■ A distance measure between a pair of data points ○ Hierarchical clustering ■ Only requires a measure of similarity between a pair
  27. 27. Agglomerative Clustering ● Bottom-up approach of hierarchical clustering ○ treats each data points as a singleton cluster ○ iterate : merge pairs of closest cluster ○ until : all clusters have been merged into a single cluster (or distance measure exceeds a threshold)
  28. 28. Agglomerative Clustering In [1]: import numpy as np In [2]: X = np.array([[ 2.2, 3.1], [ 0. , 0.5], [ 1. , 0.8], [ 0.5, 0. ], [ 0.3, 1. ], [ 0.1, 2.5], [ 3.2, 1.3], [ 3. , 2. ], [ 2.7, 2. ], [ 1.5, 0.4], [ 0.5, 0.5]])
  29. 29. Agglomerative Clustering In [3]: from scipy.spatial.distance import pdist, squareform In [4]: dist_matrix = pdist(X) In [5]: squareform(dist_matrix) Out[6]: array([[ 0. , 3.41, 2.59, 3.54, 2.83, 2.18, 2.06, 1.36, 1.21, 2.79, 3.11], [ 3.41, 0. , 1.04, 0.71, 0.58, 2. , 3.3 , 3.35, 3.09, 1.5 , 0.5 ], [ 2.59, 1.04, 0. , 0.94, 0.73, 1.92, 2.26, 2.33, 2.08, 0.64, 0.58], [ 3.54, 0.71, 0.94, 0. , 1.02, 2.53, 3. , 3.2 , 2.97, 1.08, 0.5 ], [ 2.83, 0.58, 0.73, 1.02, 0. , 1.51, 2.92, 2.88, 2.6 , 1.34, 0.54], [ 2.18, 2. , 1.92, 2.53, 1.51, 0. , 3.32, 2.94, 2.65, 2.52, 2.04], [ 2.06, 3.3 , 2.26, 3. , 2.92, 3.32, 0. , 0.73, 0.86, 1.92, 2.82], [ 1.36, 3.35, 2.33, 3.2 , 2.88, 2.94, 0.73, 0. , 0.3 , 2.19, 2.92], [ 1.21, 3.09, 2.08, 2.97, 2.6 , 2.65, 0.86, 0.3 , 0. , 2. , 2.66], [ 2.79, 1.5 , 0.64, 1.08, 1.34, 2.52, 1.92, 2.19, 2. , 0. , 1. ], [ 3.11, 0.5 , 0.58, 0.5 , 0.54, 2.04, 2.82, 2.92, 2.66, 1. , 0. ]])
  30. 30. Agglomerative Clustering In [7]: from scipy.cluster.hierarchy import linkage In [8]: linkage(X) Out[9]: array([[ 7. , 8. , 0.3 , 2. ], [ 1. , 10. , 0.5 , 2. ], [ 3. , 12. , 0.5 , 3. ], [ 4. , 13. , 0.54, 4. ], [ 2. , 14. , 0.58, 5. ], [ 9. , 15. , 0.64, 6. ], [ 6. , 11. , 0.73, 3. ], [ 0. , 17. , 1.21, 4. ], [ 5. , 16. , 1.51, 7. ], [ 18. , 19. , 1.92, 11. ]])
  31. 31. Agglomerative Clustering In [10]: from scipy.cluster.hierarchy import dendrogram In [11]: dendrogram(linkage(X))
  32. 32. Hierarchical Clustering & Dendrogram ● suggests natural clusters ● outlier detection 2 clusters 3 clusters 4 clusters
  33. 33. Agglomerative Clustering ● Two segments can be modeled by one Gaussian or two Gaussian C1 C2 C3 C4 C5 C6
  34. 34. Agglomerative Clustering ● Two segments can be modeled by one Gaussian or two Gaussian C1 C2 C3 C4 C5 C6 ΔBIC(C2 ,C3 ) > ΔBIC(C2 ,C4 )
  35. 35. Agglomerative Clustering ● measure of similarity ○ distance metric : Bayesian Information Criteria ○ linkage method : merge and update ΔBIC table C1 C2 C3 C4 C5 C6 C7 :[C2 C4 ] C8 :[C7 C6 ] C9 :[C3 C5 ] C10 :[C1 C9 ] C11 :[C10 C8 ] ΔBIC(C9 ,C10 ) < 0 3 speakers (C1 ,C8 ,C9 )
  36. 36. References ● LibROSA ○ Brian McFee,”librosa: Audio and music signal analysis in python”, 2015 ○ http://librosa.github.io/librosa/ ● Nishant Shukla, “Machine Learning with TensorFlow, Manning ● Xavier Anguera, “Speaker Diarization: A Review of Recent Research”, 2012 ● S. S. Chen and P. S. Gopalakrishnam, “Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion”, 1998
  37. 37. Contacts lee.hongjoo@yandex.com linkedin.com/in/hongjoo-lee

×