SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 404
DYNAMIC THRESHOLDING ON SPEECH SEGMENTATION
Md. Mijanur Rahman1
, Md. Al-Amin Bhuiyan2
1
Assistant Professor, Dept. of Computer Science & Engineering, Jatiya Kabi Kazi Nazrul Islam University, Bangladesh
2
Professor, Dept. of Computer Science & Engineering, Jahangirnagar University, Bangladesh
mijan_cse@yahoo.com, alamin_bhuiyan@yahoo.com
Abstract
Word is the preferred and natural unit of speech, because word units have well defined acoustic representation. This paper presents
several dynamic thresholding approaches for segmenting continuous Bangla speech sentences into words/sub-words. We have
proposed three efficient methods for speech segmentation: two of them are usually used in pattern classification (i.e., k-means and
FCM clustering) and one of them is used in image segmentation (i.e., Otsu’s thresholding method). We also used new approaches
blocking black area and boundary detection techniques to properly detect word boundaries in continuous speech and label the entire
speech sentence into a sequence of words/sub-words. K-Means and FCM clustering methods produce better segmentation results than
that of Otsu’s Method. All the algorithms and methods used in this research are implemented in MATLAB and the proposed system
achieved the average segmentation accuracy of 94% approximately.
Keywords: Blocking Black Area, Clustering, Dynamic Thresholding, Fuzzy Logic and Speech Segmentation.
----------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
Automated segmentation of speech has been a subject of study
for over 30 years [1] and it plays a central role in many speech
processing and ASR applications. It is important to various
automated speech processing algorithms: speech recognition,
speech corpus collection, speaker verification, etc., in the
research field of natural language processing [2,3]. The
traditional or manual segmentation approach is impractical for
very large data bases. It is extremely laborious and tedious.
This is why automated methods are widely utilized. Common
approach to speech signal zone identification is using a
threshold value. In many papers speech segmentation is done
by using wavelet [4], fuzzy methods [5], artificial neural
networks [6] and hidden Markov models [7]. This paper
proposed several pattern classification and image
segmentation methods for segmenting continuous Bangla
speech into words/sub-words. To do so, we have used the
modified version of k-means, fuzzy c-means and Otsu‟s
algorithms with blocking black area method and shape
identification.
The paper is organized as follows: Section 1 describes the
introduction of speech processing and the organization of this
paper. In Section 2, we will discuss about speech segmentation
and types of segmentation. Section 3 will describe pattern
clustering and different clustering algorithms. In Section 4,
Otsu‟s thresholding method will be discussed. The
methodological steps of the proposed system will be described
in Section 5. Sections 6 and 7 will describe the experimental
results and conclusion, respectively.
2. SPEECH SEGMENTATION
Speech recognition system requires segmentation of speech
signal into discrete, non-overlapping acoustic units [8][9]. It
can be the segment, phone, syllable, word, sentence or dialog
turn level. Word is the preferred and natural units of speech
because word units have well defined acoustic representation.
So, we have been chosen word as our basic unit. Automatic
speech segmentation methods can be classified in many ways,
but one very common classification is the division to blind and
aided segmentation algorithms [10]. A central difference
between aided and blind methods is in how much the
segmentation algorithm uses previously obtained data or
external knowledge to process the expected speech. The term
blind segmentation refers to methods where there is no pre-
existing or external knowledge regarding linguistic properties,
such as orthography or the full phonetic annotation, of the
signal to be segmented. On the other hand, Aided
segmentation algorithms use some sort of external linguistic
knowledge of the speech stream to segment it into
corresponding segments of the desired type. Due to the lack of
external information, the first phase of blind segmentation
relies entirely on the acoustical features present in the signal.
The second phase is usually built on a front-end processing of
the speech signal using MFCC, LP-coefficients, or pure FFT
spectrum [11]. In our research works, we have used FFT
spectrum (such as, spectrogram) features of speech signal.
3. CLUSTERING
Clustering is the process of assigning a set of objects into a set
of disjoint groups called clusters so that objects in each same
cluster are more similar to each other than objects from
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 405
different clusters. Clustering techniques are applied in many
application areas such as pattern recognition [12], data mining
[13], machine learning [14], etc. Clustering algorithms can be
broadly classified as Hard, Fuzzy, Possibilistic, and
Probabilistic [15], each of which has its own special
characteristics. There are a lot of applications of clustering
methods, range from unsupervised learning of neural network,
Pattern recognitions, Classification analysis, Artificial
intelligent, image processing, machine vision, etc. Several
clustering methods have been widely studied and successfully
applied in image segmentation [16–22].
In this paper, the hard and fuzzy clustering schemes have been
introduced to get the optimal threshold in speech
segmentation. The conventional hard clustering method
restricts each point of the data set to exclusively just one
cluster. So the membership values to be either 0 or 1. The K-
Means clustering algorithm is widely used as hard clustering
method. On the other hand, the fuzzy clustering method allows
an object to belong to several clusters at the same time, but
with the different degrees of membership between 0 and 1.
The fuzzy C-means algorithm is the most popular fuzzy
clustering method used in pattern classification.
3.1 K-Means Clustering Algorithm
K-Means Clustering is one of the unsupervised learning
algorithms, first used by James MacQueen in 1967 [23] and
first proposed by Stuart Lloyd in 1957 as a technique for
pulse-code modulation [24]. It is also referred to as Lloyd's
algorithm [25]. Here, the modified algorithm is used to
compute the threshold from speech spectrogram. The
algorithm classifies a given data set into a certain number
of clusters (i.e., k clusters) based on distance measures and to
define k-centers, one for each cluster and finally computes the
threshold. This algorithm is an iterative process until no new
moves of data, as shown in Figure-1. Finally,
this algorithm aims at minimizing an objective function, also
known as squared error function and given by:
  

k
i
c
j
j
j
i
i
vxJ
1 1
2)(
Where:
j
j
i vx )(
is the Euclidean distance between a data point (xi)
and the cluster center, (vj);
‘ci’ is the number of data points in ith
cluster; and
‘k’ is the number of cluster centers.
The algorithmic steps for calculating a threshold using k-
means clustering is given below:
Let x = {x1, x2, x3,…,xn} be the set of data values in speech
spectrogram.
1) Assign the number of clusters, k=3 and define the 3 cluster
centers.
Let v = {v1,v2,v3} be the set of 3 cluster centers.
2) Calculate the distance (dij) between ith
data and jth
cluster
center.
3) Assign each data to the closest cluster with minimum
distance.
4) Recalculate the new cluster center using:


ic
j
j
i
i x
c
v
1
1
where, ‘ci’ represents the number of data points in ith
cluster.
5) Recalculate the distance between each data and new
obtained cluster centers.
6) If no data was reassigned then stop, otherwise repeat from
step (3).
7) Calculate the desired threshold from the average of 3
cluster centers.
Figure1. K-Means Clustering Algorithm - Calculating the
desired threshold.
3.2 FCM Clustering Algorithm
The algorithm is developed by J C Bezdek in 1973 for pattern
classification [26], and first reported by J C Dunn in 1974
[27], later J C Bezdek used this algorithm for pattern
recognition [28]. FCM is a generalization of K-Means. While
K-Means assigns each object to one and only one cluster,
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 406
FCM allows clusters to be fuzzy sets, so that each object
belongs to all clusters with varying degrees of membership
and the following restriction: The sum of all membership
degrees for any given data point is equal to 1. It works by
assigning membership to each object on the basis of distance
between the cluster center and the object. More the object is
near to the cluster center more is its membership. The
algorithm is an iteration process, as shown in Figure-2. After
each iteration, membership and cluster centers are updated,
and finally compute a threshold from the cluster centroids.
The algorithmic step for calculating a threshold using fuzzy c-
means clustering is given below:
Let x = {x1, x2, x3, ..., xn} be the set of data values in speech
spectrogram.
1) Assign the number of cluster, c=3 and define the cluster
centers.
Let v = {v1, v2, v3, v3} be the set of 3 cluster centers.
2) Calculate the Euclidean distance (dij) between ith
data and
jth
cluster center.
3) Update the fuzzy membership function (µij) using:
  cjnidd
c
k
m
ikijij ,...,2,1;,...,2,1,/1
1
)1/(2
 


4) Compute the fuzzy centers 'vj' using:
    cjxv
n
i
m
ij
n
i
i
m
ijj ,...,3,2,1,
11












  
 wh
ere, m is the fuzziness index [1,∞].
5) Repeat from step (2) until the minimum 'J' value is
achieved or eUU kk
 )()1(
.
where:
k is the iteration step;
e is the termination criterion between [0, 1];
  cnijU *
 is the fuzzy membership matrix; and
J is the objective function, obtained by:
 

c
j
n
i
ij
m
ij dJ
1 1
)(
6) Calculate the desired threshold from the average of 3
cluster centroids.
Figure2. FCM Clustering Algorithm - Calculating the desired
threshold.
4. OTSU’S THRESHOLDING METHOD
Thresholding is the simplest method of image segmentation;
can be used to create binary image from gray scale image. In
order to convert the image into a binary representation, first
we converted the image into a gray scale representation and
then performed particular threshold analysis [29]. Otsu‟s
method is a simple and effective automatic thresholding
method, invented by Nobuyuki Otsu in 1979 [30], also known
as binarization algorithm. It is used to automatically perform
histogram shape-based image thresholding; i.e. the reduction
of a grayscale image into a binary image. The algorithm
assumes that the image is composed of two basic classes:
Foreground and Background [30]. It then computes an optimal
threshold value that minimizes the weighted within class
variance; also maximizes the between class variance of these
two classes. Otsu‟s threshold is used in many applications
from medical imaging to low level computer vision. Our main
aim is to use Otsu‟s threshold in speech segmentation. Figure-
3 shows the step-wise Otsu‟s algorithm.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 407
4.1 Mathematical Formulation
The mathematical formulation of Otsu‟s method for
computing the optimum threshold is given below:
Let P(i) represents the image histogram of speech
spectrogram. The two class probabilities w1(t) and w2(t) at
level t are computed by:


t
i
iPtw
1
1 )()( And 

I
ti
iPtw
1
2 )()(
The class means, μ1 (t) and μ2(t) are:


t
i tw
iiP
t
1 1
1
)(
)(
)( and 

I
ti tw
iiP
t
1 2
2
)(
)(
)(
Individual class variances:


t
i tw
iP
tit
1 1
2
1
2
1
)(
)(
)]([)(  And


I
ti tw
iP
tit
1 2
2
2
2
2
)(
)(
)]([)( 
The within class variance (σw) is defined as a weighted sum
of variances of the two classes and given by:
)()()()()( 2
22
2
11
2
ttwttwtw  
The between class variance (σb) is defined as a difference of
total variance and within class variance and given by:
)()()( 222
ttt wb   2
211 )]()()[()( tttwtw  
These two variances σw and σb are calculated for all possible
thresholds, t = 0… I (max. intensity). Otsu finds the best
threshold that minimizes the weighted within class variance
(σw), also maximizes the weighted between class variance
(σb). Finally, the pixel luminance less than or equal to
threshold is replaced by 0 (black) and greater than threshold is
replaced by 1 (white) to obtain the binary or B/W image.
Figure3. Otsu‟s Thresholding Algorithm – Converting binary
image.
5. THE PROPOSED APPROACH
We proposed several clustering methods (k-means and fuzzy c-
means algorithms) and Otsu’s thresholding methods, with
blocking black area and boundary detection techniques, in
continuous Bangla speech segmentation. The proposed
segmentation system is shown in Figure-4 and will discuss in
the following sub-sections.
Figure4. Proposed Speech Segmentation System
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 408
5.1 Speech Spectrogram
A spectrogram is a time-varying spectral representation of
signal. It shows how the spectral density of an audio signal
varies over time [31]. Spectrograms can be used to identify
spoken words phonetically. They are used extensively in the
development of the fields of music, sonar (sound navigation
and ranging), radar, and speech processing [32], etc.
The spectrogram is calculated by using the short-time Fourirer
Transform (STFT) of audio signal. The STFT of a signal
frame x[n] can be expressed by:
     




n
nj
emnnxmXmnxSTFT 
 ][][,,][
Where: ][nx = signal; ][n = window function; m is
discrete and  is continuous.
Thus, the spectrogram of the signal x[n] can be estimated by
computing the squares magnitude of the STFT of the signal
[33], i.e.,
   2
,,  nSTFTnmSpectrogra  .
We used MATLAB‟s „spectrogram‟ function that converts
speech signal into spectrogram image. The spectrogram of the
speech sentence („আমাদের জাতীয় কবি কাজী নজরুল ইসলাম‟) is
shown in Figure-5.
(a) Original Speech Signal
(b) Spectrogram Image
Figure5. Spectrogram of speech signal „আমাদের জাতীয় কবি
কাজী নজরুল ইসলাম‟.
5.2 Dynamic Thresholding
First, we used MATLAB‟s „rgb2gray‟ function to convert
spectrogram RGB image to gray scale image with pixel values
ranging from 0 to 255; where 0 represents a fully black pixel
and 255 represents white pixel. Then we used „im2bw‟
function with a threshold to convert gray scale image to binary
image. But the problem for us is how we choose our threshold.
In our research, we proposed three efficient approaches, such
as, K-Means, FCM and Otsu‟s thresholding algorithm to
compute the desired threshold, valued between 0-1.
We used MATLAB‟s 3-class „kmeans‟ function and 3-class
„fcm‟ function. Both functions return the centroids of three
clusters. The average of 1st
and 2nd
largest centroids is
calculated as the desired threshold value. We also used
MATLAB‟s „graythresh‟ function that uses Otsu's
thresholding method, returns a level or threshold value for
which the intra-class variance of the black and white pixels is
minimum. The output image replaces all pixels in the input
image with luminance greater than or equal to the threshold
with the value of 1 (fully white) and less than threshold with 0
(fully black). Figure-6 shows the thresholded spectrogram
image of the speech sentence („আমাদের জাতীয় কবি কাজী
নজরুল ইসলাম‟).
5.3 Blocking Black Area Method
Here, we used a new approach „Blocking Black Area‟ method
in the thresholded spectrogram image that produces
rectangular black boxes in the voiced regions of the speech
sentence, as shown in Figure-7. The method works as follows:
 Summing the column-wise intensity values of thresholded
spectrogram image.
 Find the image columns with fewer white pixels based on
summing value and replace all pixels on this column with
luminance 0 (black).
 Find the image columns with fewer black pixels based on
summing value and replace all pixels on this column with
luminance 1 (white).
(a) K-Means thresholded Image
(b) FCM thresholded Image
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 409
(c) Otsu‟s thresholded Image
Figure6. Thresholded Spectrogram Images of Speech signal
„আমাদের জাতীয় কবি কাজী নজরুল ইসলাম‟
(a) Thresholded Spectrogram Image from Speech Signal
(b) After applying Blocking Black Area Method
Figure7. Effect of applying Blocking Black Area Method –
Producing rectangle black boxes in voiced regions of speech
signal
5.4 Boundary Detection
We used MATLAB „regionprops‟ function to measure the
properties of each connected object in the binary image. We
also used different shape measurements (such as 'Area',
'BoundingBox', 'Centroid') to identify each rectangular object
that represents speech words/sub-words. The 'Extrema'
measurement, which is a vector of [top-left top-right right-top
right-bottom bottom-right bottom-left left-bottom left-top], is
used to detect the start (bottom-left) and end (bottom-right)
points of each rectangular object, as shown in Figure-8.
Figure8. Star and End point Detection of rectangular object.
5.5 Final Speech Segments
Each rectangular black box represents a speech word. After
detecting the start and end points of each black box, the word
boundaries of the original speech sentence are marked
automatically by these two points and finally we have cut the
word segments from the speech sentence. Figure-9 shows that
6 (six) black boxes represent 6 (six) word segments in the
speech sentence ‘আমাদের জাতীয় কবি কাজী নজরুল ইসলাম’.
Figure9. 6 (six) detected word segments in the speech
sentence „আমাদের জাতীয় কবি কাজী নজরুল ইসলাম‟.
6. EXPERIMENTAL RESULTS
The proposed system has been implemented in Windows
environment. All the methods and algorithms, discussed in
this paper, have been implemented in MATLAB 7.12.0
(R2011a) version. For the experiments there were compared 3
types of segmentation algorithms: k-means, fuzzy c-means
clustering and otsu‟s thresholding method. To evaluate the
performance of the proposed methods, different experiments
were carried out. In our experiments, various speech sentences
in Bangla language have been recorded, analyzed and
segmented by the proposed method. Table-1 shows the details
segmentation results for ten Bangla speech sentences and
reveals that the average segmentation accuracy rates are
94.11%; and 95.55% for k-means, 96.19% for FCM clustering
algorithms, and 90.58% for Otsu‟s method.
Table1. The details segmentation results
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 410
7. CONCLUSIONS AND FURTHER RESEARCH
We have proposed three efficient methods for continuous
Bangla speech segmentation: two of them are usually used in
pattern classification (i.e., k-means and fuzzy c-means
clustering) and one of them is used in image segmentation
(i.e., Otsu‟s thresholding method). We also used new
approaches blocking black area and boundary detection
methods to properly detect word boundaries. From the
experimental results it is concluded that K-Means and FCM
clustering methods produce better segmentation results than
that of Otsu‟s Method. It was observed that some of the words
were not properly segmented, this is due to different causes:
(i) the utterance of words differs depending on their position
in the sentence; (ii) the pauses between the words are not
identical in all causes; and (iii) the non-uniform articulation of
speech. Also, the speech signal is very much sensitive to the
properties of speaker and environments. Experimental results
showed that the proposed approach is effective in continuous
speech segmentation. The major goal of future research is to
search a mechanism that can be employed to enable pattern
discovery by learning. We have a continuous effort to develop
a continuous speech recognition system using neuro-fuzzy
system.
REFERENCES
[1] Okko Rasanen, “Speech Segmentation and Clustering
Methods for a New Speech Recognition
Architecture”, M.Sc Thesis, Dept. of Electrical and
Communication Engineering, Helsinki University of
Technology, Espoo, November 2007.
[2] Cherif A., Bouafif L., and Dabbabi T. “Pitch
Detection and Formant Analysis of Arabic Speech
processing”, Applied Acoustics, pp. 1129-1140, 2001.
[3] Sharma M., and Mammone R., “Subwordbased Text
Dependent Speaker Verification System with User-
selectable Password”, Proceedings of ICASSP, pp.
93-96, 1996.
[4] Y. Hioka and N. Hamada, “Voice activity detection
with array signal processing in the wavelet domain”,
IEICE Trans. Fund. Elec. Commun. Comput. Sci.,
vol. 86, no. 11, pp. 2802–2811, 2003.
[5] F. Beritelli and S. Casale, “Robust voiced/unvoiced
speech classification using fuzzy rules”, in IEEE
Worksh. Speech Cod. Telecommun. Proc., Pocono
Manor, pp. 5–6, USA, 1997.
[6] Y. Qi and B. R. Hunt, “Voiced-unvoiced-silence
classifications of speech using hybrid features and a
network classifier”, IEEE Trans. Speech Audio
Proces., vol. 1, no. 2, pp. 250–255, 1993.
[7] S. Basu, “A linked-HMM model for robust voicing
and speech detection”, Proceedings IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP‟03), Hong Kong, China,
Vol. 1, pp. 816–819, 2003.
[8] Thangarajan R, Natarajan A.M., “Syllable Based
Continuous Speech Recognition for Tamil”, In South
Asian Language Review VOL.XVIII. No.1, 2008.
[9] Kvale K. “Segmentation and Labeling of Speech”,
PhD Dissertation, The Norwegian Institute of
Technology, 1993.
[10] Md. Mijanur Rahman and Md. Al-Amin Bhuiyan,
“Continuous Bangla Speech Segmentation using
Short-term Speech Features Extraction Approaches”,
International Journal of Advanced Computer Science
and Applications (IJACSA), Vol. 3, No. 11, 2012.
[11] Sai Jayram A K V, Ramasubramanian V and
Sreenivas T V, “Robust parameters for automatic
segmentation of speech”, Proceedings IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP '02), Vol. 1, pp. 513-516,
2002.
[12] A. Webb, “Statistical Pattern Recognition”, New
Jersey, John Wiley & Sons, 2002.
[13] P. N. Tan, M. Steinbach, V. Kumar, “Introduction to
Data Mining”, Boston, Addison-Wesley, 2005.
[14] E. Alpaydin, “Introduction to Machine Learning”,
Cambridge, the MIT Press, 2004.
[15] R.J Hathway, and J.C. Bczdek, “Optimization of
Clustering Criteria by Reformulation”, IEEE
transactions on Fuzzy Systems, pp. 241-245, 1995.
[16] Fu, S.K. Mui, J.K. “A Survey on Image
Segmentation”, Pattern Recognition, Vol. 13, pp.3–
16, 1981.
[17] Haralick, R.M. Shapiro, L.G., “Image Segmentation
Techniques”, Computer Vision Graphics Image
Process. Vol. 29, pp.100–132, 1985.
[18] Pal, N. Pal, S., “A Review on Image Segmentation
Techniques”, Pattern Recognition, Vol. 26, pp.1277–
1294, 1993.
[19] Bezdek, J.C. Hall, L.O. Clarke, L.P. “Review of MR
Image Segmentation Techniques Using Pattern
Recognition”. Med. Phys., Vol. 20, pp.1033–1048,
1993.
[20] Kwon, M. J. Han, Y. J Shin, I.H. Park, H.W.
“Hierarchical Fuzzy Segmentation of Brain MR
Images”, International Journal of Imaging Systems
and Technology, Vol. 13, pp.115–125, 2003.
[21] Tolias, Y.A. Panas, S.M., “Image Segmentation by a
Fuzzy Clustering Algorithm Using Adaptive Spatially
Constrained Membership Functions”, IEEE Trans.
Systems, Man, Cybernet., Vol. 28, pp. 359–369, 1998.
[22] Pham, D. L. Prince, J. L., “Adaptive Fuzzy
Segmentation of Magnetic Resonance Images”, IEEE
Trans. Medical Imaging, Vol. 18, pp. 737–752, 1999.
[23] MacQueen, J. B., “Some Methods for classification
and Analysis of Multivariate Observations”,
Proceedings of 5th Berkeley Symposium on
Mathematical Statistics and Probability. University of
California Press. pp. 281–297, 1967.
[24] Lloyd, S. P., “Least square quantization in PCM”,
Bell Telephone Laboratories Paper, 1957. Published
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 411
in journal later: Lloyd, S. P., “Least squares
quantization in PCM”, IEEE Transactions on
Information Theory. 28 (2): 129–137, 1982.
[25] E.W. Forgy, “Cluster analysis of multivariate data:
efficiency versus interpretability of classifications”,
Biometrics 21: 768–769, 1965.
[26] J. C. Bezdek, “Fuzzy Mathematics in Pattern
Classification”, PhD Thesis, Cornell University,
Ithaca, NY, 1973.
[27] J. C. Dunn, “A Fuzzy Relative of the ISODATA
Process and its Use in Detecting Compact, Well
Separated Clusters”, J. Cyber., 3, 32-57, 1974.
[28] J. C. Bezdek, “Pattern Recognition with Fuzzy
Objective Function Algorithms”, Plenum, NY, 1981.
[29] Gonzalez, Rafael C. & Woods, Richard E,
“Thresholding”, In Digital Image Processing,
pp. 595–611. Pearson Education, 2002.
[30] Nobuyuki Otsu, “A threshold selection method from
gray-level histograms”, IEEE Trans. Sys., Man.,
Cyber. 9 (1): 62–66, 1979.
[31] Haykin, S., “Advances in Spectrum Analysis and
Array Processing”, Vol.1, Prentice Hall, 1991.
[32] JL Flanagan, “Speech Analysis, Synthesis and
Perception”, Springer- Verlag, New York, 1972.
[33] National Instruments, “STFT Spectrogram VI”:
http://zone.ni.com/reference/en-XX/help/371361E-
01/lvanls/stft_spectrogram_core/#details
BIOGRAPHIES
Mr. Md. Mijanur Rahman is working as
Assistant Professor of the Dept. of
Computer Science and Engineering at
Jatiya Kabi Kazi Nazrul Islam University,
Trishal, Mymensingh, Bangladesh. Mr.
Rahman obtained his B. Sc. (Hons) and M.
Sc degrees, both with first class first in
Computer Science and Engineering from Islamic University,
Kushtia, Bangladesh. Now he is continuing his PhD program
in the department of Computer Science and Engineering at
Jahangirnagar University, Savar, Dhaka, Bangladesh.
His teaching and research interest lies in the areas such as
Digital Speech Processing, Pattern Recognition, Neural
Networks, Fuzzy Logic, Artificial Intelligence, etc. He has got
many research articles published in both national and
international journals.
Dr. Md. Al-Amin Bhuiyan is serving as a
professor of the Dept. of Computer Science
and Engineering at Jahangirnagar
University, Savar, Dhaka, Bangladesh. Dr.
Bhuiyan obtained his B. Sc. (Hons) and M.
Sc degrees both with first class in Applied
Physics & Electronics from the University of Dhaka,
Bangladesh. He obtained his PhD degree in Information &
Communication Engineering from Osaka City University,
Japan.
His teaching and research interest lies in the areas such as
Image Processing, Computer Vision, Computer Graphics,
Pattern Recognition, Soft Computing, Artificial Intelligence,
Robotics, etc. He has got many articles published in both
national and international journals.

More Related Content

What's hot

3ways to improve semantic segmentation
3ways to improve semantic segmentation3ways to improve semantic segmentation
3ways to improve semantic segmentation
Frozen Paradise
 
Face Detection and Recognition using Back Propagation Neural Network (BPNN)
Face Detection and Recognition using Back Propagation Neural Network (BPNN)Face Detection and Recognition using Back Propagation Neural Network (BPNN)
Face Detection and Recognition using Back Propagation Neural Network (BPNN)
IRJET Journal
 
Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageMulti fractal analysis of human brain mr image
Multi fractal analysis of human brain mr image
eSAT Journals
 
Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageMulti fractal analysis of human brain mr image
Multi fractal analysis of human brain mr image
eSAT Publishing House
 
Ch35473477
Ch35473477Ch35473477
Ch35473477
IJERA Editor
 
Mixed Language Based Offline Handwritten Character Recognition Using First St...
Mixed Language Based Offline Handwritten Character Recognition Using First St...Mixed Language Based Offline Handwritten Character Recognition Using First St...
Mixed Language Based Offline Handwritten Character Recognition Using First St...
CSCJournals
 
Investigation on Using Fractal Geometry for Classification of Partial Dischar...
Investigation on Using Fractal Geometry for Classification of Partial Dischar...Investigation on Using Fractal Geometry for Classification of Partial Dischar...
Investigation on Using Fractal Geometry for Classification of Partial Dischar...
IOSR Journals
 
Ijarcet vol-2-issue-2-855-860
Ijarcet vol-2-issue-2-855-860Ijarcet vol-2-issue-2-855-860
Ijarcet vol-2-issue-2-855-860Editor IJARCET
 
IRJET- Optical Character Recognition using Neural Networks by Classification ...
IRJET- Optical Character Recognition using Neural Networks by Classification ...IRJET- Optical Character Recognition using Neural Networks by Classification ...
IRJET- Optical Character Recognition using Neural Networks by Classification ...
IRJET Journal
 
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...
ijiert bestjournal
 
Representation and recognition of handwirten digits using deformable templates
Representation and recognition of handwirten digits using deformable templatesRepresentation and recognition of handwirten digits using deformable templates
Representation and recognition of handwirten digits using deformable templates
Ahmed Abd-Elwasaa
 
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIERHANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
vineet raj
 
Neural Network Models on the Prediction of Tool Wear in Turning Processes: A ...
Neural Network Models on the Prediction of Tool Wear in Turning Processes: A ...Neural Network Models on the Prediction of Tool Wear in Turning Processes: A ...
Neural Network Models on the Prediction of Tool Wear in Turning Processes: A ...
Dept. Mechanical Engineering, Faculty of Engineering, Pharos University in Alexandria PUA
 
PSO optimized Feed Forward Neural Network for offline Signature Classification
PSO optimized Feed Forward Neural Network for offline Signature ClassificationPSO optimized Feed Forward Neural Network for offline Signature Classification
PSO optimized Feed Forward Neural Network for offline Signature Classification
IJERA Editor
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
Divya Gera
 
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
IRJET Journal
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
aavi241
 
A Parallel Framework For Multilayer Perceptron For Human Face Recognition
A Parallel Framework For Multilayer Perceptron For Human Face RecognitionA Parallel Framework For Multilayer Perceptron For Human Face Recognition
A Parallel Framework For Multilayer Perceptron For Human Face Recognition
CSCJournals
 
Sparse representation based classification of mr images of brain for alzheime...
Sparse representation based classification of mr images of brain for alzheime...Sparse representation based classification of mr images of brain for alzheime...
Sparse representation based classification of mr images of brain for alzheime...
eSAT Publishing House
 

What's hot (19)

3ways to improve semantic segmentation
3ways to improve semantic segmentation3ways to improve semantic segmentation
3ways to improve semantic segmentation
 
Face Detection and Recognition using Back Propagation Neural Network (BPNN)
Face Detection and Recognition using Back Propagation Neural Network (BPNN)Face Detection and Recognition using Back Propagation Neural Network (BPNN)
Face Detection and Recognition using Back Propagation Neural Network (BPNN)
 
Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageMulti fractal analysis of human brain mr image
Multi fractal analysis of human brain mr image
 
Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageMulti fractal analysis of human brain mr image
Multi fractal analysis of human brain mr image
 
Ch35473477
Ch35473477Ch35473477
Ch35473477
 
Mixed Language Based Offline Handwritten Character Recognition Using First St...
Mixed Language Based Offline Handwritten Character Recognition Using First St...Mixed Language Based Offline Handwritten Character Recognition Using First St...
Mixed Language Based Offline Handwritten Character Recognition Using First St...
 
Investigation on Using Fractal Geometry for Classification of Partial Dischar...
Investigation on Using Fractal Geometry for Classification of Partial Dischar...Investigation on Using Fractal Geometry for Classification of Partial Dischar...
Investigation on Using Fractal Geometry for Classification of Partial Dischar...
 
Ijarcet vol-2-issue-2-855-860
Ijarcet vol-2-issue-2-855-860Ijarcet vol-2-issue-2-855-860
Ijarcet vol-2-issue-2-855-860
 
IRJET- Optical Character Recognition using Neural Networks by Classification ...
IRJET- Optical Character Recognition using Neural Networks by Classification ...IRJET- Optical Character Recognition using Neural Networks by Classification ...
IRJET- Optical Character Recognition using Neural Networks by Classification ...
 
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...
 
Representation and recognition of handwirten digits using deformable templates
Representation and recognition of handwirten digits using deformable templatesRepresentation and recognition of handwirten digits using deformable templates
Representation and recognition of handwirten digits using deformable templates
 
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIERHANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIER
 
Neural Network Models on the Prediction of Tool Wear in Turning Processes: A ...
Neural Network Models on the Prediction of Tool Wear in Turning Processes: A ...Neural Network Models on the Prediction of Tool Wear in Turning Processes: A ...
Neural Network Models on the Prediction of Tool Wear in Turning Processes: A ...
 
PSO optimized Feed Forward Neural Network for offline Signature Classification
PSO optimized Feed Forward Neural Network for offline Signature ClassificationPSO optimized Feed Forward Neural Network for offline Signature Classification
PSO optimized Feed Forward Neural Network for offline Signature Classification
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
A Parallel Framework For Multilayer Perceptron For Human Face Recognition
A Parallel Framework For Multilayer Perceptron For Human Face RecognitionA Parallel Framework For Multilayer Perceptron For Human Face Recognition
A Parallel Framework For Multilayer Perceptron For Human Face Recognition
 
Sparse representation based classification of mr images of brain for alzheime...
Sparse representation based classification of mr images of brain for alzheime...Sparse representation based classification of mr images of brain for alzheime...
Sparse representation based classification of mr images of brain for alzheime...
 

Viewers also liked

Project presentation
Project presentationProject presentation
Project presentation
dipti Jain
 
A lightweight secure scheme for detecting provenance forgery and packet drop ...
A lightweight secure scheme for detecting provenance forgery and packet drop ...A lightweight secure scheme for detecting provenance forgery and packet drop ...
A lightweight secure scheme for detecting provenance forgery and packet drop ...
LeMeniz Infotech
 
Image segmentation
Image segmentation Image segmentation
Image segmentation
Tubur Borgoary
 
Clustering, k means algorithm
Clustering, k means algorithmClustering, k means algorithm
Clustering, k means algorithm
Junyoung Park
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSINGBRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
Dharshika Shreeganesh
 

Viewers also liked (8)

Project presentation
Project presentationProject presentation
Project presentation
 
A lightweight secure scheme for detecting provenance forgery and packet drop ...
A lightweight secure scheme for detecting provenance forgery and packet drop ...A lightweight secure scheme for detecting provenance forgery and packet drop ...
A lightweight secure scheme for detecting provenance forgery and packet drop ...
 
Image segmentation
Image segmentation Image segmentation
Image segmentation
 
Clustering, k means algorithm
Clustering, k means algorithmClustering, k means algorithm
Clustering, k means algorithm
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSINGBRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
 

Similar to Dynamic thresholding on speech segmentation

Optical character recognition an encompassing review
Optical character recognition   an encompassing reviewOptical character recognition   an encompassing review
Optical character recognition an encompassing review
eSAT Journals
 
Cc4301455457
Cc4301455457Cc4301455457
Cc4301455457
IJERA Editor
 
Indian sign language recognition system
Indian sign language recognition systemIndian sign language recognition system
Indian sign language recognition system
IRJET Journal
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing with
ijcsa
 
Enhancing the performance of cluster based text summarization using support v...
Enhancing the performance of cluster based text summarization using support v...Enhancing the performance of cluster based text summarization using support v...
Enhancing the performance of cluster based text summarization using support v...
eSAT Journals
 
Review paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extractionReview paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extraction
eSAT Journals
 
Optimized block size based video coding algorithm
Optimized block size based video coding algorithmOptimized block size based video coding algorithm
Optimized block size based video coding algorithm
eSAT Publishing House
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
eSAT Publishing House
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
eSAT Journals
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
eSAT Publishing House
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
eSAT Journals
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
IRJET Journal
 
3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain
eSAT Journals
 
3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain
eSAT Publishing House
 
3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain
eSAT Publishing House
 
Object extraction using edge, motion and saliency information from videos
Object extraction using edge, motion and saliency information from videosObject extraction using edge, motion and saliency information from videos
Object extraction using edge, motion and saliency information from videos
eSAT Journals
 
Performance comparison of blind adaptive multiuser detection algorithms
Performance comparison of blind adaptive multiuser detection algorithmsPerformance comparison of blind adaptive multiuser detection algorithms
Performance comparison of blind adaptive multiuser detection algorithms
eSAT Journals
 
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
IRJET Journal
 
Knowledge extraction from numerical data an abc
Knowledge extraction from numerical data an abcKnowledge extraction from numerical data an abc
Knowledge extraction from numerical data an abcIAEME Publication
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
VLSICS Design
 

Similar to Dynamic thresholding on speech segmentation (20)

Optical character recognition an encompassing review
Optical character recognition   an encompassing reviewOptical character recognition   an encompassing review
Optical character recognition an encompassing review
 
Cc4301455457
Cc4301455457Cc4301455457
Cc4301455457
 
Indian sign language recognition system
Indian sign language recognition systemIndian sign language recognition system
Indian sign language recognition system
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing with
 
Enhancing the performance of cluster based text summarization using support v...
Enhancing the performance of cluster based text summarization using support v...Enhancing the performance of cluster based text summarization using support v...
Enhancing the performance of cluster based text summarization using support v...
 
Review paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extractionReview paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extraction
 
Optimized block size based video coding algorithm
Optimized block size based video coding algorithmOptimized block size based video coding algorithm
Optimized block size based video coding algorithm
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 
3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain
 
3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain
 
3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain
 
Object extraction using edge, motion and saliency information from videos
Object extraction using edge, motion and saliency information from videosObject extraction using edge, motion and saliency information from videos
Object extraction using edge, motion and saliency information from videos
 
Performance comparison of blind adaptive multiuser detection algorithms
Performance comparison of blind adaptive multiuser detection algorithmsPerformance comparison of blind adaptive multiuser detection algorithms
Performance comparison of blind adaptive multiuser detection algorithms
 
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
 
Knowledge extraction from numerical data an abc
Knowledge extraction from numerical data an abcKnowledge extraction from numerical data an abc
Knowledge extraction from numerical data an abc
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
 

More from eSAT Journals

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavements
eSAT Journals
 
Material management in construction – a case study
Material management in construction – a case studyMaterial management in construction – a case study
Material management in construction – a case study
eSAT Journals
 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case study
eSAT Journals
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangalore
eSAT Journals
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
eSAT Journals
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...
eSAT Journals
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
eSAT Journals
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizer
eSAT Journals
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
eSAT Journals
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
eSAT Journals
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concrete
eSAT Journals
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...
eSAT Journals
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
eSAT Journals
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabs
eSAT Journals
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in india
eSAT Journals
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...
eSAT Journals
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
eSAT Journals
 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniquesEstimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniques
eSAT Journals
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
eSAT Journals
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...
eSAT Journals
 

More from eSAT Journals (20)

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavements
 
Material management in construction – a case study
Material management in construction – a case studyMaterial management in construction – a case study
Material management in construction – a case study
 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case study
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangalore
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizer
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concrete
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabs
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in india
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniquesEstimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniques
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...
 

Recently uploaded

CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 

Recently uploaded (20)

CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 

Dynamic thresholding on speech segmentation

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 404 DYNAMIC THRESHOLDING ON SPEECH SEGMENTATION Md. Mijanur Rahman1 , Md. Al-Amin Bhuiyan2 1 Assistant Professor, Dept. of Computer Science & Engineering, Jatiya Kabi Kazi Nazrul Islam University, Bangladesh 2 Professor, Dept. of Computer Science & Engineering, Jahangirnagar University, Bangladesh mijan_cse@yahoo.com, alamin_bhuiyan@yahoo.com Abstract Word is the preferred and natural unit of speech, because word units have well defined acoustic representation. This paper presents several dynamic thresholding approaches for segmenting continuous Bangla speech sentences into words/sub-words. We have proposed three efficient methods for speech segmentation: two of them are usually used in pattern classification (i.e., k-means and FCM clustering) and one of them is used in image segmentation (i.e., Otsu’s thresholding method). We also used new approaches blocking black area and boundary detection techniques to properly detect word boundaries in continuous speech and label the entire speech sentence into a sequence of words/sub-words. K-Means and FCM clustering methods produce better segmentation results than that of Otsu’s Method. All the algorithms and methods used in this research are implemented in MATLAB and the proposed system achieved the average segmentation accuracy of 94% approximately. Keywords: Blocking Black Area, Clustering, Dynamic Thresholding, Fuzzy Logic and Speech Segmentation. ----------------------------------------------------------------------***---------------------------------------------------------------------- 1. INTRODUCTION Automated segmentation of speech has been a subject of study for over 30 years [1] and it plays a central role in many speech processing and ASR applications. It is important to various automated speech processing algorithms: speech recognition, speech corpus collection, speaker verification, etc., in the research field of natural language processing [2,3]. The traditional or manual segmentation approach is impractical for very large data bases. It is extremely laborious and tedious. This is why automated methods are widely utilized. Common approach to speech signal zone identification is using a threshold value. In many papers speech segmentation is done by using wavelet [4], fuzzy methods [5], artificial neural networks [6] and hidden Markov models [7]. This paper proposed several pattern classification and image segmentation methods for segmenting continuous Bangla speech into words/sub-words. To do so, we have used the modified version of k-means, fuzzy c-means and Otsu‟s algorithms with blocking black area method and shape identification. The paper is organized as follows: Section 1 describes the introduction of speech processing and the organization of this paper. In Section 2, we will discuss about speech segmentation and types of segmentation. Section 3 will describe pattern clustering and different clustering algorithms. In Section 4, Otsu‟s thresholding method will be discussed. The methodological steps of the proposed system will be described in Section 5. Sections 6 and 7 will describe the experimental results and conclusion, respectively. 2. SPEECH SEGMENTATION Speech recognition system requires segmentation of speech signal into discrete, non-overlapping acoustic units [8][9]. It can be the segment, phone, syllable, word, sentence or dialog turn level. Word is the preferred and natural units of speech because word units have well defined acoustic representation. So, we have been chosen word as our basic unit. Automatic speech segmentation methods can be classified in many ways, but one very common classification is the division to blind and aided segmentation algorithms [10]. A central difference between aided and blind methods is in how much the segmentation algorithm uses previously obtained data or external knowledge to process the expected speech. The term blind segmentation refers to methods where there is no pre- existing or external knowledge regarding linguistic properties, such as orthography or the full phonetic annotation, of the signal to be segmented. On the other hand, Aided segmentation algorithms use some sort of external linguistic knowledge of the speech stream to segment it into corresponding segments of the desired type. Due to the lack of external information, the first phase of blind segmentation relies entirely on the acoustical features present in the signal. The second phase is usually built on a front-end processing of the speech signal using MFCC, LP-coefficients, or pure FFT spectrum [11]. In our research works, we have used FFT spectrum (such as, spectrogram) features of speech signal. 3. CLUSTERING Clustering is the process of assigning a set of objects into a set of disjoint groups called clusters so that objects in each same cluster are more similar to each other than objects from
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 405 different clusters. Clustering techniques are applied in many application areas such as pattern recognition [12], data mining [13], machine learning [14], etc. Clustering algorithms can be broadly classified as Hard, Fuzzy, Possibilistic, and Probabilistic [15], each of which has its own special characteristics. There are a lot of applications of clustering methods, range from unsupervised learning of neural network, Pattern recognitions, Classification analysis, Artificial intelligent, image processing, machine vision, etc. Several clustering methods have been widely studied and successfully applied in image segmentation [16–22]. In this paper, the hard and fuzzy clustering schemes have been introduced to get the optimal threshold in speech segmentation. The conventional hard clustering method restricts each point of the data set to exclusively just one cluster. So the membership values to be either 0 or 1. The K- Means clustering algorithm is widely used as hard clustering method. On the other hand, the fuzzy clustering method allows an object to belong to several clusters at the same time, but with the different degrees of membership between 0 and 1. The fuzzy C-means algorithm is the most popular fuzzy clustering method used in pattern classification. 3.1 K-Means Clustering Algorithm K-Means Clustering is one of the unsupervised learning algorithms, first used by James MacQueen in 1967 [23] and first proposed by Stuart Lloyd in 1957 as a technique for pulse-code modulation [24]. It is also referred to as Lloyd's algorithm [25]. Here, the modified algorithm is used to compute the threshold from speech spectrogram. The algorithm classifies a given data set into a certain number of clusters (i.e., k clusters) based on distance measures and to define k-centers, one for each cluster and finally computes the threshold. This algorithm is an iterative process until no new moves of data, as shown in Figure-1. Finally, this algorithm aims at minimizing an objective function, also known as squared error function and given by:     k i c j j j i i vxJ 1 1 2)( Where: j j i vx )( is the Euclidean distance between a data point (xi) and the cluster center, (vj); ‘ci’ is the number of data points in ith cluster; and ‘k’ is the number of cluster centers. The algorithmic steps for calculating a threshold using k- means clustering is given below: Let x = {x1, x2, x3,…,xn} be the set of data values in speech spectrogram. 1) Assign the number of clusters, k=3 and define the 3 cluster centers. Let v = {v1,v2,v3} be the set of 3 cluster centers. 2) Calculate the distance (dij) between ith data and jth cluster center. 3) Assign each data to the closest cluster with minimum distance. 4) Recalculate the new cluster center using:   ic j j i i x c v 1 1 where, ‘ci’ represents the number of data points in ith cluster. 5) Recalculate the distance between each data and new obtained cluster centers. 6) If no data was reassigned then stop, otherwise repeat from step (3). 7) Calculate the desired threshold from the average of 3 cluster centers. Figure1. K-Means Clustering Algorithm - Calculating the desired threshold. 3.2 FCM Clustering Algorithm The algorithm is developed by J C Bezdek in 1973 for pattern classification [26], and first reported by J C Dunn in 1974 [27], later J C Bezdek used this algorithm for pattern recognition [28]. FCM is a generalization of K-Means. While K-Means assigns each object to one and only one cluster,
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 406 FCM allows clusters to be fuzzy sets, so that each object belongs to all clusters with varying degrees of membership and the following restriction: The sum of all membership degrees for any given data point is equal to 1. It works by assigning membership to each object on the basis of distance between the cluster center and the object. More the object is near to the cluster center more is its membership. The algorithm is an iteration process, as shown in Figure-2. After each iteration, membership and cluster centers are updated, and finally compute a threshold from the cluster centroids. The algorithmic step for calculating a threshold using fuzzy c- means clustering is given below: Let x = {x1, x2, x3, ..., xn} be the set of data values in speech spectrogram. 1) Assign the number of cluster, c=3 and define the cluster centers. Let v = {v1, v2, v3, v3} be the set of 3 cluster centers. 2) Calculate the Euclidean distance (dij) between ith data and jth cluster center. 3) Update the fuzzy membership function (µij) using:   cjnidd c k m ikijij ,...,2,1;,...,2,1,/1 1 )1/(2     4) Compute the fuzzy centers 'vj' using:     cjxv n i m ij n i i m ijj ,...,3,2,1, 11                 wh ere, m is the fuzziness index [1,∞]. 5) Repeat from step (2) until the minimum 'J' value is achieved or eUU kk  )()1( . where: k is the iteration step; e is the termination criterion between [0, 1];   cnijU *  is the fuzzy membership matrix; and J is the objective function, obtained by:    c j n i ij m ij dJ 1 1 )( 6) Calculate the desired threshold from the average of 3 cluster centroids. Figure2. FCM Clustering Algorithm - Calculating the desired threshold. 4. OTSU’S THRESHOLDING METHOD Thresholding is the simplest method of image segmentation; can be used to create binary image from gray scale image. In order to convert the image into a binary representation, first we converted the image into a gray scale representation and then performed particular threshold analysis [29]. Otsu‟s method is a simple and effective automatic thresholding method, invented by Nobuyuki Otsu in 1979 [30], also known as binarization algorithm. It is used to automatically perform histogram shape-based image thresholding; i.e. the reduction of a grayscale image into a binary image. The algorithm assumes that the image is composed of two basic classes: Foreground and Background [30]. It then computes an optimal threshold value that minimizes the weighted within class variance; also maximizes the between class variance of these two classes. Otsu‟s threshold is used in many applications from medical imaging to low level computer vision. Our main aim is to use Otsu‟s threshold in speech segmentation. Figure- 3 shows the step-wise Otsu‟s algorithm.
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 407 4.1 Mathematical Formulation The mathematical formulation of Otsu‟s method for computing the optimum threshold is given below: Let P(i) represents the image histogram of speech spectrogram. The two class probabilities w1(t) and w2(t) at level t are computed by:   t i iPtw 1 1 )()( And   I ti iPtw 1 2 )()( The class means, μ1 (t) and μ2(t) are:   t i tw iiP t 1 1 1 )( )( )( and   I ti tw iiP t 1 2 2 )( )( )( Individual class variances:   t i tw iP tit 1 1 2 1 2 1 )( )( )]([)(  And   I ti tw iP tit 1 2 2 2 2 2 )( )( )]([)(  The within class variance (σw) is defined as a weighted sum of variances of the two classes and given by: )()()()()( 2 22 2 11 2 ttwttwtw   The between class variance (σb) is defined as a difference of total variance and within class variance and given by: )()()( 222 ttt wb   2 211 )]()()[()( tttwtw   These two variances σw and σb are calculated for all possible thresholds, t = 0… I (max. intensity). Otsu finds the best threshold that minimizes the weighted within class variance (σw), also maximizes the weighted between class variance (σb). Finally, the pixel luminance less than or equal to threshold is replaced by 0 (black) and greater than threshold is replaced by 1 (white) to obtain the binary or B/W image. Figure3. Otsu‟s Thresholding Algorithm – Converting binary image. 5. THE PROPOSED APPROACH We proposed several clustering methods (k-means and fuzzy c- means algorithms) and Otsu’s thresholding methods, with blocking black area and boundary detection techniques, in continuous Bangla speech segmentation. The proposed segmentation system is shown in Figure-4 and will discuss in the following sub-sections. Figure4. Proposed Speech Segmentation System
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 408 5.1 Speech Spectrogram A spectrogram is a time-varying spectral representation of signal. It shows how the spectral density of an audio signal varies over time [31]. Spectrograms can be used to identify spoken words phonetically. They are used extensively in the development of the fields of music, sonar (sound navigation and ranging), radar, and speech processing [32], etc. The spectrogram is calculated by using the short-time Fourirer Transform (STFT) of audio signal. The STFT of a signal frame x[n] can be expressed by:           n nj emnnxmXmnxSTFT   ][][,,][ Where: ][nx = signal; ][n = window function; m is discrete and  is continuous. Thus, the spectrogram of the signal x[n] can be estimated by computing the squares magnitude of the STFT of the signal [33], i.e.,    2 ,,  nSTFTnmSpectrogra  . We used MATLAB‟s „spectrogram‟ function that converts speech signal into spectrogram image. The spectrogram of the speech sentence („আমাদের জাতীয় কবি কাজী নজরুল ইসলাম‟) is shown in Figure-5. (a) Original Speech Signal (b) Spectrogram Image Figure5. Spectrogram of speech signal „আমাদের জাতীয় কবি কাজী নজরুল ইসলাম‟. 5.2 Dynamic Thresholding First, we used MATLAB‟s „rgb2gray‟ function to convert spectrogram RGB image to gray scale image with pixel values ranging from 0 to 255; where 0 represents a fully black pixel and 255 represents white pixel. Then we used „im2bw‟ function with a threshold to convert gray scale image to binary image. But the problem for us is how we choose our threshold. In our research, we proposed three efficient approaches, such as, K-Means, FCM and Otsu‟s thresholding algorithm to compute the desired threshold, valued between 0-1. We used MATLAB‟s 3-class „kmeans‟ function and 3-class „fcm‟ function. Both functions return the centroids of three clusters. The average of 1st and 2nd largest centroids is calculated as the desired threshold value. We also used MATLAB‟s „graythresh‟ function that uses Otsu's thresholding method, returns a level or threshold value for which the intra-class variance of the black and white pixels is minimum. The output image replaces all pixels in the input image with luminance greater than or equal to the threshold with the value of 1 (fully white) and less than threshold with 0 (fully black). Figure-6 shows the thresholded spectrogram image of the speech sentence („আমাদের জাতীয় কবি কাজী নজরুল ইসলাম‟). 5.3 Blocking Black Area Method Here, we used a new approach „Blocking Black Area‟ method in the thresholded spectrogram image that produces rectangular black boxes in the voiced regions of the speech sentence, as shown in Figure-7. The method works as follows:  Summing the column-wise intensity values of thresholded spectrogram image.  Find the image columns with fewer white pixels based on summing value and replace all pixels on this column with luminance 0 (black).  Find the image columns with fewer black pixels based on summing value and replace all pixels on this column with luminance 1 (white). (a) K-Means thresholded Image (b) FCM thresholded Image
  • 6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 409 (c) Otsu‟s thresholded Image Figure6. Thresholded Spectrogram Images of Speech signal „আমাদের জাতীয় কবি কাজী নজরুল ইসলাম‟ (a) Thresholded Spectrogram Image from Speech Signal (b) After applying Blocking Black Area Method Figure7. Effect of applying Blocking Black Area Method – Producing rectangle black boxes in voiced regions of speech signal 5.4 Boundary Detection We used MATLAB „regionprops‟ function to measure the properties of each connected object in the binary image. We also used different shape measurements (such as 'Area', 'BoundingBox', 'Centroid') to identify each rectangular object that represents speech words/sub-words. The 'Extrema' measurement, which is a vector of [top-left top-right right-top right-bottom bottom-right bottom-left left-bottom left-top], is used to detect the start (bottom-left) and end (bottom-right) points of each rectangular object, as shown in Figure-8. Figure8. Star and End point Detection of rectangular object. 5.5 Final Speech Segments Each rectangular black box represents a speech word. After detecting the start and end points of each black box, the word boundaries of the original speech sentence are marked automatically by these two points and finally we have cut the word segments from the speech sentence. Figure-9 shows that 6 (six) black boxes represent 6 (six) word segments in the speech sentence ‘আমাদের জাতীয় কবি কাজী নজরুল ইসলাম’. Figure9. 6 (six) detected word segments in the speech sentence „আমাদের জাতীয় কবি কাজী নজরুল ইসলাম‟. 6. EXPERIMENTAL RESULTS The proposed system has been implemented in Windows environment. All the methods and algorithms, discussed in this paper, have been implemented in MATLAB 7.12.0 (R2011a) version. For the experiments there were compared 3 types of segmentation algorithms: k-means, fuzzy c-means clustering and otsu‟s thresholding method. To evaluate the performance of the proposed methods, different experiments were carried out. In our experiments, various speech sentences in Bangla language have been recorded, analyzed and segmented by the proposed method. Table-1 shows the details segmentation results for ten Bangla speech sentences and reveals that the average segmentation accuracy rates are 94.11%; and 95.55% for k-means, 96.19% for FCM clustering algorithms, and 90.58% for Otsu‟s method. Table1. The details segmentation results
  • 7. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 410 7. CONCLUSIONS AND FURTHER RESEARCH We have proposed three efficient methods for continuous Bangla speech segmentation: two of them are usually used in pattern classification (i.e., k-means and fuzzy c-means clustering) and one of them is used in image segmentation (i.e., Otsu‟s thresholding method). We also used new approaches blocking black area and boundary detection methods to properly detect word boundaries. From the experimental results it is concluded that K-Means and FCM clustering methods produce better segmentation results than that of Otsu‟s Method. It was observed that some of the words were not properly segmented, this is due to different causes: (i) the utterance of words differs depending on their position in the sentence; (ii) the pauses between the words are not identical in all causes; and (iii) the non-uniform articulation of speech. Also, the speech signal is very much sensitive to the properties of speaker and environments. Experimental results showed that the proposed approach is effective in continuous speech segmentation. The major goal of future research is to search a mechanism that can be employed to enable pattern discovery by learning. We have a continuous effort to develop a continuous speech recognition system using neuro-fuzzy system. REFERENCES [1] Okko Rasanen, “Speech Segmentation and Clustering Methods for a New Speech Recognition Architecture”, M.Sc Thesis, Dept. of Electrical and Communication Engineering, Helsinki University of Technology, Espoo, November 2007. [2] Cherif A., Bouafif L., and Dabbabi T. “Pitch Detection and Formant Analysis of Arabic Speech processing”, Applied Acoustics, pp. 1129-1140, 2001. [3] Sharma M., and Mammone R., “Subwordbased Text Dependent Speaker Verification System with User- selectable Password”, Proceedings of ICASSP, pp. 93-96, 1996. [4] Y. Hioka and N. Hamada, “Voice activity detection with array signal processing in the wavelet domain”, IEICE Trans. Fund. Elec. Commun. Comput. Sci., vol. 86, no. 11, pp. 2802–2811, 2003. [5] F. Beritelli and S. Casale, “Robust voiced/unvoiced speech classification using fuzzy rules”, in IEEE Worksh. Speech Cod. Telecommun. Proc., Pocono Manor, pp. 5–6, USA, 1997. [6] Y. Qi and B. R. Hunt, “Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier”, IEEE Trans. Speech Audio Proces., vol. 1, no. 2, pp. 250–255, 1993. [7] S. Basu, “A linked-HMM model for robust voicing and speech detection”, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP‟03), Hong Kong, China, Vol. 1, pp. 816–819, 2003. [8] Thangarajan R, Natarajan A.M., “Syllable Based Continuous Speech Recognition for Tamil”, In South Asian Language Review VOL.XVIII. No.1, 2008. [9] Kvale K. “Segmentation and Labeling of Speech”, PhD Dissertation, The Norwegian Institute of Technology, 1993. [10] Md. Mijanur Rahman and Md. Al-Amin Bhuiyan, “Continuous Bangla Speech Segmentation using Short-term Speech Features Extraction Approaches”, International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 3, No. 11, 2012. [11] Sai Jayram A K V, Ramasubramanian V and Sreenivas T V, “Robust parameters for automatic segmentation of speech”, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), Vol. 1, pp. 513-516, 2002. [12] A. Webb, “Statistical Pattern Recognition”, New Jersey, John Wiley & Sons, 2002. [13] P. N. Tan, M. Steinbach, V. Kumar, “Introduction to Data Mining”, Boston, Addison-Wesley, 2005. [14] E. Alpaydin, “Introduction to Machine Learning”, Cambridge, the MIT Press, 2004. [15] R.J Hathway, and J.C. Bczdek, “Optimization of Clustering Criteria by Reformulation”, IEEE transactions on Fuzzy Systems, pp. 241-245, 1995. [16] Fu, S.K. Mui, J.K. “A Survey on Image Segmentation”, Pattern Recognition, Vol. 13, pp.3– 16, 1981. [17] Haralick, R.M. Shapiro, L.G., “Image Segmentation Techniques”, Computer Vision Graphics Image Process. Vol. 29, pp.100–132, 1985. [18] Pal, N. Pal, S., “A Review on Image Segmentation Techniques”, Pattern Recognition, Vol. 26, pp.1277– 1294, 1993. [19] Bezdek, J.C. Hall, L.O. Clarke, L.P. “Review of MR Image Segmentation Techniques Using Pattern Recognition”. Med. Phys., Vol. 20, pp.1033–1048, 1993. [20] Kwon, M. J. Han, Y. J Shin, I.H. Park, H.W. “Hierarchical Fuzzy Segmentation of Brain MR Images”, International Journal of Imaging Systems and Technology, Vol. 13, pp.115–125, 2003. [21] Tolias, Y.A. Panas, S.M., “Image Segmentation by a Fuzzy Clustering Algorithm Using Adaptive Spatially Constrained Membership Functions”, IEEE Trans. Systems, Man, Cybernet., Vol. 28, pp. 359–369, 1998. [22] Pham, D. L. Prince, J. L., “Adaptive Fuzzy Segmentation of Magnetic Resonance Images”, IEEE Trans. Medical Imaging, Vol. 18, pp. 737–752, 1999. [23] MacQueen, J. B., “Some Methods for classification and Analysis of Multivariate Observations”, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. pp. 281–297, 1967. [24] Lloyd, S. P., “Least square quantization in PCM”, Bell Telephone Laboratories Paper, 1957. Published
  • 8. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 411 in journal later: Lloyd, S. P., “Least squares quantization in PCM”, IEEE Transactions on Information Theory. 28 (2): 129–137, 1982. [25] E.W. Forgy, “Cluster analysis of multivariate data: efficiency versus interpretability of classifications”, Biometrics 21: 768–769, 1965. [26] J. C. Bezdek, “Fuzzy Mathematics in Pattern Classification”, PhD Thesis, Cornell University, Ithaca, NY, 1973. [27] J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact, Well Separated Clusters”, J. Cyber., 3, 32-57, 1974. [28] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms”, Plenum, NY, 1981. [29] Gonzalez, Rafael C. & Woods, Richard E, “Thresholding”, In Digital Image Processing, pp. 595–611. Pearson Education, 2002. [30] Nobuyuki Otsu, “A threshold selection method from gray-level histograms”, IEEE Trans. Sys., Man., Cyber. 9 (1): 62–66, 1979. [31] Haykin, S., “Advances in Spectrum Analysis and Array Processing”, Vol.1, Prentice Hall, 1991. [32] JL Flanagan, “Speech Analysis, Synthesis and Perception”, Springer- Verlag, New York, 1972. [33] National Instruments, “STFT Spectrogram VI”: http://zone.ni.com/reference/en-XX/help/371361E- 01/lvanls/stft_spectrogram_core/#details BIOGRAPHIES Mr. Md. Mijanur Rahman is working as Assistant Professor of the Dept. of Computer Science and Engineering at Jatiya Kabi Kazi Nazrul Islam University, Trishal, Mymensingh, Bangladesh. Mr. Rahman obtained his B. Sc. (Hons) and M. Sc degrees, both with first class first in Computer Science and Engineering from Islamic University, Kushtia, Bangladesh. Now he is continuing his PhD program in the department of Computer Science and Engineering at Jahangirnagar University, Savar, Dhaka, Bangladesh. His teaching and research interest lies in the areas such as Digital Speech Processing, Pattern Recognition, Neural Networks, Fuzzy Logic, Artificial Intelligence, etc. He has got many research articles published in both national and international journals. Dr. Md. Al-Amin Bhuiyan is serving as a professor of the Dept. of Computer Science and Engineering at Jahangirnagar University, Savar, Dhaka, Bangladesh. Dr. Bhuiyan obtained his B. Sc. (Hons) and M. Sc degrees both with first class in Applied Physics & Electronics from the University of Dhaka, Bangladesh. He obtained his PhD degree in Information & Communication Engineering from Osaka City University, Japan. His teaching and research interest lies in the areas such as Image Processing, Computer Vision, Computer Graphics, Pattern Recognition, Soft Computing, Artificial Intelligence, Robotics, etc. He has got many articles published in both national and international journals.