2. Team Members:
1. Dithya Prasanna
2. Shamanth H S
3. Priyadarshini B
4. Sarthak Sharma
5. Meghana K
6. Prasun Sarkar
3. Introduction
• One of the most common means of communication in
the world is through voice. In the real world, it is
possible for people to identify the gender of a person/s
through their voice.
• Voice is filled with a lot of linguistic features, which are
often considered as the voice prints to recognize the
gender of the speaker.
4. Problem Statement
Building reliable models to identify a voice as male or
female, based upon acoustic properties of the voice and
speech.
The goal is to compare outputs of different models and
suggest the best model that can be used for gender
recognition by voice for real-world inputs.
5. About the dataset
• The dataset consists of 3,168 recorded voice samples,
collected from male and female speakers. The voice
samples are pre-processed by acoustic analysis in R
using the seewave and tuneR packages, with an
analyzed frequency range of 0hz-280hz (human vocal
range). These samples were recorded across 20 features.
6. About the dataset – Data Description
• meanfreq: mean frequency (in kHz)
• sd: standard deviation of frequency
• median: median frequency (in kHz)
• Q25: first quantile (in kHz)
• Q75: third quantile (in kHz)
• IQR: interquantile range (in kHz)
• skew: skewness (see note in specprop description)
• kurt: kurtosis (see note in specprop description)
7. Data Description (contd.)
• sp.ent: spectral entropy
• sfm: spectral flatness
• mode: mode frequency
• centroid: frequency centroid (see specprop)
• meanfun: average of fundamental frequency measured
across acoustic signal
• minfun: minimum fundamental frequency measured
across acoustic signal
8. Data Description (contd.)
• maxfun: maximum fundamental frequency measured
across acoustic signal
• meandom: average of dominant frequency measured
across acoustic signal
• mindom: minimum of dominant frequency measured
across acoustic signal
• maxdom: maximum of dominant frequency measured
across acoustic signal
9. Data Description (contd.)
• modindx: modulation index. Calculated as the
accumulated absolute difference between adjacent
measurements of fundamental frequencies divided by
the frequency range
• dfrange: range of dominant frequency measured across
acoustic signal
• label: male or female (Target/Dependent Variable)
12. Conclusion
• Of all the models built, we see that Gradient Boosting
Classifier model has the best accuracy score of 0.9887.
• The features that play the most important role in
identifying the gender by voice are meanfun, sd, Q25,
sfm, sp.ent and meanfreq.
• Gender Recognition using voice can be used for various
applications, such as detecting feelings, differentiating
between audio and video using tags, etc.