Extracting Mel Spectrograms with Pytorch_and_Torchaudio.pdf

Creating a Custom Audio Dataset with PyTorch
Extracting Mel Spectrograms with Pytorch and
Torchaudio: UrbanSound8K Dataset
[By KAHSAY]
August 22, 2024
1

Introduction
• Objective:
• Learn how to create a custom audio dataset using PyTorch
and torchaudio.
• Use the UrbanSound8K dataset as a practical example.
• Focus Areas:
• Basic I/O operations with torchaudio.
• Handling audio files and annotations.
2

Libraries and Imports
• Import Libraries:
• os for handling file paths.
• torch.utils.data.Dataset for creating custom
datasets.
• pandas for reading CSV files.
• torchaudio for loading and processing audio files.
1 import os
2 import torch
3 from torch.utils.data import Dataset
4 import pandas as pd
5 import torchaudio
3

UrbanSoundDataset Class Overview
• Purpose:
• Create a custom dataset class that handles the
UrbanSound8K dataset.
• Inherit from PyTorch’s Dataset class.
1 class UrbanSoundDataset(Dataset):
2 def __init__(self, annotations_file, audio_dir,
transformation,
3 target_sample_rate):
4 self.annotations = pd.read_csv(annotations_file)
5 self.audio_dir = audio_dir
6 self.transformation = transformation
7 self.target_sample_rate = target_sample_rate
8
9 def __len__(self):
10 return len(self.annotations)
11
12 def __getitem__(self, index):
13 audio_sample_path = self._get_audio_sample_path(
4

Methods
• _resample_if_necessary():
• Resamples the audio signal to the target sample rate.
• _mix_down_if_necessary():
• Mixes down stereo audio to mono if necessary.
1 def _resample_if_necessary(self, signal, sr):
2 if sr != self.target_sample_rate:
3 resampler = torchaudio.transforms.Resample(
sr, self.target_sample_rate)
4 signal = resampler(signal)
5 return signal
6
7 def _mix_down_if_necessary(self, signal):
8 if signal.shape[0] > 1:
9 signal = torch.mean(signal, dim=0, keepdim=
True)
10 return signal
5

Main Script Execution
• Purpose:
• Set up file paths and instantiate the
UrbanSoundDataset class.
• Print the number of samples and access the first audio
sample.
1 if __name__ == "__main__":
2 ANNOTATIONS_FILE = "/path/to/UrbanSound8K.csv"
3 AUDIO_DIR = "/path/to/audio"
4 SAMPLE_RATE = 16000
5
6 mel_spectrogram = torchaudio.transforms.
MelSpectrogram(
7 sample_rate=SAMPLE_RATE,
8 n_fft=1024,
9 hop_length=512,
10 n_mels=64
11 )
12
6

Extracting Mel Spectrograms
• Mel Spectrogram: - A representation of the audio signal in
the Mel frequency scale. - Useful for capturing the
characteristics of audio signals for classification tasks.
• Using torchaudio: - Create a Mel Spectrogram using
torchaudio.transforms.MelSpectrogram().
7

Resampling Audio
• Resampling: - Adjusting the sample rate of audio signals
to a target sample rate. - Important for ensuring
consistent processing across different audio files.
• Using torchaudio: - Use
torchaudio.transforms.Resample() to resample
audio signals.
8

Common torchaudio Transforms
• MelSpectrogram: - Converts audio waveforms to Mel
spectrograms.
• Spectrogram: - Converts audio waveforms to regular
spectrograms.
• AmplitudeToDB: - Converts amplitude to decibels for
better visualization.
• TimeStretch: - Stretches the audio signal in time without
altering pitch.
• PitchShift: - Shifts the pitch of the audio signal by a
specified number of semitones.
• Apply Usage:
• Combine transforms using torch.nn.Sequential for a
pipeline.
9

Dataset Loading Workflow
• Workflow Steps:
1. Read Annotations:
• Load metadata from the CSV file using pandas.
2. Load Audio:
• Use torchaudio.load() to read audio files.
3. Return Data:
• Output the audio signal and its label.
10

UrbanSound8K Dataset
• Description:
• A dataset with 8,732 labeled sound excerpts from 10
classes.
• Useful for training audio classification models.
11

Conclusion
• Key Takeaways:
• Understanding the process of creating a custom audio
dataset.
• Importance of audio transformations for machine learning
tasks.
• Practical implementation using torchaudio.
12

Extracting Mel Spectrograms with Pytorch_and_Torchaudio.pdf

More Related Content

Recently uploaded

Featured

Extracting Mel Spectrograms with Pytorch_and_Torchaudio.pdf