A Pytorch library for audio data augmentation. Inspired by audiomentations. Useful for deep learning.

Audio data augmentation in PyTorch. Inspired by audiomentations.
nn.Module, so they can be integrated as a part of a pytorch neural network modelper_batch, per_example and per_channelpip install torch-audiomentations
import torch
from torch_audiomentations import Compose, Gain, PolarityInversion
# Initialize augmentation callable
apply_augmentation = Compose(
transforms=[
Gain(
min_gain_in_db=-15.0,
max_gain_in_db=5.0,
p=0.5,
),
PolarityInversion(p=0.5)
]
)
torch_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Make an example tensor with white noise.
# This tensor represents 8 audio snippets with 2 channels (stereo) and 2 s of 16 kHz audio.
audio_samples = torch.rand(size=(8, 2, 32000), dtype=torch.float32, device=torch_device) - 0.5
# Apply augmentation. This varies the gain and polarity of (some of)
# the audio snippets in the batch independently.
perturbed_audio_samples = apply_augmentation(audio_samples, sample_rate=16000)
Contributors welcome!
Join the Asteroid's slack
to start discussing about torch-audiomentations with us.
We don't want data augmentation to be a bottleneck in model training speed. Here is a comparison of the time it takes to run 1D convolution:

torch-audiomentations is in an early development stage, so the APIs are subject to change.
Every transform has mode, p, and p_mode -- the parameters that decide how the augmentation is performed.
mode decides how the randomization of the augmentation is grouped and applied.p decides the on/off probability of applying the augmentation.p_mode decides how the on/off of the augmentation is applied.This visualization shows how different combinations of mode and p_mode would perform an augmentation.

Added in v0.5.0
Add background noise to the input audio.
Added in v0.7.0
Add colored noise to the input audio.
Added in v0.5.0
Convolve the given audio with impulse responses.
Added in v0.9.0
Apply band-pass filtering to the input audio.
Added in v0.10.0
Apply band-stop filtering to the input audio. Also known as notch filter.
Added in v0.1.0
Multiply the audio by a random amplitude factor to reduce or increase the volume. This technique can help a model become somewhat invariant to the overall gain of the input audio.
Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping
Added in v0.8.0
Apply high-pass filtering to the input audio.
Added in v0.11.0
This transform returns the input unchanged. It can be used for simplifying the code in cases where data augmentation should be disabled.
Added in v0.8.0
Apply low-pass filtering to the input audio.
Added in v0.2.0
Apply a constant amount of gain, so that highest signal level present in each audio snippet in the batch becomes 0 dBFS, i.e. the loudest level allowed if all samples must be between -1 and 1.
This transform has an alternative mode (apply_to="only_too_loud_sounds") where it only applies to audio snippets that have extreme values outside the [-1, 1] range. This is useful for avoiding digital clipping in audio that is too loud, while leaving other audio untouched.
Added in v0.9.0
Pitch-shift sounds up or down without changing the tempo.
Added in v0.1.0
Flip the audio samples upside-down, reversing their polarity. In other words, multiply the waveform by -1, so negative values become positive, and vice versa. The result will sound the same compared to the original when played back in isolation. However, when mixed with other audio sources, the result may be different. This waveform inversion technique is sometimes used for audio cancellation or obtaining the difference between two waveforms. However, in the context of audio data augmentation, this transform can be useful when training phase-aware machine learning models.
Added in v0.5.0
Shift the audio forwards or backwards, with or without rollover
Added in v0.6.0
Given multichannel audio input (e.g. stereo), shuffle the channels, e.g. so left can become right and vice versa. This transform can help combat positional bias in machine learning models that input multichannel waveforms.
If the input audio is mono, this transform does nothing except emit a warning.
Added in v0.10.0
Reverse (invert) the audio along the time axis similar to random flip of an image in the visual domain. This can be relevant in the context of audio classification. It was successfully applied in the paper AudioCLIP: Extending CLIP to Image, Text and Audio
Mix, Padding, RandomCrop and SpliceOutIdentityObjectDict output type as alternative to torch.Tensor. This alternative is opt-in for
now (for backwards-compatibility), but note that the old output type (torch.Tensor) is
deprecated and support for it will be removed in a future version.AddBackgroundNoise and ApplyImpulseResponsetorch-pitch-shift to ensure support for torchaudio 0.11 in PitchShiftBandPassFilter didn't work on GPUAddBackgroundNoiseAddBackgroundNoiseOneOf and SomeOf for applying one or more of a given set of transformsBandStopFilter and TimeInversionir_paths in transform_parameters in ApplyImpulseResponse so it is possible
to inspect what impulse responses were used. This also gives freeze_parameters()
the expected behavior.BandPassFilter. The default values have been updated accordingly.
If you were previously specifying min_bandwidth_fraction and/or max_bandwidth_fraction,
you now need to double those numbers to get the same behavior as before.compensate_for_propagation_delay in ApplyImpulseResponseBandPassFilterPitchShiftHighPassFilter and LowPassFilterAddColoredNoiseShuffleChannelsAddBackgroundNoise did not work on CUDAApplyImpulseResponse.AddBackgroundNoise and ApplyImpulseResponseShiftsample_rate optional. Allow specifying sample_rate in __init__ instead of forward. This means torchaudio transforms can be used in Compose now.parameters method of the nn.Module subclassCompose for applying multiple transformsfrom_dict and from_yaml for loading data augmentation
configurations from dict, json or yamlper_batch and per_channelPeakNormalizationconvolve in the APIGain and PolarityInversionA GPU-enabled development environment for torch-audiomentations can be created with conda:
conda env createpytest
The development of torch-audiomentations is kindly backed by Nomono.
Thanks to all contributors who help improving torch-audiomentations.