PyTorch Speech Features

PyTorch based Feature Extraction to mimic Kaldi Speech Recognition Toolkit feature extraction.

Installation

git clone https://github.com/apple/pytorch-speech-features.git
cd pytorch-speech-features
pip install .

Usage

import torch
from apple_pytorch_speech_features import FBank
feature_extractor = FBank()
dummy_wav = torch.randint(2,1000,(1,16000)).float() # create random integer tensor of shape (2,16000)
features = feature_extractor(dummy_wav) # outputs features of shape (2,98,40)

Features

Spectrogram: Spectrogram Extraction from time domain audio = Window --> Remove DC --> Pre-Emphasis --> STFT(DTFT) --> Power/Energy

sr=16000, # (Sample rate of input wav signal)
winlen=400, # (Window length used for FFT/DFT/STFT)
winstep=160, # (Window step used for FFT/DFT/STFT)
premph_k=0.97, # (Pre-emphasis coefficient)
winfunc=np.hamming, # (window function to be used for FFT)
remove_dc=True, # (Remove mean after windowing)
add_noise=True, # (same as dither in Kaldi)
do_log=False, # (Produce results in log)
scale_spectrogram=1, # (Add any constant scale factor to be used on spectral outputs (like 1/NFFT))
requires_grad=False, # (Make dtft, pre-emphasis trainable if True)

FBank: Mel-Filterbank Extraction = Spectrogram --> Mel-transform

 sr=16000, # (Sample rate of input wav signal)
 winlen=400, # (Window length used for FFT/DFT/STFT)
 winstep=160, # (Window step used for FFT/DFT/STFT)
 mel_filt_path=None, # (Path of pre-calculated mel filter coefficients)
 mel_min=64, # (minimum frequency (Hz) for mel filter coefficient calculation)
 mel_max=8000, # (maximum frequency (Hz) for mel filter coefficient calculation)
 num_mels=40, # (Number of mel filterbanks)
 premph_k=0.97, # (Pre-emphasis coefficient)
 winfunc=np.hamming, # (window function to be used for FFT)
 remove_dc=True, # (Remove mean after windowing)
 add_noise=True, # (same as dither in Kaldi)
 do_log=True, # (Produce results in log)
 scale_spectrogram=1, # (Add any constant scale factor to be used on spectral outputs (like 1/NFFT))
 scale_fbanks=1, # (Add any constant scale factor to be used on fbank outputs like use 1/ln(10) if you want outputs in log 10 scale))
 requires_grad=False, # (Make fft, pre-emphasis, mel filter trainable if True)

MFCC: MFCC Extraction = FBank --> DCT --> Roll features

sr=16000, # (Sample rate of input wav signal)
winlen=400, # (Window length used for FFT/DFT/STFT)
winstep=160, # (Window step used for FFT/DFT/STFT)
mel_filt_path=None, # (Path of pre-calculated mel filter coefficients)
mel_min=64, # (minimum frequency (Hz) for mel filter coefficient calculation)
mel_max=8000, # (maximum frequency (Hz) for mel filter coefficient calculation)
num_mels=40, # (Number of mel filterbanks)
num_mfccs=20, # (Number of mfcc)
premph_k=0.97, # (Pre-emphasis coefficient)
winfunc=np.hamming, # (window function to be used for FFT)
remove_dc=True, # (Remove mean after windowing)
add_noise=True, # (same as dither in Kaldi)
do_log=True, # (Produce results in log)
scale_spectrogram=1, # (Add any constant scale factor to be used on spectral outputs (like 1/NFFT))
scale_fbanks=1, # (Add any constant scale factor to be used on fbank outputs like use 1/ln(10) if you want outputs in log 10 scale))
requires_grad=False, # (Make fft, pre-emphasis, mel filter, dct-filter trainable if True)

SlidingCMVN: Sliding Cmvn with a given minimum cmn size and cmn size

 cmn_window=600, # (cmn window used to calculate mean over)
 min_cmn_window=100, # (cmn window used to calculate mean over)

SubSample: SubSample features

 stride=1, # (Subsampling parameter)
 feat_dim=40, # (N in Dimension of input features BxTxN)

Splicing: Splice features with left-context and right-cintext parameters

 leftpad=3, # (Left context)
 rightpad=3, # (Right context)
 feat_dim=40, # (N in Dimension of input features BxTxN)

GlobalCMVN: Apply CMVN/normalization using mean & std computed on whole dataset

 cmvn_mean = <list of feat mean, len(list) = input_feat_dim>
 cmvn_std  = <list of feat std, len(list)  = input_feat_dim>

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apple_pytorch_speech_features		apple_pytorch_speech_features
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch Speech Features

Installation

Usage

Features

About

Releases

Packages

Languages

License

apple/pytorch-speech-features

Folders and files

Latest commit

History

Repository files navigation

PyTorch Speech Features

Installation

Usage

Features

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages