Features¶
feature module.
This module provides functions for calculating the main speech features that the package is aimed to extract as well as the required elements.
Functions:
- filterbanks: Compute the Mel-filterbanks
- The filterbanks must be created for extracting speech features such as MFCC.
mfcc: Extracting Mel Frequency Cepstral Coefficient feature.
mfe: Extracting Mel Energy feature.
lmfe: Extracting Log Mel Energy feature.
- extract_derivative_feature: Extract the first and second derivative
- features. This finction, directly use the
derivative_extraction
function in theprocessing
module.
MFCC¶
-
speechpy.feature.
mfcc
(signal, sampling_frequency, frame_length=0.02, frame_stride=0.01, num_cepstral=13, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None, dc_elimination=True)[source]¶ Compute MFCC features from an audio signal.
Parameters: - signal (array) – the audio signal from which to compute features. Should be an N x 1 array
- sampling_frequency (int) – the sampling frequency of the signal we are working with.
- frame_length (float) – the length of each frame in seconds. Default is 0.020s
- frame_stride (float) – the step between successive frames in seconds. Default is 0.02s (means no overlap)
- num_filters (int) – the number of filters in the filterbank, default 40.
- fft_length (int) – number of FFT points. Default is 512.
- low_frequency (float) – lowest band edge of mel filters. In Hz, default is 0.
- high_frequency (float) – highest band edge of mel filters. In Hz, default is samplerate/2
- num_cepstral (int) – Number of cepstral coefficients.
- dc_elimination (bool) – hIf the first dc component should be eliminated or not.
Returns: A numpy array of size (num_frames x num_cepstral) containing mfcc features.
Return type: array
Mel Frequency Energy¶
-
speechpy.feature.
mfe
(signal, sampling_frequency, frame_length=0.02, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)[source]¶ Compute Mel-filterbank energy features from an audio signal.
Parameters: - signal (array) – the audio signal from which to compute features. Should be an N x 1 array
- sampling_frequency (int) – the sampling frequency of the signal we are working with.
- frame_length (float) – the length of each frame in seconds. Default is 0.020s
- frame_stride (float) – the step between successive frames in seconds. Default is 0.02s (means no overlap)
- num_filters (int) – the number of filters in the filterbank, default 40.
- fft_length (int) – number of FFT points. Default is 512.
- low_frequency (float) – lowest band edge of mel filters. In Hz, default is 0.
- high_frequency (float) – highest band edge of mel filters. In Hz, default is samplerate/2
Returns: features - the energy of fiterbank of size num_frames x num_filters. The energy of each frame: num_frames x 1
Return type: array
Log Mel Frequency Energy¶
-
speechpy.feature.
lmfe
(signal, sampling_frequency, frame_length=0.02, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)[source]¶ Compute log Mel-filterbank energy features from an audio signal.
Parameters: - signal (array) – the audio signal from which to compute features. Should be an N x 1 array
- sampling_frequency (int) – the sampling frequency of the signal we are working with.
- frame_length (float) – the length of each frame in seconds. Default is 0.020s
- frame_stride (float) – the step between successive frames in seconds. Default is 0.02s (means no overlap)
- num_filters (int) – the number of filters in the filterbank, default 40.
- fft_length (int) – number of FFT points. Default is 512.
- low_frequency (float) – lowest band edge of mel filters. In Hz, default is 0.
- high_frequency (float) – highest band edge of mel filters. In Hz, default is samplerate/2
Returns: Features - The log energy of fiterbank of size num_frames x num_filters frame_log_energies. The log energy of each frame num_frames x 1
Return type: array
Extract Derivative Features¶
-
speechpy.feature.
extract_derivative_feature
(feature)[source]¶ - This function extracts temporal derivative features which are
- first and second derivatives.
Parameters: feature (array) – The feature vector which its size is: N x M Returns: The feature cube vector which contains the static, first and second derivative features of size: N x M x 3 Return type: array