Features

MFCC

speechpy.feature.mfcc(signal, sampling_frequency, frame_length=0.02, frame_stride=0.01, num_cepstral=13, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None, dc_elimination=True)[source]

Compute MFCC features from an audio signal.

Parameters:
  • signal (array) – the audio signal from which to compute features. Should be an N x 1 array
  • sampling_frequency (int) – the sampling frequency of the signal we are working with.
  • frame_length (float) – the length of each frame in seconds. Default is 0.020s
  • frame_stride (float) – the step between successive frames in seconds. Default is 0.02s (means no overlap)
  • num_filters (int) – the number of filters in the filterbank, default 40.
  • fft_length (int) – number of FFT points. Default is 512.
  • low_frequency (float) – lowest band edge of mel filters. In Hz, default is 0.
  • high_frequency (float) – highest band edge of mel filters. In Hz, default is samplerate/2
  • num_cepstral (int) – Number of cepstral coefficients.
  • dc_elimination (bool) – hIf the first dc component should be eliminated or not.
Returns:

A numpy array of size (num_frames x num_cepstral) containing mfcc features.

Return type:

array

Mel Frequency Energy

speechpy.feature.mfe(signal, sampling_frequency, frame_length=0.02, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)[source]

Compute Mel-filterbank energy features from an audio signal.

signal (array): the audio signal from which to compute features. Should be an N x 1 array sampling_frequency (int): the sampling frequency of the signal we are working with. frame_length (float): the length of each frame in seconds. Default is 0.020s frame_stride (float): the step between successive frames in seconds. Default is 0.02s (means no overlap) num_filters (int): the number of filters in the filterbank, default 40. fft_length (int): number of FFT points. Default is 512. low_frequency (float): lowest band edge of mel filters. In Hz, default is 0. high_frequency (float): highest band edge of mel filters. In Hz, default is samplerate/2
Returns:features - the energy of fiterbank: num_frames x num_filters frame_energies. The energy of each frame: num_frames x 1
Return type:array

Log Mel Frequency Energy

speechpy.feature.lmfe(signal, sampling_frequency, frame_length=0.02, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)[source]

Compute log Mel-filterbank energy features from an audio signal.

Parameters:
  • signal (array) – the audio signal from which to compute features. Should be an N x 1 array
  • sampling_frequency (int) – the sampling frequency of the signal we are working with.
  • frame_length (float) – the length of each frame in seconds. Default is 0.020s
  • frame_stride (float) – the step between successive frames in seconds. Default is 0.02s (means no overlap)
  • num_filters (int) – the number of filters in the filterbank, default 40.
  • fft_length (int) – number of FFT points. Default is 512.
  • low_frequency (float) – lowest band edge of mel filters. In Hz, default is 0.
  • high_frequency (float) – highest band edge of mel filters. In Hz, default is samplerate/2
Returns:

Features - The energy of fiterbank: num_frames x num_filters

frame_log_energies. The log energy of each frame: num_frames x 1

Return type:

array

Extract Derivative Features

speechpy.feature.extract_derivative_feature(feature)[source]

This function extracts temporal derivative features which are first and second derivatives.

Parameters:feature (array) – The feature vector which its size is: N x M
Returns:The feature cube vector which contains the static, first and second derivative features of size: N x M x 3
Return type:array