Preprocessing

Pre-emphasis

speechpy.processing.preemphasis(signal, shift=1, cof=0.98)[source]

preemphasising on the signal.

Parameters:
  • signal (array) – The input signal.
  • shift (int) – The shift step.
  • cof (float) – The preemphasising coefficient. 0 equals to no filtering.
Returns:

the pre-emphasized signal.

Stacking

speechpy.processing.stack_frames(sig, sampling_frequency, frame_length=0.02, frame_stride=0.02, filter=<function <lambda>>, zero_padding=True)[source]

Frame a signal into overlapping frames.

Parameters:
  • sig (array) – The audio signal to frame of size (N,).
  • sampling_frequency (int) – The sampling frequency of the signal.
  • frame_length (float) – The length of the frame in second.
  • frame_stride (float) – The stride between frames.
  • filter (array) – The time-domain filter for applying to each frame. By default it is one so nothing will be changed.
  • zero_padding (bool) – If the samples is not a multiple of frame_length(number of frames sample), zero padding will be done for generating last frame.
Returns:

stacked_frames-Array of frames of size (number_of_frames x frame_len).

Return type:

array

FFT Spectrum

speechpy.processing.fft_spectrum(frames, fft_points=512)[source]

This function computes the one-dimensional n-point discrete Fourier Transform (DFT) of a real-valued array by means of an efficient algorithm called the Fast Fourier Transform (FFT). Please refer to https://docs.scipy.org/doc/numpy/reference/generated/numpy.fft.rfft.html for further details.

Parameters:
  • frames (array) – The frame array in which each row is a frame.
  • fft_points (int) – The length of FFT. If fft_length is greater than frame_len, the frames will be zero-padded.
Returns:

The fft spectrum - If frames is an num_frames x sample_per_frame matrix, output will be num_frames x FFT_LENGTH.

Return type:

array

Power Spectrum

speechpy.processing.power_spectrum(frames, fft_points=512)[source]

Power spectrum of each frame.

Parameters:
  • frames (array) – The frame array in which each row is a frame.
  • fft_points (int) – The length of FFT. If fft_length is greater than frame_len, the frames will be zero-padded.
Returns:

The power spectrum - If frames is an num_frames x sample_per_frame matrix, output will be num_frames x fft_length.

Return type:

array

Power Spectrum Log

speechpy.processing.log_power_spectrum(frames, fft_points=512, normalize=True)[source]

Log power spectrum of each frame in frames.

Parameters:
  • frames (array) – The frame array in which each row is a frame.
  • fft_points (int) – The length of FFT. If fft_length is greater than frame_len, the frames will be zero-padded.
  • normalize (bool) – If normalize=True, the log power spectrum will be normalized.
Returns:

The power spectrum - If frames is an num_frames x sample_per_frame matrix, output will be num_frames x fft_length.

Return type:

array

Derivative Extraction

speechpy.processing.derivative_extraction(feat, DeltaWindows)[source]

This function the derivative features.

Parameters:
  • feat (array) – The main feature vector(For returning the second order derivative it can be first-order derivative).
  • DeltaWindows (int) – The value of DeltaWindows is set using the configuration parameter DELTAWINDOW.
Returns:

Derivative feature vector - A NUMFRAMESxNUMFEATURES numpy array which is the derivative features along the features.

Return type:

array