Preprocessing¶
Processing module for signal processing operations.
This module demonstrates documentation for the signal processing function which are required as internal computations in the package.
ivar preemphasis: | |
---|---|
Preemphasising on the signal. This is a preprocessing step. | |
ivar stack_frames: | |
Create stacking frames from the raw signal. | |
ivar fft_spectrum: | |
Calculation of the Fast Fourier Transform. | |
ivar power_spectrum: | |
Power Spectrum calculation. | |
ivar log_power_spectrum: | |
Log Power Spectrum calculation. | |
ivar derivative_extraction: | |
Calculation of the derivative of the extracted featurs. | |
ivar cmvn: | Cepstral mean variance normalization. This is a post processing operation. |
ivar cmvnw: | Cepstral mean variance normalization over the sliding window. This is a post processing operation. |
Pre-emphasis¶
Stacking¶
-
speechpy.processing.
stack_frames
(sig, sampling_frequency, frame_length=0.02, frame_stride=0.02, filter=<function <lambda>>, zero_padding=True)[source]¶ Frame a signal into overlapping frames.
Parameters: - sig (array) – The audio signal to frame of size (N,).
- sampling_frequency (int) – The sampling frequency of the signal.
- frame_length (float) – The length of the frame in second.
- frame_stride (float) – The stride between frames.
- filter (array) – The time-domain filter for applying to each frame. By default it is one so nothing will be changed.
- zero_padding (bool) – If the samples is not a multiple of frame_length(number of frames sample), zero padding will be done for generating last frame.
Returns: Stacked_frames-Array of frames of size (number_of_frames x frame_len).
Return type: array
FFT Spectrum¶
-
speechpy.processing.
fft_spectrum
(frames, fft_points=512)[source]¶ This function computes the one-dimensional n-point discrete Fourier Transform (DFT) of a real-valued array by means of an efficient algorithm called the Fast Fourier Transform (FFT). Please refer to https://docs.scipy.org/doc/numpy/reference/generated/numpy.fft.rfft.html for further details.
Parameters: - frames (array) – The frame array in which each row is a frame.
- fft_points (int) – The length of FFT. If fft_length is greater than frame_len, the frames will be zero-padded.
Returns: The fft spectrum. If frames is an num_frames x sample_per_frame matrix, output will be num_frames x FFT_LENGTH.
Return type: array
Power Spectrum¶
-
speechpy.processing.
power_spectrum
(frames, fft_points=512)[source]¶ Power spectrum of each frame.
Parameters: - frames (array) – The frame array in which each row is a frame.
- fft_points (int) – The length of FFT. If fft_length is greater than frame_len, the frames will be zero-padded.
Returns: The power spectrum. If frames is an num_frames x sample_per_frame matrix, output will be num_frames x fft_length.
Return type: array
Power Spectrum Log¶
Derivative Extraction¶
-
speechpy.processing.
derivative_extraction
(feat, DeltaWindows)[source]¶ This function the derivative features.
Parameters: - feat (array) – The main feature vector(For returning the second order derivative it can be first-order derivative).
- DeltaWindows (int) – The value of DeltaWindows is set using the configuration parameter DELTAWINDOW.
Returns: Derivative feature vector - A NUMFRAMESxNUMFEATURES numpy array which is the derivative features along the features.
Return type: array