In this letter, researchers propose a novel method for time-stretching audio signals while preserving their quality. The proposed approach, called Non-linear Mapping (NM), is based on the STN (Sound Transmission Network) decomposition, which separates audio signals into tonal content, impulsive events, and sound nuances. NM uses a soft spectral masking technique to decompose the audio signals into these components, allowing for perfect reconstruction.
The proposed method was evaluated through a blind listening test against several baseline methods, including a standard phase vocoder, a fuzzy phase vocoder, an enhanced version of the fuzzy phase vocoder with transient preservation, and a method that uses a neural synthesizer for stretching the noise component. The results showed that NM outperformed all the baseline methods in terms of quality, indicating its effectiveness in preserving the quality of time-stretched audio signals.
To demystify complex concepts, let’s consider an everyday analogy: imagine you have a recipe for your favorite dessert, but you want to speed up the cooking process without compromising the taste or texture. Just like how NM preserves the quality of time-stretched audio signals, this approach helps preserve the essence of the original signal while accelerating its processing.
In summary, the proposed method offers a novel approach for time-stretching audio signals while preserving their quality. By leveraging the STN decomposition and soft spectral masking technique, NM provides a robust solution for applications where accurate time-stretching is crucial, such as audio restoration or enhancement.
Audio and Speech Processing, Electrical Engineering and Systems Science