Bridging the gap between complex scientific research and the curious minds eager to explore it.

Computer Science, Sound

Enhancing Noisy Speech via Asymmetric Tucker Decomposition

Enhancing Noisy Speech via Asymmetric Tucker Decomposition

In this article, we propose a novel approach to self-supervised representation learning called DAS-based DNN, which leverages the channel-dependent sensitivity of Deep Adaptive Sensors (DAS) to extract common features from multiple inputs. The proposed method is designed to address the limitations of traditional self-supervised learning methods that rely solely on spatial or temporal domain priors.
We introduce a framework that combines SimSiam [19], which enables the extraction of common features from multiple inputs, with DAS priors [20]. By incorporating the channel-dependent sensitivity of DAS, we can avoid the trivial solution and extract meaningful features in a self-supervised manner. The proposed method minimizes a function that combines the reconstruction error between the input and reconstructed signals, along with a rank constraint to ensure the low-rank property of the resulting representation.

Our main contributions can be summarized as follows

  • We propose a novel approach to self-supervised representation learning called DAS-based DNN, which leverages the channel-dependent sensitivity of DAS to extract common features from multiple inputs.
  • We introduce a framework that combines SimSiam with DAS priors, enabling the extraction of meaningful features in a self-supervised manner.
  • We propose a new loss function that combines reconstruction error and rank constraint, which enables the extraction of low-rank representations that preserve the important information from multiple inputs.
    The proposed method has several advantages over traditional self-supervised learning methods, including:
  • It can handle complex signals with multiple modalities, such as speech corrupted by noise [18], or images with different levels of noise [19].
  • It does not require any additional annotations or labels, making it a cost-effective and efficient approach.
  • It can be applied to various applications, including image and speech recognition, anomaly detection, and medical imaging.
    In summary, the proposed DAS-based DNN is a novel approach to self-supervised representation learning that leverages the channel-dependent sensitivity of DAS to extract common features from multiple inputs in a self-supervised manner. The proposed method has several advantages over traditional methods, including its ability to handle complex signals and applications, as well as its cost-effectiveness and efficiency.