TEA-PSE 2.0: SUB-BAND NETWORK FOR REAL-TIME PERSONALIZED SPEECH ENHANCEMENT

Yukai Ju1,2, Shimin Zhang1, Wei Rao2, Yannan Wang2, Tao Yu2, Lei Xie1, Shidong Shang2
1Audio, Speech and Language Processing Group (ASLP@NPU), Northwestern Polytechnical University, Xi'an, China
2Tencent Ethereal Audio Lab, Tencent Corporation, Shenzhen, China

Paper PDF

  1. TEA-PSE 2.0

0. Contents

  1. Abstract
  2. DNS blind test set (with speaker interference)
  3. DNS blind test set (without speaker interference)


1. Abstract

Personalized speech enhancement (PSE) utilizes additional cues like speaker embeddings to remove background noise and interfering speech and extract the speech from target speaker. Previous work, the Tencent-Ethereal-Audio-Lab personalized speech enhancement (TEA-PSE) system, ranked 1st in the ICASSP 2022 deep noise suppression (DNS2022) challenge. In this paper, we expand TEA-PSE to its sub-band version – TEA-PSE 2.0, to reduce computational complexity as well as further improve performance. Specifically, we adopt finite impulse response filter banks and spectrum splitting to reduce computational complexity. We introduce a time frequency convolution module (TFCM) to the system for increasing the receptive field with small convolution kernels. Besides, we explore several training strategies to optimize the two-stage network and investigate various loss functions in the PSE task. TEA-PSE 2.0 significantly outperforms TEA-PSE in both speech enhancement performance and computation complexity. Experimental results on the DNS2022 blind test set show that TEA-PSE 2.0 brings 0.102 OVRL personalized DNSMOS improvement with only 21.9% multiply-accumulate operations compared with the previous TEA-PSE.



2. DNS blind test set (with speaker interference)

Models Sample 1 Sample 2 Sample 3 Sample 4
Noisy
TEA-PSE
Full TEA-PSE 2.0
Sub TEA-PSE 2.0


3. DNS blind test set (without speaker interference)

Models Sample 1 Sample 2 Sample 3 Sample 4
Noisy
TEA-PSE
Full TEA-PSE 2.0
Sub TEA-PSE 2.0