A crucial problem facing hearing impaired listeners is the difficulty of understanding speech in the presence of nearby competing sound sources. In this project, we exploit a distributed microphone network that is either embedded in the room infrastructure or is formed from microphones in low-cost personal mobile devices such as smartphones. The project aims firstly to research novel methods for estimating a monaural time-frequency mask that characterises the desired speech using a distributed microphone network. Secondly, by developing appropriate binaural signal processing, this mask will be used to enhance the noisy signal in the listener’s binaural hearing aids while preserving the spatial cues contained in the binaural signal.
Early Stage Researcher: Vikas Tokala
Host Institution: Imperial College
Supervisors: Patrick Naylor, Mike Brookes, Jesper Jensen, Simon Doclo
Research Progress
STOI-optimal masking has been previously proposed and developed for single-channel speech enhancement. In this paper, we consider the extension to the task of binaural speech enhancement in which spatial information is known to be important to speech understanding and therefore should be preserved by the enhancement processing. Masks are estimated for each of the binaural channels individually and a `better-ear listening' mask is computed by choosing the maximum of the two masks. The estimated mask is used to supply probability information about the speech presence in each time-frequency bin to an Optimally-modified Log Spectral Amplitude (OM-LSA) enhancer. We show that using the proposed method for binaural signals with a directional noise not only improves the SNR of the noisy signal but also preserves the binaural cues and intelligibility. This paper has been presented at the 17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022).
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 956369.