Multi- and moving-talker automatic speech recognition (ASR) with distributed microphone networks

Distributed microphones gather some useful and some degrading sound for Automatic Speech Recognition (ASR). In any given room, the number and location of relevant talkers is unknown and changes over time. Existing ASR strategies are not robust to these factors. This project will jointly consider adaptive microphone subset selection and multi-channel processing for speech detection, talker diarization and multi-talker ASR for stationary and moving, single and multiple talkers. An important and specific requirement for ASR robustness is to estimate confidence in the input data. Confidence measures for the acoustic parameter estimates will therefore be researched so as to develop a probabilistic framework for acoustic metadata that can be employed to provide high ASR robustness.

Research Team

Early Stage Researcher: Francesco Nespoli
Host Institution: Nuance Communications Inc.
Supervisors: Daniel Barreda, Patrick Naylor, Jörg Bitzer

Research Progress

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 956369.