A. El-Rayyes, H. Löllmann, C. Hofmann, and W. Kellermann (FAU Erlangen-Nuremberg)
Abstract: For a natural human-robot communication, a reliable automated speech recognition system (ASR) is essential. This requires in turn a system for acoustic echo control (AEC) such that the robot can speak and listen, i.e., record and emit speech signals, at the same time (’barge-in’). This contribution investigates the specific problems and possible solutions to perform AEC for humanoid robots. Special attention is paid to possible nonlinear transmission characteristics and a low near-end to far-end ratio (NFR). In addition, robot ego-noise impairs the performance of the AEC system. Various approaches to tackle these problems will be presented, where the combination with time-varying spatial filtering on therobot’s microphone array will also be accounted for. Evaluation is performed in terms of echo return loss enhancement (ERLE) as well as word error rate (WER) of an ASR system.
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.