This page provides supplementary audio and visual data for the ICASSP 2015 submission “Phase-Optimized K-SVD for Signal Extraction from Underdetermined Multichannel Sparse Mixtures” by Antoine Deleforge and Walter Kellermann (Manuscript available at http://arxiv.org/abs/1410.2430). This paper introduces a new sparse matrix factorization technique operating in the complex Fourier domain. Contrary to existing non-negative factorization method, the proposed approach is complex, multichannel, and estimates the instantaneous phase of all involved sound sources.
MATLAB code and usage examples for PO-KSVD:
Copyright (c) 2014 Friedrich Alexander Universität
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
The method is applied to the challenging problem of “egonoise” reduction, i.e., how to reduce the auditory noise produced by a robot performing motor actions such as hand waving or walking. All recordings were made with the commercial robot NAO V5 of Aldebaran robotics, in the audio lab of the LMS chair (Erlangen, Germany). The T60 reverberation time of the room was around 200ms. Although the recordings performed and used by the proposed method are 4-channel (left, right, front and rear microphones), this page provides stereo sounds corresponding to the left and the right microphones only, for a better listening experience.
Below are two videos of the robot NAO waving and walking. The soundtracks correspond to the sounds recorded at the left and the right microphones of the robot (best heard with headphones).
As can be heard, these signals are highly non-stationary and possess an intricate spatial distribution, making them challenging to model or extract from a mixture.
Multichannel Wiener pre-filtering
To reduce the noise produced by the CPU fan, the multichannel Wiener filtering technique described in [Löllmann et al. 2014] was used on all recordings, as illustrated below:
Fan noise (training signal) :
Input noisy signal (waving right arm + speech + fan noise) :
Cleaned signal :
All spectrograms showed in this page correspond to the left microphone channel and use the following color code:
The “waving” egonoise
Test mixtures were generated by summing up utterances from the GRID corpus and out-of-training “waving noise” recordings. The utterances were emitted by a loudspeaker placed 1 meter in front of the robot at null elevation, and recorded with the fan turned off. The waving noise was recorded with the fan turned on, and with NAO repeatedly waving its right arm. It was then pre-processed using the multichannel Wiener filtering method described in previous section.
Below are the results obtained using the proposed PO-KSVD+mask and PO-KSVD methods, as compared to results obtained using conventional NMF [Yifeng and Ngom 2013] and conventional K-SVD [Aharon et al. 2006]. All methods were trained with a 1 minute recording of NAO repeatedly moving the arm.
PO-KSVD + mask:
The “walking” egonoise
Similarly test mixtures were generated by summing up utterances from the GRID corpus and out-of-training “walking noise” recordings. The utterances were emitted by a loudspeaker placed 1 meter in front of the robot at null elevation, and recorded with the fan turned off. The walking noise was recorded with the fan turned on, and with NAO walking on place. Again, it was pre-processed using multichannel Wiener filtering.
Below are the results obtained using the proposed PO-KSVD+mask and PO-KSVD methods, as compared to results obtained using conventional NMF [Yifeng and Ngom 2013] and conventional K-SVD [Aharon et al. 2006]. All methods were trained with a 1 minute recording of NAO walking on place.
PO-KSVD + mask: