Measured HRIRs for Prototype Head

Head Related Impulse Reponses (HRIRs) have been measured for the (Benchmark II) prototype head for the NAO robot. This prototype head was developed within the EARS project as part of Deliverable D5.3. The head contains 12 microphones in a pseudo-spherical arrangement whose positions have been determined as part of Deliverable D1.2. The head used for the HRIR measurements is not the same but manufactured to the same specifications as the robot head used for the IEEE-AASP Challenge on Acoustic Source Localization and Tracking (LOCATA) . A mat-file with the measured HRIRs and a corresponding documentation (pdf-file) are provided by this zip-archive.
 
 

EARS map object

The EARS map objects are Matlab classes designed to store and visualise data for acoustic scene mapping. EARS map objects allow the storage of a) individual speakers at one time step using a mapFeature object, b) a collection of speakers at one time step using a map object, and c) a trajectory of the evolution of a map objects over time using a mapFeature object. The objects are designed to contain data from both sound source localisation (SSL) as well as speaker tracking algorithms to provide a complete representation of the acoustic scene.
The MATLAB code along with a documentation can be found here: https://github.com/cevers/ears_map_objects

Phase-Optimized K-SVD : A New Sparse Representation for Multichannel Mixtures

This page provides supplementary audio and visual data for the ICASSP 2015 submission “Phase-Optimized K-SVD for Signal Extraction from Underdetermined Multichannel Sparse Mixtures” by Antoine Deleforge and Walter Kellermann (Manuscript available at http://arxiv.org/abs/1410.2430). This paper introduces a new sparse matrix factorization technique operating in the complex Fourier domain. Contrary to existing non-negative factorization method, the proposed approach is complex, multichannel, and estimates the instantaneous phase of all involved sound sources.


MATLAB code and usage examples for PO-KSVD:

Copyright (c) 2014 Friedrich Alexander Universität
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


The method is applied to the challenging problem of “egonoise” reduction, i.e., how to reduce the auditory noise produced by a robot performing motor actions such as hand waving or walking. All recordings were made with the commercial robot NAO V5 of Aldebaran robotics, in the audio lab of the LMS chair (Erlangen, Germany). The T60 reverberation time of the room was around 200ms. Although the recordings performed and used by the proposed method are 4-channel (left, right, front and rear microphones), this page provides stereo sounds corresponding to the left and the right microphones only, for a better listening experience.
Below are two videos of the robot NAO waving and walking. The soundtracks correspond to the sounds recorded at the left and the right microphones of the robot (best heard with headphones).

As can be heard, these signals are highly non-stationary and possess an intricate spatial distribution, making them challenging to model or extract from a mixture.


Multichannel Wiener pre-filtering

To reduce the noise produced by the CPU fan, the multichannel Wiener filtering technique described in [Löllmann et al. 2014] was used on all recordings, as illustrated below:
MWF_explained
Fan noise (training signal) :

Input noisy signal (waving right arm + speech + fan noise) :

Cleaned signal :

All spectrograms showed in this page correspond to the left microphone channel and use the following color code:
color_bar


The “waving” egonoise

Test mixtures were generated by summing up utterances from the GRID corpus and out-of-training “waving noise” recordings. The utterances were emitted by a loudspeaker placed 1 meter in front of the robot at null elevation, and recorded with the fan turned off. The waving noise was recorded with the fan turned on, and with NAO repeatedly waving its right arm. It was then pre-processed using the multichannel Wiener filtering method described in previous section.
waving_mixture
Clean speech:

“Waving” Noise:

Noisy input:

Below are the results obtained using the proposed PO-KSVD+mask and PO-KSVD methods, as compared to results obtained using conventional NMF [Yifeng and Ngom 2013] and conventional K-SVD [Aharon et al. 2006]. All methods were trained with a 1 minute recording of NAO repeatedly moving the arm.
waving_results
PO-KSVD + mask:

PO-KSVD:

NMF:

K-SVD:


The “walking” egonoise

Similarly test mixtures were generated by summing up utterances from the GRID corpus and out-of-training “walking noise” recordings. The utterances were emitted by a loudspeaker placed 1 meter in front of the robot at null elevation, and recorded with the fan turned off. The walking noise was recorded with the fan turned on, and with NAO walking on place. Again, it was pre-processed using multichannel Wiener filtering.
walking_mixtureClean speech:

“Walking” noise:

Noisy input:

Below are the results obtained using the proposed PO-KSVD+mask and PO-KSVD methods, as compared to results obtained using conventional NMF [Yifeng and Ngom 2013] and conventional K-SVD [Aharon et al. 2006]. All methods were trained with a 1 minute recording of NAO walking on place.
walking_results
PO-KSVD + mask:

PO-KSVD:

NMF:

K-SVD:

Multi-channel Wiener filter for fan noise reduction

MATLAB implementation of a distortion weighted multi-channel Wiener filter, which is designed for reducing the fan ego-noise recorded by the head microphones of the NAO robot: MWF.zip
Copyright (c) 2014 Friedrich Alexander Universität
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.