EARS map object

The EARS map objects are Matlab classes designed to store and visualise data for acoustic scene mapping. EARS map objects allow the storage of a) individual speakers at one time step using a mapFeature object, b) a collection of speakers at one time step using a map object, and c) a trajectory of the evolution of a map objects over time using a mapFeature object. The objects are designed to contain data from both sound source localisation (SSL) as well as speaker tracking algorithms to provide a complete representation of the acoustic scene.

The MATLAB code along with a documentation can be found here: https://github.com/cevers/ears_map_objects

Audio-Visual Synchronization with Modularity

banner_pageThis page contains all necessary instructions and files to implement the audio-visual synchronization demo presented by Aldebaran during the synchronization workshop held during the first EARS annual meeting in Erlangen on December 9, 2014 .

The demo synchronizes the results of a face detector with the results of a sound directional of arrival (DOA) estimator in order to identify a speaking person in NAO’s field of view (See image above).

The following files can be downloaded:

  • workshop_slides.pdf: Slides presented by Gregory Rump on the workshop’s day, introducing the Modularity framework, and the recent update allowing a better synchronization of data streams.

Phase-Optimized K-SVD : A New Sparse Representation for Multichannel Mixtures

This page provides supplementary audio and visual data for the ICASSP 2015 submission “Phase-Optimized K-SVD for Signal Extraction from Underdetermined Multichannel Sparse Mixtures” by Antoine Deleforge and Walter Kellermann (Manuscript available at http://arxiv.org/abs/1410.2430). This paper introduces a new sparse matrix factorization technique operating in the complex Fourier domain. Contrary to existing non-negative factorization method, the proposed approach is complex, multichannel, and estimates the instantaneous phase of all involved sound sources.

 

Download MATLAB code and usage examples for PO-KSVD:

The method is applied to the challenging problem of “egonoise” reduction, i.e., how to reduce the auditory noise produced by a robot performing motor actions such as hand waving or walking. All recordings were made with the commercial robot NAO V5 of Aldebaran robotics, in the audio lab of the LMS chair (Erlangen, Germany). The T60 reverberation time of the room was around 200ms. Although the recordings performed and used by the proposed method are 4-channel (left, right, front and rear microphones), this page provides stereo sounds corresponding to the left and the right microphones only, for a better listening experience.

Below are two videos of the robot NAO waving and walking. The soundtracks correspond to the sounds recorded at the left and the right microphones of the robot (best heard with headphones).

As can be heard, these signals are highly non-stationary and possess an intricate spatial distribution, making them challenging to model or extract from a mixture.

 

Multichannel Wiener pre-filtering

To reduce the noise produced by the CPU fan, the multichannel Wiener filtering technique described in [Löllmann et al. 2014] was used on all recordings, as illustrated below:

MWF_explained

Fan noise (training signal) :

 

Input noisy signal (waving right arm + speech + fan noise) :

 

Cleaned signal :

 

All spectrograms showed in this page correspond to the left microphone channel and use the following color code:

color_bar

The “waving” egonoise

Test mixtures were generated by summing up utterances from the GRID corpus and out-of-training “waving noise” recordings. The utterances were emitted by a loudspeaker placed 1 meter in front of the robot at null elevation, and recorded with the fan turned off. The waving noise was recorded with the fan turned on, and with NAO repeatedly waving its right arm. It was then pre-processed using the multichannel Wiener filtering method described in previous section.

waving_mixture

Clean speech:

 

“Waving” Noise:

 

Noisy input:

 

Below are the results obtained using the proposed PO-KSVD+mask and PO-KSVD methods, as compared to results obtained using conventional NMF [Yifeng and Ngom 2013] and conventional K-SVD [Aharon et al. 2006]. All methods were trained with a 1 minute recording of NAO repeatedly moving the arm.

waving_results

PO-KSVD + mask:

 

PO-KSVD:

 

NMF:

 

K-SVD:

 

The “walking” egonoise

Similarly test mixtures were generated by summing up utterances from the GRID corpus and out-of-training “walking noise” recordings. The utterances were emitted by a loudspeaker placed 1 meter in front of the robot at null elevation, and recorded with the fan turned off. The walking noise was recorded with the fan turned on, and with NAO walking on place. Again, it was pre-processed using multichannel Wiener filtering.

walking_mixtureClean speech:

 

“Walking” noise:

 

Noisy input:

 

Below are the results obtained using the proposed PO-KSVD+mask and PO-KSVD methods, as compared to results obtained using conventional NMF [Yifeng and Ngom 2013] and conventional K-SVD [Aharon et al. 2006]. All methods were trained with a 1 minute recording of NAO walking on place.

walking_results

PO-KSVD + mask:

 

PO-KSVD:

 

NMF:

 

K-SVD:

Data-base with head related impulse responses

This data-base contains a set of head-related impulse responses measured with the NAO robot (version 4) in the low-reverberation chamber of FAU.

The complete data-base including a comprehensive documentation can be download as zip-file.

Data-base with measured room impulse responses

This data-base contains different sets of room impulse responses measured with the NAO robot (version 4) at source-robot distances of 1m, 2m, and 4m. The measurements were conducted in the audio lab of FAU with a T60 of approximately 190ms and 600ms. The data-base has been created for the development and evaluation of algorithms for signal extraction, blind source separation, sound source localization, and automatic speech recognition.

The complete data-base including a comprehensive documentation can be download as zip-file.

Data-base for source localization

The data-base was created for the development and evaluation of algorithms for sound source localization with the NAO robot for real acoustic environments. It contains recordings with Nao head microphones for 1 and 2 loudspeakers emitting speech or white noise from 38 annotated positions and for two different room environments with T60=190ms and T60=510ms.

The complete data-base including a comprehensive documentation can be download as zip-file (two parts): Part 1, Part 2

 

Data-base for ego-noise reduction

The data-base was created for the development and evaluation of unsupervised ego-noise reduction algorithms for the Nao robot. Speech signals emitted by one or two loudspeakers were recorded with internal and external microphones, while the robot performs different predefined motor actions.
The complete data-base including a comprehensive documentation can be download here.