D3.2: Augmentation of robot audition by visual information

This deliverable reports on the audio-visual calibration process of multimodal data (including software) covering T1.5, and describes the methodology for the extraction of 3D descriptors based on visual cues (addressing T3.1, including software). Moreover it summarises the results of T3.2 and T3.3 by describing the developed methods for audio-visual event localisation and classification (including software).

This Deliverable is due in M30 and has been submitted to the EC on December 31, 2016.

Report: EARS_Report_on_D3-2_20161231_v1_RH