Examples of use

 SMARD is useful for evaluating various signal processing algorithms for audio and speech processing such as:

  • source localisation,
  • room geometry estimation,
  • distributed array processing
  • source separation,
  • noise reduction,
  • dereverberation.

Below, we present some examples of using SMARD.

If you have been using our data, and have other examples of use that you are willing to share in this section like the ones below, feel free to contact us.

Configuration XX21

Configuration XX21

Localisation

The use of SMARD was exemplified by applying two localisation algorithms on excerpts of it. The evaluated algorithms are:

  • the steered response with phase transform (SRP-PHAT) method [1], and
  • a near-field, maximum likelihood (ML) method [2].

Since the ML method assumes that the desired signal is quasi-periodic, the methods were applied on the synthetic harmonic signals and the violin signals. More specifically, a single segment of 100 samples was used from each microphone in different configurations. The segment from the harmonic signals was taken from the last part of the signal where the pitch is 500 Hz, while the segment from the violin signal was taken from the first part.

The pitch of the signals needed in the ML method, is estimated using a recently proposed pitch estimator [3], with the model order assumed known. Further details about the simulation setup, can be found in the code used for generating the below results.

Cost Functions

In the figure below, we have depicted an example of the cost-functions of the SRP-PHAT [1] and ML [2] methods when applied on configuration 2000:

 

 

To generate these plots of the cost functions versus two coordinates at a time, the last coordinate was fixed to the value estimated by that method. From the plots, it appears clearly that the cost functions peak relatively close to the true source position.

Further Results

The methods were also evaluated on both the synthetic harmonic signal (just denoted as synthetic in the remaining text) and the violin signal in other configurations. The results from these evaluations are summarized in the below tables, where φ denotes azimuth, ψ denotes elevation, and rc denotes range..

Results obtained with speaker 0 (configuration 00XX): 

 

Results obtained with speaker 1 (configuration 10XX): 

 

Results obtained with speaker 2 (configuration 20XX):

 

In general, most angle estimates (azimuth and elevation) are close to the true angles, except for a few cases where the methods finds a reflection from the reflective wooden floor (e.g., configurations 1001, 2001, and 2003 for the synthetic signal, and configuration 1003 for the violin signal. The range estimates are more inaccurate, but the speaker is also relatively far away from the arrays in all these configurations. These results clearly demonstrates the potential of applying SMARD for evaluation of localisation methods.

Downloads

The code for evaluating the SRP-PHAT and ML methods on the SMAR data and producing the above results can be obtained here. Below, some details about running the code are found:

  • To produce the localisation results for generating the cost function plots and the tables, the masterLocalizationTest.m script needs to be run. Note that before you can run this file, you need to download the data for configurations X00X, and X02X, and put them in the 'data' folder. In the current state, the master file is also only compatible with these data, and needs to be extended if you want to conduct localisation on other configurations.
  • To produce the cost functions above and in the IWAENC2014 paper, you should run masterLocalizationTestPrintFigures.m.
  • To produce the tables above and in the IWAENC2014 paper, you should run masterLocalizationTestPrintTables.m.

REFERENCES

  • [1]: J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, “Robust localization in reverberant rooms,” in Microphone Arrays - Signal Processing Techniques and Applications, M. S. Brandstein and D. B. Ward, Eds. Springer-Verlag, 2001, ch. 8, pp. 157–180.
  • [2]: J. R. Jensen, and M. G. Christensen, “Near-field Localization of Audio: A Maximum Likelihood Approach,” in Proc. European Signal Process. Conf., Sep. 2014.
  • [3]: J. K. Nielsen, M. G. Christensen, and S. H. Jensen, "Default Bayesian Estimation of the Fundamental Frequency," in IEEE Trans. on Audio, Speech, and Language Process., vol. 21, no. 3, Mar. 2013.