Description of SMARD

SMARD is a database containing approximately 18 GB of 24 bit audio recordings at a sampling frequency of 48 kHz. Below, a description of the database is given.

The Room

The recordings have all been made in a 60 m2 multichannel listening room at Aalborg University, and a sketch of this room is given in Fig. 1. We removed the carpet on the floor during the measurement campaign. The room is box-shaped, symmetrical, and has been measured using the Brüel & Kjær Type 2270 to have a reverberation time of approximately 0.15 seconds.

Equipment

Recordings were made for various combinations of different loudspeakers and microphone arrays. As detailed in the equipment section, three loudspeakers were used. The Brüel & Kjær OmniPower 4296 and the Brüel & Kjær OmniSource 4295 are both approximately omnidirectional loudspeakers within a limited frequency range. The OmniPower 4296 loudspeaker can emit more sound power than the OmniSource 4295, but can only be considered omnidirectional over a narrower frequency range. The directional loudspeaker is a conventional 3'' speaker in a wooden cabinet.

Up to 22 G.R.A.S. microphones were used in various array configurations. The simplest array was just a single microphone. The other array types were uniform linear arrays (ULAs), uniform circular arrays (UCAs), and an orthogonal array. Finally, a dummy microphone, which is simply a capacitor mounted on the microphone pre-amplifier, was also present in all recordings. The recordings from the dummy microphone can be used to inspect electrical noise, cross-talk, etc. Except for the loudspeakers, the microphones, and the arrays, all measurement equipment was situated in a control room adjacent to the multichannel listening room. A list and pictures of this equipment can be found in the equipment section.

Measurements Configurations

Measurements were made for a total of 48 different configurations. Each configuration is enumerated by a four digit number of the form ABCD where the first and most significant digit A denotes the type of loudspeaker; the second digit B denotes the position and orientation of the loudspeaker; the third digit C denotes the type(s) of microphone arrays; and the least significant digit D denotes the position and orientations of these arrays. Table 1 summarises these configurations and depictions of all the configurations can be found in the download section

Table 1: The 48 measurement configurations.

Loudspeaker type

 0XXX OmniPower 4296
 1XXX OmniSource 4295
 2XXX Directional loudspeaker

Loudspeaker position and orientation 

 X0XX Placed at (2.00, 6.50, 1.25), Angle of -90o in XY-plane
 X1XX Placed at (3.50, 4.50, 1.50), Angle of -45o in XY-plane

Array types

 XX0X  Orthogonal array, single microphone, and dummy microphone
 XX1X Three ULAs and dummy microphone
 XX2X Two UCAs, one ULA, and dummy microphone

Array positions and orientations 

 XX00 See the sketches in the download section.
 XX01  See the sketches in the download section.
 XX02  See the sketches in the download section.
 XX03 See the sketches in the download section.
 XX10  See the sketches in the download section.
 XX11 See the sketches in the download section.
 XX20  See the sketches in the download section.
 XX21  See the sketches in the download section.

 

Audio Segments

For each of the 48 configurations, a total of 20 audio segments were played and recorded. As listed in Table 2, seven artificial signals, six speech/vocal signals, and seven musical signals were used. The artificial signals were all created in MATLAB and a description of them can be downloaded here. The speech and musical signals consist of both reverberant and anechoic signals. The signals from the EBU SQAM CD are reverberant signals whereas the signals from the TSP speech database and the musical instrument samples (MIS) database are anechoic signals. As it can be seen by following the above links, all of these databases are freely available online for research usage.

Table 2: The 20 audio segments.

Artificial sounds

Five seconds of silence
2 Exponential sine sweep from 10 Hz to 24 kHz
Harmonic signals with increasing fundamental frequency in steps
Eight repetitions of a 16th order MLS sequence
Pink noise
6 Single sinusoidal tone with increasing frequency in steps
White Gaussian noise

Speech/vocal signals

Soprano vocal from the EBU SQAM CD
Quartet vocal from the EBU SQAM CD
10  Male voice from the EBU SQAM CD
11  Child's voice from the TSP speech database
12  Female voice from the TSP speech database
13  Male voice from the TSP speech database

Musical signals

14 Clarinet from the EBU SQAM CD
15  Trumpet from the EBU SQAM CD
16  Xylophone from the EBU SQAM CD
17  Abba excerpt from the EBU SQAM CD
18  Bass flute from the MIS database
19  Guitar from the MIS database
20  Violin from the MIS database

 

For every configuration, the temperature inside the multichannel listening room was measured and stored before these 20 audio segments were played. For each of the audio segments, all of the microphone recordings and a loopback of the loudspeaker signal were stored in the database. A pause of two seconds was added between the segments to ensure that the sound field within the room was approximately stationary before the next segment was played. The first audio segment was just five seconds of silence. The recordings made with this input signal can be used to inspect the stationary acoustical background noise.

Fig. 1: Sketch of the multichannel listening room