TIDIGITS is a well-known dataset that contains speech from speakers from four categories: man, woman, boy, and girl, each speaking 77 isolated and continuous digit sequences; more detailed information can be found here.
The process that happens between obtaining the disc that contained the TIDIGIT set, and actually having the waveform read into MATLAB (or even played with a common media player such as iTunes or QuickTimePlayer) was not trivial. Here are the steps I took:
Install Sphere
Download Sphere from: http://www.itl.nist.gov/iad/mig/tools/. The version I used was sphere-2.6a.
zcat sphere_2.6a.tar.Z | tar xvf -
cd nist
and save the content of the following page:
http://www.paulaoki.com/software/sphere.patch.amd64
and saved as:
sphere.patch.amd64
and run:
patch -p0 < sphere.patch.amd64
After applying the above patch, you can run:
sh src/scripts/install.sh
(This is already tested in my Ubuntu 10.04 x86_64 system )
Now the install script should run without error. Each
.WAV
file on the TIDIGIT cd may now be decoded by running: w_decode -o pcm_01 INFILE.WAV OUTFILE.WAV
Batch Uncompress
I used this bash script to extract out only the audio files that contain isolated digits from within the TIDIGIT file heirarchy for a specific type of speaker. Each of these audio files are uncompressed, renamed (with an optional prefix), and saved into a new directory.
Resample
At this present state, these uncompressed audio files should be playable by software like Audacity. However, because it has an unconventional sampling rate of 20000 Hz, it is not playable by many other software such as iTunes or QuickTimePlayer, and most importantly, by MATLAB’s
wavread
function.
To fix this, I passed each audio file through a resampling utility and resampled from 20000 Hz to the more conventional 44100 Hz. I used software called
resample
, available at http://www-ccrma.stanford.edu/~jos/resample/Available_Software.html. I had trouble installing this on my Mac, and finally gave up and ran it on a Linux machine. I used this bash script for batch processing.
Some Voodoo Programming
For some reason, the resulting audio files, now with a sampling rate of 44100 Hz, still was not readable by MATLAB’s
wavread
function. However, I found that if I opened each file in Audacity, and simply exported it as a WAV file, it would finally work. I have no idea why.
The new beta version of Audacity 1.3 now comes with a “batch command” feature. Exporting to WAV is one of the possible batch commands, which I made full use of for this particular situation.
You may ask why I did not simply resample using Audacity, since it does have this feature. For me, there were two problems: 1) The resampling function in Audacity is kooky. There are two different ways to resample; the first method distorts the sound of the audio file, and the second method tacks on an extra five or so seconds of silence, and 2) Resampling is not available as a batch command in Audacity 1.3.
After all this, I was finally able to call MATLAB’s
wavread
function on each audio file.
*The TIDIGIT readme suggested trying the Shorten Compression Utility first, but I ended up with nothing but gibberish, most likely due to a little/big endian issue.
No comments:
Post a Comment