Background:
WARNING (select-voiced-frames[5.2.183~1-32310]:main():select-voiced-frames.cc:75) Mismatch in number for frames 307 for features and VAD 305, for utterance file1.wav
LOG (select-voiced-frames[5.2.183~1-32310]:main():select-voiced-frames.cc:105) Done selecting voiced frames; processed 0 utterances, 1 had errors.
Solution:
--snip-edges=true # conf/mfcc.conf
Everyone,
Whenever we start a new recipe (e.g. using new data, or a new version of an existing recipe, like s5c->s5d), let's make it a practice to add the config variable
--snip-edges=false
in all feature-extraction config files in conf/, such as mfcc.conf and mfcc_hires.conf.
This will ensure that the number of frames is related to the length of the file in a consistent and obvious way.
The original Kaldi feature-extraction code aimed to be compatible with HTK, which truncates two or three frames so each frame fits entirely in the file; but this is a hassle and has led to all kinds of inconsistency regarding the meaning of 'segments' files. It's been hard to switch away from that, because if you change the config any existing recipe, the alignments would no longer match the existing extracted features if people re-ran the mfcc feature extraction.
I think the only way is to make the change is to use --snip-edges=false whenever we write recipes from scratch.
Dan
ref:
https://groups.google.com/forum/#!topic/kaldi-developers/NhdT6nc_Ldc
https://sourceforge.net/p/kaldi/discussion/1355348/thread/7a745cac/?limit=25
IIRC there's an options called --snip-edges or something like that. If you set that to false, the number of frames only depend on the frame shift (by reflecting the data at the end of the audio file).