Step 0: Double check you have a better speech audio quality, if you can:
Instead of using the lossy (μlaw) coding, use full-bandwidth version of microphone data (if available) [1]
Step 1: Feature Extraction: Using i-Vector system
Step 2: Back-end Classification: Multi-session back-end [2]
Reference:
[1] Stolcke, Andreas, and Martin Graciarena Luciana Ferrer. "Effects of audio and ASR quality on cepstral and high-level speaker verification systems." Odyssey 2012-The Speaker and Language Recognition Workshop. 2012.[PDF]
[2] G Liu, T Hasan, H Boril, JHL Hansen, "An investigation on back-end for speaker recognition in multi-session enrollment",Proc. IEEE ICASSP2013, Vancouver, Canada, [PDF]
No comments:
Post a Comment