Blog Archive

Tuesday, February 28, 2012

Bo Yin LID

Bo Yin
http://www.nicta.com.au/people/yinb

Researcher
ATP Research Laboratory

Experiences

Dr. Bo Yin has been extensively working on speech-based cognitive load monitoring, cognitive capability assessment, automatic spoken language identification, speech synthesis and distributed speech synthesis in past ten years.

Dr. Yin started research on cognitive load measurement in the Decision Support for Incident Management (DSIM) project during his Ph.D., and joined National ICT Australia in 2009 to continue his work. He has been involved in the commercialisation process since then.

Dr. Yin also co-founded now public listed speech technology company iFlytek Co. Ltd. and acted as the manager of TTS division, research director, director of embedded products and vice president in between 1999-2005.

Qualifications

Dr. Yin received his master and bachelor in electrical engineering from the University of Science and Technology of China, and his Ph.D. degree in electrical engineering from the University of New South Wales, Australia.

Research Interests

Dr. Yin's research interests include cognitive capability assessment, cognitive load monitoring, human computer interaction, audio classification, brain computer interface and their applications.

Affiliations

Dr. Yin has been appointed as a visiting fellow in the University of New South Wales, and also member of IEEE and IET.

Patents

  1. “The Method of Distributed Speech Synthesis”, 2002, No. 02116017.1, China;
  2. “Distributed Speech Synthesis System”, 2002, No. 02108890.X, China;
  3. “Data Exchanging in Speech Synthesis System”, 2002, No. 02148666.2, China;
  4. “In-vehicle Detachable Voice System”, 2003, No. ZL 2003 2 0120441 2, China;
  5. “In-vehicle Wireless Voice System”, 2003, No. ZL 2003 2 0120442 7, China;

Selected Publications

  1. B. Yin, E. Ambikairajah, and F. Chen, "Articulatory Feature based Duration Modelling for Language Identification," in Proc. of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Taipei, 2009.
  2. Yin, B., Ambikairajah, E. and Chen, F., "Introducing FM based Feature to Hierarchical Language Identification", Proc. the InterSpeech 2008, Brisbane, Australia.
    Yin, B., Chen, F., Ruiz, N. and Ambikairajah, E., "Exploring Classification Techniques in Speech based Cognitive Load Monitoring", Proc. the InterSpeech 2008, Brisbane, Australia.
  3. Yin, B., Ambikairajah, E. and Chen, F., "Language-dependent Contribution Measuring and Weighting for Combining Likelihood Scores in Language Identification Systems", the Journal of Signal Processing Systems, published by Springer.
  4. Yin, B., Chen, F., Ruiz, N. and Ambikairajah, E., “Speech-based Cognitive Load Monitoring System”, Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP’08), Las Vegas, March 2008.
  5. Yin, B., Ambikairajah, E. and Chen, F., “Improvements on Hierarchical Language Identification based-on Automatic Language Clustering”, Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP’08), Las Vegas, March 2008.
  6. Yin, B., Ruiz, N., Chen, F. and Khawaja, M. A., “Automatic Cognitive Load Detection from Speech Features”, Proc. Australasian Computer-Human Interaction Conference (OzCHI’07), Adelaide, Nov. 2007, pp. 249-255.
  7. Yin, B., Ambikairajah, E. and Chen, F., “A Novel Weighting Technique for Fusing Language Identification Systems based on Pair-wise Performances”, Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’07), Kyoto, Japan, Dec. 2007, pp. 408-412
  8. Yin, B., Ambikairajah, E. and Chen, F., “A Novel Weighting Technique for Combining Likelihood Scores in Language Identification Systems”, Proc. IEEE International Conference on Information, Communications and Signal Processing (ICICS) 2007, Singapore
    Yin, B. and Chen, F., “Towards Automatic Cognitive Load Measurement from Speech Analysis”, Human-Computer Interaction International 2007, Beijing, China.
  9. Yin, B., Ambikairajah, E. and Chen, F., “Hierarchical Language Identification based on Automatic Language Clustering”, InterSpeech 07 - EuroSpeech, Antwerp, Belgium
  10. Yin, B., Ambikairajah, E. and Chen, F., “Language-dependent Fusion in Language Identification”, the Eleventh Australasian International Conference on Speech Science and Technology (SST-2006), Auckland
  11. Yin, B., Ambikairajah, E. and Chen, F., “Combining Prosodic and Cepstral Features in Language Identification”, the IEEE International Conference on Pattern Recognition (ICPR-2006), Hong Kong
  12. Yin, B. and Wang, R. H., “A Hierarchic Processing Model In Chinese TTS”, ISCSLP-2000, paper 44, Beijing
  13. Tang, H., Yin, B., Wang, R. H., “Study on Distributed Speech Synthesis System”, ICASSP-2003, pI-732, Hong Kong
  14. Tang, H., Yin, B., Wang, R. H., "Design of embedded application oriented distributed speech synthesis system with high naturalness", In ISCSLP 2002, paper 76, Taipei
ContactDr. Bo Yin
Emailbo.yin@nicta.com.au

interview advice for Chinese fellow

http://www.mitbbs.com/article_t1/JobHunting/32016319_0_1.html



节前最后一天晚上,太过兴奋睡不着觉。说说最近面试招人的一点感想吧。

公司刚开始招人,我是负责电话面试的interviewer之一,同时参于面试的还有我的老
板。

这份工作技术性很强,来参加面试的都有PhD学位。至今为止面试了3位,两位中国人一
位中东人。

两位中国人的学术背景和工作经验都比中东人高出很多。但我的老板却很不喜欢。要不
是我拼命为其中还不算太差的那位说好话的话,老板就把他们通通第一轮面试刷下去了
。唉,真是可惜。

两个中国人都非常一致的弱点---

1. 英语!!!这两位的英语连我都听的非常非常的难受,更别提我老板了。想不通,
在美国生活了十年左右了,PhD也是美国拿的,还有点美国的工作经验,怎么英语会那
么那么的差。相反,那个中东人来美3年,没走出过校园,英语除了一点口音之外,简
直就是native speaker的水平。

---解决办法:多听英语,多说英语,多看美国电影电视,走出去和美国人交流。这是
个长期的过程,没什么捷径可走。短期抱佛脚的办法:拿着自己简历,把那上面的每个
经历都编成一个能说上20分钟的小故事,打个草稿下写来,照着背上20遍!把自我介绍
也写下来,背上30遍。自己面试自己,把自己想象成面试官,站在面试官角度上评价自
己的答案,并对自己提问。从工作学习经历到生活环境都要问,直到对自己问不出问题
为止。对着镜子练答案。做到了以上几点,自信心立马提升50倍,而且面试时绝对不会
再紧张了。

2. 不懂得推销自己!就拿今天面试这位来讲,每被问到一个问题,都是一句两句笼统
的回答概括,然后就冷场了。我作为面试官,还得绞尽脑汁从他的resume里和他的简要
回答里发觉他可能的亮点,针对他的亮点提问。但人家也是只能说上顶多三四句,就又
没话了。明明做了很多impressive的工作,但是说不出来。搞得我老板已经很不耐烦了
,是我不停的挖他的可能闪光点,才让面试继续下去。再看那个中东人,明明只做过一
个非常简单的工作,但是人家非常能说。说自己是第一个做这个工作的人,之前都没人
做过的。乖乖,哪个PhD的research不是别人没做过的?重复别人工作的能PhD毕业吗?
但是你不说,工业界的那些老板都是本科毕业,他们能知道吗?

---解决办法:拿着自己简历,把那上面的每个经历都编成一个能说上20分钟的小故事
,打个草稿下写来,照着背上20遍!把自我介绍也写下来,背上30遍。自己面试自己,
把自己想象成面试官,站在面试官角度上评价自己的答案,并对自己提问。从工作学习
经历到生活环境都要问,直到对自己问不出问题为止。对着镜子练答案。

3. 不积极热情!这个是文化问题,我们中国文化讲究中庸内敛。但这在美国非常非常
的行!不!通!明明这两个中国人一个被layoff了,另一个即将被layoff,本来应该非
常积极的争取这个工作,但是那话说出来简直太平淡了。就一句:I believe I'm a 
good fit for this job. 然后就没了。没了。没了。。。。知道人家中东人怎么说的
么?人家说:I know I'm perfect for this job. And your company is exactly 
where I want to build my career at. 然后把我们公司的主要项目总结说了出来,重
点说了对哪些项目感兴趣并且是他擅长的,让我们觉得他真的就是做了很多功课非常想
来我们公司。


---解决办法:做足功课!查公司网站,查面试官资料,这些都非常容易找到的。想象
自己是面试官,怎样说服面试官让他相信你能做好工作给公司带来最大利益。

就先说到这儿吧。觉得大家技术方面应该都没什么问题,而且都能甩其他中东人印度人
几条街出去的,但是就是不会表达。那什么“酒香不怕巷子深”的鬼话千万别信。面试
就是销售自己的一个过程,你有10分只能说1分,那还不如别人有1分能说出2分的。最
后,希望那个被我拼命保下来的大哥能够看到这个帖子,好好准备onsite面试,最好能
volunteer一两个presentation来给之前的表现挽回点印象分。

Sunday, February 26, 2012

MFCC flowchart

http://www.stanford.edu/class/cs224s/lec/224s.09.lec9.pdf

power spectrum and magnitude spectrum


2APR/09
3

I often get confused about the scaling of magnitude and power spectra. I think I've worked it out so I'm puttin it here for my own benefit, and maybe the benefit of someone on the internet. If I'm wrong,please tell me.
Let's take a unit-amplitude sinusoid at 100 Hz, and do a Fourier transform. If we look at the amplitude we'd get two components at 100 Hz and -100 Hz, each with amplitude 1/2. Something like this.
Now, if we do the same with a DFT we will get basically the same thing, except we'll see the effects of discretization of course. But depending on which variation of the DFT you're using, the magnitude of the components will be either about 1/2 or about N/2. The difference is, of course, the 1/N factor that you included or left out. In the case of the fft function in Octave (and I assume MATLAB), it's without the 1/N factor.
max(abs(fft(x,1024)))
ans = 443.23
The power spectrum is the magnitude spectrum squared. abs(fft(x)).^2 or(abs(fft(x))/N).^2. Why would you want to square it? Well there are of course many reasons but the nature of audio signals makes most of the ones I know about moot. Furthermore the relative shape of the power spectrum and the magnitude spectrum is the same.
Try this. Take an audio signal (something more interesting than a single sinusoid) and plot the magnitude spectrum in dB, then plot the power spectrum also in dB, e.g.
plot(10*log10(abs(fft(x))))
figure
plot(10*log10(abs(fft(x)).^2))

They have a different scale, but they have the same shape. If you were doing something like peak-picking, it wouldn't matter which you used if you're working in dB.

Friday, February 10, 2012