Blog Archive

Sunday, October 30, 2016

Nuance Acquires Agnitio




Nuance Communications has acquired long-time Voice Biometrics specialist Agnitio, making the Spanish company’s personnel, intellectual property and global customer base part of the Security Group of the Nuance Enterprise Division. The move marks another milestone in the metamorphosis of voice  from a discrete biometric factor for caller verification into a flexible component in a family of technologies that support fraud prevention and simple, secure, conversational commerce.
For Agnitio, it represents the second shoe to drop after it sold its line of embedded voice authentication technologies to Cirrus Logic in September 2015. At that time the company focused on refining its server-based solutions with special attention to government segments that grow out of forensic applications as well as fraud prevention. Its core technologies will complement Nuance Enterprise’s offerings that integrate voice-based authentication with its branded virtual assistant, Nina, on IVRs, Web sites and mobile applications.
Nuance expects to see immediate benefits from ‘acquihiring’ Agnitio’s R&D and product development personnel, as well as sales resources that have strengths in Spanish-speaking countries. Broadening the combined presence in government markets and citizens’ services will also be a benefit.
During its tenure as an independent company, Agnitio focused primarily on establishing OEM relationships and indirect sales. It has been integrated into fraud detection and prevention products from the likes of Verint and Pindrop Security. It is in these domains that Nuance will find immediate expansion of its technology footprint.

Wednesday, October 26, 2016

Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million Challenge | Techdirt

Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million Challenge | Techdirt:

from the times-change dept

You probably recall all the excitement that went around when a group finally won the big Netflix $1 million prize in 2009, improving Netflix's recommendation algorithm by 10%. But what you might not know, is that Netflix never implemented that solution itself. Netflix recently put up a blog post discussing some of the details of its recommendation system, which (as an aside) explains why the winning entry never was used. First, they note that they did make use of an earlier bit of code that came out of the contest:
A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. And, they gave us the source code. We looked at the two underlying algorithms with the best performance in the ensemble: Matrix Factorization (which the community generally called SVD, Singular Value Decomposition) and Restricted Boltzmann Machines (RBM). SVD by itself provided a 0.8914 RMSE (root mean squared error), while RBM alone provided a competitive but slightly worse 0.8990 RMSE. A linear blend of these two reduced the error to 0.88. To put these algorithms to use, we had to work to overcome some limitations, for instance that they were built to handle 100 million ratings, instead of the more than 5 billion that we have, and that they were not built to adapt as members added more ratings. But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine.
Neat. But the winning prize? Eh... just not worth it:
We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.
It wasn't just that the improvement was marginal, but that Netflix's business had shifted and the way customers used its product, and the kinds of recommendations the company had done, had shifted too. Suddenly, the prize winning solution just wasn't that useful -- in part because many people were streaming videos rather than renting DVDs -- and it turns out that the recommendation for streaming videos is different than for rental viewing a few days later.
One of the reasons our focus in the recommendation algorithms has changed is because Netflix as a whole has changed dramatically in the last few years. Netflix launched an instant streaming service in 2007, one year after the Netflix Prize began. Streaming has not only changed the way our members interact with the service, but also the type of data available to use in our algorithms. For DVDs our goal is to help people fill their queue with titles to receive in the mail over the coming days and weeks; selection is distant in time from viewing, people select carefully because exchanging a DVD for another takes more than a day, and we get no feedback during viewing. For streaming members are looking for something great to watch right now; they can sample a few videos before settling on one, they can consume several in one session, and we can observe viewing statistics such as whether a video was watched fully or only partially.
The viewing data obviously makes a huge difference, but I also find it interesting that there's a clear distinction in the kinds of recommendations people that work if people are going to "watch now" vs. "watch in the future." I think this is an issue that Netflix probably has faced on the DVD side for years: when people rent a movie that won't arrive for a few days, they're making a bet on what they want at some future point. And, people tend to have a more... optimistic viewpoint of their future selves. That is, they may be willing to rent, say, an "artsy" movie that won't show up for a few days, feeling that they'll be in the mood to watch it a few days (weeks?) in the future, knowing they're not in the mood immediately. But when the choice is immediate, they deal with their present selves, and that choice can be quite different. It would be great if Netflix revealed a bit more about those differences, but it is already interesting to see that the shift from delayed gratification to instant gratification clearly makes a difference in the kinds of recommendations that work for people.


'via Blog this'

Friday, October 14, 2016

sox cannot play files with a-law encoding

sox cannot play files with a-law encoding - Stack Overflow:



Question:

sox cannot play files with a-law encoding



I am in linux mint 14 and trying to play a .sph file using sox with play foo.sph and got the following error: play FAIL formats: can't open input file 'foo.sph': sph: unsupported coding 'alaw'



Doesn't sox support alaw encoding? What can I do to play this file? Note that it can successfully play ulaw. Thanks!



Solution:

Here is the relevant SoX source code (from sox-14.4.2/src/sphere.c, starting at line 79):
(the source code can be downloaded from: https://sourceforge.net/projects/sox/files/sox/14.4.2/sox-14.4.2.tar.gz/download)
if (!strcasecmp(fldsval, "ulaw") || !strcasecmp(fldsval, "mu-law"))
  encoding = SOX_ENCODING_ULAW;
else if (!strcasecmp(fldsval, "pcm"))
  encoding = SOX_ENCODING_SIGN2;
else {
  lsx_fail_errno(ft, SOX_EFMT, "sph: unsupported coding `%s'", fldsval);
  /* ... */
}
As you can see, the format handler only knows about µ-law and PCM encodings, nothing else. As you say, SoX does have decoding routines for A-law; therefore, it would suffice to add these lines:
else if (!strcasecmp(fldsval, "alaw"))
  encoding = SOX_ENCODING_ALAW;
Obviously, this is only going to help you if you can compile SoX yourself from source with this addition.

A probably simpler way is to use the libsndfile driver, which is supposed to support A-law encoding in Sphere files: play -t sndfile foo.sph


Reference:

http://stackoverflow.com/questions/18169817/sox-cannot-play-files-with-a-law-encoding

'via Blog this'

Saturday, October 8, 2016

国内关于湾区马公,马母收入的传闻 - Chats&&华人闲话 - Chinese In North America(北美华人e网) 北美华人e网|海外华人网上家园 - Powered by Huaren.us

国内关于湾区马公,马母收入的传闻 - Chats&&华人闲话 - Chinese In North America(北美华人e网) 北美华人e网|海外华人网上家园 - Powered by Huaren.us:



http://forums.huaren.us/showtopic.aspx?topicid=2074519



国内关于湾区马公,马母收入的传闻


相信很多人都有同学/亲戚/朋友在美国湾区的互联网公司做码农,而这些人的薪资水平总是众说纷纭,令人眼花缭乱,比如为什么看到新闻上说,谷歌的软件工程师年薪才十二三万美元, Glassdoor上的平均数也是如此,但是另一方面,网上却经常有人提到google的码农年入几十万美元呢? 此外,一种流传很广的说法是,湾区的贫困线是三十万美元,这又是怎么回事?还有,水木上经常有人问自己应该去湾区发展还是留在国内,收入总是排在第一位的考虑因素。海外学人版也出现过湾区的码农上来哭穷,罗列了一番各项开支,最后说每月剩不了几个钱。这一切,不但让身在国内的朋友们难以摸到头绪,就连美国非湾区的华人也经常探不清虚实。本帖的目的,就是要就湾区码农的收入问题给出一个清晰的答案 (我强调清晰,是因为看到很多类似问题的讨论中,经常有人引入了过多的信息而不作结论,把人越搞越糊涂),尤其是大家最关心的实质:湾区码农的收入跟国内比算什么档次? 
  
首先,湾区互联网公司的码农,收入到底是十几万,还是几十万? 
科技新闻里经常提到google的码农年薪十二三万,还有glassdoor的平均统计数也是这个数字,这说的是平均税前底薪。但是码农的收入并不只有底薪,网上华人交流收入信息的时候,基本上都是说的一年的税前总收入,这其中包括了三大部分:底薪,奖金,股票。其中奖金有每年的奖金,也有入职时一次性或分期发放的入职奖金。股票包括入职时给的分四年发放的部分,也有每年看绩效新发放的部分(从绩效评定当年开始分若干年发放)。总的来说,大部分硕士和博士出身的湾区互联网公司码农,在工作第四年时,税前年收入达到20万至35万,是比较普遍的水平(这里不包括为数不多的大牛,对于这种人,湾区大公司总是舍得砸钱的)。在过完前四年后,大部分人年收入会有一个下降,这是因为入职时给的大批股票已经发完,而每年根据绩效增发的股票又没有入职时那么多。当然,股票给收入带来了一定的不确定性。比如Facebook, 2013年的时候股价曾经是20多,那个时候给的入职新人发的股票部分当时价值十五六万美元,分四年发放。但到了今年(2016),facebook的股价已经到了120,甚至还没有停止上涨的迹象,如果当时入职的员工一直没有卖他手里的股票,那么到现在已经价值八九十万美元,平均到四年里,仅这部分就带来每年20多万的年收入。当然,这只是一个比较例外的情况。普通的正常情况,就是我刚才说的,20万到35万美元之间。 
  
其次,湾区的贫困线30万美元是骇人听闻吗? 
首先,这个贫困线指的是税前家庭总收入,而不是个人收入。其次,要了解这个贫困线的产生环境是在湾区的码农之中。如果你把这个贫困线放到湾区所有人口中来看,显然是不合理的,因为湾区还是有不少低收入人群的。但如果你考虑到华人码农的收入,夫妻两口子年总收入达到30万美元可以说不费吹灰之力,工作两年之后两个人光底薪加起来就30万了,更别说还有股票和奖金。所以,我认为,简单说湾区的贫困线30万美元是不合实际的,但说湾区年收入30万美元的华人码农家庭是处在一个较低收入水平线上,是没什么太大的问题的。 
  
最后,湾区码农的收入怎么换算成国内的水平? 
因为美国和国内的个人收入税制度非常不同,要回答这个问题,我认为还是比较税后收入更合理些。以下我分四种情况计算湾区码农的税后收入。请注意,我在计算时都没有考虑401k,也就是自己给自己存的退休金。(我认为考虑401k除了让情况变的复杂,没有太大影响,毕竟这部分钱的税现在不交以后还是要交。有些人经常自己存了很多401k然后发帖子哭穷说每个月不剩钱。这种论调简直无聊至极,毕竟是你自己选择了要存那么多钱进401k,而且那还是你自己的钱,结果说自己没钱花了你这不是耍流氓吗?)另外,对于公司发的股票和奖金,美国都是把它们和你的底薪加在一起收税,税率并无特别。 
  (1)单身,税前收入20万美元(包括奖金,股票,下同) 
     联邦税49606, 州税16077, 社保税7049, 医保税2900, 医保 + SDI 4000 
     剩120368 
     湾区大多数地方消费税在9.25%-10%之间,按9.5%算(以下同), 也就是说湾区人的1万美元去买东西只能当10000/(1+9.5%)=9132块来花 
     所以税后收入等效于109925美元 
  (2)单身,税前收入30万美元 
      联邦税82606,州税25744,社保税7049, 医保税5250,医保 + SDI 4000 
      剩175351, 175351 x 1/(1+9.5%) = 160130 
      此情况下税后收入约等效于160130美元 
  (3)夫妻,税前收入30万美元 
      联邦税74529,州税22853,社保税14098(假设两人都工作),医保税5250, 医保 + SDI 5000  
      剩178270, 178270 x 1/(1+9.5%) = 162796美元 
      此情况下税后收入约等效于162796美元 
  (4)夫妻,税前收入40万美元 
      联邦税107529,州税32153,社保税14098(假设两人都工作),医保税7600, 医保 + SDI 5000 
      剩233620, 233620 x 1/(1+9.5%)= 213341美元 
      此情况下税后收入约等效于213341美元 
  
但是以上收入要拿来和北京码农税后收入比较还有一个问题,就是住房消费有很大差别。差别主要体现在: 
一: 租房的情况下,湾区成本比较高,一室的公寓,现在月租均价在2600美元左右,两室的公寓,月租均价在3300美元左右,因此单身码农即使和别人合租公寓,每月的房租支出包括水电车位不会少于1700美元,而在北京,即使两室精装公寓月租也就是在1000到1500美元,这样两个人合租每人只需付月租500到750美元还包水电。 
二: 买房的情况下,湾区码农需要每年支付地产税,注意是每年!这是国内买房没有的一项大支出。湾区各地的地产税大概在房价的1.15% - 1.25%之间,这里取1.2%,房价按照目前的情况,华人购置的房产均价也差不多有1百万美元,这样每年的地产税就要交付12000美元。 
根据以上两点,我们再假设单身码农租住公寓,夫妻自购住房(这其实也符合大部分湾区码农的实际情况),对以上四种情况作休正: 
  
(1)单身税前收入20万美元,税后有效收入扣除全年房租支出20400美元,余89525美元 
(2)单身税前收入30万美元,税后有效收入扣除全年房租支出20400美元,余139730美元 
(3)夫妻税前收入30万美元,税后有效收入扣除全年地产税支出12000美元,余150796美元 
(4)夫妻税前收入40万美元,税后有效收入扣除全年地产税支出12000美元,余201341美元 
  
以上数据大家就可以拿来和自己的收入做比较了,当然你应该用自己扣除四险一金的税后收入(包括公司发的奖金和股票),而如果你是在租房,也应该扣除全年的租房支出(如果你是自购房,那就不用扣除房贷了,因为湾区的数字我也没有扣除房贷)后再和(1)(2)这两种情况进行比较。此外,国内的朋友们应将住房公积金也加到自己的收入中去。 
以上之所以只就住房支出而不是其他消费支出对收入做修正,是因为该项支出基本上人人都有而且占比例大,除此之外的其他项支出影响就不是那么大了。但我将湾区有关孩子的一些消费罗列如下,有兴趣的人可自行参考或据此进一步修正上述数据: 
月嫂 26天 4000美元 
保姆 每周工作5天 每月2500美元 
托儿所(Daycare)每月1200-2100美元 
幼儿园(Preschool)每月1000-1300美元 
  



'via Blog this'

Sunday, October 2, 2016

How to restart / resubmit the SGE job in Eqw status

[Rocks-Discuss] job in error state will not restart:



   1146 0.55500 SomeTask user        Eqw  10/02/2016 19:24:09                                    1 2,17

qmod -cj jobnumber

For example,

qmod -cj 1146.2

qmod -cj 1146.17


If you want to resubmit a job in 'r' status, you can use:

qmod -rj jobnumber





Reference:

1) Usage and qmod (type qmod in terminal will display the following)

qmod
SGE 8.1.8
usage: qmod [options]
   [-c job_wc_queue_list]  clear error state
   [-cj job_list]          clear job error state
   [-cq wc_queue_list]     clear queue error state
   [-d wc_queue_list]      disable
   [-e wc_queue_list]      enable
   [-f]                    force action
   [-help]                 print this help
   [-r job_wc_queue_list]  reschedule jobs (running in queue)
   [-rj job_list]          reschedule jobs
   [-rq wc_queue_list]     reschedule all jobs in a queue
   [-s job_wc_queue_list]  suspend
   [-sj job_list]          suspend jobs
   [-sq wc_queue_list]     suspend queues
   [-us job_wc_queue_list] unsuspend
   [-usj job_list]         unsuspend jobs
   [-usq wc_queue_list]    unsuspend queues

job_wc_queue_list          {job_tasks|wc_queue}[{','|' '}{job_tasks|wc_queue}[{','|' '}...]]
job_list                   {job_tasks}[{','|' '}job_tasks[{','|' '}...]]
job_tasks                  {{job_id'.'task_id_range}|job_name|pattern}[' -t 'task_id_range]
task_id_range              task_id['-'task_id[':'step]]
wc_cqueue                  wildcard expression matching a cluster queue
wc_host                    wildcard expression matching a host
wc_hostgroup               wildcard expression matching a hostgroup
wc_qinstance               wc_cqueue@wc_host
wc_qdomain                 wc_cqueue@wc_hostgroup
wc_queue                   wc_cqueue|wc_qdomain|wc_qinstance
wc_queue_list              wc_queue[','wc_queue[','...]]


2)
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-December/022877.html


关于Audacity语音标注软件的使用

[关于Audacity语音标注软件的使用]
Part 1: How to manually label an audio file

1)下载Audacity 软件并安装:    http://www.audacityteam.org/download 
2)打开软件:导入语音demo.wav(可以在左边框选用"Waveform" 或"Spectrogrm"展示语音)
3)依次点击:  Tracks --> Add New --> Label Track
4)在语音区域,按住鼠标左键拖选择一个区域,然后同时按下ctrl + B
在Label Track 区域会出现一个“填空题”,把你的标签(speech)添在上面,
5)重复4)直到听到语音结束
6)依次点击  Tracks --> Edit Labels -- > Export   存为:demo.txt

Part 2: how to prepare and import label file for audio with Audacity

Step 1: ensure your label file is in correct format:

segment_startTime_inSecond    segment_endTime_inSecond    label
Must use TAB not space as separator. 

For example:
Audacity_label.txt
29.473812    49.066307    text1
57.213861    96.885166    text2
Note:   column1 and column2 indicates starting and ending time in second, respectively
save the label file as txt

Step 2: Import:
File->Import->Label

For more, please refer [1]

Reference:
[1] http://manual.audacityteam.org/o/man/label_tracks.html