Blog Archive

Thursday, February 27, 2014

Awesome Examples to Manipulate Audio Files Using Sound eXchange (SoX)

source: 15 Awesome Examples to Manipulate Audio Files Using Sound eXchange (SoX):
Reference: http://sox.sourceforge.net/sox.html


SoX stands for Sound eXchange. SoX is a cross-platform command line audio utility tool that works on Linux, Windows and MacOS. It is very helpful in the following areas while dealing with audio and music files.
  • Audio File Converter
  • Editing audio files
  • Changing audio attributes
  • Adding audio effects
  • Plus lot of advanced sound manipulation features

In general, audio data is described by following four characteristics:
  1. Rate – The sample rate is in samples per second. For example, 44100/8000
  2. Data size - The precision the data is stored in.  For example, 8/16 bits
  3. Data encoding - What encoding the data type uses. For example, u-law,a-law
  4. Channels – How many channels are contained in the audio data.  For example, Stereo 2 channels
SoX supports over 20 audio file formats. To get the list of all supported formats, execute sox -h from the command line. One of the major benefits of a command-line audio/music tool is easy to use in scripts to perform more complex tasks in batch mode.

All the 15 examples mentioned below can be used to manipulate Audio files on Unix, Windows and MacOS. Make sure to download the corresponding SoX utility for your platform from theSoX – Sound eXchange download page.

1. Combine Multiple Audio Files to Single File

With the -m flag, sox adds two input files together to produce its output. The example below adds first_part.wav and second_part.wav leaving the result in whole_part.wav. You can also use soxmix command for this purpose.
$ sox -m first_part.wav second_part.wav whole_part.wav

(or)

$ soxmix first_part.wav second_part.wav whole_part.wav

Note:

You can also mix together two different format audio files.
          sox -m music.mp3 voice.wav mixed.flac

To concatenates two audio files:
 sox short.wav long.wav longer.wav
       

2. Extract Part of the Audio File

Trim can trim off unwanted audio from the audio file.
Syntax : sox old.wav new.wav trim [SECOND TO START] [SECONDS DURATION].
  • SECOND TO START – Starting point in the voice file.
  • SECONDS DURATION – Duration of voice file to remove.
The command below will extract first 10 seconds from input.wav and stored it in output.wav
$ sox input.wav output.wav trim 0 10

3. Increase and Decrease Volume Using Option -v

Option -v is used to change (increase or decrease ) the volume.

Increase Volume

$ sox -v 2.0 foo.wav bar.wav

Decrease Volume

If we need to lower the volume on some files, we can lower them by using negative numbers. Lower Negative number will get more soft . In the following example, the 1st command (-0.5) will be louder than the 2nd command (-0.1)
$ sox -v -0.5 srcfile.wav test05.wav

$ sox -v -0.1 srcfile.wav test01.wav

4. Get Audio File Information

The stat option can provide lot of statistical information about a given audio file. The -e flag tells sox not to generate any output other than the statistical information.
$ sox foo.wav -e stat
Samples read: 3528000
Length (seconds): 40.000000
Scaled by: 2147483647.0
Maximum amplitude: 0.999969
Minimum amplitude: -1.000000
Midline amplitude: -0.000015
Mean norm: 0.217511
Mean amplitude: 0.003408
RMS amplitude: 0.283895
Maximum delta: 1.478455
Minimum delta: 0.000000
Mean delta: 0.115616
RMS delta: 0.161088
Rough frequency: 3982
Volume adjustment: 1.000

5. Play an Audio Song

Sox provides the option for playing and recording sound files. This example explains how toplay an audio file on Unix, Linux. Playing a sound file is accomplished by copying the file to the device special file /dev/dsp. The following command plays the file music.wav: Option -t specifies the type of the file /dev/dsp.
$ sox music.wav -t ossdsp /dev/dsp
You can also use play command to play the audio file as shown below.
Syntax :play options Filename audio_effects

$ play -r 8000 -w music.wav

6. Play an Audio Song Backwards

Use the ‘reverse’ effect to reverse the sound in a sound file. This will reverse the file and store the result in output.wav
$ sox input.wav output.wav reverse
You can also use play command to hear the song in reverse without modifying the source file as shown below.
$ play test.wav reverse

7. Record a Voice File

‘play’ and ‘rec’ commands are companion commands for sox . /dev/dsp is the digital sampling and digital recording device. Reading the device activates the A/D converter for sound recording and analysis. /dev/dsp file works for both playing and recording sound samples.
$ sox -t ossdsp /dev/dsp test.wav
You can also use rec command for recording voice. If SoX is invoked as ‘rec’ the default sound device is used as an input source.
$ rec -r 8000 -c 1 record_voice.wav

8. Changing the Sampling Rate of a Sound File

To change the sampling rate of a sound file, use option -r followed by the sample rate to use, in Hertz. Use the following example, to change the sampling rate of file ‘old.wav’ to 16000 Hz, and write the output to ‘new.wav’
$ sox old.wav -r 16000 new.wav

9. Changing the Sampling Size of a Sound File

If we increase the sampling size , we will get better quality. Sample Size for audio is most often expressed as 8 bits or 16 bits. 8bit audio is more often used for voice recording.
  • -b Sample data size in bytes
  • -w Sample data size in words
  • -l Sample data size in long words
  • -d Sample data size in double long words
The following example will convert 8-bit audio file to 16-bit audio file.
$ sox -b input.wav -w output.wav

10. Changing the Number of Channels

The following example converts mono audio files to stereo.  Use Option -c to specify the number of channels .
$ sox mono.wav -c 2 stereo.wav
There are methods to convert stereo sound files to mono sound.  i.e to get a single channel from stereo file.

Selecting a Particular Channel

This is done by using the avg effect with an option indicating what channel to use. The options are -l for left, -r for right, -f for front, and -b for back.  Following example will extract the left channel
$ sox stereo.wav -c 1 mono.wav avg -l

Average the Channels

$ sox stereo.wav -c 1 mono.wav avg

Newer version may only support:
$ sox stereo.wav -c 1 mono.wav

Extract LEFT channel from a stereo file with sox (The following two are exactly the same)

 sox in-stereo.wav -c 1 out-mono.wav  remix 1

 sox in-stereo.wav out-mono.wav  remix 1

Extract RIGHT channel from a stereo file with sox (The following two are the same)

 sox in-stereo.wav -c 1 out-mono.wav  remix 2

 sox in-stereo.wav out-mono.wav  remix 2

Note:  
To extract RIGHT channel, in-stereo.wav should have two channels. You can check it with:
sox -c in-stereo.wav

11. Audio Converter – Music File Format Conversion

Sox is useful to convert one audio format to another. i.e from one encoding (ALAW, MP3) to another. Sox can recognize the input and desired output formats by parsing the file name extensions . It will take infile.ulaw and creates a GSM encoded file called outfile.gsm. You can also use sox to convert wav to mp3.
$ sox infile.ulaw outfile.gsm
If the file doesn’t have an extension in its name , using ‘-t’ option we can express our intention . Option -t  is used to specify the encoding type .
$ sox -t ulaw infile -t gsm outfile

12. Generate Different Types of Sounds

Using synth effect we can generate a number of standard wave forms and types of noise. Though this effect is used to generate audio, an input file must still be given, ‘-n’ option is used to specify the input file as null file .
$ sox -n synth len type freq
  • len – length of audio to synthesize. Format for specifying lengths in time is hh:mm:ss.frac
  • type is one of sine, square, triangle, sawtooth, trapezium, exp, [white]noise, pinknoise, brown-
    noise. Default is sine
  • freq – frequencies at the beginning/end of synthesis in Hz
The following example produces a 3 second 8000 kHz, audio file containing a sine-wave swept from 300 to 3300 Hz
$ sox -r 8000 -n output.au synth 3 sine 300-3300

13. Speed up the Sound in an Audio File

To speed up or slow down the sound of a file, use speed to modify the pitch and the duration of the file. This raises the speed and reduces the time. The default factor is 1.0 which makes no change to the audio. 2.0 doubles speed, thus time length is cut by a half and pitch is one interval higher.
Syntax: sox input.wav output.wav speed factor

$ sox input.wav output.wav speed 2.0

14. Multiple Changes to Audio File in Single Command

By default, SoX attempts to write audio data using the same data type, sample rate and channel count as per the input data. If the user wants the output file to be of a different format then user has to specify format options. If an output file format doesn’t support the same data type, sample rate, or channel count as the given input file format, then SoX will automatically select the closest values which it supports.
Converting a wav to raw. Following example convert sampling rate , sampling size , channel in single command line .
$ sox -r 8000 -w -c 1 -t wav source -r 16000 -b -c 2 -t raw destination

15. Convert Raw Audio File to MP3 Music File

There is no way to directly convert raw to mp3 file because mp3 will require compression information from raw file . First we need to convert raw to wav. And then convert wav to mp3.  In the exampe below, option -h indicates high quality.
Convert Raw Format to Wav Format:
$ sox -w -c 2 -r 8000 audio1.raw audio1.wav
Conver Wav Format to MP3 Format:
$ lame -h audio1.wav audio1.mp3

16. Convert Raw Audio File to Wav File

Convert Raw Format to Wav Format (old format)
$ sox -s -2 -r 8000 input1.raw output1.wav

To convert from mono raw 8000 Hz 8-bit unsigned PCM data to a WAV file:


sox -r 8000 -u -b -c 1 filename.raw filename.wav

Convert Raw Format to Wav Format (new format)
$ sox -e signed-integer -b 16 -r 8000 input1.raw output1.wav

pcm to wav:  some people may use pcm as raw file extension, you can also try:

sox -e signed-integer -b 16 -r 8000 -t raw input1.pcm output1.wav

[matlab solutions example] x=read_RAW_file('s6.raw','short','b'); x=x/2^15;
function data = read_RAW_file(filename,datatype,endianness)% data = read_RAW_file(filename,datatype)% datatype is a string e.g. 'float32', 'int16' etc.% endianness can be 'l' for little endian or 'b' for big endian (optional,%   if left unspecified the native endianness is used ).% - returns the samples in a 1 x D matrixif nargin  < 3    endianness = 'n';end
file = fopen(filename,'rb',endianness);
data = fread(file,datatype);fclose(file);


17. cut an audio file into multiple equal pieces 


It will split the input file into multiple files of 30 seconds in length. 
Each output filename will have unique number in its name as documented in the Output Files section.
Example: 
If inflie.wav is 180-second long, then the following compand will produce the 3 files.
sox infile.wav output.wav trim 0 60 : newfile : restart
output001.wav
output002.wav
output003.wav

Note:
It is an example of multiple effects chains. 

18. Convert MP3 to Wav File

Code:
sudo apt-get install sox libsox-fmt-mp3
and then:
Code:
find Music/ -iname "*.mp3" -exec sox -V3 {} {}.SOX-CONVERTED.wav \;

Nuance Cloud Services - Experiencing a more human conversation

Nuance Cloud Services - Experiencing a more human conversation:



'via Blog this'

Saturday, February 1, 2014

世纪佳缘创始人再投在线教育 8股望爆发(2)_证券时报网

世纪佳缘创始人再投在线教育 8股望爆发(2)_证券时报网:



'via Blog this'

对着手机练口语:英语流利说的生意经


英语培训机构英孚教育2012年的一份报告显示,中国人现在每年要花300亿元学英语。在必须外教教学的口语培训方面,英孚、华尔街英语和新东方(26.46, 0.20, 0.76%)精英的学费已经动辄数万。
不过,在前Google产品经理王翌看来,英语口语学习完全可以不用这么贵。过去一年,作为英语流利说的联合创始人与CEO,王翌试图通过游戏和社交的方式激励用户在手机上学习口语。英语流利说是一款支持iPhone和Android手机的免费口语学习应用。与传统教学应用不同,它没有为用户准备课程大纲或者学习计划。用户打开应用以后要做的只有一件事:照着自己听到的语调,对着麦克风开口说话。
“英语和数理化不太一样,它是一个基本技能。我们不觉得在英语学习上教师或者教材是最核心内容。这些很重要,但最核心的是不断地练习。”王翌在接受《第一财经周刊》采访时说,“这就像摊煎饼。我不用教你任何事,每天给你面粉、鸡蛋和必备材料。随便怎么摊,一个月之后肯定会摊好。”
英语流利说的学习分为修炼和闯关两部分。修炼阶段,用户跟着双人对话录音,对着手机麦克风模仿录音里的发音和语调。应用将根据单个词的发音、句子里的停顿位置和声调判断用户的读法是否正确,不正确的单词将以红色标出。整个评价过程完全依靠计算机算法在手机上完成。当用户觉得自己已经熟练以后,就可以开始闯关,扮演其中一方与应用进行对话。应用将根据发音的表现给出得分,判断闯关是否成功。
2011年4月,王翌离开Google回国,他先在数字广告平台易传媒做了一年产品总监,熟悉国内商业环境。英语流利说的想法也在此期间诞生。王翌回忆说:“我在易传媒的团队里有一半多的同事自己掏钱上英语学习班,但很多人说没效果,自己很难坚持下去。有些是因为加班去不了,不加班的时候也可能因为工作太累或其它原因而不去上课。当时我就想那么多人对着手机用微信聊天、在唱吧上唱歌,完全可以做个应用让他们对着手机学英语。”
去年夏天从易传媒离职后,王翌拉了自己之前的同事,在Google从事语音和自然语言处理研究的林晖以及在硅谷大数据广告创业公司Quantcast任职的工程师胡哲人共同创办了英语流利说。英语流利说的整套语音分析算法由林晖完成。它能够在修炼阶段对用户的口语水平进行评估,用红黑绿三种颜色告知用户需要改进的单词发音,避免用户按照错误的方式重复练习。
为了让用户开口,英语流利说内置了上百个话题。话题内容五花八门,有叙利亚局势这样的热点,也有职场、乐评、美剧之类的生活话题。每个话题又被分为十段一对一的对话,长度多在一分钟之内。为了维持用户热情,王翌又效仿Candy Crush等当红社交游戏不断增加关卡的设计,每周请外教来到公司在上海火车站附近租的办公室里录制四个新话题。
话题的选择由团队内部讨论完成。英语流利说团队有一位在英孚教育做了四年的美国外教,内容的制作会参考朱迪·吉尔伯特(Judy Gilbert)等美国英语口语专家授权的作品,包括针对美国本土教学的Clear Speech和Teaching Pronunciation。
一个总共不到15人的小团队每周准备40段对话自然不可能像耗时数年的教材那样字斟句酌。但通过海量内容覆盖不同场景的形式可能反倒比教材里精选的对话更接近人类本能。Google语音识别工程总监,研究人脑运作数十年的未来学家雷·库兹韦尔(Ray Kurzweil)在去年出版的《如何创造思维》(How to Create a Mind)一书里解释说人脑并不是按照单词进行记忆和对话,人脑记住的是说特定句子时身体的一连串动作—就像早上起床时自然的穿衣动作一样。这意味着流利的口语必须依靠重复练习各种可能的场景对话,当大脑开始思考该用什么词的时候,说话难免变得吞吞吐吐。
面对日常生活不用英语交流的用户,英语流利说只能依靠一些激励措施让用户持续练习。王翌选择效仿那些令人上瘾的游戏设计,特别是《愤怒的小鸟》。英语流利说允许用户选择任何自己感兴趣的话题进行学习,但每次只能打开话题中的第一段对话,必须闯关才能解锁下一个话题。这就像游戏里的关卡一样,你必须一关一关地打下去。闯关成功后,英语流利说将依照用户的表现给出一至三个金币的评分,只有达到一个比较高的水准才能拿满三个金币。并且应用还会将用户的得分上传,做成金币排行榜鼓励竞争。《愤怒的小鸟》已经证明闯关和评星模式能够激发用户的挑战欲。
“8月底,有位用户一天在英语流利说里对着麦克风练了7小时14分钟。这个数字相当夸张,一般情况下,每天的第一名都录三到四小时。”王翌告诉《第一财经周刊》。在游戏机制和社交排名的激励下,英语流利说实现了不错的黏性。
自今年2月第一版在苹果(508.89, 4.39, 0.87%)应用商店上线至今,英语流利说已经积累了超过一百万用户。根据移动应用分析平台Flurry的统计数据,英语流利说的用户活跃度比iOS平台全球教育类应用的平均值高两倍以上。偏重国内的分析平台友盟的数字则是英语流利说活跃度目前处于所有应用的前5%。
英语流利说已经完成种子轮融资,但目前还没有营收来源。虽然曾经在Google广告的核心部门工作,王翌从事的Analysitc数据分析业务并不产生利润,他信奉“专注于好用户,别的都会自然跟上来”的商业逻辑。不过,作为一家独立的商业公司,迟早需要面对如何赚钱的问题。
问题的答案可能在英语流利说的下一个发展目标—为用户提供个性化建议。目前它只会告诉用户句子的哪个词发音标准、哪个词还不行,需要用户自己改进。很多用户没法自己纠正错误,这也是那么多人愿意花数万元去传统培训机构上口语课的主要原因。
“我们正在对数据分析改进打分算法,将来有可能针对用户所遇到的不同问题,为他提供个性化的服务”,王翌计划在个性化建议上延续技术主导的策略,不过要通过算法为口音各不相同的用户提供有效的修改建议绝不是件容易事。

Detecting Parkinson's for better treatment --Speech based




Parkinson’s disease is a neurological disorder that affects a half million people in the United States, with about 50,000 newly diagnosed cases each year. There is no cure and, until now, no reliable method for detecting the disease. But an MSU research team has developed an innovative detection method that is a major breakthrough in diagnosing Parkinson’s in early stages—the point at which treatment to control symptoms is most effective.
Parkinson's, a disorder of the nervous system that affects movement, occurs when nerve cells in the brain stop producing the chemical dopamine, which helps control muscle movement.  Without dopamine, nerve cells can’t properly send messages, causing the loss of muscle function.
The method of detection developed in part by Rahul Shrivastav, professor and chair of MSU’s Department of Communicative Sciences and Disorders, involves monitoring a patient’s speech patterns, specifically movement patterns of the tongue and jaw. Shrivastav says Parkinson’s affects all patients’ speech and changes in speech patterns are detectable before other movement and muscles are affected by the disease.
The new early detection method has proved to be more than 90 percent effective and is noninvasive and inexpensive. Requiring as little as two seconds of speech, monitoring can be done remotely and in telemedicine applications. In addition, the new method has the potential to track the progression of Parkinson’s and measure the effectiveness of treatment.
- See more at: http://msutoday.msu.edu/360/2013/detecting-parkinsons-for-better-treatment/#sthash.8C7bfEOz.dpuf


'via Blog this'