steven@Jenkins-server:~/works/asr/dataset/thchs30$ ll total 20 drwxrwxr-x 5 steven steven 4096 5月 1 12:49 ./ drwxrwxr-x 4 steven steven 4096 5月 1 12:49 ../ drwxr-xr-x 8 steven steven 4096 12月 30 2015 data_thchs30/ drwxr-xr-x 4 steven steven 4096 1月 25 2016 resource/ drwxr-xr-x 5 steven steven 4096 1月 25 2016 test-noise/
下面进入 egs/thchs30/s5/ 目录,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
steven@Jenkins-server:~/kaldi/egs/thchs30/s5$ ls -l total 108 -rw-rw-r-- 1 steven steven 1062 5月 1 12:53 cmd.sh drwxrwxr-x 2 steven steven 4096 4月 30 23:28 conf drwxrwxr-x 15 steven steven 4096 5月 1 14:13 data drwxrwxr-x 18 steven steven 4096 5月 1 14:24 exp drwxrwxr-x 5 steven steven 4096 5月 1 14:22 fbank drwxrwxr-x 4 steven steven 4096 4月 30 23:28 local drwxrwxr-x 5 steven steven 4096 5月 1 13:14 mfcc -rw------- 1 steven steven 57117 5月 1 17:40 nohup.out -rwxrwxr-x 1 steven steven 374 4月 30 23:28 path.sh -rw-rw-r-- 1 steven steven 4469 4月 30 23:28 RESULTS -rwxrwxr-x 1 steven steven 5287 5月 1 13:03 run.sh lrwxrwxrwx 1 steven steven 19 4月 30 23:28 steps -> ../../wsj/s5/steps/ lrwxrwxrwx 1 steven steven 19 4月 30 23:28 utils -> ../../wsj/s5/utils/
creating data/{train,dev,test} cleaning data/train preparing scps and text in data/train cleaning data/dev preparing scps and text in data/dev cleaning data/test preparing scps and text in data/test creating test_phone for phone decoding steps/make_mfcc.sh --nj 4 --cmd run.pl data/mfcc/train exp/make_mfcc/train mfcc/train utils/validate_data_dir.sh: Successfully validated data-directory data/mfcc/train steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. Succeeded creating MFCC features for train steps/compute_cmvn_stats.sh data/mfcc/train exp/mfcc_cmvn/train mfcc/train Succeeded creating CMVN stats for train
### IS CUDA GPU AVAILABLE? 'Jenkins-server' ### ### CUDA WAS NOT COMPILED IN! ### To support CUDA, you must run 'configure' on a machine that has the CUDA compiler 'nvcc' available. # Accounting: time=1 threads=1 # Ended (code 1) at Wed May 1 14:24:20 CST 2019, elapsed time 1 seconds
...
steps/diagnostic/analyze_lats.sh: see stats in exp/mono/decode_test_phone/log/analyze_alignments.log Overall, lattice depth (10,50,90-percentile)=(6,20,64) and mean=29.6 steps/diagnostic/analyze_lats.sh: see stats in exp/mono/decode_test_phone/log/analyze_lattice_depth_stats.log local/score.sh --cmd run.pl --mem 4G data/mfcc/test_phone exp/mono/graph_phone exp/mono/decode_test_phone local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
在上面的日志输出中可以看到,因为默认 DNN 是用 GPU 来跑的,应该也可以改成 CPU 来跑,但是从网上看到有人用 CPU 跑了 7, 8 天才迭代了 17, 18 次,也就放弃试了。
steven@Jenkins-server:~/kaldi/egs/thchs30/s5/exp$ ls -l total 64 drwxrwxr-x 5 steven steven 4096 5月 1 14:24 fbank_cmvn drwxrwxr-x 5 steven steven 4096 5月 1 14:22 make_fbank drwxrwxr-x 5 steven steven 4096 5月 1 13:14 make_mfcc drwxrwxr-x 5 steven steven 4096 5月 1 13:15 mfcc_cmvn drwxrwxr-x 7 steven steven 4096 5月 1 16:37 mono drwxrwxr-x 3 steven steven 4096 5月 1 13:27 mono_ali drwxrwxr-x 7 steven steven 4096 5月 1 15:45 tri1 drwxrwxr-x 3 steven steven 4096 5月 1 13:31 tri1_ali drwxrwxr-x 7 steven steven 4096 5月 1 15:38 tri2b drwxrwxr-x 3 steven steven 4096 5月 1 13:39 tri2b_ali drwxrwxr-x 9 steven steven 4096 5月 1 15:56 tri3b drwxrwxr-x 3 steven steven 4096 5月 1 13:57 tri3b_ali drwxrwxr-x 9 steven steven 4096 5月 1 16:16 tri4b drwxrwxr-x 3 steven steven 4096 5月 1 14:13 tri4b_ali drwxrwxr-x 3 steven steven 4096 5月 1 14:13 tri4b_ali_cv drwxrwxr-x 4 steven steven 4096 5月 1 14:24 tri4b_dnn
steven@Jenkins-server:~/kaldi/egs/thchs30/s5/exp/tri1$ ls -l total 66176 -rw-rw-r-- 1 steven steven 3435133 5月 1 13:31 35.mdl -rw-rw-r-- 1 steven steven 8326 5月 1 13:31 35.occs -rw-rw-r-- 1 steven steven 1524364 5月 1 13:30 ali.1.gz -rw-rw-r-- 1 steven steven 1495302 5月 1 13:30 ali.2.gz -rw-rw-r-- 1 steven steven 1655834 5月 1 13:30 ali.3.gz -rw-rw-r-- 1 steven steven 1533052 5月 1 13:30 ali.4.gz -rw-rw-r-- 1 steven steven 1 5月 1 13:27 cmvn_opts drwxrwxr-x 4 steven steven 4096 5月 1 17:21 decode_test_phone drwxrwxr-x 4 steven steven 4096 5月 1 15:44 decode_test_word lrwxrwxrwx 1 steven steven 6 5月 1 13:31 final.mdl -> 35.mdl lrwxrwxrwx 1 steven steven 7 5月 1 13:31 final.occs -> 35.occs -rw-rw-r-- 1 steven steven 14109331 5月 1 13:27 fsts.1.gz -rw-rw-r-- 1 steven steven 13876553 5月 1 13:27 fsts.2.gz -rw-rw-r-- 1 steven steven 15234357 5月 1 13:27 fsts.3.gz -rw-rw-r-- 1 steven steven 14475131 5月 1 13:27 fsts.4.gz drwxrwxr-x 3 steven steven 4096 5月 1 15:45 graph_phone drwxrwxr-x 3 steven steven 4096 5月 1 13:40 graph_word drwxrwxr-x 2 steven steven 12288 5月 1 13:31 log -rw-rw-r-- 1 steven steven 2 5月 1 13:27 num_jobs -rw-rw-r-- 1 steven steven 2098 5月 1 13:27 phones.txt -rw-rw-r-- 1 steven steven 8161 5月 1 13:27 questions.int -rw-rw-r-- 1 steven steven 35098 5月 1 13:27 questions.qst -rw-rw-r-- 1 steven steven 305009 5月 1 13:27 tree
steven@Jenkins-server:~/kaldi/egs/thchs30/s5/exp/tri1$ ls -l graph_word/ total 842184 -rw-rw-r-- 1 steven steven 290 5月 1 13:31 disambig_tid.int -rw-rw-r-- 1 steven steven 861725005 5月 1 13:40 HCLG.fst -rw-rw-r-- 1 steven steven 5 5月 1 13:40 num_pdfs drwxrwxr-x 2 steven steven 4096 5月 1 13:40 phones -rw-rw-r-- 1 steven steven 2098 5月 1 13:40 phones.txt -rw-rw-r-- 1 steven steven 646753 5月 1 13:40 words.txt
steven@Jenkins-server:~/kaldi/src/online$ make steven@Jenkins-server:~/kaldi/src/online$ cd ../onlinebin steven@Jenkins-server:~/kaldi/src/onlinebin$ ls -l total 137340 drwxrwxr-x 3 steven steven 4096 4月 30 23:28 java-online-audio-client -rw-rw-r-- 1 steven steven 1364 4月 30 23:28 Makefile -rwxrwxr-x 1 steven steven 8351432 5月 1 21:09 online-audio-client -rw-rw-r-- 1 steven steven 11986 4月 30 23:28 online-audio-client.cc -rwxrwxr-x 1 steven steven 43179504 5月 1 21:09 online-audio-server-decode-faster -rw-rw-r-- 1 steven steven 13653 4月 30 23:28 online-audio-server-decode-faster.cc -rwxrwxr-x 1 steven steven 27141312 5月 1 21:08 online-gmm-decode-faster -rw-rw-r-- 1 steven steven 8113 4月 30 23:28 online-gmm-decode-faster.cc -rwxrwxr-x 1 steven steven 7978200 5月 1 21:08 online-net-client -rw-rw-r-- 1 steven steven 4970 4月 30 23:28 online-net-client.cc -rwxrwxr-x 1 steven steven 26017240 5月 1 21:08 online-server-gmm-decode-faster -rw-rw-r-- 1 steven steven 8255 4月 30 23:28 online-server-gmm-decode-faster.cc -rwxrwxr-x 1 steven steven 27878392 5月 1 21:08 online-wav-gmm-decode-faster -rw-rw-r-- 1 steven steven 10116 4月 30 23:28 online-wav-gmm-decode-faster.cc
# Change this to "tri2a"if you like to test using a ML-trained model #ac_model_type=tri2b_mmi ac_model_type=tri1
- 注释掉如下这段从 voxforge 下载现网的测试预料和模型的代码
1 2 3 4 5 6 7 8 9
#if [ ! -s ${data_file}.tar.bz2 ]; then #echo"Downloading test models and data ..." # wget -T 10 -t 3 $data_url; # # if [ ! -s ${data_file}.tar.bz2 ]; then #echo"Download of $data_file has failed!" #exit 1 #fi #fi
SIMULATED ONLINE DECODING - pre-recorded audio is used
The (bigram) language model used to build the decoding graph was estimated on an audio book's text. The text in question is "King Solomon's Mines" (http://www.gutenberg.org/ebooks/2166). The audio chunks to be decoded were taken from the audio book read by John Nicholson(http://librivox.org/king-solomons-mines-by-haggard/)
NOTE: Using utterances from the book, on which the LM was estimated is considered to be "cheating" and we are doing this only for the purposes of the demo.
You can type "./run.sh --test-mode live" to try it using your own voice!