ㅇㅇ: Emotion Recognition using GMM-HMM in Kaldi

This is the post of my [previous blog's post](https://gogyzzz.github.io/2017/03/01/emotion-recognition-using-GMMHMM-in-kaldi.html)

I wanted to implement this paper
[Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition](http://ieeexplore.ieee.org/document/6681449/#full-text-section),

So I try to explain how to prepare data set and implement like that paper.

And The implementation is made of yesno recipe script of kaldi.

First, you should have a little experience about using kaldi in linux environment.

There are several references for understanding linux and kaldi.

[Linux command line basics - Udacity](https://www.udacity.com/course/linux-command-line-basics--ud595)

[Introduction to the use of WFSTs in Speech and Language processing](http://www.lvcsr.com/static/pubs/apsipa_09_tutorial_dixon_furui.pdf)

[Kaldi ASR](http://kaldi-asr.org/)

[Josh meyer's website](http://jrmeyer.github.io/) <- this is the best material of kaldi for beginner, I think.

## Dataset preparation
You can download [Berlin Database of Emotional Speech](http://emodb.bilderbar.info/docu/).

And the DB has 535 wave files.

Each wave file has a label in 7 types of emotions(anger, boredeom, disgust, anxiety, happiness, sadness, neutral)

After downloading, you should split the dataset into training and test set(Maybe the validation set will be needed for training DNN)

In my case, I shuffled dataset and splited it into training(50%), test(40%) and validation(10%) set following the paper.

(The validation set was not used in this case.)

And I prepare the 'wav.scp', 'text', 'spk2utt', 'utt2spk' scripts for each set.

You can make spk2utt and utt2spk more detail using specific speaker.

But I wanted not to care who the speakers are.

These files look like below. These are examples.

- files of training set
![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/train_data.PNG)

## Additional preparation

You should also prepare several files of language model. It is really easy.

![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/lang_dict.PNG)

for removing the cyclic of WFST, I made the G.fst file manually.

This tip was [Dan Povey's comment](https://sourceforge.net/p/kaldi/discussion/1355348/thread/d927baef/).

In order to prepare G.fst without cyclic, make the txt format file like below.

![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/g_fst_txt.PNG)

and compile.

> fstcompile --isymbols=words.txt --osymbols=words.txt G.fst.txt G.fst

After removing the cyclic, G.fst will be like below. I used 'fstdraw' and 'dot' to extract a pdf format file.

> fstdraw 
--portrait=true 
--isymbols=words.txt 
--osymbols=words.txt 
G.fst | dot -Tpdf > G.pdf

![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/g_fst_without_cyclic.PNG)

## Implementation

First, set the files and directories paths.
![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/prep_script.PNG)

And make the language model. prepare_lang.sh make language model directory using your dictionary directory.

You can see sil_prob. it is probability of silence phone. it should be 0.0 for getting better score(only for this experiment).
This tip was [Dan Povey's](https://groups.google.com/forum/#!topic/kaldi-developers/z4km_Q8kO0U), too.

![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/prep_lang.PNG)

Extract MFCC features as having 42 dimensions using defined configure file.
![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/mfcc_conf.PNG)

![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/prep_data.PNG)

And train GMMHMM model, WFST graph and decode.
![](https://github.com/gogyzzz/gogyzzz.github.io/raw/master/_posts/train_mkgraph_decode.PNG)

If you implemented well, you will get a result.

See exp/your_experiment/decode/wer_*. 
(I think the ratios between language model and acoustic model not important.)

I got a WER, 26.51. It means 73% accuracy, it is lower than paper's.

I hope if somebody noticed better implementation for me.

ㅇㅇ

2017년 8월 18일 금요일

Emotion Recognition using GMM-HMM in Kaldi

댓글 없음:

댓글 쓰기