ㅇㅇ: 180913 CMU 음성인식 ppt 노트 - 27(끝). Rescoring, Nbest and Confidence

2018년 9월 13일 목요일

180913 CMU 음성인식 ppt 노트 - 27(끝). Rescoring, Nbest and Confidence

Link: http://www.cs.cmu.edu/afs/cs/user/bhiksha/WWW/courses/11-756.asr/spring2013/

아무것도 모르고 쓰는 기분이 든다.

요약

viterbi search로 얻어낸 entries는 tree구조를 가지고 있다. 이 tree에 추가로 edge를 넣어준다. 이것이 lattice이다.
lattice에 대해 A* algorithm으로 best path를 찾는 것이 decoding이다.
A* algorithm 시 edge cost를 더 복잡한 n-gram으로 대체하여 다시 디코딩 하는것이 rescoring이다.
path를 구성하는 word의 confidence = posterior = (all path prob through the word)/(all path prob)

추가 설명

1.

entry: (id, time, parent, score, word)
edge를 넣는 방법: tree 내 start, end 시간대가 인접한 word끼리 edge로 연결해준다.

2.

stack decoder를 쓰지 않는 이유는 stack에서 pop을 할 때 cost가 비교적 낮은 초기 node의 path만 튀어나오기 때문이다.
A* algorithm은 path cost를 계산할 때 애초에 dijkstra 알고리즘으로 sink node까지의 cost를 미리 계산해둔 것을 반영하기 때문에 초기 node path만 stack에서 튀어나오는 경향이 덜하다.
Dijkstra's algorithm: Directed Acyclic Graph의 특정 두 node 간 최소 cost path를 찾는 것

4.

(all path prob through the word)를 계산하기 위해 미리 forward, backward algorithm을 사용한다.
forward, backward algorithm: DAG의 특정 두 node 간 total path의 total cost를 구하는 것. HMM의 그것과 같다.

이제 다시 chain model 공부를 시작할 수 있을 것 같다.

댓글 없음:

댓글 쓰기

피드 구독하기: 댓글 (Atom)