Link: http://www.cs.cmu.edu/afs/cs/user/bhiksha/WWW/courses/11-756.asr/spring2013/
Class 4 - Text matching
Edit distance
- How do we compute minimum distance between template 'ABBAAACBADDA' and input 'CBBBACCCBDDDDA'?
- It also called 'cost, 'Levenshtein distance'
Insertion
- template: DDA
- input: DDDDA
- insertion: DD(DD)A
Deletion
- template: AACB
- input: CB
- deletion: (AA)CB
Substitution
- template: BBA
- input: CBA
- substitution: (C)BA
Find the minimum edit distance!
- But it is computationally intractable. (possible paths exponentially increase)
To get minimum edit distance, use dynamic programming(DP)
- DP allows using the previous result(minimum edit distance at the previous stage).
- Of course, DP needs a back-pointer to track an optimal path.
Memory cost for string matching
- 2 columns only needed
- In most cases using in-place, only 1 column needed
Parallel computation
- Stacking trellises along with columns, Then all trellis can be computed concurrently.
Trellis sharing -> template sharing -> lexical trees (!!!)
- Having one sub-trellis belonging several trellises
Pruning
- Heuristic, the right threshold should be found
Pruning by limiting search path
- Having a width on diagonal
- Cons: hard to imagine diagonal on lexical tree
Pruning by limiting path cost
- kill partial paths which have large cost
- Cons: how can we define the threshold?
- the absolute cost increases along with the length of the path!
Beam search
- Thresholding with relative cost T
- How can we define the threshold? -> heuristically -> It works great to many cases!
댓글 없음:
댓글 쓰기