N - Gram Overlap
Use the n-gram index (recall wild-card search) to retrieve all lexicon terms matching any of the query n-grams
Example )
- Simple Lower Bound of Substring Edit Distance
정확하진 않지만 Lower Bound ( 근사값 ) - The minimun number of edit operations
SOUNDEX
Class of heuristics to expand a query into phonetic equivalents
발음이 동등한지 여부를 판단한다
-
Retain the first letter of the word ( 첫 글자는 생략 )
- Change all occurrences of the following letters to ‘0’ (zero)
- A, E, I, O, U, H, W, Y
- Change letters to digits as follows
- B, F, P, V : 1
- C,G,J,K,Q,S,X,Z : 2
- D,T : 3
- L : 4
- M, N : 5
- R : 6
–> 외울 필요 없음
- Repeatedly remove one out of each pair of consecutive identical digits.( 반복되는 Pair 제거 )
- Remove all zeros from the resulting string ( 0 모두 지움 )
- Pad the resulting string with trailing zeros and return the first four positions ( 길이 맞추기 위해 0 padding )
Example )
What queries can we process?
- Positional inverted index with skip pointers
- Wild-card index
- Spell-correction
- Soundex
Query Search Engine Such as (SPELL(moriset) /3 toron*to) OR
SOUNDEX(chaikofski) …