Wild - Card Queries
- “ * “ / “ $ “
- ” * “ means 0 or more occurrences
- ” $ “ represents the end ( m$ : m 으로 끝남 , $m : m 으로 시작함 )
Example
Easy with Binary tree ( or B+ Tree ) lexicon ( 사전 ) : retrieve all words in range : mon <= w < moo
Maintain an additional B+ Tree for terms backwards
- Query Processing
- At this point, we have an enumeration of all terms in the Dictionary that match the wild-card query
- We still have to look up the postings for each enumerated term
Example
–> Expensive ( B+ Tree 를 하나 더 사용하게 된다 )
Solution : Transform wild-card queries so that the “ * “ occur at the end
This give rise to the Permuterm Index
- Permuterm Index
해당 Word 에 대해 모든 Permuterm 을 Dictinary 에 넣고 Search 시 사용한다.
Exercise
- Bigram ( k - gram ) Indexes
- Enumerate all k - grams ( sequence of k chars ) occuring in any term
- $ is a special word boundary symbol
- Maintain a second inverted index from bigrams to dictionary terms that match each bigram
Example
- Processing wild - cards
-
Query mon * can now be run as
$m AND mo AND on - Gets terms that match AND version of our wildcard query
- But we may enumerate moon too!
- Must post-filter these terms against query.
- Or positional index can handle this problem
- Surviving enumerated terms are then looked up in the term-document inverted index
- Fast, space efficient ( Compared to permuterm )
- As before, we must execute a Boolean query for each enumerated, filtered term
- Wild-cards can result in expensive query execution (very large disjunctions…)