Intersecting two postings lists ( Continue )
AND NOT (p1, p2) -> p1 에는 있고 p2 에는 없는것
OR (p1,p2) -> p1 이나 p2 둘중 아무 곳에다 있는것
Boolean queries
Boolean Queries are queries using AND, OR and NOT to join query terms
Perhaps the simplest model to build an IR system on
- Views each document as a set of words
- Is precise: document matches condition or not
==> set : word 의 order / frequency 는 고려하지 않는다.
Example - WestLaw
Phrase queries
Biword indexes
————
- Index every consecutive pair of terms in the text as a phrase
- Each of these biwords is now a dictionary term
#Problem
Longer phrase queries -> n words ==> (n+1)Combination(2) ==> O(n^2)
Index blowup due to bigger dictionary
Infeasible for more than biwords, big even for them
ex)
A B C D E -> there is 6 position to stand words
So (6)Combination(2) biwords exists
Positional indexes
- Extract inverted index entries for each distinct term
- Merge their doc: position lists to enumerate all positions with terms
- Same general method for proximity searches
“to be or not to be” -> “to” “be” “to” “be”
Index -> “to” 0 / “be” 1 / “to” 4 / “be” 5
1 / 2 과정
3 과정
DocId 가 다르면 포인터를 다음 DocId 로 옮긴다.
Result ==> 429/430 (first to be) , 433/434 (second to be)