Intersecting two postings lists ( Continue )
For the intersection of two posting lists of lengths x and y, its time complexity is O(x + y)
Completeness
Prove that the algorithm INTERSECTION finds the complete list of common docIDs
Proof by mathematical induction
-
Premise
Lists L1 and L2 share a set of common docIDs <d1, d2, .., dn> which are in the increasing order -
Prove
- we change the presudo code slightly that initial answer is { d0 }
- At the beginning of each Iteration where d(i-1) < docId(p1) < d(i) and d(i-1) < docId(p2) < d(i)
-> answer is {d0, d1, d2, … d(i-1)}
(Loop Invariant) -
Maintenance
=> p1 이나 p2 가 shift 되면 d(i)보다 커질수 있다?
===> 아니다. Loop Invariant 에서 어긋남
===> Thus, At the beginning of next Iteration, the loop invariant is still true
4. Temination
For simple proof, assume docId(NULL) = d(Last number) > d(n)
W.L.O.G , Let’s say p1 = NULL . That is d(n) < docId(p1) <= d(L) and d(n) < docId(p2) <= d(L)
By the Loop invariant, answer is {d0, …, d(n)}