Intersecting two postings lists ( Continue )
For the intersection of two posting lists of lengths x and y, its time complexity is O(x + y)
Intersect(p1,p2)
answer = {}
while p1 is not null and p2 is not null:
if docId(p1) == docId(p2):
add(answer,docId(p1))
p1 <- next(p1)
p2 <- next(p2)
else if docId(p1) < docId(p2):
p1 <- next(p1)
else :
p2 <- next(p2)
return answer
Completeness
Prove that the algorithm INTERSECTION finds the complete list of common docIDs
Proof by mathematical induction
-
Premise
Lists L1 and L2 share a set of common docIDs <d1, d2, .., dn> which are in the increasing order -
Prove
- we change the presudo code slightly that initial answer is { d0 }
- At the beginning of each Iteration where d(i-1) < docId(p1) < d(i) and d(i-1) < docId(p2) < d(i)
-> answer is {d0, d1, d2, … d(i-1)}
(Loop Invariant) -
Maintenance
if docId(p1) == docId(p2) == d(i):
answer = {d0, ... , d(i)}
d(i) < docId(p1) <= d(i+1)
d(i) < docId(p2) <= d(i+1)
else :
d(i-1) < docId(p1) <= d(i)
d(i-1) < docId(p2) <= d(i)
=> p1 이나 p2 가 shift 되면 d(i)보다 커질수 있다?
===> 아니다. Loop Invariant 에서 어긋남
===> Thus, At the beginning of next Iteration, the loop invariant is still true
4. Temination
For simple proof, assume docId(NULL) = d(Last number) > d(n)
W.L.O.G , Let’s say p1 = NULL . That is d(n) < docId(p1) <= d(L) and d(n) < docId(p2) <= d(L)
By the Loop invariant, answer is {d0, …, d(n)}