Here is the question. This concerns a claim of plagarism. There are
two indexes of a similar text numbering about 750,000 words. The first
index has 27,740 terms in it, while the second index has 3,500 terms
in it. The authors of the first index claim that the authors of the
second plagarized their index, but it turns out the indexes are mostly
different, and only a few terms are similar. Can anyone calculate what
the random similarity would be, i.e., if we assume that there was no
plagarism and that index 1 (27740 terms) and index 2 (3500 terms) were
independently derived, what would be the probability that some of the
terms would still be identical if the text to which the indexes refer
is 80%-90% similar.


|