flame.dawn@[EMAIL PROTECTED]
wrote:
> Here is the question. This concerns a claim of plagarism. There are
> two indexes of a similar text numbering about 750,000 words. The first
> index has 27,740 terms in it, while the second index has 3,500 terms
> in it. The authors of the first index claim that the authors of the
> second plagarized their index, but it turns out the indexes are mostly
> different, and only a few terms are similar. Can anyone calculate what
> the random similarity would be, i.e., if we assume that there was no
> plagarism and that index 1 (27740 terms) and index 2 (3500 terms) were
> independently derived, what would be the probability that some of the
> terms would still be identical if the text to which the indexes refer
> is 80%-90% similar.
No.
I find it confusing that you are talking about two different texts, and
I have no idea what "80%-90% similar" should be taken to mean.
If two indices were generated from the same text, using a largely
automated technique, I would expect the smaller index to be a subset of
the larger one.
I find it difficult to conceive of two indices, to "similar" text, that
are "mostly different" - unless one (at least) is an abysmally bad index.
--
Lau AS! d-(!) a++ c++++ p++ t+ f-- e++ h+ r--(+) n++(*) i++ P- m++
ASC Decoder at <http://www32.brinkster.com/ascdecode/>


|