singhals wrote:
> Didn't someone in the past, oh, say, month, mention software that would
> compare databases and flag matches?
>
> Not necessarily a *specific* genealogy program database, jsut databases
> in general?
>
> I'm looking for an easy way to vacuum up "hit" lists from Ancestry. WC,
> Google, et al, and find the common ones.
>
>
> Cheryl
I don't recall anything like that and a quick google doesn't find
anything. Wishful thinking?
It's an interesting problem. First of all what's the format of the hit
lists? Are the hits from all the sources in the same format?
Secondly, most comparison tools that I can think of work on a specific
file format, usually a flat text file although there are some that work
on XML files. You would need to get the files into the appropriate
format.
Thirdly, many comparison tools do the opposite of what you want - they
look for differences. My favourite approach to looking for multiple
occurrences of *identical* lines across multiple files would be the Unix
command
cat x y z|sort|uniq -c|sort -rn|more
where x, y & z would be 3 file names (you can cat as few or many files
as you like). This will merge the contents into alphabetical order so
that duplicates follow each other, process each line with the count of
times it was found, re-sort them in descending order of count and page
the output. You can then see which lines were in more than one file but
not which file they were in.
This requires that you have the hits in a common flat file format or can
convert them to that; that hits which you would consider matching are
identical within the files; that you either don't care which lists the
matches were in, don't mind just comparing them in pairs or are prepared
to hunt for them in the files and finally that you have access to
Unix-style commands (if you're on Windows only, google for "cygwin").
--
Ian
Hotmail is for spammers. Real mail address is igoddard
at nildram co uk


|