Ian Goddard wrote:
> singhals wrote:
>
>> Didn't someone in the past, oh, say, month, mention software that
>> would compare databases and flag matches?
>>
>> Not necessarily a *specific* genealogy program database, jsut
>> databases in general?
>>
>> I'm looking for an easy way to vacuum up "hit" lists from Ancestry.
>> WC, Google, et al, and find the common ones.
>>
>>
>> Cheryl
>
>
> I don't recall anything like that and a quick google doesn't find
> anything. Wishful thinking?
>
> It's an interesting problem. First of all what's the format of the hit
> lists? Are the hits from all the sources in the same format?
>
> Secondly, most comparison tools that I can think of work on a specific
> file format, usually a flat text file although there are some that work
> on XML files. You would need to get the files into the appropriate
format.
>
> Thirdly, many comparison tools do the opposite of what you want - they
> look for differences. My favourite approach to looking for multiple
> occurrences of *identical* lines across multiple files would be the Unix
> command
>
> cat x y z|sort|uniq -c|sort -rn|more
>
> where x, y & z would be 3 file names (you can cat as few or many files
> as you like). This will merge the contents into alphabetical order so
> that duplicates follow each other, process each line with the count of
> times it was found, re-sort them in descending order of count and page
> the output. You can then see which lines were in more than one file but
> not which file they were in.
>
> This requires that you have the hits in a common flat file format or can
> convert them to that; that hits which you would consider matching are
> identical within the files; that you either don't care which lists the
> matches were in, don't mind just comparing them in pairs or are prepared
> to hunt for them in the files and finally that you have access to
> Unix-style commands (if you're on Windows only, google for "cygwin").
>
Yes, quite possibly I was mis-remembering either the details
or the list. I couldn't find it either. (g)
I've done it by hand, and it's not /that/ onerous, but the
person who needs it would reach for the smellin' salts if I
mentioned Unix or even CMD lines.
Thanks.
Cheryl


|