Talk About Network

Google





Education > Genealogy, Computing > Re: I would've ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 2 of 16 Topic 1940 of 2083
Post > Topic >>

Re: I would've sworn it was mentioned here

by Ian Goddard <goddai01@[EMAIL PROTECTED] > Mar 6, 2008 at 08:42 PM

singhals wrote:
> Didn't someone in the past, oh, say, month, mention software that would 
> compare databases and flag matches?
> 
> Not necessarily a *specific* genealogy program database, jsut databases 
> in general?
> 
> I'm looking for an easy way to vacuum up "hit" lists from Ancestry. WC, 
> Google, et al, and find the common ones.
> 
> 
> Cheryl

I don't recall anything like that and a quick google doesn't find 
anything.  Wishful thinking?

It's an interesting problem.  First of all what's the format of the hit 
lists?  Are the hits from all the sources in the same format?

Secondly, most comparison tools that I can think of work on a specific 
file format, usually a flat text file although there are some that work 
on XML files.  You would need to get the files into the appropriate
format.

Thirdly, many comparison tools do the opposite of what you want - they 
look for differences.  My favourite approach to looking for multiple 
occurrences of *identical* lines across multiple files would be the Unix 
command

cat x y z|sort|uniq -c|sort -rn|more

where x, y & z would be 3 file names (you can cat as few or many files 
as you like).  This will merge the contents into alphabetical order so 
that duplicates follow each other, process each line with the count of 
times it was found, re-sort them in descending order of count and page 
the output.  You can then see which lines were in more than one file but 
not which file they were in.

This requires that you have the hits in a common flat file format or can 
convert them to that; that hits which you would consider matching are 
identical within the files; that you either don't care which lists the 
matches were in, don't mind just comparing them in pairs or are prepared 
to hunt for them in the files and finally that you have access to 
Unix-style commands (if you're on Windows only, google for "cygwin").

-- 
Ian

Hotmail is for spammers.  Real mail address is igoddard
at nildram co uk
 




 16 Posts in Topic:
I would've sworn it was mentioned here
singhals <singhals@[EM  2008-03-05 19:36:10 
Re: I would've sworn it was mentioned here
Ian Goddard <goddai01@  2008-03-06 20:42:06 
Re: I would've sworn it was mentioned here
Dennis <nobody@[EMAIL   2008-03-06 16:02:53 
Re: I would've sworn it was mentioned here
Carl <cwsachs@[EMAIL P  2008-03-06 20:18:39 
Re: I would've sworn it was mentioned here
Ian Goddard <goddai01@  2008-03-07 08:58:37 
Re: I would've sworn it was mentioned here
singhals <singhals@[EM  2008-03-07 10:38:11 
Re: I would've sworn it was mentioned here
singhals <singhals@[EM  2008-03-07 10:37:41 
Re: I would've sworn it was mentioned here
Hugh Watkins <hugh.wat  2008-03-06 23:21:20 
Re: I would've sworn it was mentioned here
Ian Goddard <goddai01@  2008-03-07 09:17:59 
Re: I would've sworn it was mentioned here
singhals <singhals@[EM  2008-03-07 10:36:57 
Re: I would've sworn it was mentioned here
Ian Goddard <goddai01@  2008-03-07 16:46:30 
Re: I would've sworn it was mentioned here
singhals <singhals@[EM  2008-03-08 10:41:52 
Re: I would've sworn it was mentioned here
Wes Groleau <groleau+n  2008-03-08 00:37:38 
Re: I would've sworn it was mentioned here
Don Kirkman <donsno2@[  2008-03-08 12:28:38 
Re: I would've sworn it was mentioned here
Ian Goddard <goddai01@  2008-03-08 23:58:34 
Re: I would've sworn it was mentioned here
Don Kirkman <donsno2@[  2008-03-09 14:37:32 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
localhost-V2008-12-19 Fri Jan 9 1:13:13 PST 2009.