In article <9N-dnW2F5OThdRHanZ2dnUVZ_oKhnZ2d@[EMAIL PROTECTED]
>,
singhals <singhals@[EMAIL PROTECTED]
> wrote:
>Robert Grumbine wrote:
>
>> Oh well, a new person to the field, with ideas shaped by another,
>> to whine some about what's available. Nothing new there. But maybe
>> my whining can provide targets (some things I complain about might
>> be solved) or, as we continue, some sup****t for doing certain things
>> could develop. I could write some suitable software to implement
>> certain ideas, if it looked worthwhile.
>>
>> I've done some back reading as I get into the subject, including
>> the gedcom/xml arguments, and am not really trying to go back to those.
>>
>> One interesting thing to me was the mention of the GENTECH
>> Genealogical Data Model. The sad news there being that, apparently,
>> nobody actually implements it. Or anything particularly close.
>>
>> I come to the computing/data from a science field (oceanography)
>> and one of the things which has promptly bothered me is that the
>> software available (paf, legacy, reunion) seems far too aimed
>> at conclusions rather than evidence, and even more poorly aimed
>> at representing source information trails.
>>
>> The evidence trail is something particularly bothersome
>> to me. From my field, let's say our original observation is that it
>> was 22.2 C. Now, if that was all we had, we'd be ticked, because it
>> doesn't tell us when the observation was taken, where it was, or
>> how it was taken. All these metadata are im****tant, and usually you
can
>> get them (with sufficient patience and phone calls, rather like
>> genealogy in that, it seems).
>>
>> But that is only the proverbial tip of the ice berg. Because
>> that 22.2 C observation (with rest of sup****t) is almost certainly not
>> exactly the number we're going to use for analyzing the air-sea
>> heat flux, or sea surface temperature, or whatever it is we're doing.
>> The thing is, each observing method has biases. We know this, so
>> adjust for them as relevant to our problem at hand. The problem that
>> we _could_ run in to is that the 22.2 we now see is not the actual
>> original observation. Someone could already have made the adjustment
>> for intake temperature bias. How we avoid this is that the data
>> (are supposed to be) are given histories. The original observation
>> (and its metadata) are augmented by a new value and _its_ metadata
>> (22.4 C after George applied John Doe's intake temperature bias
>> correction, say), and this additional information then follows along.
>> I could decide that John Doe's correction method is not the best,
>> and instead apply, myself, Mary Roe's -- to the original 22.2, now
>> that I know the 22.4 was after somebody else applied a correction I
>> don't like to arrive at it. Not clear to me yet (I've been doing
>> some light reading of the data model do***ent, but not carefully
>> nor complete) whether the GENTECH sup****ts this sort of consideration.
>>
>> A different problem is that the typical software treatment seems
>> to be that it has little or no ability to track exactly what the
>> evidence and sources are. For instance, it seems that if I im****t a
>> file from someone and they cite a census record, I have my choice of
>> ignoring that _my_ source was Jane Genealogist, not the orignal record,
>> and preserve the census citation, or I can _add_ Jane as a source.
>> Now this is a problem, in my mind. When I look later, it will show
>> two sources -- the census, and Jane. But my real state of knowledge
>> is only that Jane _said_ the census had some information. This isn't
>> two independant sources, it's 1 source, 1 step removed from the
>> primary do***ent. (Please, no jumping on that usage, I realize that
>> there's a trade meaning to the term 'primary do***ent', and census
>> isn't an example.) What I want the software to do is, when I im****t
>> a file that has citations, mark that my source is Jane, and her
>> sources were ... whatever she said. If I'm making a 20th generation
>> copy/im****t (of a copy of a copy ...), then the software should show
>> the prior 19 im****ters as well as the original person who looked at
>> a do***ent. GENTECH seems to sup****t this concern of mine, but
>> with no implementation thereof, I'm still sol.
>
>
>First off -- PAF, Legacy, Reunion are all lineage-linked
>databases. You'll probably be slightly happier with one of
>the EVENT-linked databases; I know there are at least two, I
>remember only one name (The Master Genealogist).
Thanks. I'll look for that one. Do mention the other if
the name comes to you.
>Second, when those older programs were being written, a
>permanent way to record conclusions is what was wanted. NO
>ONE wanted to have to keep handwriting copies for the family
>if the computer would print it out for you. TMG came along
>later, when computer genealogy wasn't quite as insular as it
>had been. But, I'd venture to suggest that out of any 100
>genealogists at least 51% _still_ want a program to record
>their conclusions so they can print it out. This doesn't
>mean that 49% is insignificant, it just means it's the minority.
>
>Now.
And probably for at least a while yet. Although plenty of people
_could_ set up a proper meteorological enclosure, most people are
happy with the official recording station's numbers, or even somebody
else's far less than proper stations and their re****ted figures.
This is one reason the gedcom argument(s) struck me as misdirected.
It does do a job more or less well -- this one here of representing
conclusions. Once I've gotten to some conclusions, there's definitely
a big plus to having something that makes up nice pictures of them
and would let me pass them on to other people. I also consider it
a plus that the gedcom is a plain text format. Give or take some
nuisance, you're at least not _worse_ off than having a stack of
paper -- just make a text print of the gedcom file itself. (Sure,
many pages would be involved. But paper doesn't go technologically
obsolete as the usual computer storage media.)
>I like the concept (I can hear people falling over in
>droves) of tracking who-said-what-and-when-did-he-say-it.
>However, let's bring a touch of realism in ... I'll even
>play fair and use one of my smaller databases as the example.
Actually, data base size is another of my 'issues'. The
software (I've seen) seems to have been aimed at being a step
up from index cards. It succeeds, but aimed at index card-scale
problems. I'd like something that was designed thinking that I
might want to work with my last 20 generations of ancestry.
(Not that I believe I could actually get them all!) Or, maybe,
the last few generations of everybody in a town of 20,000.
(And such is probably already happening for some biological
research.)
>Database L has 2000 names; each name has one source per
>datapoint (i.e., a source for the name, for the parent
>relation****p, for the bd, for the bp, for the spouse, for
>the md, for the mp, for the dd, for the dp), which is 10
>sources per name, potentially 20,000 source entries. By
>the time that data is re-tagged with each of 20 iterations,
>it is going to be unmanageable. The more sup****ting
>do***entation (i.e., complete extracts of books, images of
>do***ents, etc etc) you include, the faster it will become
>unmanageable.
>
>I tried doing it manually for one project, but it palled
>very quickly.
>
>I still like the idea of knowing where you got it, but I'm
>unconvinced it is worth the programmer's effort or the
>user's effort of maintaining the chain-of-evidence.
This is one that programmers _can_ deal with easily. At least
I can envision a solution, and I only play at being a programmer.
(But hard enough that casual observers might not notice that I'm
not one.)
I'd hate to have to deal with this kind of thing manually.
Agreed. That's why we invented computers.
I think, for instance, if you're going to take the time to
carefully examine and cite 20,000 sources, that your work
in doing so should be acknowledged by all later users. As it
stands, it's actually hard to do so. There's no reason for
this. A program can easily say 'I got this from person A'
for each of the 20,000 sources. Doing so in the worst possible
way would add only order of 500 kb. (n.b. In my day job
I work with data sets in the gigabyte to terabyte range.)
I was more thinking the other side, though. If I, as a would-be
data im****ter, see that my currently-attempted source is 20
steps removed from the one who looked at the original do***ent
or first made conclusion, I (finally) have the option of tracking
down the original person and asking if they still believe what
was referenced. As notions of handwriting change (one relative's
name was transcribed as Lyda in an online source, but a glance at
the image made it obvious, to me, that it was Lydia -- once I got
there to see it), we'll revise our conclusions, for instance. The
person citing the edition of my conclusions that was made 20 years
ago, well, they're behind the times. But at least they can know
this upfront, without even writing me.
It seems that the standard response in geneaology is 'you should
always verify everything yourself'. As an ideal, whether we're
talking my day job or geneaology, this might have some merit. But,
part of what has made science successful is that we _can_ use (with
due citation :-) others' work. I don't have to re-develop all of
science before putting some degree of trust in a thermometer reading.
How much trust ... depends on what kind of thermometer, and what kind
of conclusion I'm trying to draw.
So it goes for my thinking regarding geneaology. Almost certainly
at some point in the past 200 years some sizeable chunks of my family's
tree have been do***ented fairly well, by somebody, somewhere, somewhen.
If I could start with that, well, that's a big help. Not that
I'm going to take it as gospel. Some parts of that rendition will
be incorrect, for any of many reasons. But, as with the thermometers,
some parts of it, I also don't care as much about. My 6th cousin
5 times removed, if that's wrong, I won't lose sleep over it. But
I _would_ like the source properly labelled and transmitted so that
if somebody ever finds out that my source was wrong, they could tell
me of this. (Not that I expect everybody would, or even many, possibly
not at all. But I'd like it to be possible, and the audit trail
sup****ted by the software by default.) The ones I'm most concerned
about are my direct lines (though I've started noting interesting
aunts and uncles along the way). _If_ sources could be shared better,
then when I nail down who Leonhart Krumbein's parents were (Leonhart
being the origin for my name, to the colonies in 1754) then Leonhart
Am Weg's descendants (he's two generations further back for me) who
aren't Krumbeins could suitably update this part of their tree. Not
a prime concern, but it's nice to keep things correct.
Can't say you're wrong, as I don't have standing for that.
But I'll explain my notions and let you try (it's possible, if
perhaps not easy :-) to persuade me that I'm looking at things
the wrong way. Better to figure out now, and what the better way
is, before I develop a major data set that turns out to be
constructed in a way that would make sense if they were
oceanographic observations, but not for geneaology.
--
Robert Grumbine http://www.radix.net/~bobg/
Science faqs and amateur
activities notes and links.
Sagredo (Galileo Galilei) "You present these recondite matters with too
much
evidence and ease; this great facility makes them less appreciated than
they
would be had they been presented in a more abstruse manner." Two New
Sciences


|