I did some work in this area Cheryl but I elected to keep a simple
rich-text
description of the blow-by-blow gathering of evidence, e.g. where it came
from, how, snippets of conversations with individuals (copied from email,
IM, etc). It felt like projects such as Gentech might be trying to
over-formalise such data. Obviously a lot of data such as linkages,
events,
dates, and stuff can be formalised but the record of the 'breadcrumb
trails'
you followed to get that data could be as varied in content and format as
any of us could imagine. The provision of a simple "notes" item to
accompany
each item of formalised data seemed to be a practical compromise.
The use of "rich-text" as opposed to plain text allowed me to embed links
to
specific parts of the formalised data, but that's covered in other
threads.
Tony Proctor
"singhals" <singhals@[EMAIL PROTECTED]
> wrote in message
news:9N-dnW2F5OThdRHanZ2dnUVZ_oKhnZ2d@[EMAIL PROTECTED]
> Robert Grumbine wrote:
>
> > Oh well, a new person to the field, with ideas shaped by another,
> > to whine some about what's available. Nothing new there. But maybe
> > my whining can provide targets (some things I complain about might
> > be solved) or, as we continue, some sup****t for doing certain things
> > could develop. I could write some suitable software to implement
> > certain ideas, if it looked worthwhile.
> >
> > I've done some back reading as I get into the subject, including
> > the gedcom/xml arguments, and am not really trying to go back to
those.
> >
> > One interesting thing to me was the mention of the GENTECH
> > Genealogical Data Model. The sad news there being that, apparently,
> > nobody actually implements it. Or anything particularly close.
> >
> > I come to the computing/data from a science field (oceanography)
> > and one of the things which has promptly bothered me is that the
> > software available (paf, legacy, reunion) seems far too aimed
> > at conclusions rather than evidence, and even more poorly aimed
> > at representing source information trails.
> >
> > The evidence trail is something particularly bothersome
> > to me. From my field, let's say our original observation is that it
> > was 22.2 C. Now, if that was all we had, we'd be ticked, because it
> > doesn't tell us when the observation was taken, where it was, or
> > how it was taken. All these metadata are im****tant, and usually you
can
> > get them (with sufficient patience and phone calls, rather like
> > genealogy in that, it seems).
> >
> > But that is only the proverbial tip of the ice berg. Because
> > that 22.2 C observation (with rest of sup****t) is almost certainly not
> > exactly the number we're going to use for analyzing the air-sea
> > heat flux, or sea surface temperature, or whatever it is we're doing.
> > The thing is, each observing method has biases. We know this, so
> > adjust for them as relevant to our problem at hand. The problem that
> > we _could_ run in to is that the 22.2 we now see is not the actual
> > original observation. Someone could already have made the adjustment
> > for intake temperature bias. How we avoid this is that the data
> > (are supposed to be) are given histories. The original observation
> > (and its metadata) are augmented by a new value and _its_ metadata
> > (22.4 C after George applied John Doe's intake temperature bias
> > correction, say), and this additional information then follows along.
> > I could decide that John Doe's correction method is not the best,
> > and instead apply, myself, Mary Roe's -- to the original 22.2, now
> > that I know the 22.4 was after somebody else applied a correction I
> > don't like to arrive at it. Not clear to me yet (I've been doing
> > some light reading of the data model do***ent, but not carefully
> > nor complete) whether the GENTECH sup****ts this sort of consideration.
> >
> > A different problem is that the typical software treatment seems
> > to be that it has little or no ability to track exactly what the
> > evidence and sources are. For instance, it seems that if I im****t a
> > file from someone and they cite a census record, I have my choice of
> > ignoring that _my_ source was Jane Genealogist, not the orignal
record,
> > and preserve the census citation, or I can _add_ Jane as a source.
> > Now this is a problem, in my mind. When I look later, it will show
> > two sources -- the census, and Jane. But my real state of knowledge
> > is only that Jane _said_ the census had some information. This isn't
> > two independant sources, it's 1 source, 1 step removed from the
> > primary do***ent. (Please, no jumping on that usage, I realize that
> > there's a trade meaning to the term 'primary do***ent', and census
> > isn't an example.) What I want the software to do is, when I im****t
> > a file that has citations, mark that my source is Jane, and her
> > sources were ... whatever she said. If I'm making a 20th generation
> > copy/im****t (of a copy of a copy ...), then the software should show
> > the prior 19 im****ters as well as the original person who looked at
> > a do***ent. GENTECH seems to sup****t this concern of mine, but
> > with no implementation thereof, I'm still sol.
> >
> >
>
>
> First off -- PAF, Legacy, Reunion are all lineage-linked
> databases. You'll probably be slightly happier with one of
> the EVENT-linked databases; I know there are at least two, I
> remember only one name (The Master Genealogist).
>
> Second, when those older programs were being written, a
> permanent way to record conclusions is what was wanted. NO
> ONE wanted to have to keep handwriting copies for the family
> if the computer would print it out for you. TMG came along
> later, when computer genealogy wasn't quite as insular as it
> had been. But, I'd venture to suggest that out of any 100
> genealogists at least 51% _still_ want a program to record
> their conclusions so they can print it out. This doesn't
> mean that 49% is insignificant, it just means it's the minority.
>
> Now.
>
> I like the concept (I can hear people falling over in
> droves) of tracking who-said-what-and-when-did-he-say-it.
> However, let's bring a touch of realism in ... I'll even
> play fair and use one of my smaller databases as the example.
>
> Database L has 2000 names; each name has one source per
> datapoint (i.e., a source for the name, for the parent
> relation****p, for the bd, for the bp, for the spouse, for
> the md, for the mp, for the dd, for the dp), which is 10
> sources per name, potentially 20,000 source entries. By
> the time that data is re-tagged with each of 20 iterations,
> it is going to be unmanageable. The more sup****ting
> do***entation (i.e., complete extracts of books, images of
> do***ents, etc etc) you include, the faster it will become
> unmanageable.
>
> I tried doing it manually for one project, but it palled
> very quickly.
>
> I still like the idea of knowing where you got it, but I'm
> unconvinced it is worth the programmer's effort or the
> user's effort of maintaining the chain-of-evidence.
>
> Cheryl


|