>> you don't need anything so hi-falautin' as a data-model
....which is basically my point Cheryl. You do need the data model for the
formalised data, and that model must be flexible enough to cover the
semantic & syntactic issues with variant dates, names, places, etc. All I
was saying is that this kind of audit trail of how you came about the
formalised data can be simply attached to each item as a free-form
meta-data
tag. It would seem to be a case of knowing when best to formalise data and
when best to leave it free-form
Tony Proctor
"singhals" <singhals@[EMAIL PROTECTED]
> wrote in message
news:tM6dnc77o_CMRj3anZ2dnUVZ_oKhnZ2d@[EMAIL PROTECTED]
> If that's all you're wanting to do, you don't need anything
> so hi-falautin' as a data-model. You need a simple lab
> notebook used as a log. Your .RTF file is perfectly good,
> up to a point, and that point is where/when/if you have to
> PROVE you didn't go back and tweak the data in it to make it
> fit.
>
> Cheryl
>
> Tony Proctor wrote:
>
> > I did some work in this area Cheryl but I elected to keep a simple
rich-text
> > description of the blow-by-blow gathering of evidence, e.g. where it
came
> > from, how, snippets of conversations with individuals (copied from
email,
> > IM, etc). It felt like projects such as Gentech might be trying to
> > over-formalise such data. Obviously a lot of data such as linkages,
events,
> > dates, and stuff can be formalised but the record of the 'breadcrumb
trails'
> > you followed to get that data could be as varied in content and format
as
> > any of us could imagine. The provision of a simple "notes" item to
accompany
> > each item of formalised data seemed to be a practical compromise.
> >
> > The use of "rich-text" as opposed to plain text allowed me to embed
links to
> > specific parts of the formalised data, but that's covered in other
threads.
> >
> > Tony Proctor
> >
> > "singhals" <singhals@[EMAIL PROTECTED]
> wrote in message
> > news:9N-dnW2F5OThdRHanZ2dnUVZ_oKhnZ2d@[EMAIL PROTECTED]
> >
> >>Robert Grumbine wrote:
> >>
> >>
> >>> Oh well, a new person to the field, with ideas shaped by another,
> >>>to whine some about what's available. Nothing new there. But maybe
> >>>my whining can provide targets (some things I complain about might
> >>>be solved) or, as we continue, some sup****t for doing certain things
> >>>could develop. I could write some suitable software to implement
> >>>certain ideas, if it looked worthwhile.
> >>>
> >>> I've done some back reading as I get into the subject, including
> >>>the gedcom/xml arguments, and am not really trying to go back to
those.
> >>>
> >>> One interesting thing to me was the mention of the GENTECH
> >>>Genealogical Data Model. The sad news there being that, apparently,
> >>>nobody actually implements it. Or anything particularly close.
> >>>
> >>> I come to the computing/data from a science field (oceanography)
> >>>and one of the things which has promptly bothered me is that the
> >>>software available (paf, legacy, reunion) seems far too aimed
> >>>at conclusions rather than evidence, and even more poorly aimed
> >>>at representing source information trails.
> >>>
> >>> The evidence trail is something particularly bothersome
> >>>to me. From my field, let's say our original observation is that it
> >>>was 22.2 C. Now, if that was all we had, we'd be ticked, because it
> >>>doesn't tell us when the observation was taken, where it was, or
> >>>how it was taken. All these metadata are im****tant, and usually you
can
> >>>get them (with sufficient patience and phone calls, rather like
> >>>genealogy in that, it seems).
> >>>
> >>> But that is only the proverbial tip of the ice berg. Because
> >>>that 22.2 C observation (with rest of sup****t) is almost certainly
not
> >>>exactly the number we're going to use for analyzing the air-sea
> >>>heat flux, or sea surface temperature, or whatever it is we're doing.
> >>>The thing is, each observing method has biases. We know this, so
> >>>adjust for them as relevant to our problem at hand. The problem that
> >>>we _could_ run in to is that the 22.2 we now see is not the actual
> >>>original observation. Someone could already have made the adjustment
> >>>for intake temperature bias. How we avoid this is that the data
> >>>(are supposed to be) are given histories. The original observation
> >>>(and its metadata) are augmented by a new value and _its_ metadata
> >>>(22.4 C after George applied John Doe's intake temperature bias
> >>>correction, say), and this additional information then follows along.
> >>>I could decide that John Doe's correction method is not the best,
> >>>and instead apply, myself, Mary Roe's -- to the original 22.2, now
> >>>that I know the 22.4 was after somebody else applied a correction I
> >>>don't like to arrive at it. Not clear to me yet (I've been doing
> >>>some light reading of the data model do***ent, but not carefully
> >>>nor complete) whether the GENTECH sup****ts this sort of
consideration.
> >>>
> >>> A different problem is that the typical software treatment seems
> >>>to be that it has little or no ability to track exactly what the
> >>>evidence and sources are. For instance, it seems that if I im****t a
> >>>file from someone and they cite a census record, I have my choice of
> >>>ignoring that _my_ source was Jane Genealogist, not the orignal
record,
> >>>and preserve the census citation, or I can _add_ Jane as a source.
> >>>Now this is a problem, in my mind. When I look later, it will show
> >>>two sources -- the census, and Jane. But my real state of knowledge
> >>>is only that Jane _said_ the census had some information. This isn't
> >>>two independant sources, it's 1 source, 1 step removed from the
> >>>primary do***ent. (Please, no jumping on that usage, I realize that
> >>>there's a trade meaning to the term 'primary do***ent', and census
> >>>isn't an example.) What I want the software to do is, when I im****t
> >>>a file that has citations, mark that my source is Jane, and her
> >>>sources were ... whatever she said. If I'm making a 20th generation
> >>>copy/im****t (of a copy of a copy ...), then the software should show
> >>>the prior 19 im****ters as well as the original person who looked at
> >>>a do***ent. GENTECH seems to sup****t this concern of mine, but
> >>>with no implementation thereof, I'm still sol.
> >>>
> >>>
> >>
> >>
> >>First off -- PAF, Legacy, Reunion are all lineage-linked
> >>databases. You'll probably be slightly happier with one of
> >>the EVENT-linked databases; I know there are at least two, I
> >>remember only one name (The Master Genealogist).
> >>
> >>Second, when those older programs were being written, a
> >>permanent way to record conclusions is what was wanted. NO
> >>ONE wanted to have to keep handwriting copies for the family
> >>if the computer would print it out for you. TMG came along
> >>later, when computer genealogy wasn't quite as insular as it
> >>had been. But, I'd venture to suggest that out of any 100
> >>genealogists at least 51% _still_ want a program to record
> >>their conclusions so they can print it out. This doesn't
> >>mean that 49% is insignificant, it just means it's the minority.
> >>
> >>Now.
> >>
> >>I like the concept (I can hear people falling over in
> >>droves) of tracking who-said-what-and-when-did-he-say-it.
> >>However, let's bring a touch of realism in ... I'll even
> >>play fair and use one of my smaller databases as the example.
> >>
> >>Database L has 2000 names; each name has one source per
> >>datapoint (i.e., a source for the name, for the parent
> >>relation****p, for the bd, for the bp, for the spouse, for
> >>the md, for the mp, for the dd, for the dp), which is 10
> >>sources per name, potentially 20,000 source entries. By
> >>the time that data is re-tagged with each of 20 iterations,
> >>it is going to be unmanageable. The more sup****ting
> >>do***entation (i.e., complete extracts of books, images of
> >>do***ents, etc etc) you include, the faster it will become
> >>unmanageable.
> >>
> >>I tried doing it manually for one project, but it palled
> >>very quickly.
> >>
> >>I still like the idea of knowing where you got it, but I'm
> >>unconvinced it is worth the programmer's effort or the
> >>user's effort of maintaining the chain-of-evidence.
> >>
> >>Cheryl
> >
> >
> >


|