20130502

GEDCOM musings and rambles (no rants)

GEDCOM


After several partially successful attempts to write a GEDCOM C++ library I've slowly run into enough design problems to suggest my approach is flawed. In what I thought was an obvious direction (top-down) I created top level objects of the major record types found in a GEDCOM file. Starting at the highest level:
  • 0 «Header»
  • 0 «Submission_Record»
  • 0 «Record»
  • 0 Trlr
Where Record expands to:
  • n «FAM_RECORD»
  • n «INDIVIDUAL_RECORD»
  • n «MULTIMEDIA_RECORD»
  • n «NOTE_RECORD»
  • n «REPOSITORY_RECORD»
  • n «SOURCE_RECORD»
  • n «SUBMITTER_RECORD»
And as an example the FAM_RECORD expands to:
  • n @<XREF:FAM>@ FAM
    • +1 RESN <RESTRICTION_NOTICE>
    • +1 «FAMILY_EVENT_STRUCTURE»
    • +1 HUSB @<XREF:INDI>@
    • +1 WIFE @<XREF:INDI>@
    • +1 CHIL @<XREF:INDI>@
    • +1 NCHI <COUNT_OF_CHILDREN>
    • +1 SUBM @<XREF:SUBM>@
    • +1 «LDS_SPOUSE_SEALING»
    • +1 REFN <USER_REFERENCE_NUMBER>
      • +2 TYPE <USER_REFERENCE_TYPE>
    • +1 RIN <AUTOMATED_RECORD_ID>
    • +1 «CHANGE_DATE»
    • +1 «NOTE_STRUCTURE»
    • +1 «SOURCE_CITATION»
    • +1 «MULTIMEDIA_LINK»
Were this all it would have been a successful strategy. However, note the presence of «SOMETHING» references. These are why GEDCOM is referred to as a Linage-Linked document form. Any item shown that way links to another sub-record which may well be a mix of similar nature: primitives and higher level forms. As an illustration of the problem consider the NOTE_STRUCTURE and CHANGE_DATE links. Interestingly enough, each NOTE_STRUCTURE link contains a CHANGE_DATE link. Even more fun, each CHANGE_DATE link contains a NOTE_STRUCTURE link.

This ramble is a kind of thinking on paper exercise to allow stating the problem and hopefully deriving a solution.

That said, my current thinking (hopefully box escaping) lies in reconsidering the nature of the Linage-Linked format. I my initial rush to create a hierarchy of objects I believe that I missed the obvious. The clue lies in the word 'linked'. The entire form can (and hopefully should) be thought of as a linked list. In a kind of lispish format, the highest level can be thought of this way:


  • (header)(link)
  • (submission_record)(link)
  • (record)(link)
  • (trlr)
Essentially the idea is to flatten the entire form into a single data type: a list. Each list has a type and data. Data may be terminal or a list of lists. If terminal it is a simple string. If a list of lists, it is a simple collection of the basic data type. (type)(link) all the way down.

I'm going to wander off and play with this idea—I'll return to this white board when I learn more about what I am thinking here…

20130423

A Leaf named Postscript

As in the language that is. I'll probably switch to writing about what I know best and that would most likely be programming. I certainly won't stop ranting, but that doesn't have anything to do with "know best".

This was a draft that I had forgotten about. At a guess it was written while I was doing a fair amount of Postscript programming. For reasons that are unclear to me at least, I've always liked Postscript and before that Forth. Reverse Polish Stack Based Languages (RPSBLs ?) are just cool in my book. More on this at some point—back to the real world ah-well!

At it again...

After a substantial absence it occurs to me that I should get back to it so to speak. So I will. Soon. Really. Trust me™