Date: Tue, 17 Apr 2007 15:25:24 -0400 From: William Pence To: FITSBITS Subject: [fitsbits] INHERIT and Hierarchical Grouping Doug Tody wrote (06-04-2007) regarding INHERIT: > I agree with Steve that this is a simple example of a broader problem > of associating relational entities. FITS is in essence a relational > system; every FITS object (even an image) is actually a table. > INHERIT is a simple means for specifying the relationship between two > or more tables composed as an MEF. A FITS MEF is a simple container > with one level of structure. ... > The real problem with INHERIT is that it is a simplistic solution to > what is a more general problem. Since no one else has commented on this yet, I'll just point out that the Hierarchical Grouping convention (now also open for public comment) provides a more general mechanism for specifying the relationship between multiple HDUs that may be in different files or even on different computer systems. Information about both the INHERIT and Hierarchical Grouping conventions is available at http://fits.gsfc.nasa.gov/fits_registry.html. Does anyone have any comments about the Hierarchical Grouping convention itself? Are there any deficiencies or limitations in this convention that have not been considered? Are there alternative ways of accomplishing the same thing that might be simpler or offer more features than this convention? Bill Pence ***************************************************************************** Date: Tue, 17 Apr 2007 14:20:23 -0600 (MDT) From: Doug Tody Cc: FITSBITS Subject: Re: [fitsbits] INHERIT and Hierarchical Grouping Hi Bill - I just took a quick look at this. A word of warning: when I first heard of this convention some weeks ago, I thought from the name that what was being referred to was the HIERARCH (hierarchical keyword) convention from ESO. I see now that this is something completely different. I only had time to skim the document, but it does appear to offer a more general alternative to INHERIT, and does much more. It reminds me somewhat of the GROUP construct in VOTable, which might be worth looking at as a comparision (the VOTable GROUP refers to fields rather than tables or extensions, but otherwise is similar in providing a way to describe hierarchical relationships). One difference is that in the FITS HGC a group tells what other groups it is a member of, as well as lists member extensions, whereas in VOTable GROUP, a GROUP only lists its member elements, which can be either other GROUPs or simpler member elements. Both of these constructs provide generic ways to specify a logical structure which applies to an otherwise unstructured collection of objects; since the structure is not explicit, the data can be viewed either way. An alternative approach is a data model, for example the entity-relationship model often used in relational databases. This can describe more complex relationships and does not require use of an explict grouping construct, but is less direct. - Doug ***************************************************************************** Date: Tue, 17 Apr 2007 13:30:27 -0700 From: Steve Allen To: FITSbits Subject: Re: [fitsbits] INHERIT and Hierarchical Grouping On Tue 2007-04-17T15:25:24 -0400, William Pence hath writ: > the Hierarchical Grouping convention (now also open for public comment) > provides a more general mechanism for specifying the relationship > between multiple HDUs that may be in different files > Does anyone have any comments about the Hierarchical Grouping convention > itself? Are there any deficiencies or limitations in this convention > that have not been considered? Are there alternative ways of > accomplishing the same thing that might be simpler or offer more > features than this convention? My initial impression was that the Hierarchical Grouping Convention (HGC) was initially a way of handling interconversions between FITS files and HDF files. That made it seem like "feature envy", and also like a solution in search of a problem. I think that in general the FITS community has not tried hard enough to acknowledge the cases where interoperability is hindered because one team chose one way of representing complex data structures. In most cases everyone else simply uses exactly the same data reduction system for those sorts of FITS files. The problems are solved because everything that has to know how somehow just knows how. Doug Mink's WCSTools code for FITS has all sorts of heuristics about how it tries to make sense of the zoo of different coordinate conventions which evolved before the WCS papers were approved and adopted. Bill Joye's DS9 viewer has a raft of GUI buttons to push so that the user can select which set of heuristics should be used when trying to decide how to present the data in a multi-HDU FITS file. HGC adopted a mentality that no existing HDU would have to be modified. That means that no single-HDU-minded application has any way of knowing that a given HDU belongs to a group. In some cases that's okay because the individual HDUs have significant meaning even when they stand alone. But in other cases the individual HDUs are small parts of a normalized data scheme where the picture only makes sense when all parts are considered. Those cases more nearly resemble the sorts of activity that goes on inside relational databases. What HGC does not provide merely within its descriptive document is a strongly motivating example of such things or MUST/SHOULD/MAY advisories which would guide the implementation of some sort of integrity checking mechanism. My impression is that I would want to have Bill Pence's questions answered by someone who has a deep knowledge of the sorts of problems that arise with highly structured data and the sorts of mechanisms that are used in relational databases. -- Steve Allen WGS-84 (GPS) UCO/Lick Observatory Natural Sciences II, Room 165 Lat +36.99845 University of California Voice: +1 831 459 3046 Lng -122.06025 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m ***************************************************************************** Date: Tue, 17 Apr 2007 17:36:04 -0400 From: William Pence To: FITSbits Subject: Re: [fitsbits] INHERIT and Hierarchical Grouping Steve Allen wrote: > My initial impression was that the Hierarchical Grouping Convention > (HGC) was initially a way of handling interconversions between FITS > files and HDF files. That made it seem like "feature envy", and also > like a solution in search of a problem. Here's the history as far as I remember it: The first draft of the grouping convention was written by Don Jennings in May 1994, and was designed for use in the HDF <--> FITS data file conversion project. The first really major application of the grouping convention didn't occur, however, until a few years later when Jennings began working for the INTEGRAL Gamma-ray satellite project; The grouping convention solved a major problem they had in how to manage the hundreds of individual FITS files than could be generated by a single observation. Support for the grouping convention is built into the INTEGRAL Data Access Layer software, and the API for creating and managing grouping tables (written by Jennings) was added to CFITSIO in approximately 1999. Bill ***************************************************************************** Date: Tue, 17 Apr 2007 23:29:47 +0100 (BST) From: "Malcolm J. Currie" Cc: FITSBITS Subject: Re: [fitsbits] INHERIT and Hierarchical Grouping Just glancing through the proposal, I noticed that it continues to misuse EXTNAME, although understandable given an omission from the generalized extensions paper. (Sorry Bill if I sound like a broken record on this point.) GROUPING is a type; it describes the function or semantics of the HDU or component in a data structure, it is not a named instance. Don added these EXT* keywords specifically with Starlink hierarchical data in mind. It's my fault for not picking the omission of an EXTTYPE keyword from the extension paper before publication. It only came blindly obvious once I tried to convert a Starlink NDF dataset into FITS and back to the hierarchical format. Once I'd added an EXTTYPE to record the component data type, to supplement the path within the hierarchy stored in EXTNAME, it was possible to convert to FITS and back to NDF recovering the original structure. "Generalized extensions and blocking factors for FITS", Section 9 does suggest using the component paths "to establish easy to understand relationships between different extensions and even between different extensions in different FITS files." The overall concept looks good and goes beyond the single file. However, I should re-read it more carefully. I do prefer the VOTable way of grouping as explained by Doug, through familiarity with that perspective in our own packages. Malcolm Currie Rutherford Appleton Laboratory ***************************************************************************** From: Mark Calabretta To: fitsbits@donar.cv.nrao.edu Date: Wed, 18 Apr 2007 12:34:46 +1000 Subject: Re: [fitsbits] Start of the ''Hierarchical Grouping' Public Comment Period On Mon 2007/04/09 15:17:45 -0400, William Pence wrote in a message to: fitsbits@donar.cv.nrao.edu >This is to announce the start of the 30-day Public Comment Period on the >'Hierarchical Grouping' FITS convention that has been submitted for Notes on "A Hierarchical Grouping Convention for FITS". The documentation is well written and leaves little doubt as to how the convention would be implemented in software. The API described in Sect. 7 is also very useful in this respect. I have only a few minor comments. * Some comment needs to be made on fragility of references to external FITS files, especially those using http:// or ftp:// URLs. What is supposed to happen if one of these references can't be satisfied? * There should be some mention of the need to avoid recursive grouping; i.e. a group cannot be a member of itself either directly or indirectly. * Can a HDU be added more than once to a grouping table? * In Sect. 2.2 it should be noted that EXTVER does not have to be sequential. The caption for example 2 incorrectly concludes that "it is the seventh group table" on the basis that EXTVER = 7. * Sect. 3, there needs to be some discussion on the meaning and use of the index "n" in GRPIDn and GRPLCn. * If grouping table A is a member of grouping table B, and grouping table B is a member of grouping table C, then should there be a GRPIDn/GRPLCn in grouping table A to say that (indirectly) it is also a member of grouping table C? * I suggest extending example 2 by illustrating a small portion of the grouping table itself, i.e. in rows and columns with column headers as fv would. * Appendix 1 seems to define keyword values with no associated keyword(s). * Sect 7 (third para) has "a given HDU may be referenced by up to 999 grouping tables simultaneously". Where does this limit come from? Typos noticed in passing: p3: "values, Thus, this" -> "values. Thus, this" p4: "mne-monic" -> "mnemonic". p9: Writing 'MEMBER LOCATION' for the value of MEMBER_LOCATION (etc.) makes them look like keyvalues, which they are for TTYPEn, but not in this situation. I suggest writing MEMBER_LOCATION (etc.) in italics instead. p11: "CFITSIO software library that is maintained by" -> "CFITSIO software library which is maintained by" p11 & 13: Use ``xxx'' for double-quoting in latex. p11: Missing full-stop after "Hierarchical Grouping Convention for FITS". ***************************************************************************** Date: Tue, 24 Apr 2007 14:32:14 -0400 From: William Pence To: fitsbits@donar.cv.nrao.edu Subject: Re: [fitsbits] Start of the ''Hierarchical Grouping' Public Comment Period Mark, I only played a minor role in the development of this Grouping convention, but here is my take on the issues you raised: Mark Calabretta wrote: > * Some comment needs to be made on fragility of references to external > FITS files, especially those using http:// or ftp:// URLs. What is > supposed to happen if one of these references can't be satisfied? This is mainly an implementation issue; it is up to the application program to decide whether to continue in this case, or possibly abort with an error message. Note that one of the routines in the sample API, called fits_verify_group, will test if all group members are accessible. This is an issue regardless of whether the location of the group member is specified by a http or ftp URL, or as an absolute or relative directory path on the local file system. > * There should be some mention of the need to avoid recursive grouping; > i.e. a group cannot be a member of itself either directly or > indirectly. Yes. This is mentioned in the discussion of the API, but this restriction should be stated up front in the document. > * Can a HDU be added more than once to a grouping table? I don't think this is strictly prohibited, although I can't think of a good reason to do it. > * In Sect. 2.2 it should be noted that EXTVER does not have to be > sequential. The caption for example 2 incorrectly concludes that > "it is the seventh group table" on the basis that EXTVER = 7. Yes, this should be noted. Example 2 probably really is the seventh grouping table in the FITS file, but as you note, this cannot be assumed simply by the value of EXTVER. > * Sect. 3, there needs to be some discussion on the meaning and use of > the index "n" in GRPIDn and GRPLCn. The definition of GRPIDn says it is the "nth group table that the HDU is a member of" but I agree this could be made clearer. > * If grouping table A is a member of grouping table B, and grouping > table B is a member of grouping table C, then should there be a > GRPIDn/GRPLCn in grouping table A to say that (indirectly) it is also > a member of grouping table C? I think the answer to this is "no". > * I suggest extending example 2 by illustrating a small portion of the > grouping table itself, i.e. in rows and columns with column headers as > fv would. This probably wasn't done simply because it is hard to display the wide table on a single page. There are sample FITS files available for people that want to see the full implementation. > * Appendix 1 seems to define keyword values with no associated > keyword(s). This appendix was provided for informational purposes and is not directly necessary to the grouping convention itself. It defines a syntax by which any keyword value or column entry could point to the location of another HDU. In simple cases, it might be desirable for a keyword or column to point directly to an associated HDU, rather than using a grouping table to associate the 2 HDUs. > * Sect 7 (third para) has "a given HDU may be referenced by up to 999 > grouping tables simultaneously". Where does this limit come from? The indices on the GRPIDn/GRPLCn keywords are limited to 3 digits. An HDU could actually belong to more than 999 groups, but that HDU could only point back to at most 999 grouping tables of which it is a member. Bill ***************************************************************************** Date: Mon, 21 May 2007 15:35:27 +0200 (CEST) From: Lucio Chiappetti To: IAU-FWG Subject: Re: [iaufwg] [fitsbits] Start of the ''Hierarchical Grouping' Public Comment Period On Mon, 9 Apr 2007, William Pence wrote: > This is to announce the start of the 30-day Public Comment Period on the > 'Hierarchical Grouping' FITS convention that has been submitted for I am trying to recover the backlog on my FITS (WG) duties, so I'm looking at the pending conventions in the registry. Since formally the Public Comment Period for this convention has expired, I thought better to send this to iaufwg than to fitsbits. I must say that a (rather quick) reading of the document is rather intimidating, and I wonder whether this is the reason why there have been NO comments whatsoever (that I'm aware of) ! Of course I cannot have any valid reason to object to register a convention already under production use at a major centre like ISDC. But probably the documentation would take advantage of some improvements (in the sense of helping the reader to understand, and attracting potential users ... at the moment I'm utterly confused) : - one would be to provide in the example not only the headers of the grouping table, but also the content (or an excerpt of the content) of such table - a second would be to provide some sort of figure or diagram showing the structure of a FITS file (group) obeying this convention and the links between the various components. -- ------------------------------------------------------------------------ Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy) For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html