This is a digest of the discussions on the FITSBITS email list regarding the INHERIT keyword convention. The original discussions may be seen in the FITSBITS archive at http://listmgr.cv.nrao.edu/pipermail/fitsbits/ ***************************************************************************** Date: Fri, 23 Mar 2007 16:06:53 -0400 From: William Pence To: FITSBITS Subject: [fitsbits] Start of the 'INHERIT' Public Comment Period This is to announce the start of the 30-day Public Comment Period on the 'INHERIT' FITS convention that has been submitted by the Nelson Zarate and Perry Greenfield for inclusion in the 'Registry of FITS Conventions' that is maintained by the IAU FITS Working Group. This is the 4th in a growing series of conventions submitted for inclusion in the Registry. A document describing this convention and a sample FITS file that uses it are available for public review and comment from the FITS registry web page at http://fits.gsfc.nasa.gov/fits_registry.html Under this convention, the presence of the keyword INHERIT = T in an extension header indicates that the extension should inherit the keywords from the primary header (except for the mandatory and commentary keywords) Comments may be posted here on the FITSBITS mail exploder or the sci.astro.fits newsgroup. Minor typographical issues may be sent directly to the authors of the convention. Bill Pence (on behalf of the IAU FITS Working Group) ***************************************************************************** Date: Thu, 05 Apr 2007 15:14:59 -0400 From: Robert Hanisch I have to express some concern about registering the INHERIT convention. The documentation notes a number of potential problems that can occur when FITS software that is unaware of the convention is used to read, interpret, and write new copies of files that use INHERIT. I consulted with Perry Greenfield, one of the authors of the document, as to whether these problems arise in practice or not. He said yes, indeed, they do, and that he would prefer to discourage people from using the convention. If we are to include INHERIT in the FITS registry, we should perhaps do so solely to document the practice. However, it seems to me too fragile a construct to recommend for wider adoption. It would be cleaner, I think, to define more clearly the rules for how primary headers pertain to extension headers (e.g., the concept of inheritance applies by default, or whatever). This is something we might recommend to the recently formed FITS review panel to discuss. Bob ***************************************************************************** From: Rob Seaman Date: Thu, 5 Apr 2007 15:57:31 -0700 Bob says: > I have to express some concern about registering the INHERIT > convention. > The documentation notes a number of potential problems that can occur > when FITS software that is unaware of the convention is used to read, > interpret, and write new copies of files that use INHERIT. 1) Isn't the notion of the convention registry to record what is already being used around the FITS community? NOAO relies on INHERIT for our MEF data from the MOSAIC DHS and will rely on it for the NEWFIRM DHS. The NOAO Pipeline understands INHERIT. The convention has served us well. 2) NOT documenting INHERIT certainly won't help the community. 3) IRAF has understood INHERIT for three-score-and-ten dog years. > It would be cleaner, I think, to define more clearly the rules for > how primary > headers pertain to extension headers (e.g., the concept of inheritance > applies by default, or whatever). 4) Yes, but that boat has sailed. The community has been on a course to deal with inheritance since this note from the image extension paper: "Although allowed, it is recommended that the primary header does not set the keyword NAXIS=0, since it would not make sense to extend a non-existing image with another image." FITS is either going to tie the contents of separate HDUs together semantically or not. The community eagerly - and widely - adopted the notion of the primacy of the primary HDU - likely before the words above were published. Implicit here is that the primary header of an empty HDU is often used for information that applies to the entire file. 5) If not INHERIT, then what? 6) And we'd still be left with gazillions of files that rely on this convention as an organizing semantic principle. Clearly the first step in revisiting the fundamental semantics of a FITS file (of which keyword inheritance is only a small part) would be to protect our investment in previous data products by documenting the current de facto standards. 6a) In any event, the first step in deprecating any convention would be to recognize its existence. > This is something we might recommend to the recently formed FITS > review > panel to discuss. 7) By all means, but only in an advisory capacity. I presume we're not thinking of changing the fundamental FITS standards process? It has served us well for many years. 8) Really - isn't documenting the current usage the simplest thing to do? All of these conventions are conventional, rather than standard, precisely because they reflect issues that were thorny to deal with the first time around. Few will fall into the same category as the checksum keywords - i.e., pre-existing legal FITS usage demanding no clarification of the standard. Rob ***************************************************************************** Date: Thu, 05 Apr 2007 20:00:20 -0400 From: Robert Hanisch On 4/5/07 6:57 PM, "Rob Seaman" wrote: > Bob says: > >> I have to express some concern about registering the INHERIT >> convention. >> The documentation notes a number of potential problems that can occur >> when FITS software that is unaware of the convention is used to read, >> interpret, and write new copies of files that use INHERIT. > > 1) Isn't the notion of the convention registry to record what is already > being used around the FITS community? NOAO relies on INHERIT > for our MEF data from the MOSAIC DHS and will rely on it for the > NEWFIRM DHS. The NOAO Pipeline understands INHERIT. > The convention has served us well. It is clear it is in use. STScI uses it for HST data, too. > 2) NOT documenting INHERIT certainly won't help the community. > > 3) IRAF has understood INHERIT for three-score-and-ten dog years. Well, I doubt it is THAT long.... >> It would be cleaner, I think, to define more clearly the rules for >> how primary >> headers pertain to extension headers (e.g., the concept of inheritance >> applies by default, or whatever). > > 4) Yes, but that boat has sailed. The community has been on a course to > deal with inheritance since this note from the image extension paper: > > "Although allowed, it is recommended that the primary header does > not set the keyword NAXIS=0, since it would not make sense to extend a > non-existing image with another image." > > FITS is either going to tie the contents of separate HDUs together > semantically or not. The community eagerly - and widely - adopted the > notion of the primacy of the primary HDU - likely before the words above > were published. Implicit here is that the primary header of an empty > HDU is often used for information that applies to the entire file. That would be my interpretation, too, but as the INHERIT document notes, this was not made explicit. > 5) If not INHERIT, then what? Making the rules explicit. The FITS review panel should look at this and see if it would be a clarification to existing practice, or something new and potentially standard-breaking. > 6) And we'd still be left with gazillions of files that rely on this > convention > as an organizing semantic principle. Clearly the first step in > revisiting the > fundamental semantics of a FITS file (of which keyword inheritance is > only > a small part) would be to protect our investment in previous data > products > by documenting the current de facto standards. > > 6a) In any event, the first step in deprecating any convention would be > to recognize its existence. I did write that, Rob.... " If we are to include INHERIT in the FITS registry, we should perhaps do so solely to document the practice." >> This is something we might recommend to the recently formed FITS >> review >> panel to discuss. > > 7) By all means, but only in an advisory capacity. I presume we're > not thinking > of changing the fundamental FITS standards process? It has served us > well > for many years. No such radical suggestion. It is like the panel that I chaired some ten years ago. The is to look for ambiguities, inconsistencies, etc., and tidy them up. There is no idea of circumventing the process. > 8) Really - isn't documenting the current usage the simplest thing to > do? > All of these conventions are conventional, rather than standard, > precisely > because they reflect issues that were thorny to deal with the first > time around. > Few will fall into the same category as the checksum keywords - i.e., > pre-existing legal FITS usage demanding no clarification of the > standard. > > Rob The thing I am concerned about is conveying a sense of "this is a great idea" by registering the convention. This one in particular, given the caveats written into the document itself, requires one to pause. The language describing registered conventions says " These conventions are not necessarily endorsed by the IAU FITS Working Group." But that is pretty weak, so what I am suggesting is that this review for this extension might say that the FITS WG notes its existence, and provides documentation, but does not encourage further use. Or that potential adopters fully understand the potential problems. Something like that. Bob ***************************************************************************** Date: Fri, 06 Apr 2007 09:44:05 -0400 From: William Pence Rob Seaman wrote: > 4) Yes, but that boat has sailed. The community has been on a course to > deal with inheritance since this note from the image extension paper: > > "Although allowed, it is recommended that the primary header does > not set the keyword NAXIS=0, since it would not make sense to extend a > non-existing image with another image." > > FITS is either going to tie the contents of separate HDUs together > semantically or not. The community eagerly - and widely - adopted the > notion of the primacy of the primary HDU - likely before the words above > were published. Implicit here is that the primary header of an empty > HDU is often used for information that applies to the entire file. Maybe I'm missing your point, but I don't see how that paper can be interpreted as an endorsement of the inherit convention. In that sentence you quote, and elsewhere in the paper, they make it clear that they do not recommend appending an image extension to a null primary array; instead they think the primary array should be filled first, and then only append more image extensions if the primary array is already occupied. This is contrary to the inherit convention, which requires that the primary array be empty to avoid confusion about whether the keywords in the primary array should be interpreted as applying globally to the following extensions or not. Some might suggest that with the abundance of low cost disk space that is now available, the inherit convention is trying to fix a non-problem. The amount of diskspace that is saved by not duplicating the keywords in every extension is rather insignificant in most cases and doesn't warrant the extra software complexity in supporting the inherit convention.. There are no doubt some pathological cases where the size of the headers could dominate the size of the whole file, but in those cases there may be alternate ways to pack the data more efficiently (e.g. pack the separate image extension data into vectors in rows of a single binary table extension). Bill Pence ***************************************************************************** From: Archie Warnock Date: Fri, 06 Apr 2007 14:52:27 GMT Hi all! Yes - I'm still lurking around. William Pence wrote in news:mailman.11.1175867088.4349.fitsbits@listmgr.cv.nrao.edu: > Some might suggest that with the abundance of low cost disk space that > is now available, the inherit convention is trying to fix a > non-problem. The amount of diskspace that is saved by not duplicating > the keywords in every extension is rather insignificant in most cases > and doesn't warrant the extra software complexity in supporting the No, but avoiding potential errors by not duplicating text strings is a worthy effort, as we learned long ago from relational database theory. > inherit convention.. There are no doubt some pathological cases where > the size of the headers could dominate the size of the whole file, but > in those cases there may be alternate ways to pack the data more > efficiently (e.g. pack the separate image extension data into vectors > in rows of a single binary table extension). In current practice or not, I think the philosophy of "it's better to seek forgiveness than permission" is dangerous in this context. If a convention breaks FITS, I believe it should be considered a private agreement and not part of the FITS standard. That doesn't mean it can't be used in practice - just that it's not FITS. -- Archie -- Archie Warnock warnock at awcubed dot com -- A/WWW Enterprises www.awcubed.com -- As a matter of fact, I _do_ speak for my employer. ***************************************************************************** From: Rob Seaman Date: Fri, 6 Apr 2007 08:03:37 -0700 William Pence wrote: > Maybe I'm missing your point, but I don't see how that paper can be > interpreted as an endorsement of the inherit convention. It can't. The fact that the paper attempted to force a particular outcome - that primary HDUs not be empty - and that empty primary HDUs have instead become widespread, respected usage was my (admittedly obscure) point. A FITS file is either conforming or it isn't. Nothing requires that HDUs not be empty, for whatever purpose. Users are also permitted to= define new keywords with new interpretations. In this case, fundamental goals of DB normalization drive the existence of a primary header to contain keywords that apply to all other extensions in a file. There are reasons more basic than not consuming an additional N*80 bytes (1.2 KB for each Mosaic keyword) for not duplicating redundant keywords. > Some might suggest that with the abundance of low cost disk space > that is now available, the inherit convention is trying to fix a > non-problem. The diskspace may be a non-problem (although this is a quirky opinion coming from a FITS compression stalwart :=96), but the underlying question is about the purpose of registering conventions in the first place. I would have thought that the key goal was to collect descriptions of local usage, not to vet long-established usage against esthetic criteria. By insisting on the latter, the danger is that conventions will go unregistered, perhaps undocumented. Is this a preferred outcome? If the warning: "These conventions are not necessarily endorsed by the IAU FITS Working Group." is not deemed strong enough, how about labeling *all* of the conventions with something snarkier? There is nothing demonstrably less conforming to the standard about INHERIT than any other convention. I also suggest deleting the entire section "Practical Considerations" from http://fits.gsfc.nasa.gov/registry/inherit/=20 fits_inheritance.txt. It amounts to nothing more than stating that unusual things might happen if files are run through software that doesn't know about the particular convention. This applies to all conventions (and all software), and it seems to this observer that INHERIT is rather more user friendly in such a case than most. Rob ***************************************************************************** Date: Fri, 06 Apr 2007 13:00:55 -0400 From: William Pence Rob Seaman wrote: >> Some might suggest that with the abundance of low cost disk space that >> is now available, the inherit convention is trying to fix a non-problem. > > The diskspace may be a non-problem (although this is a quirky opinion > coming from a FITS compression stalwart :–) I see it mainly as a cost/benefit issue. Compression can reduce the size of an image by a large factor and hence is probably worth the cost of added software complexity. The INHERIT convention on the other hand only reduces the size of the FITS file by a small fraction of 1% in typical cases. (e.g., each additional keyword in the header of a 2000 x 2000 x 16-bit CCD image only increases the file size by 0.001%. but the underlying question > is about the purpose of registering conventions in the first place. Nobody has suggested that the inherit convention shouldn't be documented in the registry. The main issue that Bob raised earlier is whether the IAUFWG (or anyone else for that matter) should be able to ofter any advice, or recommendations, to potential new users of the convention, beyond simply documenting what keywords are used by the convention. This is a general issue that will affect a number of conventions, not just the inherit convention. If the IAUFWG decides this would be useful, then a mechanism for adding usage comments or recommendations could be added to the Registry. Bill Pence ***************************************************************************** Date: Fri, 06 Apr 2007 13:20:43 -0400 From: Robert Hanisch > Nobody has suggested that the inherit convention shouldn't be documented > in the registry. The main issue that Bob raised earlier is whether the > IAUFWG (or anyone else for that matter) should be able to ofter any > advice, or recommendations, to potential new users of the convention, > beyond simply documenting what keywords are used by the convention. Thanks Billl -- that is my main concern. There seem to be enough operational difficulties with this convention that I think potential adopters must be warned, and the caveat language we have right now is very non-specific. A FITS reader that does not understand the column limits convention, for example, would probably not cause any problems. A reader that does not understand a FOREIGN extension would simply ignore it. But a reader that encounters INHERIT, and manipulates headers with even simple copy operations, could make a bit of a mess. It is good to document what is out there, but we do not necessarily want to encourage its further adoption. Bob ***************************************************************************** Date: Fri, 6 Apr 2007 10:27:06 -0700 From: Steve Allen On Fri 2007-04-06T08:03:37 -0700, Rob Seaman hath writ: > William Pence wrote: > > >Maybe I'm missing your point, but I don't see how that paper can be > >interpreted as an endorsement of the inherit convention. > > It can't. The fact that the paper attempted to force a particular > outcome There are factors other than the approved FITS papers which have forced particular outcomes. This was also in the context of PHDUs which with empty data arrays. The data reduction package developed at Lick would not accept FITS files unless the PHDU contained these keywords: NAXIS = 2 NAXIS1 = 0 NAXIS2 = 0 So that's what we put in the empty PHDU of our mosaic files. But as we got close to the point of deploying the instrument producing these files we found that we could not continue that practice, for the data reduction package developed at NOAO would not accept such FITS files. What it wanted was NAXIS = 0 I note with gratitude that Pence's FITSIO toolkit allowed either scenario without forcing the issue. The larger internet world of interoperable standards is attacking another such issue right now with calendaring programs. http://www.ietf.org/html.charters/calsify-charter.html One might think that scheduling appointments using civil time and the Gregorian calendar had a well tested, obvious solution, but no. In that arena the initial standard in RFC 2445 was not firm enough. Existing implementations ignore the letter of the standard and are not fully interoperable. They are struggling with a new version in hopes of removing ambiguity while maintaining compatibility, but this has been opined: I believe we should make it clear to implementors that they ignore VERSION at their peril. To some extent FITS, at least the voice of the community if not the letter of the standard, probably has to do the same. ***************************************************************************** From: Rob Seaman Date: Fri, 6 Apr 2007 11:24:32 -0700 Bill wrote: > The INHERIT convention on the other hand only reduces the size of > the FITS file by a small fraction of 1% in typical cases. There is an assumption here that only images will use such a feature, and further, that all images are large. A file containing reduced MOS data might have one or more bintables or even as many extensions as spectra. In general, we shouldn't assume that headers are smaller than data units. A notion of FITS compression is to preserve readable headers. To the extent that this discussion is about minimizing the size of headers (and not about the correct data model for FITS objects), INHERIT is a natural complement to the tile compression convention. > Nobody has suggested that the inherit convention shouldn't be > documented > in the registry. Bob started the discussion with: "I have to express some concern about registering the INHERIT convention." My apologies if I misunderstood. > If the IAUFWG decides this would be useful, then a mechanism for > adding usage comments or recommendations could be added to the > Registry. By all means, comment away. Feel free to append mine. Bob wrote: >> a reader that encounters INHERIT, and manipulates headers with >> even simple copy operations, could make a bit of a mess. A reader that does not understand INHERIT will copy the extensions verbatim, including the INHERIT keyword itself. The only trouble that might arise is if the primary header is disconnected from the extensions, but similar trouble might afflict any FITS file that is naively split apart. A reader that does understand INHERIT may trigger inheritance, of course, in the copy. In this case, the extensions will contain all the keywords. Or a reader may more subtly implement INHERIT and choose between these two correct behaviors on an application specific basis. I think concern about "confused users" is inevitable, but overstated in this case. >> It is good to document what is out there, but we do not >> necessarily want to encourage its further adoption. One might consider a prerequisite to discouraging the use of unique conventions to be the adoption of similar functionality within the standard. A broader discussion of a coherent FITS data model and of how individual HDUs are related to one another sounds interesting, but beyond the scope of the registry. Rob ***************************************************************************** From: Rob Seaman Date: Fri, 6 Apr 2007 12:28:10 -0700 Steve wrote: > The data reduction package developed at Lick would not accept > FITS files unless the PHDU contained these keywords: > > NAXIS = 2 > NAXIS1 = 0 > NAXIS2 = 0 > > So that's what we put in the empty PHDU of our mosaic files. But as > we got close to the point of deploying the instrument producing these > files we found that we could not continue that practice, for the data > reduction package developed at NOAO would not accept such FITS files. > What it wanted was > > NAXIS = 0 > > I note with gratitude that Pence's FITSIO toolkit allowed either > scenario without forcing the issue. This is also a case in which there is a clear answer in the standard - either form is legal, but the latter is preferred: >> NAXIS Keyword - The value field shall contain a non-negative >> integer no greater than 999, representing the number of axes in >> the associated data array. A value of zero signifies that no data >> follow the header in the HDU. >> NAXISn Keywords - The value field of this indexed keyword shall >> contain a non-negative integer, representing the number of >> elements along axis n of a data array. The NAXISn must be present >> for all values n = 1,...,NAXIS, and for no other values of n. A >> value of zero for any of the NAXISn signifies that no data follow >> the header in the HDU. If NAXIS is equal to 0, there should not >> be any NAXISn keywords. The IRAF support is incomplete, but the Lick behavior is simply bizarre. It is gratifying that CFITSIO accepted either form, but data providers are certainly well advised to test compliance with a range of community software resources. Note that precisely the same issue applies here as with INHERIT. A Lick file might be copied with third party software that quite reasonably would note that since there are no data records that NAXIS should equal zero. The Lick software would then refuse to recognize the copy. The original file is conforming FITS, however. Rob ***************************************************************************** From: Arnold Rots Date: Fri, 6 Apr 2007 15:32:49 -0400 (EDT) I must admit that I have serious misgivings about this particular convention, as I think I stated in past discussions on the subject. That's neither here nor there for the present discussion, but I mention it to indicate that I feel it would be a bad idea to change our mind on the principle of self-sufficient HDU headers. Back to the present discussion: I agree that it is a good idea to register the convention, but the exchanges make me wonder whether we should add some components to the registry for each convention: - An analysis on general useability; i.e., an explicit discussion of what will happen to a naive reader and what the prerequisites are for proper understanding - A general assessment as to whether the convention is recommended for general use The former could be provided by the submitter, the latter by the FWG. - Arnold ***************************************************************************** Date: Fri, 6 Apr 2007 23:44:27 +0200 (CEST) From: Thierry Forveille On Fri, 6 Apr 2007, Arnold Rots wrote: > Back to the present discussion: > I agree that it is a good idea to register the convention, but the > exchanges make me wonder whether we should add some components to the > registry for each convention: > - An analysis on general useability; i.e., an explicit discussion of > what will happen to a naive reader and what the prerequisites are for > proper understanding > - A general assessment as to whether the convention is recommended > for general use > The former could be provided by the submitter, the latter by the FWG. I am on the same line: we should document our past mistakes, but clearly mark them as mistakes to avoid (or at least limit ;-)) additionaly usage slipping in. ***************************************************************************** Date: Fri, 6 Apr 2007 16:59:11 -0600 (MDT) From: Doug Tody There are two different issues being discussed here: 1) What is a convention, and what is our role in documenting them, and 2) Is the INHERIT convention (most recently) a good idea Regarding 1), I suggest that a convention is not an official recommended standard, and if we are confused about this fact, we cannot proceed to document conventions. I suspect that if we examine any convention carefully in the same way that we do broad FITS standards, we will find plenty of things to be concerned about which each individual convention. Probably if this were not the case, and it were generally useful, it would long ago have been promoted as a general standard. Nontheless, conventions can be quite useful for solving more limited problems. The FITS registry of conventions should not "recommend" or "discourage" any convention. If we start making such distinctions are we are starting to repeat the standards process. I suggest it is better to merely document them uniformly. We can attach some of the discussions of the review groups to inform potential adopters of any issues. It is not our job however, to revisit the design of each convention, or we are repeating the standards process (and we will probably throw out 80% of them). Regarding 2), INHERIT is an established convention in current use, and as such should be documented. It is absolutely fine if we do so with various caveats about possible issues. Regarding whether INHERIT is a good idea: I can comment on that a bit as I had a lot to do with creating this way back when. The point is not a reduction in file size as Bill suggests, but to avoid duplicating information in the way the MEF file is stored. Duplicating information in a complex data structure is bad, and causes problems with, for example, dynamic updates. At run time, when an individual extension is accessed, the inherited information is supposed to be included, and the header is restored to its full logical size. Hence if one "imcopies" a single extension, the inheritance is resolved and the result is a self-contained FITS object. I agree with Steve that this is a simple example of a broader problem of associating relational entities. FITS is in essence a relational system; every FITS object (even an image) is actually a table. INHERIT is a simple means for specifying the relationship between two or more tables composed as an MEF. A FITS MEF is a simple container with one level of structure. Every extension logically inherits from the "global header" (primary HDU). One can resolve the inheritance to simplify access to an individual extension, but this is problematic as it is very easy to get into a situation where updates affecting the entire MEF object do not propagate to all the extensions. The real problem with INHERIT is that it is a simplistic solution to what is a more general problem. On the other hand, it *is* simple, and is adequate for such complex data where we need to aggregate a number of primary data objects (images, tables) into a container. Like most conventions, it does not fully address the underlying problem. - Doug ***************************************************************************** Date: Sat, 7 Apr 2007 21:07:56 +0200 (CEST) From: Thierry Forveille On Fri, 6 Apr 2007, Archie Warnock wrote: > William Pence wrote in > news:mailman.11.1175867088.4349.fitsbits@listmgr.cv.nrao.edu: > >> Some might suggest that with the abundance of low cost disk space that >> is now available, the inherit convention is trying to fix a >> non-problem. The amount of diskspace that is saved by not duplicating >> the keywords in every extension is rather insignificant in most cases >> and doesn't warrant the extra software complexity in supporting the > > No, but avoiding potential errors by not duplicating text strings is a > worthy effort, as we learned long ago from relational database theory. > Well, if one really cares about such consistency, using multiple image extensions doesn't sound like a very good base. One single binary table maps a lot better to a data base than multiple image extensions that may or may not duplicate header information. I have (perhaps incorrect?) memories that the image extension was sold to the FITS community on the basis of being easier to use for simple cases than rows within a binary table (I was never quite convinced by that argument, but didn't really voice those concerns...). It seems that its use has grown beyond simple cases and that its limitations now bite. I know I am being a bit provocative here, but would it perhaps be time to consider deprecating the IMAGE extension?? ***************************************************************************** Date: Sat, 7 Apr 2007 13:37:14 -0600 (MDT) From: Doug Tody Indeed, this is getting off topic, but we might want to have a separate discussion sometime about the "FITS data model" and perhaps even how to map this into more flexible serializations or storage mechanisms. (also, as a matter of history, FITS began as an image transport format, and tables and image extensions came much later). The basic FITS model provides a keyword table (PHU or some other form of empty image kludge), an N-dimensional image object, a table, plus a simple general container (the MEF). We can aggregate instances of these three basic objects in a container, an associate them in some fashion to model more complex objects, such as instrumental datasets. Usually this is done by defining a convention, e.g., using custom keywords in the PHU and/or extensions. One can argue that Table can contain anything including an image, but the regularly sampled N-Dim Image case is so important that it deserves its own class. If nothing else, this is still required to be able to efficiently store and access large data arrays. In addition, the basic Image object is much simple than Table, and much existing code can do useful things with a FITS image, but cannot do anything with a FITS table. Within VO, FITS is still the preferred format for image data, whereas VOTable is often used instead of FITS for table data. One could argue that the FITS Image is the most successful and widely used part of FITS, and even today provides a better mechanism for storing and manipulating regularly sampled data arrays than anything existing alternative. - Doug ***************************************************************************** Date: Sat, 7 Apr 2007 13:10:59 -0700 From: Steve Allen On Fri 2007-04-06T14:52:27 +0000, Archie Warnock hath writ: > No, but avoiding potential errors by not duplicating text strings is a > worthy effort, as we learned long ago from relational database theory. What FITS did not learn from relational database theory was how to create mechanisms which document and enforce the self consistency of data which have been neatly separated into distinct logical chunks. I think that's the way forward. ***************************************************************************** Date: Sat, 7 Apr 2007 14:53:40 -0600 (MDT) From: Doug Tody On Sat, 7 Apr 2007, Steve Allen wrote: > On Fri 2007-04-06T14:52:27 +0000, Archie Warnock hath writ: >> No, but avoiding potential errors by not duplicating text strings is a >> worthy effort, as we learned long ago from relational database theory. I mentioned this point in my earlier mail as well. Within IRAF, the main motivation for INHERIT was to avoid duplication of information in multiple places within a MEF. This would very likely lead to problems with updates. It could also have advantages when viewing a MEF as a more complex object. > What FITS did not learn from relational database theory was how to > create mechanisms which document and enforce the self consistency of > data which have been neatly separated into distinct logical chunks. > > I think that's the way forward. One could also say that this is not a FITS issue at all, but rather a more general data modeling issue. We are already getting into this within VO in several different contexts. What we will probably be doing is mapping some more general model or mechanism into a FITS representation. Typically such relationships and models need to be consistent regardless of how the information is stored, with FITS being only part of the picture. While this can be done with the current FITS mechanisms, it is awkward. The sometimes discussed "FITS 2.0", if it ever comes to pass, could address the respresentation issues but should not change the basic FITS data models. - Doug ***************************************************************************** Date: Sat, 7 Apr 2007 23:51:19 +0200 (CEST) From: Thierry Forveille > Indeed, this is getting off topic, but we might want to have a separate > discussion sometime about the "FITS data model" and perhaps even how > to map this into more flexible serializations or storage mechanisms. > (also, as a matter of history, FITS began as an image transport format, > and tables and image extensions came much later). > Yeah, I am uunfortunately old enough to remember that (even enough to have written random groups) ;-). Tables were the first extension I think, then the image extension. > The basic FITS model provides a keyword table (PHU or some other form > of empty image kludge), an N-dimensional image object, a table, plus > a simple general container (the MEF). We can aggregate instances of > these three basic objects in a container, an associate them in some > fashion to model more complex objects, such as instrumental datasets. > Usually this is done by defining a convention, e.g., using custom > keywords in the PHU and/or extensions. > Well, that's one way of looking at it. The alternate perspective that I am arguing for is that everything should go into one table extension, with images as either multiple entries in one row or entries in successive rows. Essentially, that's the perspective that's taken by the Green Bank convention for sets of radioastronomical spectra. > One can argue that Table can contain anything including an image, > but the regularly sampled N-Dim Image case is so important that it > deserves its own class. If nothing else, this is still required to be > able to efficiently store and access large data arrays. Actually, storage and access inside a binary table is perhaps slightly more difficult to get right, but it is just as efficient as using image extensions (if anything margiinally more efficient, due to less block padding). > In addition, > the basic Image object is much simple than Table, and much existing > code can do useful things with a FITS image, but cannot do anything > with a FITS table. > That's definitely a factor that needs consideration. For DENIS we used a large binary table to store stripes of 180 1kx1k images, but ended often/usually working through a filter that extracted one image to a FITS file because a tool expected that. On the other hand, that format did provide very robust consistency (stable header items in extension header, variable ones as element of the data rows, and nothing ever duplicated). > Within VO, FITS is still the preferred format for image data, whereas > VOTable is often used instead of FITS for table data. One could > argue that the FITS Image is the most successful and widely used part > of FITS, and even today provides a better mechanism for storing and > manipulating regularly sampled data arrays than anything existing > alternative. > Simpler and most successful for sure. Better, that depends on what your goals/criteria are :-) ***************************************************************************** Date: Sat, 7 Apr 2007 16:16:30 -0600 (MDT) From: Doug Tody On Sat, 7 Apr 2007, Thierry Forveille wrote: >> The basic FITS model provides a keyword table (PHU or some other form >> of empty image kludge), an N-dimensional image object, a table, plus >> a simple general container (the MEF). We can aggregate instances of >> these three basic objects in a container, an associate them in some >> fashion to model more complex objects, such as instrumental datasets. >> Usually this is done by defining a convention, e.g., using custom >> keywords in the PHU and/or extensions. >> > Well, that's one way of looking at it. The alternate perspective > that I am arguing for is that everything should go into one table > extension, with images as either multiple entries in one row > or entries in successive rows. Essentially, that's the perspective > that's taken by the Green Bank convention for sets of radioastronomical > spectra. There are cases where this is the best approach. If what you have is a large uniform collection of (not terribly large) images or spectra, then representation as a table is often the best approach. However I would not suggest that we replace Image with a Table-based representation containing one very long row. If the aggregation includes heterogeneous objects (e.g., images with substantially different headers) then a single table is not appropriate, and a MEF representation is probably better. - Doug ***************************************************************************** From: Rob Seaman Date: Sat, 7 Apr 2007 19:59:17 -0700 Archie Warnock wrote: >> No, but avoiding potential errors by not duplicating text strings >> is a worthy effort, as we learned long ago from relational >> database theory. Like I said, well-worn principles of database normalization. >> In current practice or not, I think the philosophy of "it's better >> to seek forgiveness than permission" is dangerous in this context. I'm a little unclear what permission should have been sought and from whom. INHERIT is completely legal FITS usage - the MEF format is legal, the dataless HDU is legal and the keyword is a legal boolean. This is particularly true since in the absence of a coherent data model, FITS is silent on issues of the semantic interconnectedness of extensions. Absent a data model, software developers still need to develop. >> If a convention breaks FITS, I believe it should be considered a >> private agreement and not part of the FITS standard. That doesn't >> mean it can't be used in practice - just that it's not FITS. None of the conventions are part of the FITS standard. However, even nonconforming FITS cannot "break FITS" or even break FITS applications. An application should do something reasonable even if presented with nonconforming input. In any event, input conforming to the INHERIT convention also conforms to FITS. Some applications may not know what to do with it, but the absence of a feature is not precisely the same thing as the presence of a bug. Thierry Forveille wrote: > One single binary table maps a lot better to a data base than > multiple image extensions that may or may not duplicate header > information. I disagree. A typical normalized database consists of several tables. These tables may correspond to binary tables in FITS, but also may correspond to a hierarchy of FITS headers. Well chosen image extension headers will often be better than a single flat binary table. > would it perhaps be time to consider deprecating the IMAGE extension?? Obviously a rhetorical question, but no, of course not. IMAGE extensions provide a mechanism for aggregating classical FITS image objects. FITS exists for mere astronomical mortals, not just for titans of software engineering. An MEF file of image extensions is vastly more accessible to our users, and likely much more robust for our applications. Not all astronomical data maps well onto image arrays, but CCDs and other array detectors do. On the other hand, tile compression provides a natural path for image extensions to map, one-to-one, onto binary tables. The headers, of course, copy directly across. Presumably by recommending the deprecation of the image extension, you're really suggesting deprecating the idea of the FITS header itself. Rob ***************************************************************************** From: Mark Calabretta Date: Thu, 12 Apr 2007 13:07:58 +1000 My comments only refer to the documentation of the convention, not the convention itself. The document states: "It is recommended that this inherit convention be used only in FITS files that have a null primary array (e.g., with NAXIS = 0) to avoid possible confusion if array-specific keywords (e.g., BSCALE and BZERO) were to be inherited." If this is the way that INHERIT has always been used then it should be required, not "recommended" (inheriting from a non-null primary HDU would be asking for trouble, especially concerning WCS keywrds). Also, while recommended usage may assist human interpretation, it is unhelpful for software which can't assume that any recommendations have been followed and so must be completely general. "If the INHERIT keyword is not present, nothing should be inferred about whether the inherit convention should apply or not because the FITS standard says nothing regarding the relationship of keywords in the primary header to those in an extension." This statement is contentious. It is widely understood that FITS HDUs must be self-contained, whether or not that is stated explicitly in the standard. Basically this statement tries to excuse the exploitation of a loophole in the standard. "When an application reads an extension header with INHERIT = T, it should merge the keywords in the current extension with the primary header keywords, ..." What does "merge" mean here? A simple way to picture keyword inheritance is to think of appending the extension header onto the primary header (as a character array) and feeding the lot into a header parser that accepts the last occurrence of any repeated keyword. "... with the exclusion of the mandatory keywords, and any COMMENT, HISTORY, and blank keywords in the primary header." If COMMENT and HISTORY keywords are not propagated then there should also be some statement that extension headers must contain a full complement of COMMENT and HISTORY cards whether or not they duplicate those in the primary header. General: The FITS WCS standard defines default values for WCS keywords that are omitted from a header. The document should discuss how the INHERIT convention interacts with the omission of such keywords. Although it is reasonable to discuss the relative merits and deficiencies of the convention compared to other alternatives, much of the content of the first paragraph of the section entitled "Practical Considerations" seems to deal with one particular implementation. As such it should either be omitted or recast into a discussion of the way that it should be implemented. As another practical consideration, serial readers (e.g. tape or internet download) have no way of knowing in advance whether the primary header will be required for later use by an extension that inherits it. Therefore, in order to implement this convention, they would need to cache the primary header from ANY general FITS file in case it later turns out to use the INHERIT convention. Mark Calabretta ATNF ***************************************************************************** From: "Lucio Chiappetti's NoSpam Newsreading account" Date: Thu, 12 Apr 2007 14:19:11 +0200 On Thu, 12 Apr 2007, Mark Calabretta wrote: > "It is recommended that this inherit convention be used only in > FITS files that have a null primary array (e.g., with NAXIS = 0) to > avoid possible confusion if array-specific keywords (e.g., BSCALE > and BZERO) were to be inherited." > > If this is the way that INHERIT has always been used then it should be > required, not "recommended" (inheriting from a non-null primary HDU > would be asking for trouble, especially concerning WCS keywrds). 1) I doubt the authors of a convention (which, once again, is NOT a standard) can "require" anything. The convention is all just a recommendation, or even less, a suggestion. 2) WCS keywords could/should be considered "array specific keywords" > "If the INHERIT keyword is not present, nothing should be inferred about > whether the inherit convention should apply or not because the FITS > standard says nothing regarding the relationship of keywords in the > primary header to those in an extension." > > This statement is contentious. It is widely understood that FITS HDUs > must be self-contained, whether or not that is stated explicitly in the > standard. Basically this statement tries to excuse the exploitation of > a loophole in the standard. Here I agree. The sentence does not belong to the convention. One of the purposes of convention documentation should be to allow the user/reader to identify a file as respecting a particular convention, ideally in an univocal manner. Then, IF the file is univocally identified as obeying a convention, and the reader supports it, it shall interpret the file according to the convention. If the reader does not support it, it interprets it as a "plain"file. If the file is not identified as obeying a convention, ditto. About self-containment, it seems to me similar e.g. to the requirement of page independence in postscript (which is good practice but can be and is violated sometimes). Should the STANDARD say something about this (food for IAUFWG or the IAUFWG Technical Panel). > "When an application reads an extension header with INHERIT = T, it > should merge the keywords in the current extension with the primary > header keywords, ..." > > What does "merge" mean here? E.g. creating a data structure with the keywords in the current extension, another with the keywords in the PHDU, and appending to the current one the kwds in the PHDU which are not overridden in the current. Sort of when configuration files in /etc, /usr/share, /usr/local and ~user are combined. > "... with the exclusion of the mandatory keywords, and any COMMENT, > HISTORY, and blank keywords in the primary header." > > If COMMENT and HISTORY keywords are not propagated then there should > also be some statement that extension headers must contain a full > complement of COMMENT and HISTORY cards whether or not they duplicate > those in the primary header. Commentary keywords are intended as human readable documentation, not for computer processing. So they'd appear only once. For the rest it seems normal sound practice for me to report in the PHDU computer-readable (keywords) information relevant to all extensions, e.g. the instrument configuration of the observation. Of course each extension should contain its own necessary data (so "array related" might include not only NAXIS* BSCALE WCS but also e.g TFIELDS and TT* keyowrds). If a file contains, say, n image extensions which are respectively 512x512, 512x1024 and 512x768, it seems silly (and incorrect) to me have NAXIS1=512 only in the PHDU. If a file contains n binary tables, each with 5 columns but with names and types different, it is also silly to have TFIELDS=5 only in the PHDU (which, BTW, is not a binary table) If a file contains n binary tables with different number of columns, but one of them (say the second) is always the same, it is equally silly having TTYPE2='energy' only in the PHDU. ***************************************************************************** From: Mark Calabretta Date: Fri, 13 Apr 2007 09:50:47 +1000 On Thu 2007/04/12 14:19:11 +0200, "Lucio Chiappetti's NoSpam Newsreading accoun t" wrote in a message to: fitsbits@donar.cv.nrao.edu >1) I doubt the authors of a convention (which, once again, is NOT a > standard) can "require" anything. I agree, you took me too literally! > The convention is all just a recommendation, or even less, a > suggestion. It's not even supposed to be a recommendation, simply a description of what was done. Cheers, Mark ***************************************************************************** Date: Thu, 12 Apr 2007 14:42:03 -1000 From: Maren Purves Mark Calabretta wrote: > On Thu 2007/04/12 14:19:11 +0200, "Lucio Chiappetti's NoSpam Newsreading accoun > t" wrote > in a message to: fitsbits@donar.cv.nrao.edu > > >>1) I doubt the authors of a convention (which, once again, is NOT a >> standard) can "require" anything. > > I agree, you took me too literally! maybe too literally, but if you're going to follow a convention you should follow all of it (otherwise you may end up with unreducible data), even if it's not a standard. Aloha, Maren