NCSA HDF Specifications DRAFT January 1993 University of Illinois at Urbana--Champaign Introduction Overview The Hierarchical Data Format (HDF) was designed to make the sharing of scientific data between different people, different projects, and different types of computers easy and self-describing. An extensible header, along with carefully crafted internal layers, provides a system that can grow along with the software that NCSA develops. This chapter provides a brief overview of HDF capabilities and design. Why HDF? A fundamental requirement of scientific data management is the ability to access as much information in as many ways and as quickly and easily as possible. To make this possible, there needs to be a data storage and retrieval system that facilitates these capabilities. Specific needs of such a system include the following. * Support for scientific data and metadata. Scientific data is characterized by a variety of different data types and representations, data sets (including images) that can be extremely large and complex, and the need to attach accompanying attributes, parameters, notebooks, and other metadata. * Support for a range of hardware platforms. Data can originate on one machine, only to be used later on many different machines. Scientists must be able to access data and metadata on as many hardware platforms as possible * Support for a range of software tools. Scientists need a variety of software tools and utilities for easily searching, analyzing, archiving, and transporting the data and metadata. These tools range from a library of routines for reading and writing data and metadata to small utilities that simply display an image on a console, to full-blown database retrieval systems that provide multiple views of thousands of sets of data and metadata. * Rapid data transfer. Both the size and the dispersion of scientific data sets require that mechanisms must exist to get the data from place to place rapidly. * Extendibility. As new types of information are generated and new kinds of science are done, a means must be provided to support them. What is HDF? The structure of HDF. HDF is a self-describing extensible file format based on the use of tagged objects that have standard meanings. The idea is to store both a known format description and the data in the same file. HDF tags describe the format of the data in the sense that each tag is assigned a specific meaning--one tag is assigned to "Color Palette," another is assigned to "Raster Image," and so on (see Figure 1). A program that has been written to understand a certain list of tag types can scan the file for those tag types and process the data. This program also can ignore any data that is beyond its scope. The set of available data objects encompasses both primary and secondary data (metadata). Most HDF objects are machine- and medium-independent, physical representations of data and metadata. HDF Tags. HDF is designed with the assumption that we cannot know a priori what types of data objects will be needed in the future, nor can we know how scientists will want to view their data. As new science is done, new types of data objects are needed, and new tags must be created. In order to avoid unnecessary proliferation of tags, and to insure that all tags are available to potential users who need to share data, a portable public domain library is available that interprets all public tags. The library contains user interfaces designed to provide views of the data that are most natural for users. As we learn more about the way scientists need to view their data, we can add user interfaces that reflect data models consistent with those views. Types of data and structures. HDF currently supports the most common types of data and metadata that scientists use, including multidimensional gridded data, 2d and 3d raster images, polygonal mesh data, multivariate datasets, sparse matrices, finite-element data, splines, non-Cartesian coordinate data, and text. In the future there will almost certainly be a need to incorporate new types of data, such as voice and video, some of which might actually be stored on other media than the central file itself. In this sense, it may be desirable to employ the concept of a "virtual file", which functions like a file, but doesn't fit our normal notion of a file as a monolithic sequence of bits stored entirely on a disk or tape somewhere. HDF also makes it possible for the user to include annotations, titles, and specific descriptions of the data in the file, so that files can be archived with human-readable information about the data and its origins. One collection of HDF tags supports a hierarchical grouping structure called vset that allows scientists to organize data objects within HDF files to fit their views of how the objects go together, much as a person in an office or laboratory organizes information in folders, drawers, journal boxes, and on their desktops. *** INSERT FIGURE HERE *** Backward and forward compatibility. An important goal of HDF is to maximize backward and forward compatibility among its interfaces. This is not always achievable, because changes sometimes have to be made to the way data is organized in order to enhance performance, to correct errors, or for other reasons. However, whenever possible, HDF files should not become out of date. For example, suppose a site falls far behind in the HDF standard, so its users can only work with the portions of the specification that are three years old. Users at this site might produce files with their old HDF software, then read them with newer software designed to work with more advanced data files. The newer software should still be able to read the old files. Conversely, if the site receives files that contain objects that its HDF software does not understand, it should still be able to list the types of data in the file, and it should still be able to access all of the older types of data objects that it understands, despite the fact that the older types of data objects are mixed in with new kinds of data. In addition, if the more advanced site uses the text annotation facilities of HDF effectively, the files will arrive Appendix A, "NCSA HDF Tags," presents a list of brief descriptions of the tags assigned at NCSA for general use. Appendix B, "Header Files," includes the general header files used in compiling all HDF libraries. Form of Presentation The material in this manual is presented in text or Presentation screen displays. Text In explaining various features and commands, this manual often presents a word within a paragraph in italics to indicate that the word is defined within the paragraph. Portions of this manual refer to other portions of the manual where the other portions explain related topics. These cross references usually mention the title of sections or chapters enclosed in quotation marks, such as, See Chapter 1, "The Basic Structure of HDF Files." Screen Displays. Screen displays in this manual are presented in Courier type. long process of redesigning the lower layers of HDF began. As of this writing, in Summer 1982, we are about to release the first version of HDF that incorporates the new lower layers of HDF. Use of This Manual This manual is designed for software developers who are designing applications or routines for use with HDF files and for users who need detailed information about HDF. Users who are interested in using HDF to store or manipulate their data do not normally need the kind of detail presented in this manual. They should instead consult a user manual, such as "HDF Calling Interfaces and Utilities," "HDF Vset", or perhaps a manual having to do with software that uses HDF. Manual Contents The manual is organized into the following chapters: Chapter 1, "The Basic Structure of HDF Files," introduces and describes the components and organization of Hierarchical Data Format files. Chapter 2, "HDF Software Overview," describes the organization of the software layers that make up the basic HDF library. Chapter 3, "The NCSA HDF General Purpose Interface," describes the HDF modules that make up the general purpose HDF routines, sometimes referred to as the lower layer of HDF. Chapter 4, "Sets and Groups," explains the role of sets and groups in an HDF file. It contains descriptions of raster image sets, scientific datasets, and Vsets. Vsets are covered in more detail in another chapter. Chapter 5, "Annotations," explains how annotations are currently organized in HDF files. Chapter 6, "Number Conversion," describes the HDF module that is used for number conversion. Chapter 7, "Vsets," describes the structure and functioning of the Vset module. Chapter 8, "Portability," describes techniques and conventions used in the HDF code to achieve portability. Chapter 9, "HDF Conventions," presents guidelines regarding the use of HDF that are not discussed elsewhere. Table of Contents Introduction Overview vii Why HDF vii What Is HDF viii Some History x Use of This Manual x Chapter 1 The Basic Structure of HDF Files Chapter Overview 1.1 File Header 1.1 Data Object 1.1 Physical Organization of HDF Files 1.4 Sample HDF File 1.5 Chapter 2 Software Overview Chapter Overview 2.1 Software Layers 2.1 Organization of HDF Software 2.2 Some HDF Conventions 2.5 Chapter 3 The NCSA HDF General Purpose Interface Chapter Overview 3.1 Introduction 3.1 Overview of the interface 3.2 Function Specifications 3.6 Chapter 4 Sets and Groups Chapter Overview 4.1 Sets 4.1 Groups 4.2 Raster Image Sets 4.4 Scientific Datasets 4.6 Vsets and Vdatas 4.12 Appendix: The Raster-8 Set 4.13 Chapter 5 Annotations Chapter Overview 5.1 Types of Annotations 5.1 File Annotations 5.1 Object Annotations 5.1 Getting Reference Numbers for Object Annotations 5.2 Chapter 6 Tag Specifications Overview 6.1 The HDF Tag Space 6.1 Physical Storage Methods 6.1 Specifications for Supported Tags 6.4 Chapter 7 Making HDF Portable Chapter Overview 7.1 The HDF Environment 7.1 Organization of Source Files 7.2 Passing Strings Between.FORTRAN and C 7.5 Function Return Values between FORTRAN and C 7.7 Differences in Acceptable Routine Names 7.8 ANSI C vs. Old C 7.11 Type Differences 7.12 Access to Library Functions 7.15 Figures and Tables Figure 0.1 Raster Image Sets in an HDF File viii Figure 1.1 Three Data Objects 1.1 Figure 1.2 A Data Descriptor 1.2 Figure 1.3 Model of a Data Descriptor Block 1.3 Figure 1.4 Sample Data Descriptor Block 1.4 Figure 1.5 Physical Representation of Data Objects 1.5 Figure 2.1 HDF software layers 2.1 Figure 4.1 Physical organization of Sample RIG Groupings 4.3 Figure 5.1 Three SDS Tags with Their Ref Numbers 5.1 Figure 5.2 Displayed Example of SDS, Ref #, and Annotation 5.2 Figure 6.1 Description Record for a Linked Block Element 6.2 Figure 6.2 A Linked Block Table 6.3 Figure 6.3 A Data Block 6.3 Figure 6.4 Description Record for an External Element 6.4 Figure 7.1 Illustration of the sequence of actions Involved when a FORTRAN call includes a string as a parameter 7.7 Table 1.1 Parts of a Data Descriptor 1.2 Table 1.2 Summary of the Relationships among Parts of an HDF File 1.4 Table 1.3 Sample Data Objects in an HDF File 1.5 Table 2.1 HDF 3.2 source code modules 2.5 Table 4.1 Tags for Raster Image Sets 4.5 Table 4.2 Additional tags for Raster Image Sets 4.5 Table 4.3 Required tags for SDG 4.8 Table 4.4 Optional Tags for SDG 4. Table 4.5 Required tags for NDG 4.9 Table 4.6 Optional Tags for NDG 4.10 Table 4.7 Required Tags for NDG structure that is compatible with SDG structure 4.10 Table 4.8 Tags for Raster-8 Sets 4.14 Table 5.1 HDF Annotation tags 5.1 Table 6.1 Number Type Values 6.7 Table 6.2 Possible Machine Types 6.8 Table 6.3 Possible Tag Types in an RIG 6.12 Table 6.4 Color Format String Values 6.16 Table 6.5 Possible Tag Types in an NDG 6.21 Table 6.6 Possible calibrated data types 6.28 Table 6.7 Possible Tag Types in an SDG 6.34 Table 6.9 Scientific Data Dimension Record Fields 6.12 Chapter 1 The Basic Structure of HDF Files Chapter Overview File Header Data Object Data Descriptor DD Blocks Data Element Naming and Assigning Tags Physical Organization of HDF Files Sample HDF File Chapter Overview This chapter introduces and describes the components and organization of Hierarchical Data Format (HDF) files. File Header The first component of an HDF file is the file header (FH), which takes up the first four bytes in an HDF file. The file header is a signature that indicates that the file is an HDF file. Specifically, it is the 32-bit magic number with the 32-bit hexadecimal value 0e031301. NOTE: HDF assumes big-endian order in reading and writing files. On some machines the order of bytes in the file header might be swapped when the header is written to an HDF file, causing these characters to be written in little endian. To maintain portability of HDF files when developing software for such machines, you should counteract this byte-swapping by making sure the characters are read and written in the exact order shown. Data Object The basic building block in an HDF file is the data object, which contains both data and information about the data. A data object has two parts: a 12-byte data descriptor (DD) and a data element. Figure 1.1 shows three examples of data objects. As the names imply, the data descriptor gives information about the data, and the data element it the data itself. In other words, all data in an HDF file has attached to it information about itself. In this sense, HDF files are examples of self-describing files. ED. NOTE: Figures are not available in this plain text version of the specification. Figure 1.1 Three Data Objects Data Descriptor (DD) A data descriptor (DD) has four fields: a 16-bit tag, a 16-bit reference number, a 32-bit data offset, and 32-bit data length. These parts of a DD are depicted in Figure 1.2 and are briefly described in Table 1.1. Explanations of each part appear in the paragraphs following Table 1.1. *** INSERT FIGURE HERE *** Table 1.1 Parts of a Data Descriptor Part Description tag designates the type of data in a data element reference number uniquely distinguishes corresponding data element from others with the same tag data identifier tag/ref; uniquely identifies data element offset byte offset of corresponding data element length length of data element Tag A tag is the part of a data descriptor that tells what kind of data is contained in the corresponding data element. A tag is actually a 16-bit unsigned integer between 1 and 65535, but every tag is also usually given a name that programs can refer to instead of the number. If a DD has no corresponding data element, the value of its tag is DFTAG_NULL, indicating that no data is present.. A tag may never be zero. Tags are assigned by NCSA as part of the specification of HDF. The following ranges are to be used to guide tag assignment: 00001 - 32767 reserved for NCSA use 32768 - 64999 user-definable 65000 - 65535 reserved for expansion of the format Appendix A contains full specifications for all currently supported NCSA HDF tags. Appendix B, "Assigned Tag Numbers," contains the current number assignments. See the section 'Some HDF Conventions" in the chapter "Software Overview" for more information on allocating tags. Reference Number For each occurrence of a tag in an HDF file, a unique reference number is stored with the tag in the data descriptor. Reference numbers are 16-bit unsigned integers. Reference numbers are not necessarily assigned consecutively, so you cannot assume that the actual value of a reference number has any meaning beyond providing a way of distinguishing among objects with the same tag. Data Identifier The combination of a tag and its reference number uniquely identifies the corresponding data object in the file. For this reason, the tag/ref combination is sometimes referred to as a data identifier. Data Offset and Length The data offset reflects the byte position of the corresponding data element from the start of the file. The length gives the number of bytes occupied by the data element. Offset and length are both 32-bit unsigned integers. DD Blocks Data descriptors are stored physically in a linked list of blocks called data descriptor blocks, or DD blocks. The individual components of a data descriptor block are depicted in Figure 1.3. All of the DDs in a DD block are assumed to contain significant data unless they have a tag that is equal to DFTAG NULL (no data). In addition to its DDs, each data descriptor block has a data descriptor header (DDH). The DDH has two fields--a block size field and a next block field. The block size field is a 16-bit unsigned integer that indicates the number of DDS in the following DD block. The next block field is a 32-bit unsigned integer giving the offset of the next DD block, if there is one. The last DDH in the list contains a 0 in its next block field. *** INSERT FIGURE HERE *** Data Element A data element is the raw data part of a data object. Its basic data type is determined by its tag, but other interpretive information may be required before it can be processed properly. Each data element is stored as a set of contiguous bytes starting at the offset given in the corresponding DD (see Figure 1.4).(1) *** INSERT FIGURE HERE *** Physical Organization of HDF Files Physically, the file header, DD blocks, and data elements are organized as follows. The file header is followed by the first DD block, which is followed by data elements and, if necessary, more DD blocks. These relationships are summarized in Table 1.2. There are no rules governing the distribution of DD blocks and data elements within a file, except that the first DD block must follow immediately after the file header. The pointers in the DD headers connect the DD blocks in a linked list, and the offsets in the individual DDs connect the DDS to the data elements. Beyond this basic structure there is no assumed order among the objects in an HDF file. Table 1.2 Summary of the Relationships among Parts of an HDF File Part Constituents HDF File FH, DD-block, data, DD-block, data, DD-block, data ... F H oxOe031301 (32 bit magic number) DD-block DDH, DD, DD, DD ... DDH number-of-DDs (16 bits], offset-to-next-DD block (32 bits) DD tag (16 bits), ref [16 bits], offset (32 bits),length (32 bits) (1) Some HDF software provides the capability of storing objects as a series of linked blocks or external elements, but this occurs at a higher level. At the lowest level each object with a tag/ref is stored contiguously. Sample HDF File Consider an HDF file that contains two 400-by-600 8-bit raster images. Typically, such a file might contain the objects described in Table 1.3. Table 1.3 Sample Data Objects in an HDF File Tag Ref Data FID 1 file identifier: user-assigned title for file FD 1 file descriptor: user-assigned block of text describing overall file contents IP8 1 Image palette (768 bytes) ID8 1 x and y dimensions of the 2D arrays that contain the raster images (4 bytes) RI8 1 first 2D array of raster image pixel data (x*y bytes) RI8 2 second 2D array of pixel data (also x*y bytes) Assuming, for example, that the size of a DD block is 10 DDs, the physical organization of the contents of the file might be described as shown in Figure 1.5. Figure 1.5 Physical Representation of Data Objects Offset Contents 0 FH 4 DDH (10 0) 10 DD (FID 1 130 4) 22 DD (FD 1 134 41) 34 DD (IP8 1 175 768) 46 DD (ID8 1 943 4) 58 DD (RI8 1 947 240000) 70 DD (RI8 2 240947 240000) 82 DD (empty) 94 DD (empty) 106 DD (empty) 118 DD (empty) 130 "sw3" 134 "solar wind simulation: third try. 8/8/88" 175 943 : 400, 600 947 240947 In this instance, the file contains two raster images. The two images have the same dimensions and are to be used with the same palette. So, the same data objects for the palette (IP8) and dimension record (ID8) can be used with both images. Chapter 2 HDF Software Overview Chapter Overview Introduction Software Layers Organization of HDF Software Versions and Release Numbers ANSI C and Portability Modules and Interfaces Header Files The HDF Test Suite and Examples Some HDF Conventions Naming and Assigning Tags Using Reference Numbers to Organize Data Objects Multiple References and File Compaction Chapter Overview This chapter contains a description of how HDF software is organized. It also contains some guidelines on writing HDF software. HDF Software Layers HDF-based software comes in four basic forms: an HDF interface library, user programs that store and retrieve data in HDF files, HDF command-line utilities, and HDF-based software tools. The HDF interface library has two types of interfaces: (1) sets of general purpose routines that form the basis of all higher-level HDF development, and (2) application interfaces that support higher level views of data. User programs access HDF files via calls to the HDF library. User programs are attached to the HDF library when they are compiled and linked. The HDF command-line utilities are a group of programs that are distributed with the HDF library. The functionality of the command-line utilities ranges from general purpose, such as listing the contents of an HDF file, to special purpose, such as converting data between different HDF data types (e.g., raster images to scientific data sets). In general, the utilities perform data management tasks. In contrast, HDF-based software tools usually perform data analysis tasks and have polished interactive user interfaces. They include the NCSA Visualization Tool Suite and commercial software packages that use HDF. HDF software is implemented in layers, as illustrated in Figure 2.1. At the lowest level are the general purpose modules, which perform basic I/O. At the next level are interfaces that reflect commonly used objects such as B-bit raster images (RIS8) and multidimensional arrays (SDS). At the top layer are users' programs, utilities, and software tools such as the NCSA visualization software. *** INSERT FIGURE HERE *** The general purpose interfaces are described in detail in this document. Descriptions of the applications interfaces and command-line utilities can be found in the manual "HDF Calling Interfaces and Utilities." Each HDF-based software tool should have its own manual. Since the NCSA user community writes programs primarily in C and Fortran, all of the HDF application interfaces developed at NCSA are callable from both C and Fortran programs. Since the general purpose interface is primarily for program development, not for applications, it provides C routines only. Organization of Software Versions and Release Numbers Since HDF is under continual development, new releases are periodically made available. An HDF version number looks like "3.2r1" which means that it is major version 3, minor version 2, release 1. The three parts of a version number have different meanings: * A new major version number implies that there is some fundamental difference between this code and code with earlier major version numbers. When a new major version is made available, HDF users and developers are strongly encouraged to obtain the new source code and documentation. There will likely be added functionality in successive major versions.of the library and possibly some deletion of obsolete code, so some user code may have to be modified to use the new library. * The meaning of a new minor version number is somewhat less well defined. It essentially means that there is some appreciable difference in the new code which was not deemed drastic enough to warrant a new major version, but is more substantial than a new release number would indicate. * A new release number implies some bug fixes or other small modifications have been made to the code. Using a new release of the same version of the library will not usually require modification of existing user code. ANSI C and Portability In order to provide for easy porting of HDF to new platforms, all versions of the HDF source code from version 3.2 on will be written in ANSI standard C, with special provisions made for non-ANSI compilers. For more information about porting HDF and writing portable HDF-based code, refer to the chapter "Making HDF Portable." Modules and Interfaces The HDF distribution contains many source files or modules which can be grouped into families according to their root name. For example, dfp.c, dfpf.c and dfpff.f all share the root name "dfp" and, therefore, all belong to the "dfp" family. In general, each family of source modules represents one HDF applications interface. Thus, the "dfp" family together represent the HDF Palette Interface. There are a few exceptions to this rule which will be discussed later in this section. For each interface, there is necessarily one file that contains the C Code that provides the basic functionality of that interface. But some interfaces may have one or two additional code modules that provide Fortran callability for the interface. So there are three possible family sizes: 1 file: Modules of this sort are generally not calling interfaces themselves, but rather provide useful support functions for actual calling interfaces. Since they are not meant to be called by any routine outside the HDF library itself, they do not need to be callable from Fortran programs. An example of such a module is hblocks.c. 2 files: Although there are currently no examples of this situation, it is conceivable (and desirable) that some future interface may need only one extra source module to provide Fortran compatibility. If this were to happen, there would only be two source modules for the interface. For instance, dfnew.c and dfnewf.c would make up the "New Interface." 3 files: Most current implementations of Fortran-callable HDF interfaces require the passing of character string arguments to some of their functions. Due to differences in the way C and Fortran represent strings, the passing of strings requires that there be a small amount of special purpose Fortran code written for each function that takes a string argument. For this reason, most Fortran-callable HDF interfaces consist of three source modules: (1) the primary C module, (2)a Fortran-callable C module, and (3) a Fortran module. For example, dfsd.c, dfsdf.c and dfsdff.f make up the Scientific Data Set Interface. dfsd.c contains the basic functionality of the interface, dfsdf.c provides the major part of Fortran callability, and dfsdff.f contains the special purpose Fortran code that allows the passing of character string arguments. Header Files In addition to the source code modules discussed above, some interfaces also have C header files associated with them that are meant to be included by C applications programmers with the "#include" preprocessor directive. They contain some useful constants and data structures for interaction with the interface from C programs. The header files can be identified by the same name as the root name for the rest of the family with the ".h" extension added. For example, dfsd.h is the header file for the scientific Data Set Interface. Of particular importance among the header files are hdf.h and hdfi.h. hdf.h is the C header file that must be included by any program that calls the HDF library. It contains all the symbolic constants and public data structures that are needed to use HDF. hdfi.h contains specific portability information about each platform on which HDF is supported. It is automatically included in programs when hdf.h is included, so programmers need not explicitly include it. For more information on hdfi.h and other portability issues, refer to the Chapter "Making HDF Portable.". Table 2.1 shows all of the source code modules and header files grouped into families for HDF 3.2. Table 2.1 HDF 3.2 source code modules general general grouping utilities Vsets Old headers purpose (non- general Vset) purpose hdf.h hfile.c dfgroup.c dfutil.c vg.c dfstubs.c hdfi.h hfilef.c dfgroup.h dfutilf.c vgf.c dff.c hproto.h hfileff.f dfutilff.f vgff.f dfff.f dfivms.h hkit.c dfutil.h vfp.c df.h hblocks.c vgi.h dfi.h hextelt.c vio.c dfstubs.h herr.c vconv.c herrf.c vparse.c hfile.h vrw.c herr.h vsfld.c vg.h vproto.h 8/24 bit general palettes scientifi annotatio special raster raster c data ns FORTRAN sets dfr8.c dfgr.c dfp.c dfsd.c dfan.c constants.f dfr8f.c dfgr.h dfpf.c dfsdf.c dfanf.c functions.f dfr8ff.f dfcomp.c dfpff.f dfsdff.f dfanff.f df24.c dfimcomp.c dfsd.h dfan.h df24f.c dfrig.h df24ff.f The HDF Test Suite and Examples In addition to the source code for the HDF library, versions 3.2 and higher will have an available suite of test programs There are at least two test programs for most interfaces: one for the C version and one for the Fortran-callable version. Some interfaces have more than two test programs to test special features of that interface and some have only one test program, since they only provide C-callability. Every effort will be made to ensure that the test programs provide a thorough and accurate assessment of the health of the HDF library. Although it is hoped that the test suite will greatly improve the reliability of HDF code, it is almost inevitable that some parts of the code will be untested. Therefore, no guarantees can be made on the basis of test suite performance. There is also a set of example programs to help users write HDF programs. They illustrate some of the common ways in which users program with HDF. Some HDF Conventions The specification of HDF described in the previous chapter is not sufficient to guarantee its success. It is also important for users to adhere to certain conventions in using HDF. Guidelines in the use of HDF are implicit in many discussions in other sections of this document, and others are presented in the manual "HDF Calling Interfaces and Utilities." Guidelines not covered elsewhere are introduced in this section. Naming and Assigning Tags Tags that are to be made available to a general population of HDF users should be assigned and controlled by NCSA. Tags of this type are given numbers in the range 1-32,767. If you have an application that fits this criterion, contact NCSA at the address listed on the README page at the beginning of this manual and specify the tags you would like. For each tag, your specifications should include a suggested name, information about the type and structure of the data that the tag will refer to, and information about how the tag will be used. Your specifications should be similar to those contained in Appendix A. NCSA will assign you a set of tags for your application and include your tag descriptions in its documentation. Tags in the range 32,768-64,999 are user-definable. That is, you can assign them for any private application. Of course, if you use tags in this range you need to be aware that they may conflict with other people's private tags. Using Reference Numbers to Organize Data Objects The HDF library itself uses reference numbers solely for the purpose of distinguishing between different objects with the same tag. While application programmers may find it convenient to impart some meaning to reference numbers, they should be forewarned that the HDF library will be ignorant of any such meaning. In other words, any meaning attached to reference numbers exists only at the application program or software tool level. Some users have used reference numbers to indicate how objects should be grouped by considering all objects with the same reference number to be part of the same group. This practice is not recommended. Instead, if object grouping is desired it is recommended that you use either the simple grouping procedures used by the SDS, RIS8, and RIS24 applications (supported by the routines in dfgroup.c), or the more general (and more complex) Vset structures. Another possible use of reference numbers is for keyed access to HDF objects. An HDF data identifier (tag/ref) provides an unique identifier for any HDF object within a file, and hence could be used as a primary key for that object. One could keep a table of data identifiers as a way of providing random access to HDF objects. Reference numbers might also be used to impose an ordering on HDF objects. Once again, because the assignment scheme for reference numbers in HDF files does not guarantee any order, caution is advised in this uses of reference numbers. Multiple References Multiple references to a single data element are quite common in HDF. The general purpose routine Hdupdd generates a new reference to data that is already pointed to by another DD. If Hdupdd is used several times, there could be several DDs that point to the same data element. It is important to note that when a multiply-referenced data element is deleted or moved, the various DDs that previously pointed to the data element are not automatically deleted or adjusted to point to the data element in its new location. Consequently, each DD to be deleted or moved should be checked for multiple references and handled as the programmer sees fit. Chapter 3 The NCSA HDF General Purpose Interface Chapter Overview Introduction Overview of the Interface Function Specifications Opening and Closing Files Finding Tags, Refs, and Element Lengths Reading and Writing Entire Data Elements Reading and Writing Part of a Data Element Manipulating Data Descriptors (DDs) Creating Special Data Elements Development Routines Error Reporting Chapter Overview This chapter contains a detailed description of the routines that make up the general purpose HDF interface. Introduction NCSA supports interfaces for HDF users--both high level interfaces to support certain application areas, such as image processing, and low level general purpose interfaces for performing basic operations on HDF files. These interfaces are written in C only but most functions are typically accessible from Fortran. The routines in the general purpose interface enable you to build and manipulate HDF objects of any type, including those of your own invention. All HDF applications developed at NCSA use these routines as their basic building blocks. The routines described in this chapter represent a second set of general purpose routines. All HDF applications prior to HDF 3.2 (released in June 1992) used an earlier set of general purpose routines. These low level general purpose routines have been changed to allow for better functionality. Old routines will still be emulated but at a cost of reduced functionality. Users are strongly advised to use the new interface. The new lower layer, first used with HDF Version 3.2, incorporates the following improvements over its predecessor: * More consistent data and function types. * An error handling module that supports more meaningful and extensive reporting of errors. * Simplification of key lower level functions. * Simplified techniques for facilitating portability. * Support for alternate forms of physical storage, such as linked blocks storage, and storage of the data portion of an object in an external file. * A version tag indicating which version of the HDF library last changed an HDF file. * Support for simultaneous access to multiple files. * Support for simultaneous access to multiple objects within a single file. The previous lower layer is called the "DF layer", because all routines began with the letters "DF", as in "DFopen" and "DFclose." The new layer is called the "H layer" because all routines begin with the letter "H" (Hopen, Hclose, Hwrite, etc.). The source modules that implement these changes can be found in files that begin with the letter "h". Also, the number of basic source modules has changed, and now includes: hfile.c basic I/O herr.c error-handling hkit.c general purpose routines hblocks.c to support linked block physical storage hextelt.c to support external storage of HDF data Overview of the interface Following is a listing of the public functions that can be found in the general purpose interface. This section provides specifications and descriptions of these routines. Opening and Closing HDF Files These calls are used to open and close HDF files. Hopen Provides an access path to an HDF file. It also reads into memory all of the DD blocks in the file. Hclose Closes the access path to a file. Locating Elements for Access and Getting Information These routines make it possible to locate elements or find out other information. Except for Hendaccess, they initialize the element that they locate and return an access id that is used in later references to the data element. Calls to them can include wild cards so that one can search for unknown tags and refs. Hstartread Locates an existing data element with matching tag/ref and returns an access id for reading it. Hnextread Continues the search with the same access id. Hstartwrite Allows writing to the object with the supplied tag/ref. If the object exists, the object will be modified, otherwise it is created. Hendaccess Disposes of access id for tag/ref. Hinquire Returns access information about a data element. Hishdf Determines whether a file is an HDF file. Hnumber Returns the number of occurrences of a specified data identifier (tag/ref) in a file. Hgetlibversion Returns version information for the current HDF library Hgetfileversion Returns version information for an HDF file Reading and Writing Entire Data Elements There are two sets of routines for reading and writing data elements. The set of routines described here is used to store and retrieve entire data elements. A second set of routines, described in the next section, may be used if you wish to access only part of a data element at a time. Hputelement Adds or replaces elements in a file. Hgetelement Obtains the data referred to by the tag/ref combination that is passed to it. Reading and Writing Part of a Data Element The second set of routines for reading and writing data elements makes it possible to read or write all or part of a data element, in contrast to the routines described above which can only read or write an entire element. One of the access routines Hstartread or Hstartwrite must be called before calling these routines. Hwrite Appends data to a data element. It starts at the last position left by a Hwrite or Hseek command, writes up to a specified number of bytes, then leaves the access pointer at the end of the data written. Hread Reads a portion of a data element. It starts at the last position left by a Hread or Hseek command and reads any data that remains in the element up to a specified number of bytes. Hseek Sets the access pointer to an offset within a data element. The next time Hread or Hwrite is called, the access occurs from the new position. The location to seek to can be specified as an offset from the current location or from the start of the element. Manipulating Data Descriptors (DDs) These routines perform operations on DDs without doing anything with the data to which the DDs refer. Hdupdd Is used to generate new references to data that is already referenced from somewhere else. Hdeldd Deletes a tag/ref from the list of DDs. Hnewref Returns the next available reference number for the HDF file. Creating Special Data Elements HDF 3.2 introduces two alternate methods of physical storage for HDF objects. Previously, all of the objects in an HDF "file" had to be in the same file and any given object had to be contiguous. This last requirement caused many problems, especially with regard to appending to existing objects. Objects needed to be deleted and rewritten to the end of the file in order to append to them. The two new storage methods are "linked blocks" and "external elements". Linked blocks allow elements in a single HDF file to be non-contiguous. External elements allow a single HDF object to be stored in an external file. It is not currently possible to have a single object (such as a very large data set) stored in multiple files. Nor is it possible to have multiple objects stored in an "external" file. Special data elements can be accessed with the same routines as for normal data elements once they are created. These routines create special data elements. HLcreate Creates a new linked block special data element. HXcreate Creates a new external file special data element. Both of these routines have two modes of operation. For example, calling HLcreate with a tag and ref which do not exist in a file will create i new element with the given tag and ref that will be stored as linked blocks. On the other hand, if the tag/ref pair already existed in the file, the referenced object is "promoted" to being stored as linked blocks. All data which had been stored in the object before the promotion is retained. HXcreate behaves similarly. Development Routines The HDF library provides a number of "developer" level routines that are meant to simplify the task of writing HDF applications. most of these routines mirror basic C library functions which are, unfortunately, not always completely portable in their library form. HDgettagname Return a pointer to a text string describing a given tag. HDgetapace Allocate space. HDfreespace Free space. HDstrncpy Copy a string from one location to another up to a given number of characters. Error Reporting The HDF library now provides a much more robust error reporting scheme. Previously, only a single error value could be returned to the user. There is now the notion of an error stack. This allows for more of the context to be known when trying to decipher a problem. HEprint Print out all of the errors on the error stack to a specified nfile. HEclear Clear the error stack. HERROR Macro to report an error. This will push the error type, file name, line number and name of the function reporting the error. HEreport Add a text string to the description of the most recently reported error. Only a single text string may be supplied per error. The only problem with the error module is that standard C does not have any way for the code inside a function to know the name of the function. Therefore, in order to use the macro HERROR to report errors, there must exist a variable FUNC which points to a string containing the name of the reporting function. Other Hsync Synchronize stored version of HDF file with image in memory. Function Specifications Opening and Closing files Hopen int32 Hopen(char *path, int access, int16 ndds) path IN: Name of file to be opened access IN: DFACC_READ, DFACC_WRITE, DFACC_CREATE or anybitwise-or of the above ndds OUT: Number of dds in a block if this file needs to be created Purpose: Provides an access path to an HDF file. It also reads into primary memory all of the DD blocks in the file. Returns: On success returns file id, on failure returns FAIL. Description: Opens an HDF file. Interpretations of access: HDF provides several constants for use as access privilege codes. Below is a list of these codes and their meanings. It is important to note that these constants are NOT bitflags and should NOT be or'd together to combine access modes. Doing so may cause odd behavior and, in some cases, loss of data. Recommended: DFACC_READ: Open for read only. If file does not exist, error. DFACC_RDWR: Open for read/write. If file does not exist, create it. DFACC_CREATE: Force creation. If file exists, delete it, then open a new file for read/write. (in the spirit of UNIX "clobber") Others: DFACC_ALL: Same as DFACC_RDWR. DFACC_WRITE: Same as DFACC_RDWR. On successful exit, * File_rec members are filled in. * File is opened with the relevant permission. * Information about dd's are set up in memory. For a new file, in addition, * The file headers and initial information are set up. Hclose intn Hclose(int32 id) id IN: the file id of the file to be closed Purpose: Closes the access path to the file. Returns: SUCCEED (0) if successful and FAIL (-1) if failed. Description: Id is first validated. If valid, the function closes the acces path to the file. If there are still access elements attached to the file, the e DFE_OPENAID is returned and the file is not closed. This is a fairly common error when developing new interfaces. the discussion of Hendaccess below for hints on how to debug problem. Locating Elements for Access and Getting Information Hstartread int32 Hstartread(int fileid, int tag, int ref) fileid IN: id of file to attach access element to tag IN: tag to search for ref IN: ref to search for Purpose: Locate an existing data element with matching tag/ref and return a descriptor for reading it. Returns: On success returns id of access element if successful, otherwise FAIL (-1). Description: Searches the DD's for a particular tag/ref combination. Wildcards can be used for tag or ref (DFTAG_WILDCARD, DFREF_WILDCARD) and they match any values. Searching on wildcards begins from the beginning of the DD list. If the search is successful, the access element is positioned to the start of that tag/ref, otherwise it is an error. An access element is created and attached to the file. Hnextread intn Hnextread(int32 access_id, int16 tag, int16 ref, int origin) access_id IN: Id of a READ access elt tag IN: the tag to search for ref IN: ref to search for origin IN: from where to start searching Purpose: Locate and position a read access id on next occurrence of tag/ref. Returns: SUCCEED (0) if successful and FAIL (-1) otherwise. Description: Searches for the "next" DD that fits the tag/ref. Wildcards apply. If origin is DF_START, search from start of DD list, if origin is DF_CURRENT, search from current position. Searching from the end of the file via DF_END is not yet implemented. If the search is successful, then the access element is positioned at the start of that tag/ref, otherwise, the access_id is not modified. Hstartwrite int32 Hstartwrite(int fileid, int tag, int ref, long len) fileid IN: Id of file to write to tag IN: tag to write to ref IN: ref to write to length IN: the length of the data element Purpose: Creates or replace data element with matching tag/ref. Returns: Id of access element if successful and FAIL otherwise. Description: Set up an access element to write out a data element. DD list of the file is searched first. If the tag/ref is four the data element is NOT replaced; rather, it is then possible modify the existing data. If an object with the corresponding and ref does not exist, a new one is created. Hendaccess int32 Hendaccess(int access_id) access-id IN: id of access element to dispose of Purpose: Disposes of descriptor for tag/ref. Returns: returns SUCCEED (0) if successful, FAIL (-1) otherwise. Description: Used to dispose of an access element. There is only a finite number of access elements allowed to be active at a time. Therefore, it is very important to call Hendaccess whenever you are done using an element. When developing new interfaces, we have found that a fairly common mistake is to not call Hendaccess for all of the elements accessed. When this happens, Hclose will return FAIL, and the dump of the error stack (see HEprint, below) will tell how many access elements are still active. This is a rather difficult problem to debug, as the low level the HDF library have really no idea who and where opened an access element and forgot to release it. It's tedious, but the most effective means we have found to debug this problem is to annotate the locations where the `attached' count of a file record is changed (there are a couple of places in hfile.c ar few in hblocks.c and hextelt.c). Hinquire intn Hinquire(int access_id, int32 *pfile_id, uint16 *ptag, uintl6 *pref, int32 *plength, int32 *poffset, int32 *pposn, int *paccess, int *pspecial) access_id IN: Id of an access elt pfile_id OUT: file id ptag OUT: tag of the element pointed to pref OUT: ref of the element pointed to plength OUT: length of the element pointed to poffset OUT: offset of elt in the file pposn OUT: position pointed to within the data elt paccess OUT: the access type of this access elt pspecial OUT: special code Purpose: Returns access information of a data element. Returns: Returns SUCCEED (0) if the access elt points to some data element, otherwise FAIL (-1). Description: Inquire statistics of the data element pointed to by access element. If a piece of information is not needed, it is possible to send NULL in for that value. There are a set of convenience macros for calls to Hinquire (HQuerypositon, HQuerylength, etc ... ) defined in hdf.h. Hishdf int32 Hishdf(char *Path) path IN: name of file Purpose: Determine if a file is an HDF file. Returns: Returns TRUE (non-zero) if file is HDF, FALSE (0) otherwise. Description: The decision of where a file is and HDF file or not is based solely on the magic number stored in the first four bytes of an HDF file. It is possible that Hishdf will identify a file as an HDF file but Hopen will be unable to open the file (for example if the DD list in the file is corrupted). Hnumber int Hnumber(int32 file-id, uint16 tag) file id IN: file id tag IN: tag to be counted Purpose: Find the number of occurrences of tag/ref in file. Returns: The number of instances of a tag in a file. Hgetlibversion Hgetlibversion--return version info for current HDF library USAGE Hgetlibversion(uint32 *majorv, uint32 *minorv, uint32 *release, char string[]) majorv OUT: majorv version number minorv OUT: minorv version number release OUT: release number string OUT: informational text string (80 chars) Purpose: Get version information for current HDF library. Returns: Returns SUCCEED (0). Description: Returns the version of the HDF library. The version information is statistically compiled into the HDF library, so it is not necessary to have any open files for this function to execute. Hgetfileversion Hgetfileversion--return version info for HDF file USAGE Hgetfileversion(uint32 file-id, uint32 *majorv, uint32 *minorv, uint32 *release, char string[]) file_id IN: handle of file majorv OUT: majorv version number *minorv OUT: minorv version number release OUT: release number string OUT: Informational text string (80 chars) Purpose: Get version information for an HDF file. Returns: Returns SUCCEED (0) if successful and FAIL (-1) if failed. Description: Returns the HDF version number stored in the given file. It is still an open question as to what exactly the version number of a file should mean, so we recommend that user code not call this function. Reading and Writing Entire Data Elements Hputelement int Hputelement(int fileid, int tag, int ref,.char *data, long length) fileid IN: Id of file tag IN: tag of data element to put ref IN: ref of data element to put data IN: pointer to buffer length IN: length of data Purpose: Add or replace element in a file. Returns: Returns SUCCEED (0) if successful and FAIL (-1) otherwise. Description: Writes a data element or replace an existing data element in a HDF file. Uses Hwrite and its associated routines. Hgetelement int Hgetelement(int file_id, int tag, int ref, char *data) file_id IN: Id of the file to read from tag IN: tag of data element to read ref IN: ref of data element to read data OUT: buffer to read into Purpose: Obtains the data referred to by the tag/ref combination that passed to it. Returns: Returns SUCCEED (0) if successful, FAIL (-1) otherwise. Description: Read in a data element from a HDF file and puts it into buffer pointed to by data. The space allocated for buffer is assumed to be large enough. Reading and Writing Part of a Data Element Hread int32 Hread(int access_id, long length, char *data) access_id IN: Id of READ access element length IN: length of segment to read in data OUT: pointer to data array to read to Purpose: Read a portion of a data element. Returns: Returns length of segment actually read in if successful and FAIL otherwise. Description: Read in the next segment in the data element pointed to by .the access element. It starts at the last position left by a Hread, or Hseek command and reads any data that remains in the element up to a specified number of bytes. If the data element is too short then it only reads to end of the data element. Hwrite int32 Hwrite(int access_id, long len, char *data) access_id IN: Id of WRITE access element len IN: length of segment to write data IN: pointer to data to write Purpose: Write next data segment to data element. Returns: Returns length of segment successfully written, FAIL (-1) otherwise. Description: Write the data to data element where the last write or Hseek() stopped. It starts at the last position left by a Hwrite command, writes up to a specified number of bytes, then leaves the write pointer at the end of the element. If the space reserved is less than the length to write, then only as much as can fit is written. It is the responsibility of the user to insure that no two access elements are writing to the same data element. It is possible to interlace writes to more than one data elements in the same file though. Hseek intn Hseek(int32 access_id, long offset, int origin) access_id IN: Id of access element offset IN: offset to seek to origin IN: position to seek from by offset, 0: from beginning; 1: current position; 2: end of data element Purpose: Set the access pointer to an offset within a data element. The next time Hread or Hwrite is called, the read or write occurs from the new position. Returns: Returns FAIL (-1) if fail, SUCCEED (0) otherwise. Description: Sets the position of an access element in a data element that the next Hread or Hwrite will start from that position. origin determines the position from which the offset should be added. This routine fails if the access element is not associated with any data element and if the seeked position is outside c the data element. Seeking from the end of a data element is not currently supported. Manipulating Data Descriptors Hdupdd int Hdupdd(int32 file_id, uint16 tag, uint16 ref, uint16 old_tag, uint16 old_ref) file id IN: Id of file tag IN: tag of new data descriptor ref IN: ref of new data descriptor old_tag IN: tag of data descriptor to duplicate old_ref IN: ref of data descriptor to duplicate Purpose: Generate new references to data that is already referenced from somewhere else. Returns: Returns SUCCEED (0) if successful, FAIL (-1) otherwise. Description: Duplicates a data descriptor so that the new tag/ref points to the same data element pointed to by the old tag/ref. Hdeldd int Hdeldd(int file_id, int tag, int ref) file id IN: Id of file tag IN: tag of data descriptor to delete ref IN: ref of data descriptor to delete Purpose: Delete a tag/ref from the list of DDs. Returns: Returns SUCCEED (0) if successful, FAIL (-1) otherwise. Description: Deletes a data descriptor of tag/ref from the dd list of the file. This routine is unsafe and may leave a file in a condition that is not usable by some routines. Use with care. Hnewref uint16 Hnewref(int32 file_id) file-id IN: id of file Purpose: Return the next available ref for HDF file. Returns: Returns the ref number, 0 otherwise. Description: Returns a ref number that can be used with any tag to produce a unique tag/ref. Successive calls to Hnewref will generate a strictly increasing sequence until the highest possible ref had been returned, then Hnewref will return unused ref's starting from 1. Creating Special Data Elements HLcreate int32 HLcreate(int32 file_id, uint16 tag, uint16 ref, int32 block_length, int32 number_blocks) file_id IN: Id of file tag IN: tag of new data descriptor ref IN: ref of new data descriptor block_length IN: length of blocks to be used number-blocks IN: number of blocks to use per linked block record Purpose: Create a new linked block special data element. Returns: Access Id for special data element if successful, otherwise (-1). Description: Appending to existing elements has been a problem in HDF in the past as HDF objects were required to be stored contiguous. When appending, the HDF library had forced the use to delete the existing element and move it to the end. With HDF 3.2 we had added the concept of linked blocks which allow unlimited appending to existing elements without copying over existing data. Initially, a table is set up to accommodate numer_blocks linked blocks for this object. Each block has size block_length bytes. If an existing object is being promoted, block_length does not have to be the same size as the original element. This routine can be used to either create an object with the given tag ref as a linked block element, or promote an existing element to be stored with linked blocks. This routine will return an active access id with write permission to the linked block element. HXcreate int32 HXcreate(int32 file_id, uint16 tag, uint16 ref, char *extern_file_name) file_id IN: file record id tag, ref IN: tag/ref of the special data element to create extern_file_name IN: name of external file to use as data element Purpose: Create a new external file special data element. Returns: Access id for special data element if successful, otherwise FAIL (-1). Description: This routine is used to create a new element in an external file or promote an existing element to be in an external file. if an existing element is to be promoted, it is deleted from the original file and copied over into the new external file. Distributing a single object over multiple external files is currently not supported. In addition, it is not possible to place multiple objects into the same external file. This routine will return an active access id with write permission to the external element. Development Routines HDgettagname char *HDgettagname(uint16 tag) tag IN: tag to look up Purpose: Get a meaningful description of a tag. Returns: A pointer to a string describing this tag or NULL if the tag unknown. Description: To reduce on the amount of reduplicated code, this rout can be used to map a tag to a character string containing the name of the tag. If the tag is unknown, NULL is returned as programs may have different ways of dealing with unknown tags For formatting purposes, the string returned by this routine guaranteed to be 30 characters or less. HDgetspace void *HDgetspace(uint32 qty) qty IN: number of bytes to allocate Purpose: Allocate space. Returns: Pointer to space that was allocated. Description: This routine is very platform-dependent. It uses an appropriate allocation routine on the local machine to get space HDfreespace void *HDfreespace(void *ptr) ptr IN: pointer to previously-allocated space to be freed Purpose: Free space. Returns: NULL. Description: It uses an appropriate routine on the local machine to space. HDstrncpy char *HDstrncpy(register char *dest,register char *source,int32 len) dest OUT: pointer to area to copy string to src IN: pointer to area to copy string from len IN: maximum number.of bytes to copy Purpose: Copy a string with some maximum length. Returns: Address of dest. Description: This function creates a string in dest that is at most len' characters long. The `len' characters include the NULL terminator, which must be added for historical reasons. Hence, if you have the string 'Foo\0' you must call this copy function with len = 4 Error Reporting HEprint void HEprint(FILE *stream, int level) stream IN: stream to print error messages on level IN: level of the error stack to print Purpose: Print out information on the error stack. Returns: No return value. Description: This routine will print out information on reported errors. If level is zero all of the errors currently on the error stack are printed. Output of this function is sent to the file point to by stream. Information printed is: an ascii description of the error, the reporting routine, its file name and the line at which the error was reported. In addition, if the programmer has supplied extra information by means of HEreport, this information is printed well. HEclear void HEclear(void) Purpose: Clear all information on reported errors off of the error stack Returns: No return values. Description: Clear all of the information off of the error stack. HERROR void HERROR(int number) number IN: error number Purpose: Report an error. Returns: No return value. Description: HERROR can be used to report an error. Any function which calls HERROR must have a variable FUNC which points to a string containing the name of the function. HERROR is implemented as a macro. HEreport void HEreport(char *format, ... ) format IN: printf style format and arguments Purpose: Provide extra information to the error reporting routines. Returns: No return value. Description: This routine can be used to provide further annotation to an error report. Only one such annotation is remembered for each error report. The arguments to this routine follow the style of printf. An example from hfile.c char *FUNC = "Hclose"; ... if (file_rec->attach > 0) { file rec>refcount++; HERROR(DFE_OPENAID); HEreport("There are still %d active aids attached", file rec->attach) return FAIL; Other Hsync int Hsync(int32 file id) file_id IN: id of the file to sync Purpose: Synchronize on-disk HDF file with image in memory. Returns: Returns SUCCEED. Description: This routine is currently vacuous as the on-disk representation of an HDF file is always the same as its in-me representation. However, future releases of the HDF library n employ buffering schemes, so this might not always be the case. Hsync will be provided to force the two representations to be consistent. Chapter 4 Sets and Groups Chapter Overview Sets Types of Sets Calling Interfaces for Sets Groups Sample Groups General Features of Groups Raster Image Sets Raster Image Groups Tags for Raster Image Sets Compression of Raster Images Scientific Datasets Required Tags Optional Tags Vsets and Vdatas Chapter Appendix: Raster-8 Sets Compatibility between Raster-8 and Raster Image Sets Chapter Overview This chapter describes raster image sets, scientific datasets and Vsets, and explains the role of sets and groups in an HDF file. It also discusses the programming interfaces available for the three types of sets. Sets Sometimes tags are grouped into sets, where each set is designed to serve a particular user requirement. For example, the raster image set that is described in the following sections, contains several tags that are used for storing information about 8-bit raster images. Types of Sets In the current implementation of HDF there are three kinds of sets: * A raster image set contains a raster image, along with descriptive information about the image, such as its dimensions and (optionally) a color lookup table. * A scientific data set contains a multidimensional array, along with descriptive information about the data. * A Vset is a general grouping structure that can contain any kinds of HDF objects that a user wishes. Each HDF set is defined in terms of a minimum collection of data objects that must be present for the set to make sense when it is used. For instance, every raster image set must contain at least the following three data objects: * an image dimension record, which gives the width and height of the corresponding image; * raster image data, which consists of the pixel values that make up the image; * a raster image group, which lists all of the members in the set. In addition to the required objects, there are optional data objects that may be included in a set. A raster image set, for instance, often contains a palette, or color lookup table, which gives the red, green, and blue values to be associated with each pixel in the raster image data. Calling Interfaces for Sets NCSA provides calling interfaces for all the HDF sets that it supports. The primary purpose of these calling interfaces is to provide libraries of routines for reading and writing the data that is associated with each set. The libraries currently supported at NCSA are callable from either C or Fortran programs. In addition to the libraries, a growing number of command-line utility routines are available for working with sets. For example, a utility called r8tohdf is an HDF command that converts one or more raw raster images to HDF 8-bit raster image set format. NCSA supports calling interfaces for the following machines: Cray (UNICOS), Silicon Graphics (UNIX), Sun (UNIX), Macintosh (MacOS), and IBM PC (MS-DOS). The calling interfaces that are currently available are described in the manual NCSA HDF Calling Interfaces and Utilities. Groups An HDF set is a collection of HDF data objects in a file. Unless some mechanism is used to identify explicitly those objects that belong to a set, there is often no way to tie them together. This problem is solved in HDF by means of groups. A group is a data object that explicitly identifies all of the data objects in a set. Since a group is a type of data object, its structure is like that of any other data object. A group data identifier (tag/ref) points to a data element that consists of the collection of data identifiers that make up the corresponding set. A group tag can be defined for any set. For instance, raster image group (RIG) is the group tag used to group members of raster image sets; RIG data consists of a list of all data identifiers that belong to a particular raster image set. Groups provide a convenient mechanism for. application programs to locate all of the information that they need about a set. Application programs that deal with RIGs, for instance, read all of the elements in a RIG group, using only those that they need for their application and ignoring the others. Sample Groups Suppose that the two images shown in Figure 1.5 are organized into two sets with group tags. Since they are images, they may be stored as RIG groups. Figure 4.1 illustrates the type of organization that incorporates RIG groupings of these images. Figure 4.1 Physical Organization of Sample RIG Grouping Offset Contents 0 FH 4 DDH (10 OL) 10 DD (FID 1 130 4) 22 DD (FD 1 134 41) 34 DD (IP8 1 175 768) 46 DD (ID 1 943 4) 58 DD (RI 1 947 240000) 70 DD (ID 2 240947 4) 82 DD (RI 2 240951 240000) 94 DD (RIG 1 480951 12) 106 DD (RIG 2 480963 12) 118 DD (empty) 130 "sw3" 134 "solar wind simulation: third try. 8/8/88" 175 943 : 400, 600 947 240947 : 400, 600 240951 480951 tag/refs for 1st RIG: IP8/1, ID/1, RI/1 480963 tag/refs for 2nd RIG: IP8/1, ID/2, RI/2 The structure depicted in Figure 4.1 reflects the grouping of raster image sets. This file contains the same raster image information as the file in Figure 1.5, but the information is organized into two sets and groups. Note that there is only one palette (IP8/1) and it is included in both groups. General Features of Groups Figure 4.1 also illustrates a number of important general features of groups: * The contents of each set are consistent with one another. Since the palette (IP8) is designed for use with 8-bit images, the image must be an 8-bit image, rather than a 24-bit, 12-bit, or other image. * An application program can easily process all of the images in the file by accessing the groups in the file. The non-RIG information contained in the file can be used or ignored, depending on the needs and capabilities of the application program. * There is usually more than one way to group sets. For example, an extra copy of the image palette (IP8) could have been stored in the file, so that each grouping would have its own image palette. But in this instance that is not necessary because the same palette is to be used with both images. On the other hand, in this example there are two image dimension records (one per group), even though one would suffice. * Group status does not alter the fundamental role of HDF objects. They are still accessible as individual data objects, despite the fact that they also belong to raster image sets. In a very real sense, the individual data elements are in the file, whether or not there are groups that contain them. RIGs provide an index showing what sets exist and what their members are. There is nothing to prevent the imposition of other groupings (indexes) that provide a different view of the same collection of data objects. In fact, HDF is designed to encourage the addition of alternate views, when appropriate. Raster Image Sets The raster image set (RIS) provides a framework for storing images and any number of optional image descriptors. It provides for a description of the image data layout, with the optional presence of color look-up tables, aspect ratio, color correction, associated matte or other overlay information, or any other data related to the display of the image. Raster Image Groups (RIGs) Tying everything together is the raster image group (RIG), examples of which were given earlier (Figure 4.1) A RIG contains a list of data identifiers that point in turn to the data objects that describe and make up the image. The number of entries in a RIG is variable and the presence of most of the description information is optional. Complex applications can store data identifiers of image-modifying data, such as the color table and aspect ratio, in the RIG along with the reference to the image data itself. Simple applications can use simple application level calls and ignore specialized video production or film color correction parameters. NCSA currently supports two calling interfaces, RIS8 and RIS24, defined for the easy storage and retrieval of raster images using RIGS. These interfaces are documented in the manual NCSA HDF Calling interfaces and Utilities Tags for Raster Image Sets The tags presented in Table 4.1 must be fully supported by any raster image set implementation. Table 4.1 Tags for Raster Image Sets Tag Contents of Data Element RIG raster image group ID image dimension record RI raster image data With full support for the above tags, images can be stored and read from HDF files at any bit depth, with several different component ordering schemes. As illustrated in Fig. 4.1, the RIG tag points to a collection of the tag/refs that make up the RIG. The ID data element identifies the dimensions of the image, the number type of the elements that make up its pixels, the number of elements per pixel, the interlace scheme used and the compression scheme used, if any. The RI data element contains the actual raster image data. *** INSERT FIGURE HERE *** In addition to the required tags that define an image dataset, the tags listed in Table 4.2 define color properties and other image features. These tags are described fully in Appendix A. Table 4.2 Additional Tags for Raster Image Sets Tag Contents of Data Element XYP XY position of image LD look-up table dimension record LUT color look-up table for non true-color Images MD matte channel dimension record MA matte channel data CCN color correction factors CFM color format designation AR aspect ratio MTO machine-type override Fig. 4.2 illustrates the storage of a RIS that contains an image palette (IP8), in addition to the required tags. *** INSERT FIGURE HERE *** Compression of Raster Images Tags for two types of compression have been defined for raster images. They are run-length encoding (RLE) and IMCOMP aerial averaging (IMC). Others may be added at any time. Each encoding tag is documented under its specific tag type (see Appendix A). Support for RIG and RI does not require that all of the compression tag types be supported. If you find an unknown compression type, provide a suitable error message to the user. Scientific Datasets The scientific dataset (SDS) provides a framework for storing multidimensional arrays of data, together with descriptive information about the data. Current specifications support the following types of numbers in SDS arrays. * 8-bit, 16-bit and 32-bit signed and unsigned integers * 32-bit and 64-bit floating point numbers SDS numbers can be stored either as IEEE Standard integers or floats or in the format used by the machine from which they were written ("native mode"). Rank and dimension sizes may vary. A user interface exists for storing and retrieving SDS. See the NCSA HDF manual for details. Internal structures For reasons having to do with backward compatibility, the group structure that HDF uses for SDS is complicated. HDF 3.1 and previous versions only supported 32-bit IEEE floating-point numbers and Cray floating point numbers in' scientific data sets. HDF 3.2 and later releases support 8-bit, 16-bit, and 32-bit signed and unsigned integers, and 32-bit and 64-bit floating-point numbers. It also allows data sets to be written to HDF files in the local machine format ("native mode"). Furthermore, it is anticipated that later versions of HDF will support new number types and other variations in the physical storage of scientific data, such as compressed data. The internal structure used to store SDS in HDF 3.1 and earlier versions was not adequate to support the anticipated future changes to SDS. A new structure had to be developed. At the same time, it was important to try to retain compatibility with earlier versions of the HDF library. Earlier versions of the library should be able to read SDS written by HDF 3.2, if the SDS is "understandable" by that earlier software, i.e. if the number type of the data is 32- bit IEEE floating point or Cray floating point. Likewise, new libraries (HDF 3.2 and beyond) should be able to recognize SDS written by earlier versions of the library. This compatibility is achieved by examining every SDS that is written to an HDF file. If the SDS is compatible with older libraries, it is written to the file using the old structure used to represent SDS, as well as the new structure. If it is not compatible with older libraries, only the newer structure is used. The old structure for storing SDS is called SDG ("scientific data group"). The newer structure is called NDG ("numeric data group"). Hence, SDS user interfaces in HDF3.2 and beyond handle three types of numerical data groups: 1. SDG-created by old libraries and containing floating-point data. 2. NDG-created by the new library and containing non-floating-point data. This data group should not be recognized by old libraries. 3. SDG-like NDG-created by the new library and containing IEEE 32-bit floating-point data only. The old libraries should be able to recognize and interpret this kind of numerical data groups correctly. In the following sections, we described the SDG and NDG grouping structures. SDG structure Scientific datasets represented internally by the SDG tag must always contain at least the data objects listed in Table 4.3. Table 4.3 Required Tags for SDG Tag Contents of Data Element SDG scientific data group SDD scientific data dimension record for array- stored data. It includes the rank (number of dimensions) the size of each dimension, the tag/ref's representing the number types of the array-stored data and of each dimension. In the case of SDG, the number types are all 32-bit IEEE floating-point values. SD scientific data The data objects presented in Table 4.4 are optional. NCSA's SDS user interface supports these objects Table 4.4 Optional Tags for SDG Tag Contents of Data Element SDS scales along the different dimensions to be used when interpreting or displaying the data (must be of type float32). SDL labels for all dimensions and for the data. Each of the dimension labels can be interpreted as an independent variable, and the data label as the dependent variable. SDU units for all dimensions and for the data. SDF format specifications to be used when displaying values of the data. SDM maximum and minimum values of the data (must be of type float32). SDC coordinate system to be used when interpreting or displaying the data. As illustrated in Fig. 4.3, the SDG tag points to a collection of the tag/refs that make up the SDG. *** INSERT FIGURE HERE *** NDG structure SDS represented internally by the NDG tag must always contain at least the data objects listed in Table 4.5 Table 4. 5 Required Tags for NDG Tag Contents of Data Element NDG Numerical data group SDD Scientific data dimension record for array- stored data. It includes the rank (number of dimensions), the size of each dimension, the tag/ref's representing the number types of the array-stored data and of each dimension. In HDF 3.2 , the number types of dimension scales are forced to be the same as the array- stored data, but in later implementations each dimension scale will be allowed its own type. SD Scientific data. NT Number type of the data set. Default of NT is the value most recently set by DFSDsetNT(). If no DFSDsetNT() was called previously, the default will be set as floating-point. The data objects presented in Table 4.6 are optional. NCSA's SDS user interface in HDF 3.2 and later versions supports these objects. Other optional objects can be added at any time. Table 4.6 Optional Tags for NDG, HDF 3.2. Tag Contents of Data Element SDS scales along the different dimensions to be used when interpreting or displaying the data.. SDL labels for all dimensions and for the data. Each of the dimension labels can be interpreted as an independent variable, and the data label as the dependent variable. SDU units for all dimensions and for the data. SDF format specifications to be used when displaying values of the data. SDM maximum and minimum values of the data. SDC coordinate system to be used when interpreting or displaying the data. As illustrated in Fig. 4.4, the NDG is identical to the SDG, except that the NDG tag is different. This insures that older (pre-HDF 3.2) software cannot recognize this form of SDS. *** INSERT FIGURE HERE *** SDG-like NDG structure An SDS written by HDF 3.2 or later that is compatible with earlier SDS is represented internally by both an SDG and an NDG. Table 4.7 lists the objects that this group must always contain. Table 4.7 Required Tags for NDG structure that is compatible with SDG structure Tag Contents of Data Element NDG Numerical data group SDG Scientific data group SDLNK The NDG and SDG linked to the scientific data set in this group. SDD Scientific data dimension record for array- stored data. It includes the rank (number of dimensions), the size of each dimension, the tag/ref's representing the number types of the array-stored data and of each dimension. In an SDG-like NDG the number types are all 32-bit IEEE floating-point values. SD Scientific data *** INSERT FIGURE HERE *** Compatibility with future NDG structures It is likely that future versions of SDS will support optional features that are not supported by the current version. These features fall into two general categories: * optional-compatible features: optional features that are compatible with older versions of HDF even though they may not be supported by older versions of HDF. * For example, suppose a new attribute such as a time stamp, is added to SDS. Such an attribute would not be "understood" by older libraries, but it would not render the SDS data unreadable by the older libraries. * Optional-incompatible features: optional new features that might not be compatible with older versions of HDF in the sense that they could render the data unreadable by older HDF libraries. For example, suppose compression is added to SDS. Since some older HDF libraries contain no compression routines, they would not be able to read the compressed data correctly. The scheme that has been developed to address this problem involves numbering conventions for tags. The following conventions are used: * Required tags. These tags are described in Tables 4.4 and 4.5. All SDS must contain all of the tags in at least one of these sets. * Optional-compatible tags. These tags can have any valid tag number except those in the other two categories. * Optional-incompatible tags. A range of tags is defined for SDS features that might render the dataset unreadable by older versions of the library. This range has been specified as tag numbers 780-799. Vsets and vdatas An HDF Vset is a logical grouping of HDF data objects within an HDF file. Data organization within the file resembles the UNIX file system in that it is basically hierarchical in structure and also allows cross-linking of data objects. Unlike Scientific Data Sets and Raster Image Sets, Vsets have no prespecified content or structure. Users can use them to create structural relationships among HDF objects according to their needs. Figure 4.6 illustrates a Vset. *** INSERT FIGURE HERE *** A Vset is represented by a vgroup, an HDF object that contains information about the members of the Vset. The vgroup tag is VGDESCTAG. The VGDESCTAG record contains a list of the data identifiers of its members, an optional user-specified name, an optional user-specified class, and some fields that enable it to be extended to contain more information. The VGDESCTAG is described fully in Appendix A. A full treatment of Vsets can be found in the manual "NCSA HDF Vset, Version 2.0". An HDF object that is often used in connection with Vsets is the vdata. A vdata is a table. The data in a vdata is organized into fields. Each field is identified by a unique fieldname. The type of each field may be any of the data types supported by the SDS interface: 8-, 16-, and 32-bit integers (signed or unsigned), and 32- and 64-bit floats. Several fields of different types may exist within a vdata. appendix A contains full descriptions of the vdata tags (VSDESCTAG and VSDATATAG). A full treatment of vdatas can be found in the manual "NCSA HDF vset, Version 2.0". Chapter Appendix: The Raster-8 Set The raster image set (RIS), as described above, is the set currently supported by HDF for managing raster images. Before the RIS was added to HDF, a simpler, less flexible set called the raster-8 set was used for storing 8-bit raster images. This set is no longer supported in the HDF software, although it may turn up in some older HDF files. In fact, during the first three years that RIS was used, the HDF software stored raster images in both RIS and raster-8 sets. Raster-8 Sets The raster-8 set is a set of tags that provide the basic information necessary to store 8-bit raster images in a data file and display them accurately without prompting the user to supply dimensions or color information. The raster-8 set consists of the tags presented in Table 4.8. Table 4.8 Tags for Raster-8 Sets Tag Contents of Data Element RI8 eight-bit raster image data CI8 eight-bit raster image data compressed with run-length encoding II8 IMCOMP compressed image data ID8 Image dimension record IP8 Image palette data If you develop software for processing raster-8 sets, it must support RI8, ID8, and IP8. If you do not implement CI8 or II8, then be sure to provide appropriate error indicators to higher layers that might expect to find these tags. Compatibility between Raster-8 and Raster Image Sets In order to maintain backward compatibility with raster-8 sets, raster image set interface has stored tag/refs for both types of sets in HDF raster image files. For example, if an image is stored as part of a raster image set, there was one copy each of the image dimension data, image data, and palette data stored, but there were two sets of tag/refs pointing to each data element, one from each set. The image data, for instance, was associated with tag RI8 and RI. NOTE: Although this policy is continued in the current release (HDF 3.2), future plans call for phasing out the use of the raster-8 structure. Therefore, future software should not expect to find both raster-8 and RIS structures supporting 8-bit raster images. Only RIS structures will eventually be used exclusively. Chapter 5 Annotations Chapter Overview Types of Annotations File Annotations Object Annotations Getting Reference Numbers for Object Annotations Chapter Overview This chapter introduces and describes HDF objects that can be used to annotate HDF files and HDF objects.. Types of Annotations It is often useful to associate in text form information about an HDF file and its data contents, and to keep that information in the same file that contains the data. HDF provides this capability in the form of annotations. An HDF annotation is a sequence of ASCII characters that is associated its one of three types of objects: (1) the file itself, (2) the individual HDF data objects in the file, or (3) the tags that identify the data elements. The current annotation interface supports only the first two types of annotation. This interface is described in detail in the manual NCSA RDF Calling Interfaces and Utilities.. Annotations are optionally supplied by a creator or user of an HDF file or data object. Annotations come in two forms: labels, which normally consist of short strings of characters, and descriptions, which can be long and complex bodies of text. Table 5.1 shows the types of annotations currently defined for HDF files and their tag names. Table 5.1 HDF Annotation tags "Label" "Description" File Annotations FID FD Object Annotations DIL DIA Tag Annotations TID TD File Annotations Any HDF file can have labels (FID) and descriptions (FD)stored in them.. There are routines in the annotations interface specifically designed for reading and writing file IDs and file descriptions. Specifications for the tags FID and FD are given in Appendix A. Object Annotations The annotation of HDF data objects is complicated by the fact that you have to uniquely identify the objects being annotated. Since a data identifier (tag/ref) for a data object uniquely identifies that object, the data object that a particular annotation refers to can be identified by storing the object's tag and reference number together with the annotation. Note that an RDF annotation is itself a data object, so it has its own DD. This DD has a tag and a ref. number, and it points to the "data" that constitutes the annotation. The "data" that goes with an annotation consists of three things: (1) the tag of the object that it is an annotation for, (2) the ref of the object that it is an annotation for, and (3) the annotation itself. For example, suppose you have an HDF file that contains three scientific datasets (SDS). Each SDS has its own DD consisting of the SDS tag DFTAG-STG, and a unique reference number as illustrated in Figure 5.1. *** INSERT FIGURE HERE *** Suppose you wish to annotate the second SDS by storing the following annotation with it in the file: "Data from black hole experiment 8/18/87." This text would be stored in an HDF file as an annotation, and it would have stored with it the tag DFTAG-SDG and reference number 4. Figure 5.2 illustrates how the annotation would look in the file. *** INSERT FIGURE HERE *** Getting Reference Numbers for Object Annotations Note that in order to use annotation routines, you need to know the tags and reference numbers of the objects you wish to annotate. Special routines are available for obtaining the reference numbers of certain tags, including tags for SDSs, Raster Image Sets, palettes, and annotations. These are: DFSD1astref, DFR81astref, DFP1astref, and DFAN1astref. They return the most recent reference number used in either reading or writing the corresponding data object. Reference numbers for objects other than these can be obtained with the routine Hfindnextref, a general purpose HDF routine that searched through an HDF file for reference numbers that go with a given tag. These routines are described and illustrated in the manual "NCSA HDF Calling Interfaces and Utilities." Chapter 6 NCSA HDF Tags Chapter Overview The HDF Tag Space Physical Storage Methods Specifications of Supported Tags Chapter Overview This chapter addresses issues related to HDF tags and the data they represent. The first section discusses some general information about tags and their interpretation. The remainder of the chapter contains a complete list of HDF tags that have been assigned by NCSA as of version 3.2 of the library and a detailed discussion of their specifications. The HDF Tag Space As discussed in the chapter entitled "The Basic Structure of HDF Files," there are 16 bits allotted to an HDF tag number, providing for 65535 possible tags ranging from 1 to 65535, with zero (0) unused. This tag space is broken down into three ranges as shown below. 1--32767 reserved for NCSA-supported tags 32768--64999 user-definable 65000--65535 reserved for expansion of the format No restrictions are placed on the user-definable tags, but it should be noted that tags from this range cannot be guaranteed to be unique across all user-developed HDF applications. The rest of this chapter will be devoted to the NCSA-supported tags in the range 1 to 32767. Physical Storage Methods In previous versions of HDF, each data element was required to occupy one contiguous block of space in a single file. But, beginning with HDF Version 3.2, a mechanism was added to support different methods of physical storage of data elements. The new mechanism is called the "extended tag." Any of the NCSA standard tags can take advantage of the new features of the extended tags. Extended tags are automatically recognized by the library and interpreted according to a description record. The description record is a complete data element unto itself which identifies the type of extended element and provides the relevant parameters for retrieval of that element. Currently, there are two types of extended tags, both of which offer alternate methods of physical storage: linked block elements and external elements. Linked Block Elements Linked block elements provide a convenient way of adding data to a pre-existing element. They consist of a series of blocks of data chained together in a linked list (similar to the DD list). In general, the data blocks are of a uniform size. However, the first block is considered a special case and is allowed to have a different size from the rest of the blocks. The description record for a linked block element begins with the constant EXT_LINKED, which identifies the linked block storage method. It also contains information about the organization of the linked block element as a whole. Figure 6.1 shows a diagram of a description record for a linked block element. *** INSERT FIGURE HERE *** any NCSA standard tag converted to an extended tag (16-bit integer) reference number (16-bit integer) EXT_LINKED constant identifying this as a linked block description record (32-bit integer) length of entire element (32-bit integer) length of the first data block (32-bit integer) length of successive data blocks (32-bit integer) number of blocks per block table (32-bit integer) reference number of first block table (16-bit integer) The field of-the description record gives the reference number of the first linked block table for the element. This table is identified by the tag DFTAG_LINKED and contains entries. There may be any number of linked block tables chained together to describe a linked block element. Figure 6.2 shows a diagram of a linked block table. *** INSERT FIGURE HERE *** reference number for this block table (16-bit integer) reference number for next block table (16-bit integer) reference number for data block (16-bit integer) The field contains the reference number of the next linked block table. A value of zero (0) in this field indicates that there are no additional linked block tables associated with this linked block element. The fields of each linked block table contain reference numbers for the individual data blocks that make up the data portion of the linked block element. These data blocks are also identified by the tag DFTAG_LINKED as shown in Figure 6.3. Although it may seem ambiguous to use the same tag to refer to two different objects, this ambiguity is alleviated by the context in which the tags appear. *** INSERT FIGURE HERE *** reference number for this data block (16-bit integer) block of actual data (size given by or from the description record) Linked block elements can be created using the function HLcreate(), which is discussed in detail in the chapter "The NCSA HDF General Purpose Interface." External Elements External elements allow the data portion of an HDF element to reside in a separate file. The potential of external data elements is largely unexplored in the HDF context, although other file formats (most notably CDF) have used external data elements apparently to great advantage. Because there has been little discussion of external elements within the HDF user community, the structure of these elements is still not completely defined. Figure 6.4 shows a diagram of the proposed structure for an external element. *** INSERT FIGURE HERE *** any NCSA standard tag converted to an extended tag (16-bit integer) reference number (16-bit integer) EXT_EXTERN constant identifying this as an external element description record (16-bit integer) location of the data within the external file (32-bit integer) length in bytes of the data in the external file (32-bit integer) non-null terminated ASCII string containing the name of the external file in which the data resides (any length) The description record for an external element begins with the constant EXT_EXTERN, which identifies the external storage method. It also contains information about how to find the element. External elements can be created using the function HXcreate() , which is discussed in detail in the chapter "The NCSA HDF General Purpose Interface." Specifications of Supported Tags The following pages contain the specifications of all the tags that are officially supported as of HDF version 3.2. Each entry is to be interpreted as follows: * The word id capital letters on the left is the tag name. * The three short lines at the beginning of each description uniquely identify the tag: The first line is the full name of the tag. The second line describes the type and (where possible) the amount of data in the corresponding data element. When the data element is a variable-sized data structure-such as text, a string, or a variable-sized array-the amount of data cannot be specified exactly. Where possible, a formula is given for estimating the amount of data. If the second line is "? bytes, it means that neither the size nor the structure of the data element can be specified. The third line gives the tag number in decimal and (hexadecimal). * Next is a diagram showing, as nearly as possible, the structure of the tag and its associated data. * Finally, a full specification of the tag is presented, including a description of the data element and a discussion of its intended use. These listings are grouped approximately according to the roles that the tags play under the headings Utility Tags, Annotation Tags, Raster Image Tags, and so forth. These groupings imply a general context for the use of each tag, but are not meant to restrict the use of the tags to any particular context. Please note that the subsection under the heading Obsolete Tags contains the specifications for tags that have fallen out of use with the continuing development of HDF. These tags are still recognized by the HDF library, but it is not recommended that users write out new objects using these tags, since some of them may eventually be dropped from the HDF specification. Utility Tags DFTAG_NULL No data 0 bytes 1 (0X0001) *** INSERT FIGURE HERE *** reference number (16-bit integer; always 0) This tag is used for place holding and to fill empty portions of the data description block. The length and offset fields (not shown) of a NULL DD must be equal to zero. DFTAG_VERSION Library version number 12 bytes plus the length of a string 30 (0x001E) *** INSERT FIGURE HERE *** reference number (16-bit integer) Major version number (32-bit integer) minor version number (32-bit integer) release number (32-bit integer) non-null terminated ASCII string (any length) The data portion of this tag gives the complete version number and a descriptive string for the latest version of the HDF library to write to the file. DFTAG_NT Number type 4 bytes 106 (0x006A) *** INSERT FIGURE HERE *** reference number (16-bit integer) version number of NT information (8-bit integer) unsigned int, signed int, unsigned char, char, float, double (8-bit code) number of bits (assumed all significant) (8-bit code) a generic value, with different interpretations depending on type: floating-point, integer, or character (8-bit code) Some possible :values that may be included for each of the three types in the field CLASS are listed in Table 6.1. Table 6.1 Number Type Values Type Possible Values floats DFNTF_NONE DFNTF_IEEE DFNTF_VAX DFNTF_CRAY DFNTF_PC DFNTF_CONVEX ints DFNTI_MBO DFNTI_IBO DFNTI-VBO chars ASCII EBCDIC, BYTE The number type flag is used by any other element in the file to indicate specifically what a numeric value looks like other tag types should contain a reference number pointer to an DFTAG_NT instead of containing their own number type definitions. The version field allows expansion of the number type information, in case some future number types cannot be described using the fields currently defined. Successive versions of the DFTAG_NT may be substantially different from the current definition, however, backward compatibility will be maintained. The current DFTAG_NT. version number is 1. DFTAG_MT Machine type 0 bytes 107 (0x006B) *** INSERT FIGURE HERE *** specifies method of encoding double precision floating point (4-bit code) specifies method of encoding single precision floating point (4-bit code) specifies method of encoding integers (4-bit code) specifies method of encoding characters (4-bit code) The DFTAG_MT specifies that all unconstrained or partially constrained values in this HDF file are of the default type for that hardware. When the DFTAG_MT is set to VAX, for example, all integers will be assumed to be in VAX byte order unless specifically defined otherwise with a DFTAG NT. Note that all of the headers and many tags, the whole raster image set for example, are defined with bit-wise precision and will not be overridden by the DFTAG_MT setting. For DRTAG_MT, the reference field itself is the encoding of the DFTAG_MT information. The reference field is 16 bits, taken as four groups of four bits, specifying the types for double, float, int and char respectively. This allows 16 generic specifications for each type. To the user, these will be defined constants in the header file hdf.h, specifying the proper descriptive numbers for Sun, VAX, Cray, Convex, and other computer systems. If there is no DFTAG_MT in a file, the application may assume that the data in the file has been written on the local machine--assuming any portability problems are taken care of by the user. For this reason, we recommend that all HDF files contain a DFTAG_MT for maximum portability. Possible data encodings are shown in Table 6.2. Table 6.2 Possible Machine Types Type Possible Encodings double IEEE64, VAX64, CRAY128 floats IEEE32, VAX32, CRAY64 ints VAX32, Intell6, Intel32, Motorola32, CRAY64 chars ASCII, EBCDIC New encodings can be added for each data type, as the need arises. DFTAG_FID File identifier string 100 (0x0064) *** INSERT FIGURE HERE *** reference number (16-bit integer) non-null terminated ASCII text (any length) This tag points to a string which the user wants to associate with this file. The string is not null terminated. The string is intended to be a user-supplied title for the file. DFTAG_FD File description text 101 (0x0065) *** INSERT FIGURE HERE *** reference number (16-bit integer) non-null terminated ASCII text (any length) This tag points to a block of text describing the overall file contents. The text can be any length. The block is not null terminated. The text is intended to be user-supplied comments about the file. DFTAG_TID Tag identifier string 102 (0x0066) *** INSERT FIGURE HERE *** tag number to which this tag refers (16-bit integer) non-null terminated ASCII text (any length) The data for this tag is a string that identifies the functionality of the tag indicated in the space normally used for the reference number. For example, the tag identifier for DFTAG_TID might point to data that reads "tag identifier." Many tags are identified in the HDF specification, so it is usually unnecessary to include their identifiers in the HDF file. But with user-defined tags or special-purpose tags, the only way for a human reader to diagnose what kind of data is stored in a file is to read tag identifiers. Use tag descriptions to define even more detail about your user-defined tags. Note that with this tag you may make use of the user-defined tags to check for consistency. Although two persons may use the same user-defined tag, they probably will not use the same tag identifier. DFTAG_TD Tag description text 103 (0x0067) *** INSERT FIGURE HERE *** tag number to which this tag refers (16- bit integer) non-null terminated ASCII text (any length) The data for this tag is a text block which describes in relative detail the functionality and format of the tag which is indicated in the space normally occupied by the reference number. This tag is mainly intended to be used with user-defined tags and provides a medium for users to exchange files that include human-readable descriptions of the data. It is important to provide everything that a programmer might need to know to read the data from your user-defined tag. At the minimum, you should specify everything you would need to know in order to retrieve your data at a later date if the original program were lost. DFTAG_DIL Data identifier label string 104 (0x0068) *** INSERT FIGURE HERE *** reference number (16-bit integer) tag number of the data to which this label applies (16-bit integer) reference of the data to which this label applies (16-bit integer) non-null terminated ASCII text (any length) The data for this tag is a data identifier, made up of a tag and reference number, followed by a string that the user wants to place in the file. The purpose of this tag is to associate the string with the data identifier as a label for whatever that data identifier refers to in turn. By including DFTAG_DILs, you can give a data object a label for future reference. For example, DFTAG_DIL is often used to give titles to images. DFTAG_DIA Data identifier annotation text 105 (0x0069) *** INSERT FIGURE HERE *** reference number (16-bit integer) tag number of the data to which this annotation applies (16-bit integer) reference of the data to which this annotation applies (16-bit integer) non-null terminated ASCII text (any length) The data for this tag is a data identifier, which is made up of a tag and a reference number, followed by a text block that the user wants to place in the file. Its purpose is to associate the text block with the data identifier as an annotation for whatever that data identifier points to in turn. With DFTAG_DIA, any data object can have a lengthy, user-written description of why that data is in the file. This can be used to include user comments about images, datasets, source code, and so forth. Compression Tags DFTAG_RLE Run length encoded data 0 bytes 11 (0X000B) *** INSERT FIGURE HERE *** reference number (16-bit integer) This tag is used in the compression field of a DFTAG_ID and other places to indicate that an image or section of data is encoded with a run-length encoding scheme. The RLE method used is byte-wise. Each run is preceded by a count byte. The low seven bits of the count byte indicate the number of bytes (n). The high bit of the count byte indicates whether the next byte should be replicated n times (high bit=1), or whether the next n bytes should be included as is (high bit=0). See also: DFTAG_ID (General Raster Image Tags) DFTAG_NDG (Scientific Dataset Tags) DFTAG_IMC IMCOMP compressed data 0 bytes 12 (0X000C) *** INSERT FIGURE HERE *** reference number (16-bit integer) This tag is used in the ID compression field and other places to indicate that an image or section of data is encoded with an IMCOMP encoding scheme. This scheme is a 4:1 aerial averaging method which is easy to decompress. It counts color frequencies in 4x4 squares to optimize color sampling. See also: DFTAG_ID (General Raster Image Tags) DFTAG_NDC (Scientific Dataset Tags) DFTAG_JPEG 24-bit JPEG compression information ? bytes 13 (0X000D) *** INSERT FIGURE HERE *** reference number (16-bit integer) This tag points to header information for 24-bit JPEG compressed images. The data in this tag is identical to the data stored in a JFIF (JPEG File Interchange Format) file up to the Start-of-Frame parameter (see the JFIF format document for further details). The Start-of-Frame parameter and all further data for the JPEG image is stored the in associated DFTAG_CI data element which is the companion to the DFTAG_JPEG element. DFTAG_GREYJPEG 8-bit JPEG compression information ? bytes 14 (0X000E) *** INSERT FIGURE HERE *** reference number (16-bit integer) This tag points to header information for 8-bit JPEG compressed images. The data in this tag is identical to the data stored in a JFIF (JPEG File Interchange Format) file up to the Start-of-Frame parameter (see the JFIF format document for further details). The Start-of-Frame parameter and all further data for the JPEG image is stored the in associated DFTAG-CI data element which is the companion to the DFTAG-JPEG element. General Raster Image Tags DFTAG_RIG Raster image group n*4 bytes (where n is the number of data objects in the group.) 306 (0x0132) *** INSERT FIGURE HERE *** reference number (16-bit integer) tag number for nth member of the group (16-bit integer) reference number for nth member of the group (16-bit integer) The raster image group (RIG) data is a list of data identifiers (tag/ref) that describe a raster image. All of the members of the group are required in order to display the image correctly. Application programs that deal with RIGs should read all the elements of a RIG and process those identifiers which it can display correctly. Even if the application cannot process all of the tags, the tags that it can process will be usable. Tag types that may appear in a RIG are listed in Table 6.3. Table 6.3 Possible Tag Types in an RIG Tag Description DFTAG_ID Image dimension DFTAG_RI raster image DFTAG_XYP X-Y position DFTAG_LD LUT dimension DFTAG_LUT color lookup table DFTAG_MD matte channel dimension DFTAG_MA matte channel DFTAG_CCN color correction DFTAG_CFM color format DFTAG_AR aspect ratio Example ID, RI, LD, LUT An image dimension record, the raster image, an LUT dimension and the LUT go together. The application reads the image dimensions, then reads the image with those dimensions. It also reads the lookup table according to its dimensions and displays the corresponding image. DFTAG_ID, DFTAG_LD, DFTAG_MD Image dimension LUT dimension Matte dimension 20 bytes 20 bytes 20 bytes 300 (0x012C) 307 (0x0133) 308 (0x0134) *** INSERT FIGURE HERE *** reference number (16-bit integer) length of x (horizontal) dimension (32-bit integer) length of y (vertical) dimension (32-bit integer) reference number of number type information for associated object number of elements that comprise one entry (16-bit integer) defines type of interlacing used (16-bit integer) tag which tells the type of compression used and any associated parameters (16-bit integer) reference number of compression tag (16-bit integer) The three dimension records have exactly the same format. They define the dimensions of the 2D array to which they refer. The diagram above pictures a DFTAG_ID for illustration. A DFTAG_ID specifies the dimensions of a DFTAG_RI, DFTAG_LD specifies the dimensions of a DFTAG_LUT, and DFTAG_HD specifies the dimensions of a DRTAG_MA. For example, a 512x256 row-wise 24-bit raster image with each pixel stored as RGB bytes would have the following values: : 512 : 256 UINT8 3 (3 elements per pixel: e.g., R,G and B) 0 (RGB values not separated) 0 (no compression is used) DFTAG_RI Raster image xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize are given by the corresponding DFTAG_ID) 302 (0x012E) *** INSERT FIGURE HERE *** reference number (16-bit integer) This tag points to raster image data. It is stored in row-major order and must be interpreted as specified in a DFTAG_ID: =0 means the components of each pixel are together. =1 means color elements are grouped by scan lines. =2 means color elements are grouped by planes. DFTAG_LUT Lookup table xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize are given by the corresponding DFTAG_ID) 301 (0x012D) *** INSERT FIGURE HERE *** reference number (16-bit integer) Mth value of parameter n (size is given by the DFTAG_NT in the corresponding DFTAG_LD) The DFTAG-LUT, sometimes called a palette, is used by many kinds of hardware to assign colors to data values. When a raster image consists of data values which are going to be interpreted through hardware with a LUT capability, the DFTAG_LUT should be loaded along with the image. The most common lookup table is the RGB lookup table which will have X dimension-256 and Y dimension-1 with three elements per entry, one each for red, green, and blue. The interlace will be either 0, where the LUT values are given RGB, RGB, RGB ..., or 1, where the LUT values are given as 256 reds, 256 greens, 256 blues. DFTAG_MA matte channel xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize are given by the corresponding DFTAG_ID) 309 (0x0135) *** INSERT FIGURE HERE *** reference number (16-bit integer) The DFTAG_MA contains transparency data which can be used to facilitate the overlaying of images. The data consist of a two-dimensional array of unsigned 8-bit integers ranging from 0 to 255. Each point in a DFTAG-MA indicates the transparency of the corresponding point in a raster image of the same dimensions. A value of 0 indicates that the data at that point is to be considered totally transparent, while a value of 255 indicates that the data at that point is totally opaque. It is assumed that a linear scale applies to the transparency values, but users may opt to interpret the data in any way they wish. DFTAG_CCN Color correction 52 bytes (usually) 310 (0x0136) *** INSERT FIGURE HERE *** reference number (16-bit integer) gamma parameter (32-bit IEEE float) red x/y/z correction factors (32-bit IEEE floats) green x/y/z correction factors (32-bit IEEE floats) blue x/y/z correction factors (32-bit IEEE floats) white x/y/z correction factors (32-bit IEEE floats) Color correction specifies the Gamma correction for the image and color primaries for the generation of the image. DFTAG_CFM Color format string 311 (0x0137) *** INSERT FIGURE HERE *** reference number (16-bit integer) non-null terminated ASCII string (any length) The color format is a clue to how each element of each pixel in a raster image can be interpreted. It is defined to be a string which is in all caps, and is one of the values shown in Table 6.4. Table 6.4 Color Format String Values String Description VALUE pseudo-color, or just a value associated with the pixel RGB red, green, blue model XYZ color-space model HSV hue, saturation, value model HSI hue, saturation, intensity SPECTRAL spectral sampling method DFTAG_AR Aspect ratio 4 bytes 312 (0x0138) *** INSERT FIGURE HERE *** reference number (16-bit integer) ratio of width to height (32-bit IEEE float) The data for this tag is the visual aspect ratio for this image. The image should be visually correct if displayed on a screen with this aspect ratio. The data consists of one floating-point number which represents width divided by height. An aspect ratio of 1.0 indicates a display with perfectly square pixels; 1.33 is a standard aspect ratio used by many monitors. Composite Image Tags DFTAG_DRAW Draw n*4 bytes (where n is the number of data objects that comprise the composite image.) 400 (0x0190) *** INSERT FIGURE HERE *** reference number (16-bit integer) tag number of the nth member of the draw list (16-bit integer) reference number of the nth member of the draw list (16-bit integer) The data for this tag is a list of data identifiers (tag/ref pairs) which define a composite image. Each member of the DRTAG_DRAW data should be displayed, in order, on the screen. This can be used to indicate several RIGs which should be displayed simultaneously, or even include vector overlays, like DRTAG_T14, which should be placed on top of a RIG. Some of the elements in a DRAW list may be instructions about how images are to be composited (XOR, source put, anti-aliasing, etc.). These are defined as individual tags. DFTAG_XYP XY position 8 bytes 500 (0x01F4) *** INSERT FIGURE HERE *** reference number (16-bit integer) x-coordinate (32-bit integer) y-coordinate (32-bit integer) A DFTAG_XYP is used in composites-and other groups to indicate an XY position on the screen. For this, (0,0) is the lower left, X is the number of pixels to the right along the horizontal axis and Y is the number of pixels on the vertical axis. The X and Y pixel dimensions are given as two 32-bit integers. For example, if DFTAG_XYP is present inside a DFTAG_RIG, the DFTAG_XYP refers to the position of the lower left corner of the raster image on the screen. See also: DFTAG_DRAW (this section) Vector Image Tags DFTAG_T14 Tektronix 4014 ? bytes 602 (0x25A) *** INSERT FIGURE HERE *** reference number (16-bit integer) This tag points to a Tektronix 4014 data stream. The bytes in the data field, when read and sent to a Tektronix 4014 terminal, will display a vector image. Only the lower seven bits of each byte are significant. There are no record markings or non-Tektronix codes in the data. DFTAG_T105 Tektronix 4105 ? bytes 603 (0x25B) *** INSERT FIGURE HERE *** reference number (16-bit integer) This tag points to a Tektronix 4105 data stream. The bytes in the data field, when read and sent to a Tektronix 4105 terminal, will be displayed as a vector image. Only the lower seven bits of each byte are significant. Some terminal emulators will not correctly interpret every feature of the Tektronix 4105 terminal, so you may wish to use only a subset of the possible Tektronix 4105 vector commands. Scientific Dataset Tags DFTAG_NDG Numeric data group n*4 bytes (where n is the number of data objects in the group.) 720 (0x02D0) *** INSERT FIGURE HERE *** reference number (16-bit integer) tag number of nth member of the group (16-bit integer) reference number of nth member of the group (16-bit integer) The numeric data group (NDG) data is a list of data identifiers (tag/ref pairs) that describe a scientific dataset. It supercedes the old DFTAG_SDG, which has been obsoleted as of version 3.2 of the HDF library. A more complete explanation of the relationship between DFTAG_NDG and DFTAG_SDG can be found in the chapter entitled "Sets and Groups." All of the members of the group provide information for correctly interpreting and displaying the data. Application programs that deal with NDGs should read all of the elements of a NDG and process those identifiers which it can use. Even if an application cannot process all of the tags, the tags that it can understand will be usable. Tag types that may appear in a DFTAG_NDG are listed in Table 6.5. Table 6.5 Possible Tag Types in an NDG Tag Description DFTAG_SDD scientific data dimension record (rank and dimensions) DFTAG_SD scientific data DFTAG_SDS scales DFTAG_SDL labels DFTAG_SDU units DFTAG_SDF formats DFTAG_SDM maximum and minimum values DFTAG_SDC coordinate system DFTAC_CAL calibration information DFTAG_FV fill value DFTAG_LUT color lookup table DFTAG_LD lookup table dimension record DFTAG_SDLNK link to old-style DFTAG_SDG (See Sets and Groups) Example DFTAG_SDD, DRTAG_SD, DRTAG_SDM A dimension record, the scientific data, and the maximum and minimum values of the data go together. The application reads the rank and dimensions from the dimension record, then reads the data array with those dimensions. If it needs maximum and minimum, it also reads them. See also: Sets and Groups DFTAG_SDD Scientific data dimension record 6 + 8*rank bytes 701 (0x02BD) *** INSERT FIGURE HERE *** reference number (16-bit integer) number of dimensions (16-bit integer) number of values along the nth dimension (32-bit integer) reference number of DFTAG_NT for data (16-bit integer) reference number for DFTAG-NT for the scale for the nth dimension (16-bit integer) This record defines the rank and dimensions of the array in the scientific dataset. For example, a DFTAG_SDD for a 500X600X3 array of floating-point numbers would have the following values and components. Rank: 3 Dimensions: 500, 600, and 3. One data NT Three scale NTs DFTAG_SD Scientific data NTsize*x*y*z* ... bytes (where NTsize is the size of the data NT given by the corresponding DFTAG_SDD and x, y, z, etc. are the dimension sizes) 702 (0x02BE) *** INSERT FIGURE HERE *** reference number (16-bit integer) This tag points to an array of scientific data. The type of the data may be specified by an DFTAG_NT included with the SDG. If there is no DFTAG_NT, the type of the data is floating-point in standard IEEE 32-bit format. The rank and dimensions must be stored as specified in the corresponding DFTAG_SDD. The diagram above shows a three-dimensional data array. DFTAG_SDS Scientific data scales rank + NTsizeO*x + NTsize1*y +NTsize2*z +... bytes (where rank is the number of dimensions, x, y, z, etc. are the dimension sizes, and NTsize# are the sizes of each scale NT from the corresponding DFTAG_SDD.) 703 (0x02BF) *** INSERT FIGURE HERE *** reference number (16-bit integer) tells whether a scale exists for the nth dimension (8-bit integer; 0 or 1) list of scale values for the nth dimension (type is given by corresponding DFTAG_SDD) This tag points to the scales for the dataset. The first n bytes indicate whether there is a scale for the corresponding dimension (1=yes, 0=no). This is followed by the scale values for each dimension. The scale consists of a simple series of values, where the number of values and their types are given by the corresponding DFTAG_SDD. DFTAG_SDL Scientific data labels ? bytes 704 (0x02C0) *** INSERT FIGURE HERE *** reference number (16-bit integer)