How Vector Files Are Organized

Although vector files, like bitmap files, vary considerably in design, most contain the same basic structure: a header, a data section, and an end-of-file marker. Some structure is needed in the file to contain information global to the file and to correctly interpret the vector data at render time. Although most vector files place this information in a header, some rely solely on a footer to perform the same task.

Vector files on the whole are structurally simpler than most bitmap files and tend to be organized as data streams. Most of the information content of the file is found in the image data.

The basic components of a simple vector file are the following:

Header

Image Data

If a file contains no image data, only a header will be present. If additional information is required that does not fit in the header, you may find a footer appended to the file, and a palette may be included as well:

Header

Palette

Image Data

Footer

Header

The header contains information that is global to the vector file and must be read before the remaining information in the file can be interpreted. Such information can include a file format identification number, a version number, and color information.

Headers may also contain default attributes, which will apply to any vector data elements in the file lacking their own attributes. While this may afford some reduction in file size, it does so at the cost of introducing the need to cache the header information throughout the rendering operation.

Headers and footers found in vector-format files may not necessarily be a fixed size. For historical reasons mentioned above, it is not uncommon to find vector formats which use streams of variable-length records to store all data. If this is the case, then the file must be read sequentially and will normally fail to provide offset information that is necessary to allow the rendering application to subsample the image.

The type of information stored in the header is governed by the types of data stored in the file. Basic header information contains the height and width of the image, the position of the image on the output device, and possibly the number of layers in the image. Thus, the size of the header may vary from file to file within the same format.

Vector Data

The bulk of all but tiny files consists of vector element data that contain information on the individual objects making up the image. The size of the data used to represent each object will depend upon the complexity of the object and how much thought went into reducing the file size when the format was designed.

Following the header is usually the image data. The data is composed of elements, which are smaller parts that comprise the overall image. Each element either inherits information or is explicitly associated with default information that specifies its size, shape, position relative to the overall image, color, and possibly other attribute information. An example of vector data in ASCII format containing three elements (a circle, a line, and a rectangle), might appear as:

;CIRCLE,40,100,100,BLUE;LINE,200,50,136,227,BLACK;RECT,80,65,25,78,RED;

Although this example is a simple one, it illustrates the basic problem of deciphering vector data, which is the existence of multiple levels of complexity. When deciphering a vector format, you not only must find the data, but you also must understand the formatting conventions and the definitions of the individual elements. This is hardly ever the case in bitmap formats; bitmap pixel data is all pretty much the same.

In this example, elements are separated by semicolons, and each is named, followed by numerical parameters and color information. Note, however, that consistency of syntax across image elements is never guaranteed. We could have just as easily defined the format in such a way as to make blocks of unnamed numbers signify lines by default:

;CIRCLE,40,100,100,BLUE;200,50,136,227,BLACK;RECT,80,65,25,78,RED;

and the default color black if unspecified:

;CIRCLE,40,100,100,BLUE;200,50,136,227;RECT,80,65,25,78,RED;

Many formats allow abbreviations:

;C,40,100,100,BL;200,50,136,227;R,80,65,25,78,R;

Notice that the R for RECT and R for RED are distinguished by context. You will find that many formats have opted to reduce data size at the expense of conceptual simplicity. You are free to consider this as evidence of flawed reasoning on the part of the format designer. The original reason for choosing ASCII was for ease of reading and parsing. Unfortunately, using ASCII may make the data too bulky. Solution: reduce the data size through implied rules and conventions and allow abbreviation (in the process making the format unreadable). The format designer would have been better off using a binary format in the first place.

After the image data is usually an end-of-section or end-of-file marker. This can be as simple as the string EOF at the end the file. For the same reasons discussed in Chapter 3, Bitmap Files, some vector formats also append a footer to the file. Information stored in a footer is typically not necessary for the correct interpretation of the rendering and may be incidental information such as the time and date the file was created, the name of the application used to create the file, and the number of objects contained in the image data.

Palettes and Color Information

Like bitmap files, vector files can contain palettes. (For a full discussion of palettes, see the discussion in Chapter 3.) Because the smallest objects defined in vector format files are the data elements, these are the smallest features for which color can be specified. Naturally, then, a rendering application must look up color definitions in the file palette before rendering the image. Our example above, to be correct, would thus need to include the color definitions, which take the form of a palette with associated ASCII names:

RED,255,0,0,
BLACK,0,0,0,
BLUE,0,0,255
;C,40,100,100,BL;200,50,136,227;R,80,65,25,78,R;

Some vector files allow the definition of enclosed areas, which are considered outlines of the actual vector data elements. Outlines may be drawn with variations in thickness or by using what are known as different pen styles, which are typically combinations of dots and dashes and which may be familiar from technical and CAD drawings. Non-color items of information necessary for the reproduction of the image by the rendering application are called element attributes.

Fills and color attributes

Enclosed elements may be designed to be filled with color by the rendering application. The filling is usually allowed to be colored independently from the element outline. Thus, each element may have two or more colors associated with it, one for the element outline and one for the filling. Fill colors may be transparent, for instance, and some formats define what are called color attributes. In addition to being filled with solid colors, enclosed vector elements may contain hatching or shading, which are in turn called fill attributes. In some cases, fill and color attributes are lumped together, either conceptually in the format design, or physically in the file.

Formats that do not support fill patterns must simulate them by drawing parts of the pattern (lines, circles, dots, etc.) as separate elements. This not only introduces an uneven quality to the fill, but also dramatically increases the number of objects in the file and consequently the file size.

Gradient fills

An enclosed vector element may also be filled with more than one color. The easiest way is with what is called a gradient fill, which appears as a smooth transition between two colors located in different parts of the element fill area. Gradient fills are typically stored as a starting color, an ending color, and the direction and type of the fill. A rendering application is then expected to construct the filled object, usually at the highest resolution possible. CGM is an example of a format that supports horizontal, vertical, and circular gradient fills. Figure 4-1 illustrates a gradient fill.

Figure 4-1: Gradient fill

[Graphic: Figure 4-1]

Footer

A footer may contain information that can be written to the file only after all the object data is written, such as the number of objects in the image. The footer in most vector formats, however, is simply used to mark the end of the object data.