Data stream formats in the Andrew User Interface System Wilfred J. Hansen Andrew Consortium Carnegie Mellon University (The Andrew Toolkit (ATK) is the architecture and tools for building application in the Andrew User Interface System.) In order to support the inclusion of arbitrary objects in multi-media editors, the Andrew Toolkit requires data objects to conform to a set of conventions for their file representation. A data object must write its data enclosed in a begin/end marker pair. The marker must include a tag denoting the type of the object being written and a unique identifier, used for referencing the data object by other data objects. If a data object includes other data objects, they must be properly nested. The begin/end markers make it possible to find the data associated with an object without actually parsing the data. For example, a text with an embedded picture has the format: \begindata{text,1} \begindata{picture,2} \enddata{picture,2} \view {pictureview,2} \enddata{text,1} In order to transport files across most networks, data streams use only printable 7-bit ASCII characters, including tab, space and new-line, and keep line lengths below 80 characters. ____________________________________ Text format Text data streams in the Andrew User Interface System follow the general principles for Andrew Toolkit data streams. The overall structure of a text data stream is A. \begindata line B. \textdsversion line C. \template line D. definitions of additional styles E. the text body itself F. styled text G. embedded objects in text body H. \enddata line Subsequent sections of this document describe each of these components. As usual in ATK, the appropriate way to read or write the data stream is to call upon the corresponding Read or Write method from the AUIS distribution. Only in this way is your code likely to continue to work in the face of changes to the data stream definition. Moreover, there are a number of special features--mostly outdated data streams--that are implemented in the code, but not described here. A. \begindata line Standard ATK begindata line having the form \begindata{text,99999} where 99999 is some identifying number unique within this data stream. B. \textdsversion line This line always has the form \textdsversion{12} There exist files written with earlier data stream versions having values other than 12. C. \template line If the file utilizes a style template, there will be a line of this form: \template{default} where 'default' is whatever template name is used. This template name is the prefix of a filename. The name is appended with the suffix ".tpl" and sought in the directories named in the user's atktemplatepath preference value. If there is none, the default directory is $ANDREWDIR/lib/tpls. 'default' is the most usual template name. Every installation of AUIS is expected to have $ANDREWDIR/lib/tpls/default.tpl and its styles are not defined further in the document. D. definitions of additional styles A document may define and use styles that are not in the template. Each such definition is two or more lines: \define{internalstylename menuname attribute . . . attribute} The internalstylename is lower case and may have digits, but no spaces. There may be no menuname, in which case there is an empty line; if there a menuname line, it is of the form menu:[Menu card name,Style name] If there are no attributes, the closing '}' appears at the end of the menuname line. Each attribute line is of the form attr:[attributename basis units value] where the first three are strings and the fourth is an integer, possibly signed. The specific values allowed are beyond the scope of this document; they do correspond closely to values in style.H. E. the text body itself Text is represented by itself. n consecutive newlines in the text are represented by n+1 newlines in the data stream. Single newlines are used to break the stream into lines of less than 80 bytes; these are ignored when the file is read. Earlier data stream versions required a sapce before a newline if there was to be a space in the text; version 12 invents a space before the newline if one is not there. The latter is prevented by ending the line with a single backslash (\). If a sentence ends a line and has more than one space after its punctuation, the additional spaces must appear at the start of the next line. The characters backslash, left brace, and right brace are always preceded in the text with a backslash. There is a convention for representing non-ASCII ISO-8859 characters, but I don't know what it is offhand. F. styled text If text in the body is to be displayed in a style, e.g. italic, the text is preceded with \internalstylename{ and followed by a closing curly brace. The internal style name is one of the names defined either in the template or in a \define line. G. embedded objects in text body When an object is embedded in a text body, two items appear: the data stream for the object and a \view line. The \begindata for the object is always at the beginning of a line. (The previous line is terminated with backslash if there is to be no space before the object.) The \enddata line for the object always ends with a newline (which is not treated as a space). The \view line has the form: \view{rasterview,8888,777,0,0} In future data stream versions, other items may appear before the '}'; each such item is preceded by a comma. The first item in the list is the textual name of the view object to be used to display the dataobject. The second item is the identifing integer that also appears in the \begindata for the object. The third value is ignored. The fourth and fifth items are usually zero; however, if non-zero the specify the desired width and height to display the object. H. \enddata line Has the form \enddata{text,99999} that is, it is the same as the \begindata line, but has 'end' instead of 'begin'. ____________________________________ Format of ATK raster images The raster data object writes a standard ATK data stream beginning with a \begindata line and ending with a \enddata line. Between these comes a header and possibly an image body. The first line of the header looks like this: 2 0 65536 65536 0 0 484 603 Where the values are these: RasterVersion: '2' This specification describes the second version of this encoding. Options: '0' This field may specify changes to the image before displaying it: raster_INVERT(1<<0)/* exchange black and white */ raster_FLIP(1<<1)/* exch top and bottom */ raster_FLOP(1<<2)/* exch left and right */ raster_ROTATE(1<<3)/* rotate 90 clockwise */ xScale, yScale: '65536 65536' These scale factors affect the size at which the image is printed. The value raster_UNITSCALE (136535) will print the image at approximately the size on the screen. The default scale of 65536 is approximately half the screen size. (It is not exactly half screen size in an effort to simplify scaling on 300-dots-per-inch printers.) x, y, width, height: '0 0 484 603' It is possible for a raster object to display a portion of an image. These fields select this portion by specifying the index of the upper left pixel and the width and height of the image in pixels. In all instances so far, x and y are both zero and the width and height specify the entire raster. The second header line specifies the actual raster in one of three forms; but only the first of these forms is actually used. First form: bits 10156544 484 603 RasterType: 'bits' This form. RasterId: '10156544' An identifier so other raster objects can refer to this one. Usually this is the same identifier as in the \begindata line. Width, Height: '484 603' Describes the width of each row and the number of rows. This many rows follow one subsequent lines. Second form: refer 10135624 The current data object does not have the bits, but refers to the bits as stored in another data object (which should appear earlier in the same data stream.) 'refer' identifies this form and the integer is the identifying number. Third form: file 10235498 filename path The raster is not in the current data object, but is in a file. 'file' identifies this form. The id number '10235498' allows this raster data to be refered to by a 'refer' form. The filename is the full pathname of the file. Path is the element of a "rasterpath" list against which the filename was resolved. (This is not fully implemented. The idea is to acheive a measure of recovery in case the file is moved.) In the first form ('bits'), the header is followed by lines specifying the image. There is at least one line per raster row, though some rows may take more lines. The bits of a row are encoded in blocks of eight; a multiple of 8 bits are specified, though trailing bits will be ignored after reading the row. Following the last bits for a row are a space, a vertical bar (|), and a newline. Basically, white space is to be ignored, so the bytes of the encoding are broken into blocks of 13 or 14 bytes separated with tabs. The bits of the row are run-length encoded by bytes. That is, a sequence of identical bytes will be represented in only a few bytes rather than at full length. Hexadecimal is a subset of this encoding with a one bit representing black and zero for white. Here is the interpretation of each range of byte values: control characters and space: Ignored. @ [ ] ^ _ ` } ~ 0x7F and all characters with high bit set: These are errors, but at present they are ignored. { \: Illegal end of line. Treat as end of row. |: Legal end of row. If there have not been enough codes for the entire width, pad with white bits. 0x21 ... 0x2F (punctuation characters) The next two bytes specify a hex value. This value is to be repeated in the row the number of times given by c-0x1F, where c is the input code. (That is, 0x21 means to repeat the byte two times, 0x22 three times, and so on.) 0x30 ... 0x3F (digit or punctuation) This is a hex digit and encodes one byte of the row with the value c-0x30. A ... F a ... f These are hex digits with values 0xA ... 0xF. g ... z Multiple white bytes. c-'f' bytes of white are generated into row G ... Z Multiple black bytes. c- 'F' bytes of black are generated into row \begindata{text,538375988} \textdsversion{12} \template{default} \define{global } \define{up15 menu:[Justify,Up15] attr:[Script PreviousScriptMovement Point -15]} This is text in the document. \italic{This is italic.} These two lines are one paragraph. This paragraph is preceded by two newlines, but it will be displayed with only one blank line between it and the previous one. When two space are required between words, the second must appear at the beginning of a line. When a newli\ ne is not to be replaced with a space, it must be preceded with backslash. \begindata{bp,9233088} \enddata{bp,9233088} \view{bpv,9233088,38,0,0} This second page has a raster on it. \begindata{raster,10156544} 2 0 68266 68266 0 0 484 603 bits 10156544 484 603 zzzg | zzzg | 7fZZHfeKfeOc0g | . . . zzzg | \enddata{raster, 10156544} \view{rasterview,10156544,31,0,0} \enddata{text,538375988} ----------------------------- The only immediate comment I would add is that, if you come across a file which purports to be an AUIS raster file, it may be either of two things: A) A raster datastream, as defined in Fred's document under "Format of ATK raster images." The first line of this file would be \begindata{raster,99999} (with an arbitrary ID integer in place of the 999999). This would be followed by the header and possibly the image body, and then the final line would be \enddata{raster,99999} (with the same integer.) B) A text datastream (or some other kind of datastream) containing a raster as an embedded object and no other data. This is not the preferred way to store a raster image, but it tends to happen every now and then. In this case, the raster datastream will occur, as described above, somewhere within the larger datastream. It is legal to read in lines and throw them away until you find a line that begins \begindata{raster, (The backslash will always be the first character on the line.) You then read in the datastream until the \enddata line occurs, and ignore the rest of the file. (You can compare the ID numbers as a consistency check). where within the larger datastream. It is legal to read in lines and throw them away until you find a line that begins \begindata{raster, (The backslash will always be the first character on the line.) You then read in the datastream until the \enddata line occurs, and ignore the rest of the file. (You can compare the ID numbers as a consistency check).