Microsoft RTF

Also Known As: Rich Text Format


Type Metafile
Colors 256
Compression None
Maximum Image Size NA
Multiple Images Per File No
Numerical Format Little-endian
Originator Microsoft Corporation
Platform MS-DOS
Supporting Applications Most word processing, some spreadsheet
See Also None

Usage
Used for document data interchange.

Comments
A least-common-denominator format used mainly in word-processor documents.

Vendor specifications are available for this format.


Microsoft RTF (Rich Text Format) is a metafile standard developed by Microsoft Corporation to encode formatted text and graphics for interchange between applications. Normally, exporting a formatted file from one word processor to another requires that the file be converted from its original format to the format supported by the target application. This conversion almost never produces a target document that is an exact functional duplicate of the original. This is due both to the different features present in the word processor formats, and to limitations of the format converters. If a document is stored as an RTF file, however, and the reading application can also handle RTF files, no intermediate conversion is necessary and therefore no data is misinterpreted or lost.

Contents:
File Organization
File Details
For Further Information

RTF has excellent font-handling capabilities and bitmap storage features. RTF files contain only 7-bit ASCII characters, so the format can support documents formatted using the ANSI, MS-DOS, and Macintosh character sets. These features and others make the RTF format a good choice for use as a multi-platform interchange format.

File Organization

The encoded data in RTF files is arranged more like a stream than a fixed data structure, so there is no definite information header that is the same in all RTF files. Instead, an RTF code stream consists of variable-sized fields called control words, control symbols, and groups. Each of these three types of fields begins with a backslash character (\), followed by one or more ASCII characters. A control word is an RTF code that contains special formatting and printing instructions.

File Details

Looking at the 22 lines of RTF code included in this section, we see the following control codes at the beginning of the file:

\rtf1\ansi

These control codes indicate that this data stream is an RTF document, that the code conforms to version 1 of the RTF specification, and that the document uses the ANSI (\ansi) rather than the PC (\pc), PS/2 (\pca), or Macintosh (\mac) character sets.

Control symbols are special escape character sequences consisting of a backslash that is followed by a single, nonalphabetic character. RTF control symbols include:

\~    Nonbreaking space
\_    Nonbreaking hyphen
\:    Index subentry
\'    Hexadecimal value xx

A group is a collection of text, control words, and control symbols, enclosed in a set of braces ({}). In fact, the entire RTF code stream is considered a group and is always enclosed in braces. The first control word in the group identifies the group type. Both the backslash (\) and the brace characters ({}) have special meanings in RTF and should be preceded by a backslash if they are to be interpreted as text.

{\rtf1\ansi \deff0\deflang1024
{\fonttbl{\f0\froman Tms Rmn;}{\f1\froman Symbol;}{\f2\fswiss Helv;}}
{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;
\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue127;
\red0\green127\blue127;\red0\green127\blue0;\red127\green0\blue127;
\red127\green0\blue0;\red127\green127\blue0;\red127\green127\blue127;
\red192\green192\blue192;}
{\stylesheet{\fs20\lang1033 \snext0 Normal;}}
{\info{\author \'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00}
{\operator \'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00\'00}
{\creatim\yr1992\mo1\dy9\hr12\min53}
{\revtim\yr1992\mo1\dy9\hr12\min53}{\version1}{\edmins3}{\nofpages0}
{\nofwords0}{\nofchars0}{\vern16504}}
\paperw12240\paperh15840\margl1800\margr1800\margt1440\margb1440\gutter0
\widowctrl\ftnbj \sectd \linex0\endnhere \pard\plain \fs20\lang1033
Four Basic Principles to Unify Mind and Body.
\par \tab 1. Keep one point.
\par \tab 2. Relax completely.
\par \tab 3. Keep weight underside.
\par \tab 4. Extend Ki.
\par }

Looking again at the RTF code in the figure, we can see a number of groups. The first group is obviously the \rtf group, which contains the code for the entire file.

The \fonttbl group contains the descriptions of the fonts used within the document. This document defines Times Roman, Symbol, and Helvetica font sets.

The next group, \colortbl, is a color table used to control screen and printer colors. This file defines a basic palette of 16 colors, with each color channel containing an 8-bit index value in the range of 0 to 255.

The \stylesheet group contains descriptions and definitions of the various styles and formats used in the document. In this example, we can see that Normal is the only style defined in this document.

The \info group contains one or more pieces of information about the documents, such as title, subject, author, version, keywords, and comments. In this example, the author and operator (the person who made the last change to the document) are blank. The remaining fields identify the creation time and last revision time of the document and its application version number.

After the groups, we see a series of control words that define the document, section, and paragraph formats, including the width, height, and margins. Following these control words is the actual text, which is one line of text followed by four lines of tab-indented text.

RTF can also handle bitmap images encoded in either a hexadecimal or binary format. The control word \pict always begins a group containing bitmapped data. A \pict group might appear in an RTF code stream as follows:

{\pict\wmetafile8\picw23918\pich14552\picwgoal13562\pichgoal8251
\picscalex63\piccaley63

The control words are the following:

If the image source is a bitmap (\wbitmap), then the following additional control words may appear:

Source images may also be Macintosh PICT files.

Following the \pict group is the actual bitmap data, which is hexadecimal in format by default (as shown in the example below). If the data is in binary format, it is preceded by the \bin control word, followed by the number of bytes of binary data that follow.

{\rtf1\ansi \deff0\deflang1024
{\fonttbl{\f0\froman CG Times (WN);}{\f1\fdecor Symbol;}{\f2\fswiss Univers (WN);}}
{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;
\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue127;
\red0\green127\blue127;\red0\green127\blue0;\red127\green0\blue127;
\red127\green0\blue0;\red127\green127\blue0;\red127\green127\blue127;
\red192\green192\blue192;}
{\stylesheet{\fs20\lang1033 snext0 Normal;}}
{\info{\author James D. Murray}
{\creatim\yr1992\mo1\dy9\hr15\min31}{\printim\yr1992\mo1\dy9\hr15\min32}
{\version1}{\edmins2}{\nofpages1}{\nofwords0}{\nofchars2}{\vern16504}}
\paperw12240\paperh15840\margl1800\margr1800\margt1440\margb1440\gutter0
\widowctrl\ftnbj \sectd \linex0\endnhere \pard\plain \fs20\lang1033
{\pict\wmetafile8\picw23918\pich14552\picwgoal13562\pichgoal8251
\picscalex63\picscaley63
01000900000328ea01000000fee901000000050000000b0200000000050000000c024c0410070500
00000b0200000000050000000c024c04100705000000090200000000050000000102ffffff00fee9
0100430f2000cc0000004c041007000000004c0410070000000028000000100700004c0400000100
010000000000000000000000000000000000000000000000000000000000ffffff00ffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffff0000030000000000000000000000000000000000000000000000
0000}\par}

For Further Information

For further information, see the specification included on the CD-ROM. You may be able to get additional information by contacting Microsoft:

Microsoft Corporation
Attn: Department RTF
16011 N.E. 36th Way
Box 97017
Redmond, WA 98073-9717
WWW: http://www.microsoft.com/

The RTF file format is also documented in the following reference:

Microsoft Corporation. Microsoft Word Technical Reference Manual, Microsoft Press, Redmond, WA.

This book is available in bookstores or from:

Microsoft Press
Voice: 800-677-7377

You may also be able to get information via FTP through the Developer Relations Group at:

ftp://ftp.microsoft.com/developer/drg/



Copyright © 1996, 1994 O'Reilly & Associates, Inc. All Rights Reserved.

Hosted by uCoz