A document file format is a
text or
binary file format for storing
documents on a
storage media, especially for use by
computers.
There currently exist a multitude of incompatible document file formats.
In 1993, the
ITU-T tried to establish a standard for document file formats, known as the
Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.
Page description languages such as
PostScript and
PDF have become the de facto standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of
ISO/
IEC standards for PDF began to be published, including the specification for PDF itself,
ISO-32000.
HTML is the most used and open international standard and it is also used as document file format. It has also become
ISO/
IEC standard (ISO 15445:2000).
The default binary file format used by
Microsoft Word (
.doc) has become widespread de facto standard for office documents, but it is a
proprietary format and is not always fully supported by other word processors.
Common document file formats
ASCII,
UTF-8 —
plain text encodings. With these two character sets, there are three different line endings used: (a) LF -- linefeed, by UNIX and like systems, (b) CRLF -- carriage return, linefeed by DOS and Windows systems, and (c) CR -- carriage return by older Macintosh systems.
PDF — Open standard for document exchange. ISO standards include
PDF/X (eXchange),
PDF/A (Archive),
PDF/E (Engineering),
ISO 32000 (PDF),
PDF/UA (Accessibility) and
PDF/VT (Variable data and transactional printing). PDF is readable on almost every platform with free or open source readers. Open source PDF creators are also available.