Abstract

STD 11, RFC 822, defines a message representation protocol specifying considerable detail about US-ASCII message headers, and leaves the message content, or message body, as flat US-ASCII text. This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages to allow for

  • (1) textual message bodies in character sets other than US-ASCII,

  • (2) an extensible set of different formats for non-textual message bodies,

  • (3) multi-part message bodies, and

  • (4) textual header information in character sets other than US-ASCII.

These documents are based on earlier work documented in RFC 934, STD 11, and RFC 1049, but extends and revises them. Because RFC 822 said so little about message bodies, these documents are largely orthogonal to (rather than a revision of) RFC 822.

The initial document in this set, RFC 2045, specifies the various headers used to describe the structure of MIME messages. The second document defines the general structure of the MIME media typing system and defines an initial set of media types. The third document, RFC 2047, describes extensions to RFC 822 to allow non-US- ASCII text data in Internet mail header fields. The fourth document, RFC 2048, specifies various IANA registration procedures for MIME- related facilities. This fifth and final document describes MIME conformance criteria as well as providing some illustrative examples of MIME message formats, acknowledgements, and the bibliography.

These documents are revisions of RFCs 1521, 1522, and 1590, which themselves were revisions of RFCs 1341 and 1342. Appendix B of this document describes differences and changes from previous versions.

Authors:

  1. Ned Freed, Innosoft International, Inc.
  2. Nathaniel S. Borenstein, First Virtual Holdings

Introduction

The first and second documents in this set define MIME header fields and the initial set of MIME media types. The third document describes extensions to RFC822 formats to allow for character sets other than US-ASCII. This document describes what portions of MIME must be supported by a conformant MIME implementation. It also describes various pitfalls of contemporary messaging systems as well as the canonical encoding model MIME is based on.

MIME Conformance

The mechanisms described in these documents are open-ended. It is definitely not expected that all implementations will support all available media types, nor that they will all share the same extensions. In order to promote interoperability, however, it is useful to define the concept of "MIME-conformance" to define a certain level of implementation that allows the useful interworking of messages with content that differs from US-ASCII text. In this section, we specify the requirements for such conformance.

A mail user agent that is MIME-conformant MUST:

A user agent that meets the above conditions is said to be MIME- conformant. The meaning of this phrase is that it is assumed to be "safe" to send virtually any kind of properly-marked data to users of such mail systems, because such systems will at least be able to treat the data as undifferentiated binary, and will not simply splash it onto the screen of unsuspecting users.

There is another sense in which it is always "safe" to send data in a format that is MIME-conformant, which is that such data will not break or be broken by any known systems that are conformant with RFC 821 and RFC 822. User agents that are MIME-conformant have the additional guarantee that the user will not be shown data that were never intended to be viewed as text.

Guidelines for Sending Email Data

Internet email is not a perfect, homogeneous system. Mail may become corrupted at several stages in its travel to a final destination. Specifically, email sent throughout the Internet may travel across many networking technologies. Many networking and mail technologies do not support the full functionality possible in the SMTP transport environment. Mail traversing these systems is likely to be modified in order that it can be transported.

There exist many widely-deployed non-conformant MTAs in the Internet. These MTAs, speaking the SMTP protocol, alter messages on the fly to take advantage of the internal data structure of the hosts they are implemented on, or are just plain broken.

The following guidelines may be useful to anyone devising a data format (media type) that is supposed to survive the widest range of networking technologies and known broken MTAs unscathed. Note that anything encoded in the base64 encoding will satisfy these rules, but that some well-known mechanisms, notably the UNIX uuencode facility, will not. Note also that anything encoded in the Quoted-Printable encoding will survive most gateways intact, but possibly not some gateways to systems that use the EBCDIC character set.

"'"  (US-ASCII decimal value 39)
"("  (US-ASCII decimal value 40)
")"  (US-ASCII decimal value 41)
"+"  (US-ASCII decimal value 43)
","  (US-ASCII decimal value 44)
"-"  (US-ASCII decimal value 45)
"."  (US-ASCII decimal value 46)
"/"  (US-ASCII decimal value 47)
":"  (US-ASCII decimal value 58)
"="  (US-ASCII decimal value 61)
"?"  (US-ASCII decimal value 63)

Please note that the above list is NOT a list of recommended practices for MTAs. RFC 821 MTAs are prohibited from altering the character of white space or wrapping long lines. These BAD and invalid practices are known to occur on established networks, and implementations should be robust in dealing with the bad effects they can cause.

Canonical Encoding Model

There was some confusion, in earlier versions of these documents, regarding the model for when email data was to be converted to canonical form and encoded, and in particular how this process would affect the treatment of CRLFs, given that the representation of newlines varies greatly from system to system. For this reason, a canonical model for encoding is presented below.

The process of composing a MIME entity can be modeled as being done in a number of steps. Note that these steps are roughly similar to those steps used in PEM [RFC-1421] and are performed for each "innermost level" body:

Conversion from entity form to local form is accomplished by reversing these steps. Note that reversal of these steps may produce differing results since there is no guarantee that the original and final local forms are the same.

It is vital to note that these steps are only a model; they are specifically NOT a blueprint for how an actual system would be built. In particular, the model fails to account for two common designs:

Other implementation variations are conceivable as well. The vital aspect of this discussion is that, in spite of any optimizations, collapsings of required steps, or insertion of additional processing, the resulting messages must be consistent with those produced by the model described here. For example, a message with the following header fields:

Content-type: text/foo; charset=bar
Content-Transfer-Encoding: base64

must be first represented in the text/foo form, then (if necessary) represented in the "bar" character set, and finally transformed via the base64 algorithm into a mail-safe form.

NOTE: Some confusion has been caused by systems that represent messages in a format which uses local newline conventions which differ from the RFC822 CRLF convention. It is important to note that these formats are not canonical RFC822/MIME. These formats are instead *encodings* of RFC822, where CRLF sequences in the canonical representation of the message are encoded as the local newline convention. Note that formats which encode CRLF sequences as, for example, LF are not capable of representing MIME messages containing binary data which contains LF octets not part of CRLF line separation sequences.

Summary

This document defines what is meant by MIME Conformance. It also details various problems known to exist in the Internet email system and how to use MIME to overcome them. Finally, it describes MIME's canonical encoding model.

Security Considerations

Security issues are discussed in the second document in this set, RFC 2046.

Authors' Addresses (BOILERPLATE)

This RFC contained boilerplate in this section which has been moved to the RFC2223-compliant unnumbered section "Author's Address."

Acknowledgements

This document is the result of the collective effort of a large number of people, at several IETF meetings, on the IETF-SMTP and IETF-822 mailing lists, and elsewhere. Although any enumeration seems doomed to suffer from egregious omissions, the following are among the many contributors to this effort:

Harald Tveit Alvestrand       Marc Andreessen
Randall Atkinson              Bob Braden
Philippe Brandon              Brian Capouch
Kevin Carosso                 Uhhyung Choi
Peter Clitherow               Dave Collier-Brown
Cristian Constantinof         John Coonrod
Mark Crispin                  Dave Crocker
Stephen Crocker               Terry Crowley
Walt Daniels                  Jim Davis
Frank Dawson                  Axel Deininger
Hitoshi Doi                   Kevin Donnelly
Steve Dorner                  Keith Edwards
Chris Eich                    Dana S. Emery
Johnny Eriksson               Craig Everhart
Patrik Faltstrom              Erik E. Fair
Roger Fajman                  Alain Fontaine
Martin Forssen                James M. Galvin
Stephen Gildea                Philip Gladstone
Thomas Gordon                 Keld Simonsen
Terry Gray                    Phill Gross
James Hamilton                David Herron
Mark Horton                   Bruce Howard
Bill Janssen                  Olle Jarnefors
Risto Kankkunen               Phil Karn
Alan Katz                     Tim Kehres
Neil Katin                    Steve Kille
Kyuho Kim                     Anders Klemets
John Klensin                  Valdis Kletniek
Jim Knowles                   Stev Knowles
Bob Kummerfeld                Pekka Kytolaakso
Stellan Lagerstrom            Vincent Lau
Timo Lehtinen                 Donald Lindsay
Warner Losh                   Carlyn Lowery
Laurence Lundblade            Charles Lynn
John R. MacMillan             Larry Masinter
Rick McGowan                  Michael J. McInerny
Leo Mclaughlin                Goli Montaser-Kohsari
Tom Moore                     John Gardiner Myers
Erik Naggum                   Mark Needleman
Chris Newman                  John Noerenberg
Mats Ohrman                   Julian Onions
Michael Patton                David J. Pepper
Erik van der Poel             Blake C. Ramsdell
Christer Romson               Luc Rooijakkers
Marshall T. Rose              Jonathan Rosenberg
Guido van Rossum              Jan Rynning
Harri Salminen                Michael Sanderson
Yutaka Sato                   Markku Savela
Richard Alan Schafer          Masahiro Sekiguchi
Mark Sherman                  Bob Smart
Peter Speck                   Henry Spencer
Einar Stefferud               Michael Stein
Klaus Steinberger             Peter Svanberg
James Thompson                Steve Uhler
Stuart Vance                  Peter Vanderbilt
Greg Vaudreuil                Ed Vielmetti
Larry W. Virden               Ryan Waldron
Rhys Weatherly                Jay Weber
Dave Wecker                   Wally Wedel
Sven-Ove Westberg             Brian Wideen
John Wobus                    Glenn Wright
Rayan Zachariassen            David Zimmerman

The authors apologize for any omissions from this list, which are certainly unintentional.

Appendix

Appendix A -- A Complex Multipart Example

What follows is the outline of a complex multipart message. This message contains five parts that are to be displayed serially: two introductory plain text objects, an embedded multipart message, a text/enriched object, and a closing encapsulated text message in a non-ASCII character set. The embedded multipart message itself contains two objects to be displayed in parallel, a picture and an audio fragment.

MIME-Version: 1.0
From: Nathaniel Borenstein <nsb@nsb.fv.com>
To: Ned Freed <ned@innosoft.com>
Date: Fri, 07 Oct 1994 16:15:05 -0700 (PDT)
Subject: A multipart example
Content-Type: multipart/mixed;
              boundary=unique-boundary-1

This is the preamble area of a multipart message.
Mail readers that understand multipart format
should ignore this preamble.

If you are reading this text, you might want to
consider changing to a mail reader that understands
how to properly display multipart messages.

--unique-boundary-1

  ... Some text appears here ...

[Note that the blank between the boundary and the start
 of the text in this part means no header fields were
 given and this is text in the US-ASCII character set.
 It could have been done with explicit typing as in the
 next part.]

--unique-boundary-1
Content-type: text/plain; charset=US-ASCII

This could have been part of the previous part, but
illustrates explicit versus implicit typing of body
parts.

--unique-boundary-1
Content-Type: multipart/parallel; boundary=unique-boundary-2

--unique-boundary-2
Content-Type: audio/basic

Content-Transfer-Encoding: base64

  ... base64-encoded 8000 Hz single-channel
      mu-law-format audio data goes here ...

--unique-boundary-2
Content-Type: image/jpeg
Content-Transfer-Encoding: base64

  ... base64-encoded image data goes here ...

--unique-boundary-2--

--unique-boundary-1
Content-type: text/enriched

This is <bold><italic>enriched.</italic></bold>
<smaller>as defined in RFC 1896</smaller>

Isn't it
<bigger><bigger>cool?</bigger></bigger>

--unique-boundary-1
Content-Type: message/rfc822

From: (mailbox in US-ASCII)
To: (address in US-ASCII)
Subject: (subject in US-ASCII)
Content-Type: Text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: Quoted-printable

  ... Additional text in ISO-8859-1 goes here ...

--unique-boundary-1--

Appendix B -- Changes from RFC 1521, 1522, and 1590

These documents are a revision of RFC 1521, 1522, and 1590. For the convenience of those familiar with the earlier documents, the changes from those documents are summarized in this appendix. For further history, note that Appendix H in RFC 1521 specified how that document differed from its predecessor, RFC 1341.

  • (1) This document has been completely reformatted and split into multiple documents. This was done to improve the quality of the plain text version of this document, which is required to be the reference copy.

  • (2) BNF describing the overall structure of MIME object headers has been added. This is a documentation change only -- the underlying syntax has not changed in any way.

  • (3) The specific BNF for the seven media types in MIME has been removed. This BNF was incorrect, incomplete, amd inconsistent with the type-indendependent BNF. And since the type-independent BNF already fully specifies the syntax of the various MIME headers, the type- specific BNF was, in the final analysis, completely unnecessary and caused more problems than it solved.

  • (4) The more specific "US-ASCII" character set name has replaced the use of the informal term ASCII in many parts of these documents.

  • (5) The informal concept of a primary subtype has been removed.

  • (6) The term "object" was being used inconsistently. The definition of this term has been clarified, along with the related terms "body", "body part", and "entity", and usage has been corrected where appropriate.

  • (7) The BNF for the multipart media type has been rearranged to make it clear that the CRLF preceeding the boundary marker is actually part of the marker itself rather than the preceeding body part.

  • (8) The prose and BNF describing the multipart media type have been changed to make it clear that the body parts within a multipart object MUST NOT contain any lines beginning with the boundary parameter string.

  • (9) In the rules on reassembling "message/partial" MIME entities, "Subject" is added to the list of headers to take from the inner message, and the example is modified to clarify this point.

  • (10) "Message/partial" fragmenters are restricted to splitting MIME objects only at line boundaries.

  • (11) In the discussion of the application/postscript type, an additional paragraph has been added warning about possible interoperability problems caused by embedding of binary data inside a PostScript MIME entity.

  • (12) Added a clarifying note to the basic syntax rules for the Content-Type header field to make it clear that the following two forms:

    • Content-type: text/plain; charset=us-ascii (comment)

    • Content-type: text/plain; charset="us-ascii"

    • are completely equivalent.

  • (13) The following sentence has been removed from the discussion of the MIME-Version header: "However, conformant software is encouraged to check the version number and at least warn the user if an unrecognized MIME-version is encountered."

  • (14) A typo was fixed that said "application/external-body" instead of "message/external-body".

  • (15) The definition of a character set has been reorganized to make the requirements clearer.

  • (16) The definition of the "image/gif" media type has been moved to a separate document. This change was made because of potential conflicts with IETF rules governing the standardization of patented technology.

  • (17) The definitions of "7bit" and "8bit" have been tightened so that use of bare CR, LF can only be used as end-of-line sequences. The document also no longer requires that NUL characters be preserved, which brings MIME into alignment with real-world implementations.

  • (18) The definition of canonical text in MIME has been tightened so that line breaks must be represented by a CRLF sequence. CR and LF characters are not allowed outside of this usage. The definition of quoted- printable encoding has been altered accordingly.

  • (19) The definition of the quoted-printable encoding now includes a number of suggestions for how quoted- printable encoders might best handle improperly encoded material.

  • (20) Prose was added to clarify the use of the "7bit", "8bit", and "binary" transfer-encodings on multipart or message entities encapsulating "8bit" or "binary" data.

  • (21) In the section on MIME Conformance, "multipart/digest" support was added to the list of requirements for minimal MIME conformance. Also, the requirement for "message/rfc822" support were strengthened to clarify the importance of recognizing recursive structure.

  • (22) The various restrictions on subtypes of "message" are now specified entirely on a subtype by subtype basis.

  • (23) The definition of "message/rfc822" was changed to indicate that at least one of the "From", "Subject", or "Date" headers must be present.

  • (24) The required handling of unrecognized subtypes as "application/octet-stream" has been made more explicit in both the type definitions sections and the conformance guidelines.

  • (25) Examples using text/richtext were changed to text/enriched.

  • (26) The BNF definition of subtype has been changed to make it clear that either an IANA registered subtype or a nonstandard "X-" subtype must be used in a Content-Type header field.

  • (27) MIME media types that are simply registered for use and those that are standardized by the IETF are now distinguished in the MIME BNF.

  • (28) All of the various MIME registration procedures have been extensively revised. IANA registration procedures for character sets have been moved to a separate document that is no included in this set of documents.

  • (29) The use of escape and shift mechanisms in the US-ASCII and ISO-8859-X character sets these documents define have been clarified: Such mechanisms should never be used in conjunction with these character sets and their effect if they are used is undefined.

  • (30) The definition of the AFS access-type for message/external-body has been removed.

  • (31) The handling of the combination of multipart/alternative and message/external-body is now specifically addressed.

  • (32) Security issues specific to message/external-body are now discussed in some detail.