Abstract

Although HTML [RFC 1866] was designed within the context of MIME, more than the specification of HTML as defined in RFC 1866 is needed for two electronic mail user agents to be able to interoperate using HTML as a document format. These issues include the naming of objects that are normally referred to by URIs, and the means of aggregating objects that go together. This document describes a set of guidelines that will allow conforming mail user agents to be able to send, deliver and display these objects, such as HTML objects, that can contain links represented by URIs. In order to be able to handle inter-linked objects, the document uses the MIME type multipart/related and specifies the MIME content-headers "Content- Location" and "Content-Base".

Authors:

  1. Jacob Palme, Stockholm University and KTH
  2. Alex Hopmann, Microsoft Corporation

Introduction

There are a number of document formats, HTML T. Berners-Lee, D. Connolly: "Hypertext Markup Language - 2.0", PDF R. and Meehan, J.: "Portable Document Format Reference Manual, Version 1.1", Adboe Systems Inc and VRML for example, which provide links using URIs for their resolution. There is an obvious need to be able to send documents in these formats in e-mail [RFC821=SMTP, RFC822]. This document gives additional specifications on how to send such documents in MIME [RFC 1521=MIME1] e-mail messages. This version of this standard was based on full consideration only of the needs for objects with links in the

Text/HTML media type (as defined in RFC 1866 T. Berners-Lee, D. Connolly: "Hypertext Markup Language - 2.0"), but the standard may still be applicable also to other formats for sets of interlinked objects, linked by URIs. There is no conformance requirement that implementations claiming conformance to this standard are able to handle URI-s in other document formats than HTML.

URIs in documents in HTML and other similar formats reference other objects and resources, either embedded or directly accessible through hypertext links. When mailing such a document, it is often desirable to also mail all of the additional resources that are referenced in it; those elements are necessary for the complete interpretation of the primary object.

An alternative way for sending an HTML document or other object containing URIs in e-mail is to only send the URL, and let the recipient look up the document using HTTP. That method is described in N. Freed and Keith Moore: "Definition of the URL MIME External-Body Access-Type" and is not described in this document.

An informational RFC will at a later time be published as a supplement to this standard. The informational RFC will discuss implementation methods and some implementation problems. Implementors are recommended to read this informational RFC when developing implementations of the MHTML standard. This informational RFC is, when this RFC is published, still in IETF draft status, and will stay that way for at least six months in order to gain more implementation experience before it is published.

Terminology

Conformance requirement terminology

This specification uses the same words as RFC 1123 R. Braden (editor): "Requirements for Internet Hosts -- Application and Support", STD-3 for defining the significance of each particular requirement. These words are:

MUST    This word or the adjective "required" means that the item is
        an absolute requirement of the specification.

SHOULD  This word or the adjective "recommended" means that there may
        exist valid reasons in particular circumstances to ignore this
        item, but the full implications should be understood and the
        case carefully weighed before choosing a different course.

MAY     This word or the adjective "optional" means that this item is
        truly optional. One vendor may choose to include the item
        because a particular marketplace requires it or because it
        enhances the product, for example; another vendor may omit
        the same item.

An implementation is not compliant if it fails to satisfy one or more of the MUST requirements for the protocols it implements. An implementation that satisfies all the MUST and all the SHOULD requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST requirements but not all the SHOULD requirements for its protocols is said to be "conditionally compliant."

Other terminology

Most of the terms used in this document are defined in other RFCs.

Absolute URI, AbsoluteURI

CID

See E. Levinson: "Content-ID and Message-ID Uniform Resource Locators".

Content-Base

See section 4.2 below.

Content-ID

See E. Levinson: "Content-ID and Message-ID Uniform Resource Locators".

Content-Location

MIME message or content part header with the URI of the MIME message or content part body, defined in section 4.3 below.

Content-Transfer-Encoding

Conversion of a text into 7-bit octets as specified in N. Borenstein & N. Freed: "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies".

CR

See D. Crocker: "Standard for the format of ARPA Internet text messages.".

CRLF

See D. Crocker: "Standard for the format of ARPA Internet text messages.".

Displayed text

The text shown to the user reading a document with a web browser. This may be different from the HTML markup, see the definition of HTML markup below.

Header

Field in a message or content heading specifying the value of one attribute.

Heading

Part of a message or content before the first CRLFCRLF, containing formatted fields with attributes of the message or content.

HTML

See RFC 1866 T. Berners-Lee, D. Connolly: "Hypertext Markup Language - 2.0".

HTML Aggregate

HTML objects together with some or all objects, to objects which the HTML object contains hyperlinks.

HTML markup

A file containing HTML encodings as specified in [HTML] which may be different from the displayed text which a person using a web browser sees. For example, the HTML markup may contain "&lt;" where the displayed text contains the character "<".

LF

See D. Crocker: "Standard for the format of ARPA Internet text messages.".

MIC

Message Integrity Codes, codes use to verify that a message has not been modified.

MIME

See RFC 1521 N. Borenstein & N. Freed: "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", N. Borenstein & N. Freed: "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types".

MUA

Messaging User Agent.

PDF

Portable Document Format, see R. and Meehan, J.: "Portable Document Format Reference Manual, Version 1.1", Adboe Systems Inc.

Relative URI, RelativeURI

URI, absolute and relative

URL

See RFC 1738 T. Berners-Lee, L. Masinter, M. McCahill: "Uniform Resource Locators (URL)".

URL, relative

See R. Fielding: "Relative Uniform Resource Locators".

VRML

Virtual Reality Markup Language.

Overview

An aggregate document is a MIME-encoded message that contains a root document as well as other data that is required in order to represent that document (inline pictures, style sheets, applets, etc.). Aggregate documents can also include additional elements that are linked to the first object. It is important to keep in mind the differing needs of several audiences. Mail sending agents might send

aggregate documents as an encoding of normal day-to-day electronic mail. Mail sending agents might also send aggregate documents when a user wishes to mail a particular document from the web to someone else. Finally mail sending agents might send aggregate documents as automatic responders, providing access to WWW resources for non-IP connected clients.

Mail receiving agents also have several differing needs. Some mail receiving agents might be able to receive an aggregate document and display it just as any other text content type would be displayed. Others might have to pass this aggregate document to a browsing program, and provisions need to be made to make this possible.

Finally several other constraints on the problem arise. It is important that it be possible for a document to be signed and for it to be able to be transmitted to a client and displayed with a minimum risk of breaking the message integrity (MIC) check that is part of the signature.

The Content-Location and Content-Base MIME Content Headers

MIME content headers

In order to resolve URI references to other body parts, two MIME content headers are defined, Content-Location and Content-Base. Both these headers can occur in any message or content heading, and will then be valid within this heading and for its content.

In practice, at present only those URIs which are URLs are used, but it is anticipated that other forms of URIs will in the future be used.

The syntax for these headers is, using the syntax definition tools from D. Crocker: "Standard for the format of ARPA Internet text messages.":

content-location ::= "Content-Location:" ( absoluteURI |
                     relativeURI )

content-base ::= "Content-Base:" absoluteURI

where URI is at present (June 1996) restricted to the syntax for URLs as defined in RFC 1738 T. Berners-Lee, L. Masinter, M. McCahill: "Uniform Resource Locators (URL)".

These two headers are valid only for exactly the content heading or message heading where they occurs and its text. They are thus not valid for the parts inside multipart headings, and are thus meaningless in multipart headings.

These two headers may occur both inside and outside of a multipart/related part.

The Content-Base header

The Content-Base gives a base for relative URIs occurring in other heading fields and in HTML documents which do not have any BASE element in its HTML code. Its value MUST be an absolute URI.

Example showing which Content-Base is valid where:

Content-Type: Multipart/related; boundary="boundary-example-1";
              type=Text/HTML; start=foo2*foo3@bar2.net
 ; A Content-Base header cannot be placed here, since this is a
 ; multipart MIME object.

--boundary-example-1

Part 1:
Content-Type: Text/HTML; charset=US-ASCII
Content-ID: <foo2*foo3@bar2.net>
Content-Location: http://www.ietf.cnir.reston.va.us/images/foo1.bar1
;  This Content-Location must contain an absolute URI, since no base
;  is valid here.

--boundary-example-1

Part 2:
Content-Type: Text/HTML; charset=US-ASCII
Content-ID: <foo4*foo5@bar2.net>
Content-Location: foo1.bar1   ; The Content-Base below applies to
                              ; this relative URI
Content-Base: http://www.ietf.cnri.reston.va.us/images/

--boundary-example-1--

The Content-Location Header

The Content-Location header specifies the URI that corresponds to the content of the body part in whose heading the header is placed. Its value CAN be an absolute or relative URI. Any URI or URL scheme may be used, but use of non-standardized URI or URL schemes might entail some risk that recipients cannot handle them correctly.

The Content-Location header can be used to indicate that the data sent under this heading is also retrievable, in identical format, through normal use of this URI. If used for this purpose, it must contain an absolute URI or be resolvable, through a Content-Base header, into an absolute URI. In this case, the information sent in the message can be seen as a cached version of the original data.

The header can also be used for data which is not available to some or all recipients of the message, for example if the header refers to an object which is only retrievable using this URI in a restricted domain, such as within a company-internal web space. The header can even contain a fictious URI and need in that case not be globally unique.

Example:

Content-Type: Multipart/related; boundary="boundary-example-1";
                 type=Text/HTML

   --boundary-example-1

   Part 1:
   Content-Type: Text/HTML; charset=US-ASCII

   ... ... <IMG SRC="fiction1/fiction2"> ... ...

   --boundary-example-1

   Part 2:
   Content-Type: Text/HTML; charset=US-ASCII
   Content-Location: fiction1/fiction2

   --boundary-example-1--

Encoding of URIs in e-mail headers

Since MIME header fields have a limited length and URIs can get quite long, these lines may have to be folded. If such folding is done, the algorithm defined in N. Freed and Keith Moore: "Definition of the URL MIME External-Body Access-Type" section 3.1 should be employed.

Base URIs for resolution of relative URIs

Relative URIs inside contents of MIME body parts are resolved relative to a base URI. In order to determine this base URI, the first-applicable method in the following list applies.

When the methods above do not yield an absolute URI the procedure in section 8.2 for matching relative URIs MUST be followed.

Sending documents without linked objects

If a document, such as an HTML object, is sent without other objects, to which it is linked, it MAY be sent as a Text/HTML body part by itself. In this case, multipart/related need not be used.

Such a document may either not include any links, or contain links which the recipient resolves via ordinary net look up, or contain links which the recipient cannot resolve.

Inclusion of links which the recipient has to look up through the net may not work for some recipients, since all e-mail recipients do not have full internet connectivity. Also, such links may work for the sender but not for the recipient, for example when the link refers to an URI within a company-internal network not accessible from outside the company.

Note that documents with links that the recipient cannot resolve MAY be sent, although this is discouraged. For example, two persons developing a new HTML page may exchange incomplete versions.

Examples

Example of a HTML body without included linked objects

The first example is the simplest form of an HTML email message. This is not an aggregate HTML object, but simply a message with a single HTML body part. This message contains a hyperlink but does not provide the ability to resolve the hyperlink. To resolve the hyperlink the receiving client would need either IP access to the Internet, or an electronic mail web gateway.

From: foo1@bar.net
To: foo2@bar.net
Subject: A simple example
Mime-Version: 1.0
Content-Type: Text/HTML; charset=US-ASCII

<HTML>
<head></head>
<body>
<h1>Hi there!</h1>
An example of an HTML message.<p>
Try clicking <a href="http://www.resnova.com/">here.</a><p>
</body></HTML>

Example with absolute URIs to an embedded GIF picture

From: foo1@bar.net
To: foo2@bar.net
Subject: A simple example
Mime-Version: 1.0
Content-Type: Multipart/related; boundary="boundary-example-1";
              type=Text/HTML; start=foo3*foo1@bar.net

--boundary-example-1
   Content-Type: Text/HTML;charset=US-ASCII
   Content-ID: <foo3*foo1@bar.net>

   ... text of the HTML document, which might contain a hyperlink
   to the other body part, for example through a statement such as:
   <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif"
    ALT="IETF logo">

--boundary-example-1
   Content-Location:
         http://www.ietf.cnri.reston.va.us/images/ietflogo.gif
   Content-Type: IMAGE/GIF
   Content-Transfer-Encoding: BASE64

   R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
   NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
   etc...

--boundary-example-1--

Example with relative URIs to an embedded GIF picture

From: foo1@bar.net
To: foo2@bar.net
Subject: A simple example
Mime-Version: 1.0
Content-Base: http://www.ietf.cnri.reston.va.us
Content-Type: Multipart/related; boundary="boundary-example-1";
              type=Text/HTML

--boundary-example-1
   Content-Type: Text/HTML; charset=ISO-8859-1
   Content-Transfer-Encoding: QUOTED-PRINTABLE

   ... text of the HTML document, which might contain a hyperlink
   to the other body part, for example through a statement such as:
   <IMG SRC="/images/ietflogo.gif" ALT="IETF logo">
   Example of a copyright sign encoded with Quoted-Printable: =A9
   Example of a copyright sign mapped onto HTML markup: &#168;

--boundary-example-1
   Content-Location: /images/ietflogo.gif
   Content-Type: IMAGE/GIF
   Content-Transfer-Encoding: BASE64

   R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
   NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
   etc...

--boundary-example-1--

Example using CID URL and Content-ID header to an embedded GIF picture

From: foo1@bar.net
To: foo2@bar.net
Subject: A simple example
Mime-Version: 1.0
Content-Type: Multipart/related; boundary="boundary-example-1";
              type=Text/HTML

--boundary-example-1
   Content-Type: Text/HTML; charset=US-ASCII

   ... text of the HTML document, which might contain a hyperlink
   to the other body part, for example through a statement such as:
   <IMG SRC="cid:foo4*foo1@bar.net" ALT="IETF logo">

--boundary-example-1
   Content-ID: <foo4*foo1@bar.net>
   Content-Type: IMAGE/GIF
   Content-Transfer-Encoding: BASE64

   R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
   NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
   etc...

--boundary-example-1--

Content-Disposition header

Note the specification in Edward Levinson: "The MIME Multipart/Related Content-Type" on the relations between Content- Disposition and multipart/related.

Character encoding issues and end-of-line issues

For the encoding of characters in HTML documents and other text documents into a MIME-compatible octet stream, the following mechanisms are relevant:

The above mechanisms are well defined and documented, and therefore not further explained here. In sending a message, all the above mentioned mechanisms MAY be used, and any mixture of them MAY occur when sending the document via e-mail. Receiving mail user agents (together with any Web browser they may use to display the document) MUST be capable of handling any combinations of these mechanisms.

Also note that:

Note that this might cause problems with integrity checks based on checksums, which might not be preserved when moving a document from the HTTP to the MIME environment. If a document has to be converted in such a way that a checksum integrity check becomes invalid, then this integrity check header SHOULD be removed from the document.

Other sources of problems are Content-Encoding used in HTTP but not allowed in MIME, and charsets that are not able to represent line breaks as CRLF. A good overview of the differences between HTTP and MIME with regards to "Content-Type: Text" can be found in T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext Transfer Protocol -- HTTP/1.0, appendix C.

If the original document has line breaks in the canonical form (CRLF), then the document SHOULD remain unconverted so that integrity check sums are not invalidated.

A provider of HTML documents who wants his documents to be transferable via both HTTP and SMTP without invalidating checksum integrity checks, should always provide original documents in the canonical form with CRLF for line breaks.

Some transport mechanisms may specify a default "charset" parameter if none is supplied [HTTP, MIME1]. Because the default differs for different mechanisms, when HTML is transferred through mail, the charset parameter SHOULD be included, rather than relying on the default.

Security Considerations

Some Security Considerations include the potential to mail someone an object, and claim that it is represented by a particular URI (by giving it a Content-Location header). There can be no assurance that a WWW request for that same URI would normally result in that same object. It might be unsuitable to cache the data in such a way that the cached data can be used for retrieval of this URI from other messages or message parts than those included in the same message as the Content-Location header. Because of this problem, receiving User Agents SHOULD not cache this data in the same way that data that was retrieved through an HTTP or FTP request might be cached.

URLs, especially File URLs, may in their name contain company- internal information, which may then inadvertently be revealed to recipients of documents containing such URLs.

One way of implementing messages with linked body parts is to handle the linked body parts in a combined mail and WWW proxy server. The mail client is only given the start body part, which it passes to a web browser. This web browser requests the linked parts from the proxy server. If this method is used, and if the combined server is used by more than one user, then methods must be employed to ensure that body parts of a message to one person is not retrievable by another person. Use of passwords (also known as tickets or magic cookies) is one way of achieving this. Note that some caching WWW proxy servers may not distinguish between cached objects from e-mail and HTTP, which may be a security risk.

In addition, by allowing people to mail aggregate objects, we are opening the door to other potential security problems that until now were only problems for WWW users. For example, some HTML documents now either themselves contain executable content (JavaScript) or contain links to executable content (The "INSERT" specification, Java). It would be exceedingly dangerous for a receiving User Agent to execute content received through a mail message without careful attention to restrictions on the capabilities of that executable content.

Some WWW applications hide passwords and tickets (access tokens to information which may not be available to anyone) and other sensitive information in hidden fields in the web documents or in on-the-fly constructed URLs. If a person gets such a document, and forwards it via e-mail, the person may inadvertently disclose sensitive information.

Acknowledgments

Harald T. Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst, Lewis Geer, Roy Fielding, Al Gilman, Paul Hoffman, Richard W. Jesmajian, Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin Nicol, Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski, Steve Zilles and several other people have helped us with preparing this document. I alone take responsibility for any errors which may still be in the document.

References (BOILERPLATE)

This RFC contained boilerplate in this section which has been moved to the RFC2223-compliant unnumbered section "References."

Author's Address (BOILERPLATE)

This RFC contained boilerplate in this section which has been moved to the RFC2223-compliant unnumbered section "Author's Address."

References

  1. S. Dorner: "Communicating Presentation Information in Internet Messages: The Content-Disposition Header" ( ), June 1995
  2. R. Braden (editor): "Requirements for Internet Hosts -- Application and Support", STD-3 ( ), October 1989
  3. M. Duerst: "Internationalization of the Hypertext Markup Language" ( ), January 1997
  4. T. Berners-Lee, D. Connolly: "Hypertext Markup Language - 2.0" ( ), November 1995
  5. T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext Transfer Protocol -- HTTP/1.0 ( ), May 1996
  6. R. Rivest: "The MD5 Message-Digest Algorithm" ( ), April 1992
  7. E. Levinson: "Content-ID and Message-ID Uniform Resource Locators" ( ), February 1997
  8. N. Freed & N. Borenstein: "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bedies" ( ), November 1996
  9. N. Borenstein & N. Freed: "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies" ( ), September 1993
  10. N. Borenstein & N. Freed: "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types" ( ), November 1996
  11. M.R. Horton, R. Adams: "Standard for interchange of USENET messages" ( ), December 1987
  12. R. and Meehan, J.: "Portable Document Format Reference Manual, Version 1.1", Adboe Systems Inc
  13. Edward Levinson: "The MIME Multipart/Related Content-Type" ( ), February 1997
  14. R. Fielding: "Relative Uniform Resource Locators" ( ), June 1995
  15. D. Crocker: "Standard for the format of ARPA Internet text messages." ( , ), August 1982
  16. ISO 8879. Information Processing -- Text and Office -Standard Generalized Markup Language (SGML), <URL:http://www.iso.ch/cate/d16387.html , 1986
  17. J. Postel: "Simple Mail Transfer Protocol" ( , ), August 1982
  18. T. Berners-Lee, L. Masinter, M. McCahill: "Uniform Resource Locators (URL)" ( ), December 1994
  19. N. Freed and Keith Moore: "Definition of the URL MIME External-Body Access-Type" ( ), October 1996