Network Working Group E. Nebel
Request For Comments: 1867 L. Masinter
Category: Experimental Xerox Corporation
November 1995
Form-based File Upload in HTML
Status of this Memo
This memo defines an Experimental Protocol for the Internet
community. This memo does not specify an Internet standard of any
kind. Discussion and suggestions for improvement are requested.
Distribution of this memo is unlimited.
1. Abstract
Currently, HTML forms allow the producer of the form to request
information from the user reading the form. These forms have proven
useful in a wide variety of applications in which input from the user
is necessary. However, this capability is limited because HTML forms
don't provide a way to ask the user to submit files of data. Service
providers who need to get files from the user have had to implement
custom user applications. (Examples of these custom browsers have
appeared on the www-talk mailing list.) Since file-upload is a
feature that will benefit many applications, this proposes an
extension to HTML to allow information providers to express file
upload requests uniformly, and a MIME compatible representation for
file upload responses. This also includes a description of a
backward compatibility strategy that allows new servers to interact
with the current HTML user agents.
The proposal is independent of which version of HTML it becomes a
part.
2. HTML forms with file submission
The current HTML specification defines eight possible values for the
attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE,
PASSWORD, RADIO, RESET, SUBMIT, TEXT.
In addition, it defines the default ENCTYPE attribute of the FORM
element using the POST METHOD to have the default value
"application/x-www-form-urlencoded".
Nebel & Masinter Experimental [Page 1]
RFC 1867 Form-based File Upload in HTML November 1995
This proposal makes two changes to HTML:
1) Add a FILE option for the TYPE attribute of INPUT.
2) Allow an ACCEPT attribute for INPUT tag, which is a list of
media types or type patterns allowed for the input.
In addition, it defines a new MIME media type, multipart/form-data,
and specifies the behavior of HTML user agents when interpreting a
form with ENCTYPE="multipart/form-data" and/or
tags.
These changes might be considered independently, but are all
necessary for reasonable file upload.
The author of an HTML form who wants to request one or more files
from a user would write (for example):
The change to the HTML DTD is to add one item to the entity
"InputType". In addition, it is proposed that the INPUT tag have an
ACCEPT attribute, which is a list of comma-separated media types.
... (other elements) ...
Nebel & Masinter Experimental [Page 2]
RFC 1867 Form-based File Upload in HTML November 1995
... (other elements) ...
3. Suggested implementation
While user agents that interpret HTML have wide leeway to choose the
most appropriate mechanism for their context, this section suggests
how one class of user agent, WWW browsers, might implement file
upload.
3.1 Display of FILE widget
When a INPUT tag of type FILE is encountered, the browser might show
a display of (previously selected) file names, and a "Browse" button
or selection method. Selecting the "Browse" button would cause the
browser to enter into a file selection mode appropriate for the
platform. Window-based browsers might pop up a file selection window,
for example. In such a file selection dialog, the user would have the
option of replacing a current selection, adding a new file selection,
etc. Browser implementors might choose let the list of file names be
manually edited.
If an ACCEPT attribute is present, the browser might constrain the
file patterns prompted for to match those with the corresponding
appropriate file extensions for the platform.
3.2 Action on submit
When the user completes the form, and selects the SUBMIT element, the
browser should send the form data and the content of the selected
files. The encoding type application/x-www-form-urlencoded is
inefficient for sending large quantities of binary data or text
containing non-ASCII characters. Thus, a new media type,
multipart/form-data, is proposed as a way of efficiently sending the
values associated with a filled-out form from client to server.
3.3 use of multipart/form-data
The definition of multipart/form-data is included in section 7. A
boundary is selected that does not occur in any of the data. (This
selection is sometimes done probabilisticly.) Each field of the form
is sent, in the order in which it occurs in the form, as a part of
the multipart stream. Each part identifies the INPUT name within the
original HTML form. Each part should be labelled with an appropriate
content-type if the media type is known (e.g., inferred from the file
extension or operating system typing information) or as
application/octet-stream.
Nebel & Masinter Experimental [Page 3]
RFC 1867 Form-based File Upload in HTML November 1995
If multiple files are selected, they should be transferred together
using the multipart/mixed format.
While the HTTP protocol can transport arbitrary BINARY data, the
default for mail transport (e.g., if the ACTION is a "mailto:" URL)
is the 7BIT encoding. The value supplied for a part may need to be
encoded and the "content-transfer-encoding" header supplied if the
value does not conform to the default encoding. [See section 5 of
RFC 1521 for more details.]
The original local file name may be supplied as well, either as a
'filename' parameter either of the 'content-disposition: form-data'
header or in the case of multiple files in a 'content-disposition:
file' header of the subpart. The client application should make best
effort to supply the file name; if the file name of the client's
operating system is not in US-ASCII, the file name might be
approximated or encoded using the method of RFC 1522. This is a
convenience for those cases where, for example, the uploaded files
might contain references to each other, e.g., a TeX file and its .sty
auxiliary style description.
On the server end, the ACTION might point to a HTTP URL that
implements the forms action via CGI. In such a case, the CGI program
would note that the content-type is multipart/form-data, parse the
various fields (checking for validity, writing the file data to local
files for subsequent processing, etc.).
3.4 Interpretation of other attributes
The VALUE attribute might be used with tags for a
default file name. This use is probably platform dependent. It might
be useful, however, in sequences of more than one transaction, e.g.,
to avoid having the user prompted for the same file name over and
over again.
The SIZE attribute might be specified using SIZE=width,height, where
width is some default for file name width, while height is the
expected size showing the list of selected files. For example, this
would be useful for forms designers who expect to get several files
and who would like to show a multiline file input field in the
browser (with a "browse" button beside it, hopefully). It would be
useful to show a one line text field when no height is specified
(when the forms designer expects one file, only) and to show a
multiline text area with scrollbars when the height is greater than 1
(when the forms designer expects multiple files).
Nebel & Masinter Experimental [Page 4]
RFC 1867 Form-based File Upload in HTML November 1995
4. Backward compatibility issues
While not necessary for successful adoption of an enhancement to the
current WWW form mechanism, it is useful to also plan for a migration
strategy: users with older browsers can still participate in file
upload dialogs, using a helper application. Most current web browers,
when given , will treat it as and
give the user a text box. The user can type in a file name into this
text box. In addition, current browsers seem to ignore the ENCTYPE
parameter in the
and the user types "Joe Blow" in the name field, and selects a text
file "file1.txt" for the answer to 'What files are you sending?'
The client might send back the following data:
Content-type: multipart/form-data, boundary=AaB03x
--AaB03x
content-disposition: form-data; name="field1"
Joe Blow
--AaB03x
content-disposition: form-data; name="pics"; filename="file1.txt"
Content-Type: text/plain
... contents of file1.txt ...
--AaB03x--
If the user also indicated an image file "file2.gif" for the answer
to 'What files are you sending?', the client might client might send
back the following data:
Content-type: multipart/form-data, boundary=AaB03x
--AaB03x
content-disposition: form-data; name="field1"
Joe Blow
--AaB03x
content-disposition: form-data; name="pics"
Content-type: multipart/mixed, boundary=BbC04y
--BbC04y
Content-disposition: attachment; filename="file1.txt"
Nebel & Masinter Experimental [Page 9]
RFC 1867 Form-based File Upload in HTML November 1995
Content-Type: text/plain
... contents of file1.txt ...
--BbC04y
Content-disposition: attachment; filename="file2.gif"
Content-type: image/gif
Content-Transfer-Encoding: binary
...contents of file2.gif...
--BbC04y--
--AaB03x--
7. Registration of multipart/form-data
The media-type multipart/form-data follows the rules of all multipart
MIME data streams as outlined in RFC 1521. It is intended for use in
returning the data that comes about from filling out a form. In a
form (in HTML, although other applications may also use forms), there
are a series of fields to be supplied by the user who fills out the
form. Each field has a name. Within a given form, the names are
unique.
multipart/form-data contains a series of parts. Each part is expected
to contain a content-disposition header where the value is "form-
data" and a name attribute specifies the field name within the form,
e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is
the field name corresponding to that field. Field names originally in
non-ASCII character sets may be encoded using the method outlined in
RFC 1522.
As with all multipart MIME types, each part has an optional Content-
Type which defaults to text/plain. If the contents of a file are
returned via filling out a form, then the file input is identified as
application/octet-stream or the appropriate media type, if known. If
multiple files are to be returned as the result of a single form
entry, they can be returned as multipart/mixed embedded within the
multipart/form-data.
Each part may be encoded and the "content-transfer-encoding" header
supplied if the value of that part does not conform to the default
encoding.
File inputs may also identify the file name. The file name may be
described using the 'filename' parameter of the "content-disposition"
header. This is not required, but is strongly recommended in any case
where the original filename is known. This is useful or necessary in
many applications.
Nebel & Masinter Experimental [Page 10]
RFC 1867 Form-based File Upload in HTML November 1995
8. Security Considerations
It is important that a user agent not send any file that the user has
not explicitly asked to be sent. Thus, HTML interpreting agents are
expected to confirm any default file names that might be suggested
with . Never have any hidden fields be
able to specify any file.
This proposal does not contain a mechanism for encryption of the
data; this should be handled by whatever other mechanisms are in
place for secure transmission of data, whether via secure HTTP, or by
security provided by MOSS (described in RFC 1848).
Once the file is uploaded, it is up to the receiver to process and
store the file appropriately.
9. Conclusion
The suggested implementation gives the client a lot of flexibility in
the number and types of files it can send to the server, it gives the
server control of the decision to accept the files, and it gives
servers a chance to interact with browsers which do not support INPUT
TYPE "file".
The change to the HTML DTD is very simple, but very powerful. It
enables a much greater variety of services to be implemented via the
World-Wide Web than is currently possible due to the lack of a file
submission facility. This would be an extremely valuable addition to
the capabilities of the World-Wide Web.
Nebel & Masinter Experimental [Page 11]
RFC 1867 Form-based File Upload in HTML November 1995
Authors' Addresses
Larry Masinter
Xerox Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, CA 94304
Phone: (415) 812-4365
Fax: (415) 812-4333
EMail: masinter@parc.xerox.com
Ernesto Nebel
XSoft, Xerox Corporation
10875 Rancho Bernardo Road, Suite 200
San Diego, CA 92127-2116
Phone: (619) 676-7817
Fax: (619) 676-7865
EMail: nebel@xsoft.sd.xerox.com
Nebel & Masinter Experimental [Page 12]
RFC 1867 Form-based File Upload in HTML November 1995
A. Media type registration for multipart/form-data
Media Type name:
multipart
Media subtype name:
form-data
Required parameters:
none
Optional parameters:
none
Encoding considerations:
No additional considerations other than as for other multipart types.
Published specification:
RFC 1867
Security Considerations
The multipart/form-data type introduces no new security
considerations beyond what might occur with any of the enclosed
parts.
References
[RFC 1521] MIME (Multipurpose Internet Mail Extensions) Part One:
Mechanisms for Specifying and Describing the Format of
Internet Message Bodies. N. Borenstein & N. Freed.
September 1993.
[RFC 1522] MIME (Multipurpose Internet Mail Extensions) Part Two:
Message Header Extensions for Non-ASCII Text. K. Moore.
September 1993.
[RFC 1806] Communicating Presentation Information in Internet
Messages: The Content-Disposition Header. R. Troost & S.
Dorner, June 1995.
Nebel & Masinter Experimental [Page 13]