2 URI Codec: Encoding and Decoding URIs
The
net/uri-codec module
provides utilities for encoding and decoding strings using the URI
encoding rules given in RFC 2396 [
RFC2396], and to encode and
decode name/value pairs using the
application/x-www-form-urlencoded mimetype given the in HTML 4.0
specification. There are minor differences between the two encodings.
The URI encoding uses allows a few characters to be represented as-is:
a through z, A through Z,
0-9, -, _, .,
!, ~, *, ', ( and
). The remaining characters are encoded as
%‹xx›, where ‹xx› is the two-character hex
representation of the integer value of the character (where the
mapping character–integer is determined by US-ASCII if the integer is
less than 128).
The encoding, in line with RFC 2396’s recommendation, represents a
character as-is, if possible. The decoding allows any characters
to be represented by their hex values, and allows characters to be
incorrectly represented as-is.
The rules for the application/x-www-form-urlencoded mimetype
given in the HTML 4.0 spec are:
Control names and values are escaped. Space characters are
replaced by +, and then reserved characters are escaped as
described in RFC 1738, section 2.2: Non-alphanumeric characters are
replaced by %‹xx› representing the ASCII code of
the character. Line breaks are represented as CRLF pairs:
%0D%0A. Note that RFC 2396 supersedes RFC 1738
[RFC1738].
The control names/values are listed in the order they appear
in the document. The name is separated from the value by =
and name/value pairs are separated from each other by either
; or &. When encoding, ; is used as
the separator by default. When decoding, both ; and
& are parsed as separators by default.
These rules differs slightly from the straight encoding in RFC 2396 in
that + is allowed, and it represents a space. The
net/uri-codec library follows this convention,
encoding a space as + and decoding + as a space.
In addtion, since there appear to be some brain-dead decoders on the
web, the library also encodes !, ~, ',
(, and ) using their hex representation, which is
the same choice as made by the Java’s URLEncoder.
2.1 Functions
Encode a string using the URI encoding rules.
Decode a string using the URI decoding rules.
Encodes a string according to the rules in [
RFC3986] for path segments.
Decodes a string according to the rules in [
RFC3986] for path segments.
Encodes a string according to the rules in [
RFC3986] for the userinfo field.
Decodes a string according to the rules in [
RFC3986] for the userinfo field.
Encode a string using the application/x-www-form-urlencoded
encoding rules. The result string contains no non-ASCII characters.
Decode a string encoded using the
application/x-www-form-urlencoded encoding rules.
Encode an association list using the
application/x-www-form-urlencoded encoding rules.
The current-alist-separator-mode parameter determines the
separator used in the result.
Decode a string encoded using the
application/x-www-form-urlencoded encoding rules into an
association list. All keys are case-folded for conversion to symbols.
The current-alist-separator-mode parameter determines the way
that separators are parsed in the input.
The default value is 'amp-or-semi, which means that both
& and ; are treated as separators when parsing,
and & is used as a separator when encoding. The 'semi-or-amp
mode is similar, but ; is used when encoding. The other modes
use/recognize only one of the separators.
2.2 URI Codec Unit
uri-codec@ and uri-codec^ are deprecated.
They exist for backward-compatibility and will likely be removed in
the future. New code should use the net/uri-codec module.
2.3 URI Codec Signature
Includes everything exported by the net/uri-codec module.