3.4 Byte Strings
Bytes and Byte Strings in The Racket Guide introduces byte strings.
A byte string is a fixed-length array of bytes. A
byte is an exact integer between 0 and
255 inclusive.
A byte string can be
mutable or immutable. When an immutable byte
string is provided to a procedure like bytes-set!, the
exn:fail:contract exception is raised. Byte-string constants generated by the
default reader (see Reading Strings) are immutable.
Two byte strings are equal? when they have the same length
and contain the same sequence of bytes.
A byte string can be used as a single-valued sequence (see
Sequences). The bytes of the string serve as elements
of the sequence. See also in-bytes.
See Reading Strings
for information on reading
byte strings and Printing Strings
for information on printing byte strings.
See also: immutable?.
3.4.1 Byte String Constructors, Selectors, and Mutators
Returns #t if v
is a byte string, #f otherwise.
Returns a new mutable byte string of length k where each
position in the byte string is initialized with the byte b.
Returns a new mutable byte
string whose length is the number of provided bs, and whose
positions are initialized with the given bs.
Example: |
> (bytes 65 112 112 108 101) | #"Apple" |
|
Returns an immutable byte string with the same content
as bstr, returning bstr itself if bstr is
immutable.
Returns #t if v is
a byte (i.e., an exact integer between 0 and 255
inclusive), #f otherwise.
Returns the length of bstr.
Returns the character at position
k in
bstr.
The first position in the bytes cooresponds to
0, so the
position
k must be less than the length of the bytes,
otherwise the
exn:fail:contract exception is raised.
Changes the
character position
k in
bstr to
b. The first
position in the byte string cooresponds to
0, so the position
k must be less than the length of the bytes, otherwise the
exn:fail:contract exception is raised.
Returns
a new mutable byte string that is
(- end start) bytes long,
and that contains the same bytes as
bstr from
start
inclusive to
end exclusive. The
start and
end arguments must be less than or equal to the length of
bstr, and
end must be greater than or equal to
start, otherwise the
exn:fail:contract exception is raised.
Changes the bytes of
dest starting at position
dest-start to match the bytes in
src from
src-start (inclusive) to
src-end (exclusive). The
bytes strings
dest and
src can be the same byte
string, and in that case the destination region can overlap with the
source region; the destination bytes after the copy match the source
bytes from before the copy. If any of
dest-start,
src-start, or
src-end are out of range (taking into
account the sizes of the bytes strings and the source and destination
regions), the
exn:fail:contract exception is raised.
Changes dest so that every position in the
bytes is filled with b.
Returns a new mutable byte string
that is as long as the sum of the given
bstrs’ lengths, and
that contains the concatenated bytes of the given
bstrs. If
no
bstrs are provided, the result is a zero-length byte
string.
Returns a new
list of bytes corresponding to the content of
bstr. That is,
the length of the list is
(bytes-length bstr), and the
sequence of bytes in
bstr is the same sequence in the
result list.
Returns a new
mutable byte string whose content is the list of bytes in
lst.
That is, the length of the byte string is
(length lst), and
the sequence of bytes in
lst is the same sequence in
the result byte string.
Returns a new mutable byte string of length
k where each
position in the byte string is initialized with the byte
b.
For communication among
places, the new byte string is allocated in the
shared memory space.
Returns a new mutable byte
string whose length is the number of provided
bs, and whose
positions are initialized with the given
bs.
For communication among
places, the new byte string is allocated in the
shared memory space.
3.4.2 Byte String Comparisons
Returns
#t if all of the arguments are
eqv?.
Returns
#t if the arguments are lexicographically sorted
increasing, where individual bytes are ordered by
<,
#f otherwise.
Like
bytes<?, but checks whether the arguments are decreasing.
3.4.3 Bytes to/from Characters, Decoding and Encoding
Produces a string by decoding the
start to
end
substring of
bstr as a UTF-8 encoding of Unicode code
points. If
err-char is not
#f, then it is used for
bytes that fall in the range
128 to
255 but are
not part of a valid encoding sequence. (This rule is consistent with
reading characters from a port; see
Encodings and Locales for more
details.) If
err-char is
#f, and if the
start to
end substring of
bstr is not a
valid UTF-8 encoding overall, then the
exn:fail:contract exception is raised.
Produces a string by decoding the
start to
end substring
of
bstr using the current locale’s encoding (see also
Encodings and Locales). If
err-char is not
#f, it is used for each byte in
bstr that is not part
of a valid encoding; if
err-char is
#f, and if the
start to
end substring of
bstr is not a valid
encoding overall, then the
exn:fail:contract exception is raised.
Produces a string by decoding the
start to
end substring
of
bstr as a Latin-1 encoding of Unicode code points; i.e.,
each byte is translated directly to a character using
integer->char, so the decoding always succeeds.
The
err-char
argument is ignored, but present for consistency with the other
operations.
Produces a byte string by encoding the start to end
substring of str via UTF-8 (always succeeding). The
err-byte argument is ignored, but included for consistency with
the other operations.
Produces a string by encoding the
start to
end substring
of
str using the current locale’s encoding (see also
Encodings and Locales). If
err-byte is not
#f, it is used
for each character in
str that cannot be encoded for the
current locale; if
err-byte is
#f, and if the
start to
end substring of
str cannot be encoded,
then the
exn:fail:contract exception is raised.
Produces a string by encoding the
start to
end substring
of
str using Latin-1; i.e., each character is translated
directly to a byte using
char->integer. If
err-byte is
not
#f, it is used for each character in
str whose
value is greater than
255.
If
err-byte is
#f, and if the
start to
end substring of
str has a character
with a value greater than
255, then the
exn:fail:contract exception is raised.
Returns the length in bytes of the UTF-8 encoding of str’s
substring from start to end, but without actually
generating the encoded bytes.
Returns the length in characters of the UTF-8 decoding of
bstr’s substring from
start to
end, but without
actually generating the decoded characters. If
err-char is
#f and the substring is not a UTF-8 encoding overall, the
result is
#f. Otherwise,
err-char is used to resolve
decoding errors as in
bytes->string/utf-8.
Returns the
skipth character in the UTF-8 decoding of
bstr’s substring from
start to
end, but without
actually generating the other decoded characters. If the substring is
not a UTF-8 encoding up to the
skipth character (when
err-char is
#f), or if the substring decoding produces
fewer than
skip characters, the result is
#f. If
err-char is not
#f, it is used to resolve decoding
errors as in
bytes->string/utf-8.
Returns the offset in bytes into
bstr at which the
skipth
character’s encoding starts in the UTF-8 decoding of
bstr’s
substring from
start to
end (but without actually
generating the other decoded characters). The result is relative to
the start of
bstr, not to
start. If the substring is not
a UTF-8 encoding up to the
skipth character (when
err-char is
#f), or if the substring decoding produces
fewer than
skip characters, the result is
#f. If
err-char is not
#f, it is used to resolve decoding
errors as in
bytes->string/utf-8.
3.4.4 Bytes to Bytes Encoding Conversion
Produces a
byte converter to go from the encoding named by
from-name to the encoding named by
to-name. If the
requested conversion pair is not available,
#f is returned
instead of a converter.
Certain encoding combinations are always available:
(bytes-open-converter "UTF-8" "UTF-8") — the
identity conversion, except that encoding errors in the input lead
to a decoding failure.
(bytes-open-converter "UTF-8-permissive" "UTF-8") —
the identity conversion, except that
any input byte that is not part of a valid encoding sequence is
effectively replaced by the UTF-8 encoding sequence for
#\uFFFD. (This handling of invalid sequences is
consistent with the interpretation of port bytes streams into
characters; see Ports.)
(bytes-open-converter "" "UTF-8") — converts from
the current locale’s default encoding (see Encodings and Locales)
to UTF-8.
(bytes-open-converter "UTF-8" "") — converts from
UTF-8 to the current locale’s default encoding (see
Encodings and Locales).
(bytes-open-converter "platform-UTF-8" "platform-UTF-16")
— converts UTF-8 to UTF-16 on Unix and Mac OS X, where each UTF-16
code unit is a sequence of two bytes ordered by the current
platform’s endianness. On Windows, the input can include
encodings that are not valid UTF-8, but which naturally extend the
UTF-8 encoding to support unpaired surrogate code units, and the
output is a sequence of UTF-16 code units (as little-endian byte
pairs), potentially including unpaired surrogates.
(bytes-open-converter "platform-UTF-8-permissive" "platform-UTF-16")
— like (bytes-open-converter "platform-UTF-8" "platform-UTF-16"),
but an input byte that is not part of a valid UTF-8 encoding
sequence (or valid for the unpaired-surrogate extension on
Windows) is effectively replaced with (char->integer #\?).
(bytes-open-converter "platform-UTF-16" "platform-UTF-8")
— converts UTF-16 (bytes orderd by the current platform’s
endianness) to UTF-8 on Unix and Mac OS X. On Windows, the input can
include UTF-16 code units that are unpaired surrogates, and the
corresponding output includes an encoding of each surrogate in a
natural extension of UTF-8. On Unix and Mac OS X, surrogates are
assumed to be paired: a pair of bytes with the bits 55296
starts a surrogate pair, and the 1023 bits are used from
the pair and following pair (independent of the value of the
56320 bits). On all platforms, performance may be poor
when decoding from an odd offset within an input byte string.
A newly opened byte converter is registered with the current custodian
(see Custodians), so that the converter is closed when
the custodian is shut down. A converter is not registered with a
custodian (and does not need to be closed) if it is one of the
guaranteed combinations not involving "" on Unix, or if it
is any of the guaranteed combinations (including "") on
Windows and Mac OS X.
In the Racket software distributions for Windows, a suitable
"iconv.dll" is included with "libmzschVERS.dll".
The set of available encodings and combinations varies by platform,
depending on the iconv library that is installed; the
from-name and to-name arguments are passed on to
iconv_open. On Windows, "iconv.dll" or
"libiconv.dll" must be in the same directory as
"libmzschVERS.dll" (where VERS is a version
number), in the user’s path, in the system directory, or in the
current executable’s directory at run time, and the DLL must either
supply _errno or link to "msvcrt.dll" for _errno;
otherwise, only the guaranteed combinations are available.
Use bytes-convert with the result to convert byte strings.
Converts the bytes from src-start-pos to src-end-pos
in src-bstr.
If dest-bstr is not #f, the converted bytes are
written into dest-bstr from dest-start-pos to
dest-end-pos. If dest-bstr is #f, then a
newly allocated byte string holds the conversion results, and if
dest-end-pos is not #f, the size of the result byte
string is no more than (- dest-end-pos dest-start-pos).
The result of bytes-convert is three values:
result-bstr or dest-wrote-amt — a byte
string if dest-bstr is #f or not provided, or the
number of bytes written into dest-bstr otherwise.
src-read-amt — the number of bytes successfully converted
from src-bstr.
'complete, 'continues,
'aborts, or 'error — indicates
how conversion terminated:
'complete: The entire input was processed, and
src-read-amt will be equal to (- src-end-pos src-start-pos).
'continues: Conversion stopped due to the limit on
the result size or the space in dest-bstr; in this case,
fewer than (- dest-end-pos dest-start-pos) bytes may be
returned if more space is needed to process the next complete
encoding sequence in src-bstr.
'aborts: The input stopped part-way through an
encoding sequence, and more input bytes are necessary to continue.
For example, if the last byte of input is 195 for a
"UTF-8-permissive" decoding, the result is
'aborts, because another byte is needed to determine how to
use the 195 byte.
'error: The bytes starting at (+ src-start-pos src-read-amt) bytes in src-bstr do not form
a legal encoding sequence. This result is never produced for some
encodings, where all byte sequences are valid encodings. For
example, since "UTF-8-permissive" handles an invalid UTF-8
sequence by dropping characters or generating “?,” every byte
sequence is effectively valid.
Applying a converter accumulates state in the converter (even when the
third result of bytes-convert is 'complete). This
state can affect both further processing of input and further
generation of output, but only for conversions that involve “shift
sequences” to change modes within a stream. To terminate an input
sequence and reset the converter, use bytes-convert-end.
Like
bytes-convert, but instead of converting bytes, this
procedure generates an ending sequence for the conversion (sometimes
called a “shift sequence”), if any. Few encodings use shift
sequences, so this function will succeed with no output for most
encodings. In any case, successful output of a (possibly empty) shift
sequence resets the converter to its initial state.
The result of bytes-convert-end is two values:
result-bstr or dest-wrote-amt — a byte string if
dest-bstr is #f or not provided, or the number of
bytes written into dest-bstr otherwise.
'complete or 'continues —
indicates whether conversion completed. If 'complete, then
an entire ending sequence was produced. If 'continues, then
the conversion could not complete due to the limit on the result
size or the space in dest-bstr, and the first result is
either an empty byte string or 0.
Returns a string for the current locale’s encoding (i.e., the encoding
normally identified by
""). See also
system-language+country.
3.4.5 Additional Byte String Functions
Examples: |
> (bytes-append* #"a" #"b" '(#"c" #"d")) | #"abcd" | | #"Alpha, Beta, Gamma" |
|
Appends the byte strings in strs, inserting sep between
each pair of bytes in strs.
Example: |
> (bytes-join '(#"one" #"two" #"three" #"four") #" potato ") | #"one potato two potato three potato four" |
|