4.5 Byte Strings
Bytes and Byte Strings in The Racket Guide introduces byte strings.
A byte string is a fixed-length array of bytes. A byte is an exact integer between 0 and 255 inclusive.
A byte string can be mutable or immutable. When an immutable byte string is provided to a procedure like bytes-set!, the exn:fail:contract exception is raised. Byte-string constants generated by the default reader (see Reading Strings) are immutable, and they are interned in read-syntax mode. Use immutable? to check whether a byte string is immutable.
Two byte strings are equal? when they have the same length and contain the same sequence of bytes.
A byte string can be used as a single-valued sequence (see Sequences). The bytes of the string serve as elements of the sequence. See also in-bytes.
See Reading Strings for information on reading byte strings and Printing Strings for information on printing byte strings.
See also: immutable?.
4.5.1 Byte String Constructors, Selectors, and Mutators
procedure
(make-bytes k [b]) → bytes?
k : exact-nonnegative-integer? b : byte? = 0
> (make-bytes 5 65) #"AAAAA"
> (bytes 65 112 112 108 101) #"Apple"
procedure
(bytes->immutable-bytes bstr) → (and/c bytes? immutable?)
bstr : bytes?
> (bytes->immutable-bytes (bytes 65 65 65)) #"AAA"
> (define b (bytes->immutable-bytes (make-bytes 5 65))) > (bytes->immutable-bytes b) #"AAAAA"
> (eq? (bytes->immutable-bytes b) b) #t
procedure
(bytes-length bstr) → exact-nonnegative-integer?
bstr : bytes?
> (bytes-length #"Apple") 5
procedure
bstr : bytes? k : exact-nonnegative-integer?
> (bytes-ref #"Apple" 0) 65
procedure
(bytes-set! bstr k b) → void?
bstr : (and/c bytes? (not/c immutable?)) k : exact-nonnegative-integer? b : byte?
> (define s (bytes 65 112 112 108 101)) > (bytes-set! s 4 121) > s #"Apply"
procedure
bstr : bytes? start : exact-nonnegative-integer? end : exact-nonnegative-integer? = (bytes-length str)
procedure
(bytes-copy bstr) → bytes?
bstr : bytes?
procedure
(bytes-copy! dest dest-start src [ src-start src-end]) → void? dest : (and/c bytes? (not/c immutable?)) dest-start : exact-nonnegative-integer? src : bytes? src-start : exact-nonnegative-integer? = 0 src-end : exact-nonnegative-integer? = (bytes-length src)
> (define s (bytes 65 112 112 108 101)) > (bytes-copy! s 4 #"y") > (bytes-copy! s 0 s 3 4) > s #"lpply"
procedure
(bytes-fill! dest b) → void?
dest : (and/c bytes? (not/c immutable?)) b : byte?
> (define s (bytes 65 112 112 108 101)) > (bytes-fill! s 113) > s #"qqqqq"
procedure
(bytes-append bstr ...) → bytes?
bstr : bytes?
> (bytes-append #"Apple" #"Banana") #"AppleBanana"
procedure
(bytes->list bstr) → (listof byte?)
bstr : bytes?
> (bytes->list #"Apple") '(65 112 112 108 101)
procedure
(list->bytes lst) → bytes?
lst : (listof byte?)
> (list->bytes (list 65 112 112 108 101)) #"Apple"
procedure
(make-shared-bytes k [b]) → bytes?
k : exact-nonnegative-integer? b : byte? = 0
> (make-shared-bytes 5 65) #"AAAAA"
procedure
(shared-bytes b ...) → bytes?
b : byte?
> (shared-bytes 65 112 112 108 101) #"Apple"
4.5.2 Byte String Comparisons
Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.
Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.
Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.
4.5.3 Bytes to/from Characters, Decoding and Encoding
procedure
(bytes->string/utf-8 bstr [err-char start end]) → string?
bstr : bytes? err-char : (or/c #f char?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (bytes-length bstr)
> (bytes->string/utf-8 (bytes 195 167 195 176 195 182 194 163)) "çðö£"
procedure
(bytes->string/locale bstr [ err-char start end]) → string? bstr : bytes? err-char : (or/c #f char?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (bytes-length bstr)
procedure
(bytes->string/latin-1 bstr [ err-char start end]) → string? bstr : bytes? err-char : (or/c #f char?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (bytes-length bstr)
> (bytes->string/latin-1 (bytes 254 211 209 165)) "þÓÑ¥"
procedure
(string->bytes/utf-8 str [err-byte start end]) → bytes?
str : string? err-byte : (or/c #f byte?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (string-length str)
> (define b (bytes->string/utf-8 (bytes 195 167 195 176 195 182 194 163))) > (string->bytes/utf-8 b) #"\303\247\303\260\303\266\302\243"
> (bytes->string/utf-8 (string->bytes/utf-8 b)) "çðö£"
procedure
(string->bytes/locale str [err-byte start end]) → bytes?
str : string? err-byte : (or/c #f byte?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (string-length str)
procedure
(string->bytes/latin-1 str [ err-byte start end]) → bytes? str : string? err-byte : (or/c #f byte?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (string-length str)
> (define b (bytes->string/latin-1 (bytes 254 211 209 165))) > (string->bytes/latin-1 b) #"\376\323\321\245"
> (bytes->string/latin-1 (string->bytes/latin-1 b)) "þÓÑ¥"
procedure
(string-utf-8-length str [start end]) → exact-nonnegative-integer?
str : string? start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (string-length str)
> (string-utf-8-length (bytes->string/utf-8 (bytes 195 167 195 176 195 182 194 163))) 8
> (string-utf-8-length "hello") 5
procedure
(bytes-utf-8-length bstr [err-char start end])
→ (or/c exact-nonnegative-integer? #f) bstr : bytes? err-char : (or/c #f char?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (bytes-length bstr)
> (bytes-utf-8-length (bytes 195 167 195 176 195 182 194 163)) 4
> (bytes-utf-8-length (make-bytes 5 65)) 5
procedure
(bytes-utf-8-ref bstr [skip err-char start end]) → (or/c char? #f)
bstr : bytes? skip : exact-nonnegative-integer? = 0 err-char : (or/c #f char?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (bytes-length bstr)
> (bytes-utf-8-ref (bytes 195 167 195 176 195 182 194 163) 0) #\ç
> (bytes-utf-8-ref (bytes 195 167 195 176 195 182 194 163) 1) #\ð
> (bytes-utf-8-ref (bytes 195 167 195 176 195 182 194 163) 2) #\ö
> (bytes-utf-8-ref (bytes 65 66 67 68) 0) #\A
> (bytes-utf-8-ref (bytes 65 66 67 68) 1) #\B
> (bytes-utf-8-ref (bytes 65 66 67 68) 2) #\C
procedure
(bytes-utf-8-index bstr skip [ err-char start end]) → (or/c exact-nonnegative-integer? #f) bstr : bytes? skip : exact-nonnegative-integer? err-char : (or/c #f char?) = #f start : exact-nonnegative-integer? = 0 end : exact-nonnegative-integer? = (bytes-length bstr)
> (bytes-utf-8-index (bytes 195 167 195 176 195 182 194 163) 0) 0
> (bytes-utf-8-index (bytes 195 167 195 176 195 182 194 163) 1) 2
> (bytes-utf-8-index (bytes 195 167 195 176 195 182 194 163) 2) 4
> (bytes-utf-8-index (bytes 65 66 67 68) 0) 0
> (bytes-utf-8-index (bytes 65 66 67 68) 1) 1
> (bytes-utf-8-index (bytes 65 66 67 68) 2) 2
4.5.4 Bytes to Bytes Encoding Conversion
procedure
(bytes-open-converter from-name to-name)
→ (or/c bytes-converter? #f) from-name : string? to-name : string?
Certain encoding combinations are always available:
(bytes-open-converter "UTF-8" "UTF-8") —
the identity conversion, except that encoding errors in the input lead to a decoding failure. (bytes-open-converter "UTF-8-permissive" "UTF-8") —
the identity conversion, except that any input byte that is not part of a valid encoding sequence is effectively replaced by the UTF-8 encoding sequence for #\uFFFD. (This handling of invalid sequences is consistent with the interpretation of port bytes streams into characters; see Ports.) (bytes-open-converter "" "UTF-8") —
converts from the current locale’s default encoding (see Encodings and Locales) to UTF-8. (bytes-open-converter "UTF-8" "") —
converts from UTF-8 to the current locale’s default encoding (see Encodings and Locales). (bytes-open-converter "platform-UTF-8" "platform-UTF-16") —
converts UTF-8 to UTF-16 on Unix and Mac OS, where each UTF-16 code unit is a sequence of two bytes ordered by the current platform’s endianness. On Windows, the conversion is the same as (bytes-open-converter "WTF-8" "WTF-16") to support unpaired surrogate code units. (bytes-open-converter "platform-UTF-8-permissive" "platform-UTF-16") —
like (bytes-open-converter "platform-UTF-8" "platform-UTF-16"), but an input byte that is not part of a valid UTF-8 encoding sequence (or valid for the unpaired-surrogate extension on Windows) is effectively replaced with #\uFFFD. (bytes-open-converter "platform-UTF-16" "platform-UTF-8") —
converts UTF-16 (bytes ordered by the current platform’s endianness) to UTF-8 on Unix and Mac OS. On Windows, the conversion is the same as (bytes-open-converter "WTF-16" "WTF-8") to support unpaired surrogates. On Unix and Mac OS, surrogates are assumed to be paired: a pair of bytes with the bits #xD800 starts a surrogate pair, and the #x03FF bits are used from the pair and following pair (independent of the value of the #xDC00 bits). On all platforms, performance may be poor when decoding from an odd offset within an input byte string. (bytes-open-converter "WTF-8" "WTF-16") —
converts the WTF-8 [Sapin18] superset of UTF-8 to a superset of UTF-16 to support unpaired surrogate code units, where each UTF-16 code unit is a sequence of two bytes ordered by the current platform’s endianness. (bytes-open-converter "WTF-8-permissive" "WTF-16") —
like (bytes-open-converter "WTF-8" "WTF-16"), but an input byte that is not part of a valid WTF-8 encoding sequence is effectively replaced with #\uFFFD. (bytes-open-converter "WTF-16" "WTF-8") —
converts the WTF-16 [Sapin18] superset of UTF-16 to the WTF-8 superset of UTF-8. The input can include UTF-16 code units that are unpaired surrogates, and the corresponding output includes an encoding of each surrogate in a natural extension of UTF-8.
A newly opened byte converter is registered with the current custodian (see Custodians), so that the converter is closed when the custodian is shut down. A converter is not registered with a custodian (and does not need to be closed) if it is one of the guaranteed combinations not involving "" on Unix, or if it is any of the guaranteed combinations (including "") on Windows and Mac OS.
In the Racket software distributions for Windows, a suitable "iconv.dll" is included with "libmzschVERS.dll".
The set of available encodings and combinations varies by platform, depending on the iconv library that is installed; the from-name and to-name arguments are passed on to iconv_open. On Windows, "iconv.dll" or "libiconv.dll" must be in the same directory as "libmzschVERS.dll" (where VERS is a version number), in the user’s path, in the system directory, or in the current executable’s directory at run time, and the DLL must either supply _errno or link to "msvcrt.dll" for _errno; otherwise, only the guaranteed combinations are available.
Use bytes-convert with the result to convert byte strings.
Changed in version 7.9.0.17 of package base: Added built-in converters for "WTF-8", <"WTF-8-permissive", and "WTF-16".
procedure
(bytes-close-converter converter) → void
converter : bytes-converter?
procedure
(bytes-convert converter src-bstr [ src-start-pos src-end-pos dest-bstr dest-start-pos dest-end-pos])
→
(or/c bytes? exact-nonnegative-integer?) exact-nonnegative-integer? (or/c 'complete 'continues 'aborts 'error) converter : bytes-converter? src-bstr : bytes? src-start-pos : exact-nonnegative-integer? = 0
src-end-pos : exact-nonnegative-integer? = (bytes-length src-bstr) dest-bstr : (or/c bytes? #f) = #f dest-start-pos : exact-nonnegative-integer? = 0
dest-end-pos : (or/c exact-nonnegative-integer? #f) =
(and dest-bstr (bytes-length dest-bstr))
If dest-bstr is not #f, the converted bytes are written into dest-bstr from dest-start-pos to dest-end-pos. If dest-bstr is #f, then a newly allocated byte string holds the conversion results, and if dest-end-pos is not #f, the size of the result byte string is no more than (- dest-end-pos dest-start-pos).
The result of bytes-convert is three values:
result-bstr or dest-wrote-amt —
a byte string if dest-bstr is #f or not provided, or the number of bytes written into dest-bstr otherwise. src-read-amt —
the number of bytes successfully converted from src-bstr. 'complete, 'continues, 'aborts, or 'error —
indicates how conversion terminated: 'complete: The entire input was processed, and src-read-amt will be equal to (- src-end-pos src-start-pos).
'continues: Conversion stopped due to the limit on the result size or the space in dest-bstr; in this case, fewer than (- dest-end-pos dest-start-pos) bytes may be returned if more space is needed to process the next complete encoding sequence in src-bstr.
'aborts: The input stopped part-way through an encoding sequence, and more input bytes are necessary to continue. For example, if the last byte of input is 195 for a "UTF-8-permissive" decoding, the result is 'aborts, because another byte is needed to determine how to use the 195 byte.
'error: The bytes starting at (+ src-start-pos src-read-amt) bytes in src-bstr do not form a legal encoding sequence. This result is never produced for some encodings, where all byte sequences are valid encodings. For example, since "UTF-8-permissive" handles an invalid UTF-8 sequence by dropping characters or generating “?,” every byte sequence is effectively valid.
Applying a converter accumulates state in the converter (even when the third result of bytes-convert is 'complete). This state can affect both further processing of input and further generation of output, but only for conversions that involve “shift sequences” to change modes within a stream. To terminate an input sequence and reset the converter, use bytes-convert-end.
> (define convert (bytes-open-converter "UTF-8" "UTF-16")) > (bytes-convert convert (bytes 65 66 67 68))
#"\377\376A\0B\0C\0D\0"
4
'complete
> (bytes 195 167 195 176 195 182 194 163) #"\303\247\303\260\303\266\302\243"
> (bytes-convert convert (bytes 195 167 195 176 195 182 194 163))
#"\347\0\360\0\366\0\243\0"
8
'complete
> (bytes-close-converter convert)
procedure
(bytes-convert-end converter [ dest-bstr dest-start-pos dest-end-pos])
→
(or/c bytes? exact-nonnegative-integer?) (or/c 'complete 'continues) converter : bytes-converter? dest-bstr : (or/c bytes? #f) = #f dest-start-pos : exact-nonnegative-integer? = 0
dest-end-pos : (or/c exact-nonnegative-integer? #f) =
(and dest-bstr (bytes-length dest-bstr))
The result of bytes-convert-end is two values:
result-bstr or dest-wrote-amt —
a byte string if dest-bstr is #f or not provided, or the number of bytes written into dest-bstr otherwise. 'complete or 'continues —
indicates whether conversion completed. If 'complete, then an entire ending sequence was produced. If 'continues, then the conversion could not complete due to the limit on the result size or the space in dest-bstr, and the first result is either an empty byte string or 0.
procedure
(bytes-converter? v) → boolean?
v : any/c
> (bytes-converter? (bytes-open-converter "UTF-8" "UTF-16")) #t
> (bytes-converter? (bytes-open-converter "whacky" "not likely")) #f
> (define b (bytes-open-converter "UTF-8" "UTF-16")) > (bytes-close-converter b) > (bytes-converter? b) #t
procedure
4.5.5 Additional Byte String Functions
(require racket/bytes) | package: base |
procedure
(bytes-append* str ... strs) → bytes?
str : bytes? strs : (listof bytes?)
> (bytes-append* #"a" #"b" '(#"c" #"d")) #"abcd"
> (bytes-append* (cdr (append* (map (lambda (x) (list #", " x)) '(#"Alpha" #"Beta" #"Gamma"))))) #"Alpha, Beta, Gamma"
procedure
(bytes-join strs sep) → bytes?
strs : (listof bytes?) sep : bytes?
> (bytes-join '(#"one" #"two" #"three" #"four") #" potato ") #"one potato two potato three potato four"