4.6 Characters

4

4.1	Equality
4.2	Booleans
4.3	Numbers
4.4	Strings
4.5	Byte Strings
4.6	Characters
4.7	Symbols
4.8	Regular Expressions
4.9	Keywords
4.10	Pairs and Lists
4.11	Mutable Pairs and Lists
4.12	Vectors
4.13	Stencil Vectors
4.14	Boxes
4.15	Hash Tables
4.16	Sequences and Streams
4.17	Dictionaries
4.18	Sets
4.19	Procedures
4.20	Void
4.21	Undefined

4.6

top contents ← prev up next →

4.6 Characters

Characters in The Racket Guide introduces characters.

Characters range over Unicode scalar values, which includes characters whose values range from #x0 to #x10FFFF, but not including #xD800 to #xDFFF. The scalar values are a subset of the Unicode code points.

Two characters are eqv? if they correspond to the same scalar value. For each scalar value less than 256, character values that are eqv? are also eq?. Characters produced by the default reader are interned in read-syntax mode.

See Reading Characters for information on reading characters and Printing Characters for information on printing characters.

Changed in version 6.1.1.8 of package base: Updated from Unicode 5.0.1 to Unicode 7.0.0.

4.6.1 Characters and Scalar Values

procedure
(char? v) → boolean?
v : any/c

Return #t if v is a character, #f otherwise.

procedure
(char->integer char) → exact-integer?
char : char?

Returns a character’s code-point number.

Example:

> (char->integer #\A)
65

procedure
(integer->char k) → char?
   k :
(and/c exact-integer?
       (or/c (integer-in 0 #xD7FF)
             (integer-in #xE000 #x10FFFF)))

Return the character whose code-point number is k. For k less than 256, the result is the same object for the same k.

Example:

> (integer->char 65)
#\A

procedure
(char-utf-8-length char) → (integer-in 1 6)
char : char?

Produces the same result as (bytes-length (string->bytes/utf-8 (string char))).

4.6.2 Character Comparisons

procedure
(char=? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Returns #t if all of the arguments are eqv?.

Examples:

> (char=? #\a #\a)
#t
> (char=? #\a #\A #\a)
#f

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char<? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Returns #t if the arguments are sorted increasing, where two characters are ordered by their scalar values, #f otherwise.

Examples:

> (char<? #\A #\a)
#t
> (char<? #\a #\A)
#f
> (char<? #\a #\b #\c)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char<=? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Like char<?, but checks whether the arguments are nondecreasing.

Examples:

> (char<=? #\A #\a)
#t
> (char<=? #\a #\A)
#f
> (char<=? #\a #\b #\b)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char>? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Like char<?, but checks whether the arguments are decreasing.

Examples:

> (char>? #\A #\a)
#f
> (char>? #\a #\A)
#t
> (char>? #\c #\b #\a)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char>=? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Like char<?, but checks whether the arguments are nonincreasing.

Examples:

> (char>=? #\A #\a)
#f
> (char>=? #\a #\A)
#t
> (char>=? #\c #\b #\b)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char-ci=? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Returns #t if all of the arguments are eqv? after locale-insensitive case-folding via char-foldcase.

Examples:

> (char-ci=? #\A #\a)
#t
> (char-ci=? #\a #\a #\a)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char-ci<? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Like char<?, but checks whether the arguments would be in increasing order if each was first case-folded using char-foldcase (which is locale-insensitive).

Examples:

> (char-ci<? #\A #\a)
#f
> (char-ci<? #\a #\b)
#t
> (char-ci<? #\a #\b #\c)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char-ci<=? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Like char-ci<?, but checks whether the arguments would be nondecreasing after case-folding.

Examples:

> (char-ci<=? #\A #\a)
#t
> (char-ci<=? #\a #\A)
#t
> (char-ci<=? #\a #\b #\b)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char-ci>? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Like char-ci<?, but checks whether the arguments would be decreasing after case-folding.

Examples:

> (char-ci>? #\A #\a)
#f
> (char-ci>? #\b #\A)
#t
> (char-ci>? #\c #\b #\a)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

procedure
(char-ci>=? char1 char2 ...) → boolean?
char1 : char?
char2 : char?

Like char-ci<?, but checks whether the arguments would be nonincreasing after case-folding.

Examples:

> (char-ci>=? #\A #\a)
#t
> (char-ci>=? #\a #\A)
#t
> (char-ci>=? #\c #\b #\b)
#t

Changed in version 7.0.0.13 of package base: Allow one argument, in addition to allowing two or more.

4.6.3 Classifications

procedure
(char-alphabetic? char) → boolean?
char : char?

Returns #t if char has the Unicode “Alphabetic” property.

procedure
(char-lower-case? char) → boolean?
char : char?

Returns #t if char has the Unicode “Lowercase” property.

procedure
(char-upper-case? char) → boolean?
char : char?

Returns #t if char has the Unicode “Uppercase” property.

procedure
(char-title-case? char) → boolean?
char : char?

Returns #t if char’s Unicode general category is Lt, #f otherwise.

procedure
(char-numeric? char) → boolean?
char : char?

Returns #t if char has a Unicode “Numeric_Type” property value that is not None.

procedure
(char-symbolic? char) → boolean?
char : char?

Returns #t if char’s Unicode general category is Sm, Sc, Sk, or So, #f otherwise.

procedure
(char-punctuation? char) → boolean?
char : char?

Returns #t if char’s Unicode general category is Pc, Pd, Ps, Pe, Pi, Pf, or Po, #f otherwise.

procedure
(char-graphic? char) → boolean?
char : char?

Returns #t if char’s Unicode general category is Ll, Lm, Lo, Lt, Lu, Nd, Nl, No, Mn, Mc, or Me, or if one of the following produces #t when applied to char: char-alphabetic?, char-numeric?, char-symbolic?, or char-punctuation?.

procedure
(char-whitespace? char) → boolean?
char : char?

Returns #t if char has the Unicode “White_Space” property.

procedure
(char-blank? char) → boolean?
char : char?

Returns #t if char’s Unicode general category is Zs or if char is #\tab. (These correspond to horizontal whitespace.)

procedure
(char-iso-control? char) → boolean?
char : char?

Return #t if char is between #\nul and #\u001F inclusive or #\rubout and #\u009F inclusive.

procedure
(char-extended-pictographic? char) → boolean?
char : char?

Returns #t if char has the Unicode “Extended_Pictographic” property.

Added in version 8.6.0.1 of package base.

procedure
(char-general-category char) → symbol?
char : char?

Returns a symbol representing the character’s Unicode general category, which is 'lu, 'll, 'lt, 'lm, 'lo, 'mn, 'mc, 'me, 'nd, 'nl, 'no, 'ps, 'pe, 'pi, 'pf, 'pd, 'pc, 'po, 'sc, 'sm, 'sk, 'so, 'zs, 'zp, 'zl, 'cc, 'cf, 'cs, 'co, or 'cn.

procedure
(char-grapheme-break-property char) → ?
char : char?

Returns the Unicode graheme-break property for char, which is 'Other, 'CR, 'LF, 'Control, 'Extend, 'ZWJ, 'Regional_Indicator, 'Prepend, 'SpacingMark, 'L, 'V, 'T, 'LV, or 'LVT.

Added in version 8.6.0.1 of package base.

procedure
(make-known-char-range-list)
→
(listof (list/c exact-nonnegative-integer?
exact-nonnegative-integer?
boolean?))

Produces a list of three-element lists, where each three-element list represents a set of consecutive code points for which the Unicode standard specifies character properties. Each three-element list contains two integers and a boolean; the first integer is a starting code-point value (inclusive), the second integer is an ending code-point value (inclusive), and the boolean is #t when all characters in the code-point range have identical results for all of the character predicates above, have analogous transformations (shifting by the same amount, if any, in code-point space) for char-downcase, char-upcase, and char-titlecase, and have the same decomposition–normalization behavior. The three-element lists are ordered in the overall result list such that later lists represent larger code-point values, and all three-element lists are separated from every other by at least one code-point value that is not specified by Unicode.

4.6.4 Character Conversions

procedure
(char-upcase char) → char?
char : char?

Produces a character consistent with the 1-to-1 code point mapping defined by Unicode. If char has no upcase mapping, char-upcase produces char.

String procedures, such as string-upcase, handle the case where Unicode defines a locale-independent mapping from the code point to a code-point sequence (in addition to the 1-1 mapping on scalar values).

Examples:

> (char-upcase #\a)
#\A
> (char-upcase #\λ)
#\Λ
> (char-upcase #\space)
#\space

procedure
(char-downcase char) → char?
char : char?

Like char-upcase, but for the Unicode downcase mapping.

Examples:

> (char-downcase #\A)
#\a
> (char-downcase #\Λ)
#\λ
> (char-downcase #\space)
#\space

procedure
(char-titlecase char) → char?
char : char?

Like char-upcase, but for the Unicode titlecase mapping.

Examples:

> (char-upcase #\a)
#\A
> (char-upcase #\λ)
#\Λ
> (char-upcase #\space)
#\space

procedure
(char-foldcase char) → char?
char : char?

Like char-upcase, but for the Unicode case-folding mapping.

Examples:

> (char-foldcase #\A)
#\a
> (char-foldcase #\Σ)
#\σ
> (char-foldcase #\ς)
#\σ
> (char-foldcase #\space)
#\space

4.6.5 Character Grapheme-Cluster Streaming

procedure
(char-grapheme-step char state) →
boolean? fixnum?
char : char?
state : fixnum?

Encodes a state machine for Unicode’s grapheme-cluster specification on a sequence of code points. It accepts a character for the next code point in a sequence, and it returns two values: whether a (single) grapheme cluster has terminated since the most recently reported termination (or the start of the stream), and a new state to be used with char-grapheme-step and the next character.

A value of 0 for state represents the initial state or a state where no characters are pending toward a new boundary. Thus, if a sequence of characters is exhausted and accumulated state is not 0, then the end of the stream creates one last grapheme-cluster boundary. When char-grapheme-step produces a true value as its first result and a non-0 value as its second result, then the given char must be the only character pending toward the next grapheme cluster (by the rules of Unicode grapheme clustering).

The char-grapheme-step procedure will produce a result for any fixnum state, but the meaning of a non-0 state is specified only in that providing such a state produced by char-grapheme-step in another call to char-grapheme-step continues detecting grapheme-cluster boundaries in the sequence.

See also string-grapheme-length and string-grapheme-count.

Examples:

> (char-grapheme-step #\a 0)
#f
1
> (let*-values ([(consumed? state) (char-grapheme-step #\a 0)]
                [(consumed? state) (char-grapheme-step #\b state)])
    (values consumed? state))
#t
1
> (let*-values ([(consumed? state) (char-grapheme-step #\return 0)]
                [(consumed? state) (char-grapheme-step #\newline state)])
    (values consumed? state))
#t
0
> (let*-values ([(consumed? state) (char-grapheme-step #\a 0)]
                [(consumed? state) (char-grapheme-step #\u300 state)])
    (values consumed? state))
#f
5

Added in version 8.6.0.2 of package base.

top contents ← prev up next →