15.1.4 Windows Paths
In general, a Windows pathname consists of an optional drive specifier and a drive-specific path. A Windows path can be absolute but still relative to the current drive; such paths start with a / or \ separator and are not UNC paths or paths that start with \\?\.
A path that starts with a drive specification is complete. Roughly, a drive specification is either a Latin letter followed by a colon, a UNC path of the form \\‹machine›\‹volume›, or a \\?\ form followed by something other than REL\‹element›, or RED\‹element›. (Variants of \\?\ paths are described further below.)
Racket fails to implement the usual Windows path syntax in one way. Outside of Racket, a pathname "C:rant.txt" can be a drive-specific relative path. That is, it names a file "rant.txt" on drive "C:", but the complete path to the file is determined by the current working directory for drive "C:". Racket does not support drive-specific working directories (only a working directory across all drives, as reflected by the current-directory parameter). Consequently, Racket implicitly converts a path like "C:rant.txt" into "C:\rant.txt".
Racket-specific: Whenever a path starts with a drive specifier ‹letter›: that is not followed by a / or \, a \ is inserted as the path is cleansed.
Otherwise, Racket follows standard Windows path conventions, but also adds \\?\REL and \\?\RED conventions to deal with paths inexpressible in the standard convention, plus conventions to deal with excessive \s in \\?\ paths.
In the following, ‹letter› stands for a Latin letter (case does not matter), ‹machine› stands for any sequence of characters that does not include \ or / and is not ?, ‹volume› stands for any sequence of characters that does not include \ or / , and ‹element› stands for any sequence of characters that does not include \.
Trailing spaces and . in a path element are ignored when the element is the last one in the path, unless the path starts with \\?\ or the element consists of only spaces and .s.
The following special “files”, which access devices, exist in all directories, case-insensitively, and with all possible endings after a period or colon, except in pathnames that start with \\?\: "NUL", "CON", "PRN", "AUX", "COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9", "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9".
Except for \\?\ paths, /s are equivalent to \s. Except for \\?\ paths and the start of UNC paths, multiple adjacent /s and \s count as a single \. In a path that starts \\?\ paths, elements can be separated by either a single or double \.
A directory can be accessed with or without a trailing separator. In the case of a non-\\?\ path, the trailing separator can be any number of /s and \s; in the case of a \\?\ path, a trailing separator must be a single \, except that two \s can follow \\?\‹letter›:.
Except for \\?\ paths, a single . as a path element means “the current directory,” and a .. as a path element means “the parent directory.” Up-directory path elements (i.e., ..) immediately after a drive are ignored.
A pathname that starts \\‹machine›\‹volume› (where a / can replace any \) is a UNC path, and the starting \\‹machine›\‹volume› counts as the drive specifier.
Normally, a path element cannot contain a character in the range #\x 0 to #\x 1F nor any of the following characters:
< > : " / \ |
Except for \, path elements containing these characters can be accessed using a \\?\ path (assuming that the underlying filesystem allows the characters).
In a pathname that starts \\?\‹letter›:\, the \\?\‹letter›:\ prefix counts as the path’s drive, as long as the path does not both contain non-drive elements and end with two consecutive \s, and as long as the path contains no sequence of three or more \s. Two \s can appear in place of the \ before ‹letter›. A / cannot be used in place of a \ (but /s can be used in element names, though the result typically does not name an actual directory or file).
In a pathname that starts \\?\UNC\‹machine›\‹volume›, the \\?\UNC\‹machine›\‹volume› prefix counts as the path’s drive, as long as the path does not end with two consecutive \s, and as long as the path contains no sequence of three or more \s. Two \s can appear in place of the \ before UNC, the \s after UNC, and/or the \s after‹machine›. The letters in the UNC part can be uppercase or lowercase, and / cannot be used in place of \s (but / can be used in element names).
Racket-specific: A pathname that starts \\?\REL\‹element› or \\?\REL\\‹element› is a relative path, as long as the path does not end with two consecutive \s, and as long as the path contains no sequence of three or more \s. This Racket-specific path form supports relative paths with elements that are not normally expressible in Windows paths (e.g., a final element that ends in a space). The REL part must be exactly the three uppercase letters, and /s cannot be used in place of \s. If the path starts \\?\REL\.. then for as long as the path continues with repetitions of \.., each element counts as an up-directory element; a single \ must be used to separate the up-directory elements. As soon as a second \ is used to separate the elements, or as soon as a non-.. element is encountered, the remaining elements are all literals (never up-directory elements). When a \\?\REL path value is converted to a string (or when the path value is written or displayed), the string does not contain the starting \\?\REL or the immediately following \s; converting a path value to a byte string preserves the \\?\REL prefix.
Racket-specific: A pathname that starts \\?\RED\‹element› or \\?\RED\\‹element› is a drive-relative path, as long as the path does not end with two consecutive \s, and as long as the path contains no sequence of three or more \s. This Racket-specific path form supports drive-relative paths (i.e., absolute given a drive) with elements that are not normally expressible in Windows paths. The RED part must be exactly the three uppercase letters, and /s cannot be used in place of \s. Unlike \\?\REL paths, a .. element is always a literal path element. When a \\?\RED path value is converted to a string (or when the path value is written or displayed), the string does not contain the starting \\?\RED and it contains a single starting \; converting a path value to a byte string preserves the \\?\RED prefix.
Three additional Racket-specific rules provide meanings to character sequences that are otherwise ill-formed as Windows paths:
Racket-specific: In a pathname of the form \\?\‹any›\\ where ‹any› is any non-empty sequence of characters other than ‹letter›: or \‹letter›:, the entire path counts as the path’s (non-existent) drive.
Racket-specific: In a pathname of the form \\?\‹any›\\\‹elements›, where ‹any› is any non-empty sequence of characters and ‹elements› is any sequence that does not start with a \, does not end with two \s, and does not contain a sequence of three \s, then \\?\‹any›\\ counts as the path’s (non-existent) drive.
Racket-specific: In a pathname that starts \\?\ and does not match any of the patterns from the preceding bullets, \\?\ counts as the path’s (non-existent) drive.
Outside of Racket, except for \\?\ paths, pathnames are typically limited to 259 characters when used as a file path and 247 characters when used as a directory path. Racket internally converts pathnames longer than 247 characters to \\?\ form to avoid the limits; in that case, the path is first simplified syntactically (in the sense of simplify-path). The operating system cannot access files through \\?\ paths that are longer than 32,000 characters or so.
Where the above descriptions says “character,” substitute “byte” for interpreting byte strings as paths. The encoding of Windows paths into bytes preserves ASCII characters, and all special characters mentioned above are ASCII, so all of the rules are the same.
Beware that the \ path separator is an escape character in Racket strings. Thus, the path \\?\REL\..\\.. as a string must be written "\\\\?\\REL\\..\\\\..".
A path that ends with a directory separator syntactically refers to a directory. In addition, a path syntactically refers to a directory if its last element is a same-directory or up-directory indicator (not quoted by a \\?\ form), or if it refers to a root.
Even on variants of Windows that support symbolic links, up-directory .. indicators in a path are resolved syntactically, not sensitive to links. For example, if a path ends with d\..\f and d refers to a symbolic link that references a directory with a different parent than d, the path nevertheless refers to f in the same directory as d. A relative-path link is parsed as if prefixed with \\?\REL paths, except that .. and . elements are allowed throughout the path, and any number of redundant \ separators are allowed.
Windows paths are cleansed as follows: In paths that start \\?\, redundant \s are removed, an extra \ is added in a \\?\REL if an extra one is not already present to separate up-directory indicators from literal path elements, and an extra \ is similarly added after \\?\RED if an extra one is not already present. For other paths, multiple /s and \s are converted to single /s or \ (except at the beginning of a shared folder name), and a \ is inserted after the colon in a drive specification if it is missing.
For (bytes->path-element bstr), /s, colons, trailing dots, trailing whitespace, and special device names (e.g., “aux”) in bstr are encoded as a literal part of the path element by using a \\?\REL prefix. The bstr argument must not contain a \, otherwise the exn:fail:contract exception is raised.
For (path-element->bytes path) or (path-element->string path), if the byte-string form of path starts with a \\?\REL, the prefix is not included in the result.
For (build-path base-path sub-path ...), trailing spaces and periods are removed from the last element of base-path and all but the last sub-path (unless the element consists of only spaces and periods), except for those that start with \\?\. If base-path starts \\?\, then after each non-\\?\REL\ and non-\\?\RED\ sub-path is added, all /s in the addition are converted to \s, multiple consecutive \s are converted to a single \, added . elements are removed, and added .. elements are removed along with the preceding element; these conversions are not performed on the original base-path part of the result or on any \\?\REL\ or \\?\RED\ or sub-path. If a \\?\REL\ or \\?\RED\ sub-path is added to a non-\\?\ base-path, the base-path (with any additions up to the \\?\REL\ or \\?\RED\ sub-path) is simplified and converted to a \\?\ path. In other cases, a \ may be added or removed before combining paths to avoid changing the root meaning of the path (e.g., combining //x and y produces /x/y, because //x/y would be a UNC path instead of a drive-relative path).
For (simplify-path path use-filesystem?), path is expanded, and if path does not start with \\?\, trailing spaces and periods are removed, a / is inserted after the colon in a drive specification if it is missing, and a \ is inserted after \\?\ as a root if there are elements and no extra \ already. Otherwise, if no indicators or redundant separators are in path, then path is returned.
For (split-path path) producing base, name, and must-be-dir?, splitting a path that does not start with \\?\ can produce parts that start with \\?\. For example, splitting C:/x /aux/ twice produces \\?\REL\\x and \\?\REL\\aux; the \\?\ is needed in these cases to preserve a trailing space after x and to avoid referring to the AUX device instead of an "aux" file.
15.1.4.1 Windows Path Representation
A path on Windows is natively a sequence of UTF-16 code units, where the sequence can include unpaired surrogates. This sequence is encoded as a byte string through an extension of UTF-8, where unpaired surrogates in the UTF-16 code-unit sequence are converted as if they were non-surrogate values. The extended encodings are implemented on Windows as the "platform-UTF-16" and "platform-UTF-8" encodings for bytes-open-converter.
Racket’s internal representation of a Windows path is a byte string, so that path->bytes and bytes->path are always inverses. When converting a path to a native UTF-16 code-unit sequence, #\tab is used in place of platform-UTF-8 decoding errors (on the grounds that tab is normally disallowed as a character in a Windows path, unlike #\uFFFD).
A Windows path is converted to a string by treating the platform-UTF-8 encoding as a UTF-8 encoding with #\uFFFD in place of decoding errors. Similarly, a string is converted to a path by UTF-8 encoding (in which case no errors are possible).