MzScheme version 300 is different from previous versions of MzScheme in several significant ways: * MzScheme's reader is case-sensitive for symbols/identifier by default. Prefix an S-expression with #ci to make it case-insensitive. * MzScheme now directly supports Unicode. The "char" datatype corresponds to a Unicode scalar value, and strings correspond to a sequence of scalar values. Meanwhile, a new "byte string" datatype implements a sequence of bytes (exact integers between 0 and 255), and byte strings take over the old role of strings with respect to low-level port operations. Regexp matching works on both char strings and byte strings, and MzScheme provides various operations for encoding chars as byte strings. See the "Unicode" section below for more information. * Related to the Unicode change, MzScheme now uses a distinct "path" datatype for file and directory names, instead of using strings. Built-in procedures that accept a path also accept a string (and implicitly convert it); procedures that produce a path never produce a string. See the "Paths" section below for more details. * The new "foreign.ss" library in MzLib provides access to foreign libraries dynamically and directly in Scheme. See "PLT Foreign Interface Manual" for more information. * File-stream output ports (including file ports, the initial output port, and ports created by `subprocess') are now block-buffered by default, instead of line-buffered. The exception is when an output port corresponds to a terminal, in which case it is line-buffered by default. Also, the initial error port remains unbuffered. TCP output ports are block buffered (instead of unbuffered) by default. The file-stream changes are especially likely to affect stdio-based communication among OS-level processes. For example, when communicating with an ispell subprocess, adding a newline at the end of a command previously would have been enough to send the command to ispell. Now, the output must be flushed explicitly (using `flush-output') or the buffer mode must be explicitly changed to by-line (using `file-stream-buffer-mode'). The TCP changes affect most TCP-based communication. Explicitly flush output using `flush-output' or change the buffer mode using `file-stream-buffer-mode'. * The class system has changed slightly. The `rename' keyword has been changed to `rename-super', but the new `super' expression form eliminates the need for most `rename' declarations. Also, the class system supports methods that cannot be overridden entirely, but that can augmented through "inner" methods (as in Beta). Some methods in MrEd's classes have been changed to augment-only methods. Finally, `class*/names' has been eliminated, and `this', `super-new', etc. are all exported by the "class.ss" module. See the "Classes" section below for more details. * The built-in exception hierarchy has been revised and streamlined (again). See the "Exceptions" section below for more details. * A continuation is no longer tied to its creating thread. Various continuation barriers remain in place, such as around the call to an exception handler or syntax expander, and also around the start of MzScheme's main thread. The main thread's barrier prevents continuations captured in the main thread from being used in other threads (which should make sense, intuitively, because then other threads could become "main"). A newly created thread, however, has no such barrier, so that created threads can trade continuations. * The "parameter" construct has been redefined. The revised `parameterize' is like the old one, except that: - The `parameterize' form accepts only parameter procedures created by `make-parameter', not arbitrary procedures that accept 0 or 1 arguments. - The body of a `parameterize' is in tail position with respect to the entire `parameterize' expression. - The given parameter procedures are not called on exit from a `parameterize' form, so the parameter guards (if any) are not called. - A `parameterize' expression tends to execute much more quickly, while parameter lookup can be slightly slower. - A `parameterize' has the expected effect if a continuation is captured during the `parameterize' body and invoked in a different thread. Preserved thread cells now provide precisely the semantics of old "parameters" (but without a form like `parameterize'). Meanwhile, a new "parameter" maps a continuation to a preserved thread cell, which in turn provides a thread-specific value. * The `break-enabled' procedure no longer corresponds to a parameter, because changing the break-enable state implies a check for a suspended break, and this check is incompatible with tail evaluation of `parameterize' forms. Related to this change, if a `with-handlers' handler is called to handle an exception, breaks are initially disabled for the handler, but the handler is not called in tail position with respect to the `with-handlers' form. (The body is in tail position, though.) Use `with-handlers*' to make a handler called in tail position, but without breaks disabled. * The `object-wait-multiple' function has been renamed to `sync/timeout', and `sync' is the same procedure without a timeout argument. The `object-wait-multiple/enable-break' procedure has been renamed to `sync/timeout/enable-break', and `sync/enable-break' enables breaks without a timeout. The "waitable" procedures have been renamed to "evt" procedures in general, often dropping "make-". "Evt" stands for "synchronizable event". Several new event-generating procedures have been added. Old New --- --- object-waitable? evt? waitables->waitable-set choice-evt make-channel-put-waitable channel-put-evt make-semaphore-peek semaphore-peek-evt make-wrapped-waitable wrap-evt or handle-evt make-guard-waitable guard-evt make-nack-guard-waitable nack-guard-evt make-poll-guard-waitable poll-guard-evt thread-dead-waitable thread-dead-evt thread-suspend-waitable thread-suspend-evt thread-resume-waitable thread-resume-evt udp-receive-waitable udp-receive-ready-evt udp-send-waitable udp-send-ready-evt alarm-evt write-bytes-avail-evt udp-receive!-evt udp-send-to-evt udp-send-evt ... * The new `require-for-template' core form serves as a kind of dual to `require-for-syntax', and the new `define-for-syntax' and `begin-for-syntax' forms allow macro helper functions to be placed closer to macro definitions. See the MzScheme manual for more information. * Unexported module bindings are more secure because they can only appear in certified contexts, and they can be made completely secure by changing the current code inspector. Certification management is automatic for most macros, but certification requires changes to programs that transform the result of `expand' and feed the transformed program back to `eval'. See the MzScheme manual for more information. ====================================================================== Unicode ====================================================================== The "char" datatype means "Unicode scalar value", which technically should not be confused with "Unicode character". But most things that a literate human would call a "character" can be represented by a single scalar value in Unicode, so the "scalar value" approximation of "character" works well for many purposes. See section 1.2 in the MzScheme manual for an overview of MzScheme's approach to Unicode and locales. In particular, `integer->char' produces a character for every exact integer from 0 to #x10FFFF, except #xD800 to #xDFFF (which are reserved for surrogates in some encodings of Unicode). The `bytes->string/utf-8' and `string->bytes/utf-8' functions convert between byte string and character strings via UTF-8. The `bytes->string/utf-8' procedure accepts an optional character to use in place of bad encoding sequences (otherwise an exception is raised). A general `bytes-convert' interface converts among different encodings in a bytes, including UTF-8 and the current locale's encoding. The conversion interface can deal with input that ends mid-encoding, so it can be used for conversion on streams, too. (The converter uses iconv where available.) Internally, strings are encoded as UCS-4, but symbols are encoded in UTF-8. Other details: * The `char->latin-1-integer' and `latin-1-integer->char' procedures have been removed. * Added a `bytes-...' operation for most every `string-...' operation. The `byte?' predicate returns true for exact integers in [0,255]. * `regexp' produces a char regexp, and `byte-regexp' produces a byte regexp. A regexp can be matched against a byte string (or port), in which case the byte string (or port) is interpreted as a UTF-8 encoding. Similarly, a regexp can be matched against a string, in which case the string is encoded via UTF-8 before matching. * A hash before a string makes it a byte-string literal: (string->list "hi") = '(#\h #\i) (bytes->list #"hi") = '(104 105) Similarly, #rx"...." is a regexp, while #rx#"...." is a byte regexp. * Use #\uXXXX or #\UXXXXXX for arbitrary character constants, where each X is a hexadecimal digit and the resulting number identifies a scalar value. In a string (but not a byte string), use "\uXXXX" or "\UXXXXXX". * All of the `char-whitespace?', `char-alphabetic?', etc. functions are defined in accordance with SRFI-14. New functions include `char-title-case?', `char-blank?', `char-graphic?' `char-symbolic?', and `char-titlecase'. * The built-in string functions remain locale-independent (as in SRFI-13), and `string-locale=?', etc. provide locale-sensitive comparisons. The `string-locale-upcase' and `string-locale-downcase' functions provide locale-sensitive case conversion. No locale-sensitive character operations are provided (the old ones have been removed). * Case-insensitivity for symbols is consistent with SRFI-13, which means using the 1-1 character mapping defined by the Unicode consortium. Number parsing recognizes only ASCII digits (and A-F/a-f) for numbers, but all `char-whitespace?' characters are treated as whitespace by `read'. * MzScheme effectively assumes UTF-8 stdin and stdout, but library procedures like `reencode-input-port' can be used to accommodate other encodings, including the locale's encoding. DrScheme reads and writes files using UTF-8. Ports ----- "Port" still means "byte port" in MzScheme. Various port operations, like `read-string-avail!', have been renamed to to `read-bytes-avail!'. Character operations on a port, such as `read-char' and `read-string', are defined in terms of a UTF-8 parsing/writing of the port's byte stream. (With a custom-port wrapper and the byte-string conversion functions, other decodings can be implemented.) Position and column counting for a port is sensitive to UTF-8. For example, reading #o302 followed by #o251 increments the position and column by 1, instead of 2. ====================================================================== Paths ====================================================================== Under Unix, paths are fundamentally byte strings, not strings. Typically, the correct printing of a path use the current locale's encoding, but there's no guarantee that the path is well-formed using the current locale's encoding. To mediate this view of paths, MzScheme now supplies a "path" datatype, with operations `path->string', `string->path', `bytes->string', and `bytes->path'. Use `path->string' to print a path to the user, but use `path->bytes' to marshal a path (e.g., for saving a pathname in a file). All functions that consume a pathname accept a string and implicitly convert it (via the user's locale's default encoding) to a byte-string pathname. Under Windows, where a pathname is an array of UTF-16 code units, MzScheme internally converts to and from byte strings via UTF-8<->UTF-16, but extended to support unpaired surrogates and other code units that are invalid in an encoding. A byte string that is not a UTF-8 encoding will never correspond to a pathname under Windows. ====================================================================== Classes ====================================================================== Changes to the `(lib "class.ss")' object system are in three parts: - a syntactic clean-up to eliminate `class*/names', - a syntactic clean-up for super calls, and - new constructs for augment-only methods. Meanwhile, keywords such as `public' are now bound to syntactic forms that report out-of-context uses (much like `unquote' and `unquote-splicing'). The Demise of `class*/names' ---------------------------- The `class*/names' form allowed the programmer to specify names to be bound instead of `this', `super-new', etc. The `class*' and `class' forms non-hygienically introduced those names. Macros that would naturally expand to `class' or `class*' had to expand to `class*/names', instead, because expanding to a non-hygienic macro usually does not work. In v300, `this', `super-new', etc. are exported by `(lib "class.ss")', and attempting to use the keywords outside of a `class' or `class*' form results in a syntax error. Meanwhile, macros can easily and correctly expand to uses of `class' and `class*'. Super Calls ----------- A `rename' clause is no longer necessary in a typical class with method overrides, due to the new `super' form. For example, (class splotch% (rename [super-paint paint]) (define/override (paint x) (super-paint x) ....) (super-new)) can now be written (class splotch% (define/override (paint x) (super paint x) ....) (super-new)) An `override' declaration enables the corresponding (internal) method name to be used with the `super' form. The `super' form is legal only for expressions within a `class' (or `class*', etc.). For cases where `super' cannot be used --- either because no overriding method is declared in a class that calls a super method, or because the super call is in a lexically nested class --- the `rename-super' form can be used just like the old `rename' form. The script plt/notes/mzscheme/rename-super-fixup.ss may be useful for converting code that uses `rename' to use `super'. Augment-Only Methods -------------------- A `pubment' clause declares a method like `public', but the resulting method cannot be overridden. Instead, the `pubment' method can use `inner' to dispatch to an augmenting method declared in a subclass. The word "pubment" is a contraction of "public, but merely augmentable in subclasses". The `inner' expression form includes an expression to evaluate when a subclass does not provide an augmenting method. A subclass augments a `pubment' method with `augment' instead of `override'. The `augment' declaration itself is non-overridable, and it can use `inner' to allow further augmentation in further subclasses. Example: (define img% (class object% ;; No subclass can avoid clearing the dc in `paint', ;; but a subclass can augment `paint' to draw afterward. ;; The result indicates the size of the drawn image, ;; which is 0 if the paint method is not augmented. (define/pubment (paint dc) (send dc clear) (inner 0 paint dc)) (super-new))) (define box% (class img% ;; Add a square to the drawing, but allow subclasses ;; to draw first. Subclasses cannot skip the final ;; square-drawing step. Note that the result of the ;; method is the result of the `inner' call, which is 20 ;; if the paint method is not augmented. (define/augment (paint dc) (begin0 (inner 20 paint dc) (send dc draw-rectangle 0 0 20 20))) (super-new))) (define frbox% (class img% ;; Add a larger red square as a background. (define/augment (paint dc) (send dc set-color (make-object color% "red")) (send dc draw-rectangle -1 -1 22 22) (send dc set-color (make-object color% "black")) (inner 22 paint dc)) (super-new))) (send (new img%) paint dc) ; => 0 ; and clears the dc (send (new box%) paint dc) ; => 20 ; and clears the dc, ; then draws a black rectangle (send (new frbox%) paint dc) ; => 22 ; and clears the dc, ; then draws a big red rectangle ; then draws a black rectangle An augmentation itself can be made overrideable using `augride', which is a contraction of "augment, but allow the augment to be overridden". Similarly, `overment' overrides a method, but allows subclasses only to augment this overriding. (define dot% (class img% ;; This augmentation of img% can be replaced in ;; subclasses. (define/augride (paint dc) (send dc draw-ellipse 0 0 20 20) 20) (super-new))) (define emptydot% (class dot% ;; Draw nothing, but still claim to have ;; drawn something of size 20. The dc is still ;; cleared in `paint' from img%; the override ;; replaces only `paint' in dot%. (define/override (paint dc) 20) (super-new))) (define frdot% (class dot% ;; This method re-uses the `paint' augmentation in ;; dot%, and allows further augmentation in subclasses ;; (which cannot skip the painting here). (define/overment (paint dc) (send dc set-color (make-object color% "red")) (send dc draw-ellipse -1 -1 22 22) (send dc set-color (make-object color% "black")) (super paint dc) (inner 22 paint dc)) (super-new))) Note that `pubment', `augment', or `overment' without an `inner' call is effectively the same as `public-final', `augment-final', or `override-final'. However, the `-final' variants report a class error if a subclass attempts to augment the method, whereas the non-`-final' variants allow subclasses to include an augmentation (that is always ignored). In general: Can use `inner'? Can use `super'? public N N pubment Y N override N Y augment Y N overment Y Y augride N N public-final N N override-final N Y augment-final N N The `rename-inner' form is similar to `rename-super'. Like `rename-super', it is rarely useful compared to `inner'. A use of a binding introduced by `rename-inner' must include a `lambda' pattern after the identifier to provide the default expression (i.e., the expression to evaluate if no subclass augments the method); see the documentation for further information. Keywords -------- The various keywords for class clauses are now all defined as syntax and exported by `(lib "class.ss")'. Use of a keyword in an expression positions produces a syntax error. A complete list of keywords: private public override augment pubment overment augride public-final override-final augment-final field init init-field rename-super rename-inner inherit super inner ====================================================================== Exceptions ====================================================================== The new exception hierarchy distinguishes between breaks and failures at nearly the top level of the hierarchy. In particular, most `with-handlers' expressions should use the `exn:fail?' predicate, instead of the old (and now removed) `not-break-exn?' predicate. The "type" and "mismatch" exceptions have been merged into `exn:fail:contract'. Similarly, `exn:i/o:tcp' and `exn:i/o:udp' have been merged into `exn:fail:network', `exn:i/o:filesystem' has moved to `exn:fail:filesystem', and other `exn:i/o' exceptions have been simplified to just `exn:fail'. The `exn:read' and `exn:syntax' hierarchies moved to `exn:fail:read' and `exn:fail:syntax'. Most other exceptions merged with `exn:fail:contract' or simply `exn:fail'. Many exception fields have been eliminated, but certain exceptions contain multiple source locations instead of just one. Instead of a single type for all exceptions with source locations, the `exn:srclocs' property identifies exceptions with source-location information. Field guards are triggered when an exception record is created, and it checks the "type" of the field arguments. Mutators are not exported for exception fields. Structs: exn - message continuation-marks exn:fail exn:fail:contract exn:fail:contract:arity exn:fail:contract:divide-by-zero exn:fail:contract:continuation exn:fail:contract:variable - id exn:fail:syntax - exprs exn:fail:read - sources exn:fail:read:eof exn:fail:read:non-char exn:fail:filesystem exn:fail:filesystem:exists exn:fail:filesystem:version exn:fail:network exn:fail:out-of-memory exn:fail:unsupported exn:break - continuation Properties: exn:srclocs - accessor ====================================================================== Inside MzScheme (extend MzScheme via C) ====================================================================== [See "PLT Foreign Interface Manual" for a new alternative to extending MzScheme with C code.] A structure that represents a Scheme type should now start with a Scheme_Object, instead of Scheme_Type. A Scheme_Object contains only a Scheme_Type (except in 3m mode), so it takes the same amount of space as before. But using Scheme_Object instead of Scheme_Type ensures that casts to and from Scheme_Object* do not run afoul of C99's aliasing assumptions. SCHEME_STRINGP(), etc. have been replaced by SCHEME_CHAR_STRINGP(), etc. and SCHEME_BYTE_STRINGP(), etc. A character is represented by the `mzchar' type, which corresponds to an unsigned integer (4 bytes). Use the functions scheme_char_string_to_byte_string() and scheme_byte_string_to_char_string() to convert between string types via UTF-8. Several UTF-8/UTF-16 <-> mzchar conversion functions are also provided. In addition to functions scheme_char_string...() which operate on `mzchar' arrays, some functions scheme_utf8_string...() are provided, which accept a `char' array and interpret it as a UTF-8 encoding. SCHEME_PATHP() recognizes the new path type. Use SCHEME_STRING_PATHP() to recognize either a string or path, and use scheme_string_to_path() to convert a string to a path. The error_buf field of Scheme_Thread is now a pointer to a mz_jmp_buf, instead of an inlined mz_jmp_buf. The protocol for temporarily catching an exception is now as follows: mz_jmp_buf *save, fresh; save = scheme_current_thread->error_buf; scheme_current_thread->error_buf = &fresh; if (scheme_setjmp(scheme_error_buf)) { /* There was an error or continuation invocation */ if (scheme_jumping_to_continuation) { /* It was a continuation jump */ scheme_longjmp(*save, 1); /* To block the jump, instead: scheme_clear_escape(); */ } else { /* It was a primitive error escape */ } } else { /* Whatever might escape. */ .... } scheme_current_thread->error_buf = save; The input and output port driver interfaces have changed to accommodate progress events and commits (for input ports) and write events (for output ports). For most port types, the new features can be implemented automatically by MzScheme with a small amount of extra work in the driver.