29: Localization

by Scott G. Miller

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-29@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This document specifies an interface to retrieving and displaying locale sensitive messages. A Scheme program can register one or more translations of templated messages, and then write Scheme code that can transparently retrieve the appropriate message for the locale under which the Scheme system is running.

Rationale

As any programmer that has ever had to deal with making his or her code readable in more than one locale, the process of sufficiently abstracting program messages from their presentation to the user is non-trivial without help from the programming language. Most modern programming language libraries do provide some mechanism for performing this separation.

A portable API that allows a piece of code to run without modification in different countries and under different languages is a must for any non-trivial software project.  The interface should separate the logic of a program from the myriad of translations that may be necessary.

The interface described in this document provides such functionality. The underlying implementation is also allowed to use whatever datastructures it likes to provide access to the translations in the most efficient manner possible.  In addition, the implementation is provided with standardized functions that programs will use for accessing an external, unspecified repository of translations.

This interface does not cover all aspects of localization, including support for non-latin characters, number and date formatting, etc. Such functionality is the scope of a future SRFI that may extend this one.

Dependencies

A SRFI-29 conformant implementation must also implement SRFI-28, Basic Format Strings. Message templates are strings that must be processed by the format function specified in that SRFI.

Specification

Message Bundles

A Message Bundle is a set of message templates and their identifying keys. Each bundle contains one or more such key/value pairs. The bundle itself is associated with a bundle specifier which uniquely identifies the bundle.

Bundle Specifiers

A Bundle Specifier is a Scheme list that describes, in order of importance, the package and locale of a message bundle.  In most cases, a locale specifier will have between one and three elements. The first element is a symbol denoting the package for which this bundle applies. The second and third elements denote a locale. The second element (first element of the locale) if present, is the two letter, ISO 639-1 language code for the bundle. The third element, if present, is a two letter ISO 3166-1 country code.  In some cases, a fourth element may be present, specifying the encoding used for the bundle.  All bundle specifier elements are Scheme symbols.

If only one translation is provided, it should be designated only by a package name, for example (mathlib). This translation is called the default translation.

Bundle Searching

When a message template is retrieved from a bundle, the Scheme implementation will provide the locale under which the system is currently running. When the template is retrieved, the package name will be specified. The Scheme system should construct a Bundle Specifier from the provided package name and the active locale. For example, when retrieving a message template for French Canadian, in the mathlib package, the bundle specifier '(mathlib fr ca)' is used. A program may also retrieve the elements of the current locale using the no-argument procedures:

current-language -> symbol
current-language symbol -> undefined

When given no arguments, returns the current ISO 639-1 language code as a symbol.  If provided with an argument, the current language is set to that named by the symbol for the currently executing Scheme thread (or for the entire Scheme system if such a distinction is not possible).  

current-country -> symbol
current-country symbol -> undefined

returns the current ISO 3166-1 country code as a symbol.  If provided with an argument, the current country is set to that named by the symbol for the currently executing Scheme thread (or for the entire Scheme system if such a distinction is not possible).   

current-locale-details -> list of symbols
current-locale-details list-of-symbols -> undefined

Returns a list of additional locale details as a list of symbols.  This list may contain information about encodings or other more specific information.  If provided with an argument, the current locale details are set to those given in the currently executing Scheme thread (or for the entire Scheme system if such a distinction is not possible). 

The Scheme System should first check for a bundle with the exact name provided. If no such bundle is found, the last element from the list is removed and a search is tried for a bundle with that name. If no bundle is then found, the list is shortened by removing the last element again. If no message is found and the bundle specifier is now the empty list, an error should be raised.

The reason for this search order is to provide the most locale sensitive template possible, but to fall back on more general templates if a translation has not yet been provided for the given locale.

Message Templates

A message template is a localized message that may or may not contain one of a number of formatting codes. A message template is a Scheme string. The string is of a form that can be processed by the format procedure found in many Scheme systems and formally specified in SRFI-28 (Basic Format Strings).

This SRFI also extends SRFI-28 to provide an additional format escape code:

~[n]@* - Causes a value-requiring escape code that follows this code immediately to reference the [N]'th optional value absolutely, rather than the next unconsumed value. The referenced value is not consumed.

This extension allows optional values to be positionally referenced, so that message templates can be constructed that can produce the proper word ordering for a language.

Preparing Bundles

Before a bundle may be used by the Scheme system to retrieve localized template messages, they must be made available to the Scheme system.  This SRFI specifies a way to portably define the bundles, as well as store them in and retrieve them from an unspecified system which may be provided by resources outside the Scheme system.

declare-bundle! bundle-specifier association-list -> undefined

Declares a new bundle named by the given bundle-specifier.  The contents of the bundle are defined by the provided association list.  The list contains associations between Scheme symbols and the message templates (Scheme strings) they name.  If a bundle already exists with the given name, it is overwritten with the newly declared bundle.
store-bundle bundle-specifier -> boolean
Attempts to store a bundle named by the given bundle specifier, and previously made available using declare-bundle! or load-bundle!, in an unspecified mechanism that may be persistent across Scheme system restarts.  If successful, a non-false value is returned.  If unsuccessful, #f is returned.
load-bundle! bundle-specifier -> boolean
Attempts to retrieve a bundle from an unspecified mechanism which stores bundles outside the Scheme system.  If the bundle was retrieved successfully, the function returns a non-false value, and the bundle is immediately available to the Scheme system. If the bundle could not be found or loaded successfully, the function returns #f, and the Scheme system's bundle registry remains unaffected.

A compliant Scheme system may choose not to provide any external mechanism to store localized bundles.  If it does not, it must still provide implementations for store-bundle and load-bundle!. In such a case, both functions must return #f regardless of the arguments given. Users of this SRFI should recognize that the inability to load or store a localized bundle in an external repository is not a fatal error.

Retrieving Localized Message Templates

localized-template package-name message-template-name -> string or #f

Retrieves a localized message template for the given package name and the given message template name (both symbols).  If no such message could be found, false (#f) is returned.

After retrieving a template, the calling program can use format to produce a string that can be displayed to the user.

Examples

The below example makes use of SRFI-29 to display simple, localized messages.  It also defines its bundles in such a way that the Scheme system may store and retrieve the bundles from a more efficient system catalog, if available.

(let ((translations
       '(((en) . ((time . "Its ~a, ~a.")
                (goodbye . "Goodbye, ~a.")))
         ((fr) . ((time . "~1@*~a, c'est ~a.")
                (goodbye . "Au revoir, ~a."))))))
  (for-each (lambda (translation)
              (let ((bundle-name (cons 'hello-program (car translation))))
                (if (not (load-bundle! bundle-name))
                    (begin
                     (declare-bundle! bundle-name (cdr translation))
                     (store-bundle! bundle-name)))))
             translations))

(define localized-message
  (lambda (message-name . args)
    (apply format (cons (localized-template 'hello-program
                                            message-name)
                        args))))

(let ((myname "Fred"))
  (display (localized-message 'time "12:00" myname))
  (display #\newline)

  (display (localized-message 'goodbye myname))
  (display #\newline))

;; Displays (English):
;; Its 12:00, Fred.
;; Goodbye, Fred.
;;
;; French:
;; Fred, c'est 12:00.
;; Au revoir, Fred.

Implementation

The implementation requires that the Scheme system provide a definition for current-language and current-country capable of distinguishing the correct locale present during a Scheme session. The definitions of those functions in the reference implementation are not capable of that distinction. Their implementation is provided only so that the following code can run in any R4RS scheme system.  

In addition, the below implementation of a compliant format requires SRFI-6 (Basic String Ports) and SRFI-23 (Error reporting)

;; The association list in which bundles will be stored
(define *localization-bundles* '())

;; The current-language and current-country functions provided
;; here must be rewritten for each Scheme system to default to the
;; actual locale of the session
(define current-language
  (let ((current-language-value 'en))
    (lambda args
      (if (null? args)
          current-language-value
          (set! current-language-value (car args))))))

(define current-country
  (let ((current-country-value 'us))
    (lambda args
      (if (null? args)
          current-country-value
          (set! current-country-value (car args))))))

;; The load-bundle! and store-bundle! both return #f in this
;; reference implementation.  A compliant implementation need
;; not rewrite these procedures.
(define load-bundle!
  (lambda (bundle-specifier)
    #f))

(define store-bundle!
  (lambda (bundle-specifier)
    #f))

;; Declare a bundle of templates with a given bundle specifier
(define declare-bundle!
  (letrec ((remove-old-bundle
            (lambda (specifier bundle)
              (cond ((null? bundle) '())
                    ((equal? (caar bundle) specifier)
                     (cdr bundle))
                    (else (cons (car bundle)
                                (remove-old-bundle specifier
                                                   (cdr bundle))))))))
    (lambda (bundle-specifier bundle-assoc-list)
      (set! *localization-bundles*
            (cons (cons bundle-specifier bundle-assoc-list)
                  (remove-old-bundle bundle-specifier
                                     *localization-bundles*))))))

;;Retrieve a localized template given its package name and a template name
(define localized-template
  (letrec ((rdc
            (lambda (ls)
              (if (null? (cdr ls))
                  '()
                  (cons (car ls) (rdc (cdr ls))))))
           (find-bundle
            (lambda (specifier template-name)
              (cond ((assoc specifier *localization-bundles*) =>
                     (lambda (bundle) bundle))
                    ((null? specifier) #f)
                    (else (find-bundle (rdc specifier)
                                       template-name))))))
    (lambda (package-name template-name)
      (let loop ((specifier (cons package-name
                                  (list (current-language)
                                        (current-country)))))
        (and (not (null? specifier))
             (let ((bundle (find-bundle specifier template-name)))
               (and bundle
                    (cond ((assq template-name bundle) => cdr)
                          ((null? (cdr specifier)) #f)
                          (else (loop (rdc specifier)))))))))))

;;A SRFI-28 and SRFI-29 compliant version of format.  It requires
;;SRFI-23 for error reporting.
(define format
  (lambda (format-string . objects)
    (let ((buffer (open-output-string)))
      (let loop ((format-list (string->list format-string))
                 (objects objects)
                 (object-override #f))
        (cond ((null? format-list) (get-output-string buffer))
              ((char=? (car format-list) #\~)
               (cond ((null? (cdr format-list))
                      (error 'format "Incomplete escape sequence"))
                     ((char-numeric? (cadr format-list))
                      (let posloop ((fl (cddr format-list))
                                    (pos (string->number
                                          (string (cadr format-list)))))
                        (cond ((null? fl)
                               (error 'format "Incomplete escape sequence"))
                              ((and (eq? (car fl) '#\@)
                                    (null? (cdr fl)))
                                    (error 'format "Incomplete escape sequence"))
                              ((and (eq? (car fl) '#\@)
                                    (eq? (cadr fl) '#\*))
                               (loop (cddr fl) objects (list-ref objects pos)))
                              (else
                                (posloop (cdr fl)
                                         (+ (* 10 pos)
                                            (string->number
                                             (string (car fl)))))))))
                     (else
                       (case (cadr format-list)
                         ((#\a)
                          (cond (object-override
                                 (begin
                                   (display object-override buffer)
                                   (loop (cddr format-list) objects #f)))
                                ((null? objects)
                                 (error 'format "No value for escape sequence"))
                                (else
                                  (begin
                                    (display (car objects) buffer)
                                    (loop (cddr format-list)
                                          (cdr objects) #f)))))
                         ((#\s)
                          (cond (object-override
                                 (begin
                                   (display object-override buffer)
                                   (loop (cddr format-list) objects #f)))
                                ((null? objects)
                                 (error 'format "No value for escape sequence"))
                                (else
                                  (begin
                                    (write (car objects) buffer)
                                    (loop (cddr format-list)
                                          (cdr objects) #f)))))
                         ((#\%)
                          (if object-override
                              (error 'format "Escape sequence following positional override does not require a value"))
                          (display #\newline buffer)
                          (loop (cddr format-list) objects #f))
                        ((#\~)
                          (if object-override
                              (error 'format "Escape sequence following positional override does not require a value"))
                          (display #\~ buffer)
                          (loop (cddr format-list) objects #f))
                         (else
                           (error 'format "Unrecognized escape sequence"))))))
              (else (display (car format-list) buffer)
                    (loop (cdr format-list) objects #f)))))))

© 2002 Scott G. Miller.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: David Rush
Author: Scott G. Miller