Package COM.INFORMATIMAGO.COMMON-LISP.CESARUM.CHARACTER-SETS

This package exports functions to manage character-sets, character
encodings, coding systems and external format.  It's all the same, but
everyone likes to have his own terms...

The base character set repertoire will be the IANA one, published at:
http://www.iana.org/assignments/character-sets</a>


The cs-lisp-encoding and cs-emacs-encoding of the character sets are
hooked in by the implementation specific initialization code in the
COM.INFORMATIMGO.CLEXT.CHARACTER-SET package.


See also: COM.INFORMATIMGO.CLEXT.CHARACTER-SET


License:

    AGPL3

    Copyright Pascal J. Bourguignon 2005 - 2012

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU Affero General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Affero General Public License for more details.

    You should have received a copy of the GNU Affero General Public License
    along with this program.
    If not, see <http://www.gnu.org/licenses/>

*character-sets*

variable

The list of Character Sets.

Initial value: (#S(CHARACTER-SET :MIB-ENUM 106 :NAME UTF-8 :ALIASES NIL :SOURCE RFC 3629 :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 4 :NAME ISO_8859-1:1987 :ALIASES (csISOLatin1 CP819 IBM819 l1 latin1 ISO-8859-1 ISO_8859-1 iso-ir-100) :SOURCE ECMA registry :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 2025 :NAME GB2312 :ALIASES (csGB2312) :SOURCE Chinese for People's Republic of China (PRC) mixed one byte, two byte set: 20-7E = one byte ASCII A1-FE = two byte PRC Kanji See GB 2312-80 PCL Symbol Set Id: 18C :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 18 :NAME Extended_UNIX_Code_Packed_Format_for_Japanese :ALIASES (EUC-JP csEUCPkdFmtJapanese) :SOURCE Standardized by OSF, UNIX International, and UNIX Systems Laboratories Pacific. Uses ISO 2022 rules to select code set 0: US-ASCII (a single 7-bit byte set) code set 1: JIS X0208-1990 (a double 8-bit byte set) restricted to A0-FF in both bytes code set 2: Half Width Katakana (a single 7-bit byte set) requiring SS2 as the character prefix code set 3: JIS X0212-1990 (a double 7-bit byte set) restricted to A0-FF in both bytes requiring SS3 as the character prefix :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 2024 :NAME Windows-31J :ALIASES (csWindows31J) :SOURCE Windows Japanese. A further extension of Shift_JIS to include NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). The CCS's are JIS X0201:1997, JIS X0208:1997, and these extensions. This charset can be used for the top-level media type "text", but it is of limited or specialized use (see RFC2278). PCL Symbol Set id: 19K :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1019 :NAME UTF-32LE :ALIASES NIL :SOURCE <http://www.unicode.org/unicode/reports/tr19/> :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1018 :NAME UTF-32BE :ALIASES NIL :SOURCE <http://www.unicode.org/unicode/reports/tr19/> :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1017 :NAME UTF-32 :ALIASES NIL :SOURCE <http://www.unicode.org/unicode/reports/tr19/> :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1014 :NAME UTF-16LE :ALIASES NIL :SOURCE RFC 2781 :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1013 :NAME UTF-16BE :ALIASES NIL :SOURCE RFC 2781 :COMMENTS ...) ...)

(character-in-character-set-p character character-set)

function

RETURN: Whether the CHARACTER belongs to the CHARACTER-SET.

character-set

structure

Describes a character-set.

character-set-error

condition

The CHARACTER-SET-ERROR condition.

Class precedence list: CHARACTER-SET-ERROR ERROR SERIOUS-CONDITION CONDITION STANDARD-OBJECT T

Class init args: CHARACTER-SET FORMAT-CONTROL FORMAT-ARGUMENTS

(character-set-error-character-set error)

generic-function

The character-set in error.

(character-set-to-mime-encoding cs)

function

RETURN: The MIME encoding of the given character set, or it's NAME.

(cs-aliases character-set)

function

A list of aliases for the character set (strings).

(cs-comments character-set)

function

A comment (string).

(cs-emacs-encoding character-set)

function

The name of the  encoding in GNU emacs.

(cs-lisp-encoding character-set)

function

The name of the  encoding in the current lisp implementation.

(cs-mib-enum character-set)

function

The integer identifying the character set in the SNMP MIBs.

(cs-mime-encoding character-set)

function

The name of the encoding in MIME.

(cs-name character-set)

function

The name of the character set (a string).

(cs-ranges character-set)

function

The set of unicode ranges of the characters that are in this character-set.

(cs-references character-set)

function

References (string).

(cs-source character-set)

function

The normative reference specifying the character set (string).

(find-character-set name)

function

RETURN: The character-set in *CHARACTER-SETS* that has NAME as name or alias,
        or some variation of NAME (removing non alphanumeric characters
        and prefixing 'cs'.

(read-character-sets-file file)

function

DO:     Parse the <http://www.iana.org/assignments/character-sets> file,
        and extracts the character-sets defined there.
RETURN: A list of character-set structures read from the file.

(register-character-set cs)

function

DO:     Register a new character-set CS.  If there's already a
        character set with the same name or aliase, signal a
        CHARACTER-SET-ERROR.
RETURN: CS