This package exports functions to manage character-sets, character encodings, coding systems and external format. It's all the same, but everyone likes to have his own terms... The base character set repertoire will be the IANA one, published at: http://www.iana.org/assignments/character-sets</a> The cs-lisp-encoding and cs-emacs-encoding of the character sets are hooked in by the implementation specific initialization code in the COM.INFORMATIMGO.CLEXT.CHARACTER-SET package. See also: COM.INFORMATIMGO.CLEXT.CHARACTER-SET License: AGPL3 Copyright Pascal J. Bourguignon 2005 - 2012 This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see <http://www.gnu.org/licenses/>
*character-sets* |
variable |
The list of Character Sets.
Initial value: (#S(CHARACTER-SET :MIB-ENUM 106 :NAME UTF-8 :ALIASES NIL :SOURCE RFC 3629 :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 4 :NAME ISO_8859-1:1987 :ALIASES (csISOLatin1 CP819 IBM819 l1 latin1 ISO-8859-1 ISO_8859-1 iso-ir-100) :SOURCE ECMA registry :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 2025 :NAME GB2312 :ALIASES (csGB2312) :SOURCE Chinese for People's Republic of China (PRC) mixed one byte, two byte set: 20-7E = one byte ASCII A1-FE = two byte PRC Kanji See GB 2312-80 PCL Symbol Set Id: 18C :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 18 :NAME Extended_UNIX_Code_Packed_Format_for_Japanese :ALIASES (EUC-JP csEUCPkdFmtJapanese) :SOURCE Standardized by OSF, UNIX International, and UNIX Systems Laboratories Pacific. Uses ISO 2022 rules to select code set 0: US-ASCII (a single 7-bit byte set) code set 1: JIS X0208-1990 (a double 8-bit byte set) restricted to A0-FF in both bytes code set 2: Half Width Katakana (a single 7-bit byte set) requiring SS2 as the character prefix code set 3: JIS X0212-1990 (a double 7-bit byte set) restricted to A0-FF in both bytes requiring SS3 as the character prefix :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 2024 :NAME Windows-31J :ALIASES (csWindows31J) :SOURCE Windows Japanese. A further extension of Shift_JIS to include NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). The CCS's are JIS X0201:1997, JIS X0208:1997, and these extensions. This charset can be used for the top-level media type "text", but it is of limited or specialized use (see RFC2278). PCL Symbol Set id: 19K :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1019 :NAME UTF-32LE :ALIASES NIL :SOURCE <http://www.unicode.org/unicode/reports/tr19/> :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1018 :NAME UTF-32BE :ALIASES NIL :SOURCE <http://www.unicode.org/unicode/reports/tr19/> :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1017 :NAME UTF-32 :ALIASES NIL :SOURCE <http://www.unicode.org/unicode/reports/tr19/> :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1014 :NAME UTF-16LE :ALIASES NIL :SOURCE RFC 2781 :COMMENTS ...) #S(CHARACTER-SET :MIB-ENUM 1013 :NAME UTF-16BE :ALIASES NIL :SOURCE RFC 2781 :COMMENTS ...) ...)
(character-in-character-set-p character character-set) |
function |
RETURN: Whether the CHARACTER belongs to the CHARACTER-SET.
character-set |
structure |
Describes a character-set.
character-set-error |
condition |
The CHARACTER-SET-ERROR condition.
Class precedence list: CHARACTER-SET-ERROR ERROR SERIOUS-CONDITION CONDITION STANDARD-OBJECT T
Class init args: CHARACTER-SET FORMAT-CONTROL FORMAT-ARGUMENTS
(character-set-error-character-set error) |
generic-function |
The character-set in error.
(character-set-to-mime-encoding cs) |
function |
RETURN: The MIME encoding of the given character set, or it's NAME.
(cs-aliases character-set) |
function |
A list of aliases for the character set (strings).
(cs-comments character-set) |
function |
A comment (string).
(cs-emacs-encoding character-set) |
function |
The name of the encoding in GNU emacs.
(cs-lisp-encoding character-set) |
function |
The name of the encoding in the current lisp implementation.
(cs-mib-enum character-set) |
function |
The integer identifying the character set in the SNMP MIBs.
(cs-mime-encoding character-set) |
function |
The name of the encoding in MIME.
(cs-name character-set) |
function |
The name of the character set (a string).
(cs-ranges character-set) |
function |
The set of unicode ranges of the characters that are in this character-set.
(cs-references character-set) |
function |
References (string).
(cs-source character-set) |
function |
The normative reference specifying the character set (string).
(find-character-set name) |
function |
RETURN: The character-set in *CHARACTER-SETS* that has NAME as name or alias, or some variation of NAME (removing non alphanumeric characters and prefixing 'cs'.
(read-character-sets-file file) |
function |
DO: Parse the <http://www.iana.org/assignments/character-sets> file, and extracts the character-sets defined there. RETURN: A list of character-set structures read from the file.
(register-character-set cs) |
function |
DO: Register a new character-set CS. If there's already a character set with the same name or aliase, signal a CHARACTER-SET-ERROR. RETURN: CS