****************************************************************************** FILE: $RCSfile: csfile.txt,v $ $Revision: 3.71 $ $Date: 1996/08/18 20:41:31 $ ****************************************************************************** An 8-bit Implementation of BibTeX 0.99 with a Very Large Capacity ================================================================= Contents -------- 1. The Codepage and Sort Order (CS) File 1.1 CS file syntax 1.2 Testing a CS file 1.3 Sharing your CS file 2. Change Log 1. The Codepage and Sort Order (CS) File ----------------------------------------- The Codepage and Sort definition (CS) file is used to define the 8-bit character set used by BibTeX and the order in which those characters should be sorted. NOTE: this file contains many 8-bit characters and may cause problems if you try to display or print it on 7-bit systems (e.g. older versions of Unix). 1.1 CS file syntax ------------------ The codepage and sorting order (CS) file defines how BibTeX will treat an 8-bit character set, specifically which characters are to be treated as letters, the upper/lower case relationships between characters, and the sorting order of characters. The CS file may contain a number of sections, each presented in the form of a TeX macro: \section-name{
} Four sections are currently supported: \lowupcase, \lowercase, \uppercase and \order. The syntax of the four supported sections is summarised below. 8-bit characters may be entered naturally, but to avoid problems with character set translation or corruption, they can also be entered using the TeX-style portable notation for character codes, i.e. ^^XX, where XX is the hexadecimal value ofthe character code. Reading of the sections ends when the first '}' character is reached, so '}' can't be included in a section. You can, however, use ^^7d instead. The percent sign ('%') is used to introduce a trailing comment - it and all remaining characters on a line are ignored. If you need to insert a percent character, you can use ^^25 instead. \lowupcase section The \lowupcase section of the CS file is used to define the lower /upper and upper/lower case relationship of pairs of specified characters. It is only used if the relationship is symmetrical - use \lowercase or \upcase if it isn't. The syntax of the \lowupcase section is: \lowupcase{ % Comment begins with a percent sign ... } Each pair of characters defines that the upper case equivalent of is *and* the lower case equivalent of is . You cannot redefine the lower or upper case equivalent of an ASCII character (code < 128), so all instances of and (i.e. both sides of the relationship) must have codes > 127. \lowercase section The \lowercase section of the CS file is used to define the lower case equivalent of specified characters. It should normally only be used if the relationship isn't symmetrical - use \lowupcase if it is. The syntax of the \lowercase section is: \lowercase{ % Comment begins with a percent sign ... } Each pair of characters defines that the lower case equivalent of is . You cannot redefine the lower case equivalent of an ASCII character (code < 128), so all instances of (i.e. the left hand side of the relationship) must have codes > 127. \uppercase section The \uppercase section of the CS file is used to define the upper case equivalent of specified characters. It should normally only be used if the relationship isn't symmetrical - use \lowupcase if it is. The syntax of the \uppercase section is: \uppercase{ % Comment begins with a percent sign ... } Each pair of characters defines that the upper case case equivalent of is . You cannot redefine the upper case equivalent of an ASCII character (code < 128), so all instances of (i.e. the left hand side of the relationship) must have codes > 127. \order section The \order section of the CS file is used to define the order in which characters are sorted. The syntax of the \order section is: \order{ % Comment begins with a percent sign % whitespace between the chars - % a hyphen between the chars _ % an underscore between the chars ... } All characters on the same line are given the same sorting weight. The construct is used to denote that all characters in the range to should be given the same sorting weight. For example, "A _ Z" would cause all ASCII upper case alphabetical characters to have the same sorting weight and would be equivalent to placing all 26 characters on the same line. The construct is used to denote that all characters in the range to should be given an ascending set of sorting weights, starting with and ending with . For example, "A - Z" would cause all upper case ASCII alphabetical characters to be sorted in ascending order and would be equivalent to placing 'A' on the first line, 'B' on the second, through to 'Z' on the 26th line. The characters at the beginning of the order section are given a lower sorting weight than characters occuring later. When sorting alphabetically, characters with the lowest weight come first. All characters not in the \order section (including ASCII characters) are given the same very high sorting weight to ensure that they come last when sorting alphabetically. 1.2 Testing a CS file --------------------- If you create a CS, you'll want a straightforward way of testing that BibTeX is interpreting it as you expect it to. The --debug=csf option will report the results of parsing the CS file. Specifically, BibTeX will report: o characters with codes > 127 that it has defined as type ALPHA o characters with upper case equivalents o characters with lower case equivalents o characters in ascending sorting order An example of the output when processing the cp437lat.csf CS file is shown below. This output will only make sense if you are using a computer that supports the IBM codepage 437. D-CSF: c8read_csf: trying to open CS file `cp437lat.csf' ... D-CSF: reading the \lowupcase section ... D-CSF: finished reading the \lowupcase section. D-CSF: reading the \uppercase section ... D-CSF: finished reading the \uppercase section. D-CSF: reading the \order section ... D-CSF: finished reading the \order section. D-CSF: 8-bit ALPHA characters D-CSF: ---------------------- D-CSF: 80: D-CSF: 90: . . . . . . D-CSF: a0: . . . . . . . . . . D-CSF: b0: . . . . . . . . . . . . . . . . D-CSF: c0: . . . . . . . . . . . . . . . . D-CSF: d0: . . . . . . . . . . . . . . . . D-CSF: e0: . . . . . . . . . . . . . . . D-CSF: f0: . . . . . . . . . . . . . . . . D-CSF: Characters with upper case equivalents D-CSF: -------------------------------------- D-CSF: a [61] <<< A [41] b [62] <<< B [42] c [63] <<< C [43] D-CSF: d [64] <<< D [44] e [65] <<< E [45] f [66] <<< F [46] D-CSF: g [67] <<< G [47] h [68] <<< H [48] i [69] <<< I [49] D-CSF: j [6a] <<< J [4a] k [6b] <<< K [4b] l [6c] <<< L [4c] D-CSF: m [6d] <<< M [4d] n [6e] <<< N [4e] o [6f] <<< O [4f] D-CSF: p [70] <<< P [50] q [71] <<< Q [51] r [72] <<< R [52] D-CSF: s [73] <<< S [53] t [74] <<< T [54] u [75] <<< U [55] D-CSF: v [76] <<< V [56] w [77] <<< W [57] x [78] <<< X [58] D-CSF: y [79] <<< Y [59] z [7a] <<< Z [5a] [81] <<< [9a] D-CSF: [82] <<< [90] [83] <<< A [41] [84] <<< [8e] D-CSF: [85] <<< A [41] [86] <<< [8f] [87] <<< [80] D-CSF: [88] <<< E [45] [89] <<< E [45] [8a] <<< E [45] D-CSF: [8b] <<< I [49] [8c] <<< I [49] [8d] <<< I [49] D-CSF: [91] <<< [92] [93] <<< O [4f] [94] <<< [99] D-CSF: [95] <<< O [4f] [96] <<< U [55] [97] <<< U [55] D-CSF: [a0] <<< A [41] [a1] <<< I [49] [a2] <<< O [4f] D-CSF: [a3] <<< U [55] [a4] <<< [a5] D-CSF: Characters with lower case equivalents D-CSF: -------------------------------------- D-CSF: A [41] >>> a [61] B [42] >>> b [62] C [43] >>> c [63] D-CSF: D [44] >>> d [64] E [45] >>> e [65] F [46] >>> f [66] D-CSF: G [47] >>> g [67] H [48] >>> h [68] I [49] >>> i [69] D-CSF: J [4a] >>> j [6a] K [4b] >>> k [6b] L [4c] >>> l [6c] D-CSF: M [4d] >>> m [6d] N [4e] >>> n [6e] O [4f] >>> o [6f] D-CSF: P [50] >>> p [70] Q [51] >>> q [71] R [52] >>> r [72] D-CSF: S [53] >>> s [73] T [54] >>> t [74] U [55] >>> u [75] D-CSF: V [56] >>> v [76] W [57] >>> w [77] X [58] >>> x [78] D-CSF: Y [59] >>> y [79] Z [5a] >>> z [7a] [80] >>> [87] D-CSF: [8e] >>> [84] [8f] >>> [86] [90] >>> [82] D-CSF: [92] >>> [91] [99] >>> [94] [9a] >>> [81] D-CSF: [a5] >>> [a4] D-CSF: Characters in sorting order D-CSF: --------------------------- D-CSF: 00: 0 [30] D-CSF: 01: 1 [31] D-CSF: 02: 2 [32] D-CSF: 03: 3 [33] D-CSF: 04: 4 [34] D-CSF: 05: 5 [35] D-CSF: 06: 6 [36] D-CSF: 07: 7 [37] D-CSF: 08: 8 [38] D-CSF: 09: 9 [39] D-CSF: 0a: A [41] a [61] [83] [84] [85] [86] [8e] [8f] [a0] D-CSF: 0b: [91] [92] D-CSF: 0c: B [42] b [62] D-CSF: 0d: C [43] c [63] [80] [87] D-CSF: 0e: D [44] d [64] D-CSF: 0f: E [45] e [65] [82] [88] [89] [8a] [90] D-CSF: 10: F [46] f [66] D-CSF: 11: G [47] g [67] D-CSF: 12: H [48] h [68] D-CSF: 13: I [49] i [69] [8b] [8c] [8d] [a1] D-CSF: 14: J [4a] j [6a] D-CSF: 15: K [4b] k [6b] D-CSF: 16: L [4c] l [6c] D-CSF: 17: M [4d] m [6d] D-CSF: 18: N [4e] n [6e] D-CSF: 19: [a4] [a5] D-CSF: 1a: O [4f] o [6f] [93] [94] [95] [99] [a2] D-CSF: 1b: P [50] p [70] D-CSF: 1c: Q [51] q [71] D-CSF: 1d: R [52] r [72] D-CSF: 1e: S [53] s [73] D-CSF: 1f: [e1] D-CSF: 20: T [54] t [74] D-CSF: 21: U [55] u [75] [81] [96] [97] [9a] [a3] D-CSF: 22: V [56] v [76] D-CSF: 23: W [57] w [77] D-CSF: 24: X [58] x [78] D-CSF: 25: Y [59] y [79] D-CSF: 26: Z [5a] z [7a] D-CSF: (All other characters are sorted equally after any of the above.) 1.3 Sharing your CS file ------------------------ Although we have provided a limited number of CS files, we hope that other will soon produce different examples for other character sets and national sorting orders. We will also be happy to accept corrections to the example files supplied. If you'd like to contribute a CS file, please send it to one of the authors or upload it to one of the CTAN FTP archives. If you e-mail a copy please ZIP and encode (UU/MIME) it so that we can be confident that the file has not become corrupted in transit. 2. Change Log -------------- $Log: csfile.txt,v $ # Revision 3.71 1996/08/18 20:41:31 kempson # Placed under RCS control. # ******************************** END OF FILE *******************************