EMACS 18.59 by Thomas Bellman Originally Howard Gayle wrote a set of patches for GNU Emacs 18.55 for displaying, sorting and converting 8-bit characters. However, Emacs 18.55 contains some bugs, and I wanted to apply them to 18.57. This wasn't straightforward, since there were lots of internal differences between 18.55 and 18.57. However, I finally succeeded in applying them. At least I though I had... When Emacs 18.58 was released soon after, I applied my patches to that version. I soon found out that I had a bug, causing Emacs to sporadically, but repeatably, abort and dump core. After many months, Linus Tolke (Linus@Lysator.LiU.Se) got tired of this, and found the bug. Now the time has come for Emacs 18.59! Below follows what Howard Gayle has written about his patches for Emacs 18.55. You should read that before installing these patches. Just substitute 18.59 when he speaks about 18.55. If you have only the diffs, then you need to recompile all the .elc files in the lisp directory, but if you have the entire patched 18.59, I have recompiled them for you (using a newer and, presumably better, byte-compiler). I have also included an extra elisp file, iso-chars.el, for displaying ISO 8859-1 characters, since I wanted an alternative to Howard Gayles variants. Sorry, not documented, but it exists... Included in this version, is a patch by Niclas Wiberg (nicwi@isy.liu.se) that allows insertion of eight-bit characters under X-windows, while retaining the use of the Meta key. It works by inserting a C-q before any character with the high bit set. Thus it is not quite as general as I would like, but since some people liked it, and it wasn't any worse than before, I include it. Install the patches by copying all files into the respective directories in the Emacs distribution. The apply the diffs in the file 'DIFFS'. You will also have to rebuild the info files emacs* in the info directory, if you only have the diffs. There is one known incompatibility with the original Emacs. The standard Emacs interprets the regexp "[abc-]" as being equal to "[abc---]", while Howard Gayle's patches makes Emacs give the error "Invalid regexp: Premature end of regular expression" when seeing such a regexp. I know of only one elisp package that uses a regexp of this form, and that is supersite.el. If you are using supersite, then the easy way out is to change that (single) regexp. If you find any bugs when using these routines, or if you find any bugs not in the standard 18.59, I would like to know about them. I might take myself some time and try to fix them, but no promise. Share and enjoy! -- Thomas Bellman, Lysator Academic Computer Club University of Linkoping, Sweden Bellman@Lysator.LiU.Se ------------------------------------------------------------------------ SUMMARY I have modified GNU Emacs version 18.55 to handle many 8-bit character sets, including the ISO 8859 character sets. For each character, it is possible to customize the byte(s) sent to the terminal to display that character. X11R4 is also supported, to an extent. Case determination, case changing, and sorting can all be customized. Input facilities are primitive. DISCOURAGEMENT Emacs version 19 will support 8-bit character sets. That support is based on my modifications, but there will probably be some differences between the 8-bit character set support in this modified version 18.55 and the support in version 19. Therefore, if you can wait for version 19 I urge you to do so. Richard Stallman says he does not know when version 19 will be available. This is alpha-test software. It has known bugs. I'm posting it to alt.sources to emphasize that, and to avoid having it archived. If you don't know your way around GNU Emacs, please don't try to install it. I don't have time to provide support. (But please send bug reports anyway.) Input support is primitive. X windows support is for X11 only, and is incomplete. CHARACTER SETS SUPPORTED So I haven't scared you off yet. OK, you were warned. My modifications allow GNU Emacs to handle any character set provided that each character is represented by exactly one 8-bit byte, and the codes for space, newline, and horizontal tab are the same as in ASCII. Now for some definitions. DEFINITIONS A glyf is something that takes up exactly one position on the display of a terminal, terminal emulator, or window system. For example, 'a' is a glyf, as is a yellow, blinking, underlined '7' on a red background. It may be necessary to transmit many bytes to a terminal to display one glyf. A rope is a sequence of glyfs. (The name is an analogy to string, which is a sequence of characters.) For example, the glyf '^' followed by the glyf 'C' forms a rope of length 2. Glyfs are represented as unsigned 16-bit integers. Ropes are represented as vectors of glyfs. CHAR TABLES There's a new lisp object: char tables. A char table specifies, for each 8-bit character, the rope to use to display that character. Char tables are associated with windows, not buffers, so one buffer can be displayed in several different windows with several different char tables. CASE TABLES Another new lisp object, case tables, specify for each 8-bit character the case: upper, lower, or none. SORT TABLES Another new lisp object, sort tables, specify for each 8-bit character its sorting position. Sort tables are also used for searching. Special sort tables can be set up, for example, to ignore diacritical marks when searching. TRANS TABLES Finally, trans tables are lisp objects that map each 8-bit character into some other character. They are used for case conversion, and can also be used for character set conversion. ISO 8859/1 SUPPORT I include support for displaying ISO 8859/1 characters. On ASCII terminals they display as various ropes, e.g. A with grave accent displays as {`A}. If your terminal can display some of the characters correctly, e.g. by using shift-out and shift-in, then you can write a lisp/term file to do that. I include as an example lisp/term/fa4440a.el for the Facit 4440 Twist terminal with a Swedish PROM. If your terminal (emulator) provides full ISO 8859/1, you can just send 8-bit characters to it directly. See the code in lisp/term/x-win.el starting with "(if (fboundp 'get-glyf)" for an example. SWEDISH SUPPORT I include support for Swedish as an example of language support. This includes a swedish mode analogous to text mode, and sort tables for Swedish alphabetical order. INPUT Input is kludgy. The file lisp/iso8859-1-insert.el defines little functions to insert each non-ASCII ISO 8859/1 character. These are put into the global keymap under C-x 8, which is supposed to be mnemonic for 8859. So e.g. "C-x 8 ` A" runs insert-A-grave. This is OK for infrequently used characters, but for those you use often I suggest you use programmable keys on your terminal, if possible. For example, Swedish uses o with umlaut a lot, so I have one of the programmable keys on my terminal set up to transmit "C-q 3 6 6". Using C-q also means this works with e.g. incremental search, not just for inserting. Here's what I do on my Facit 4440 Twist: 1) Press Setup 2) Press 5 to enter Setup B mode 3) Press F4 C-q 3 4 5 C-Return Press F5 C-Q 3 4 4 C-Return Press F6 C-Q 3 6 6 C-Return Press F7 C-Q 3 5 1 C-Return Press F8 C-Q 3 7 4 C-Return Press Shift-F4 C-Q 3 0 5 C-Return Press Shift-F5 C-Q 3 0 4 C-Return Press Shift-F6 C-Q 3 2 6 C-Return Press Shift-F7 C-Q 3 1 1 C-Return Press Shift-F8 C-Q 3 3 4 C-Return 4) Press S to save everything in nonvolatile memory. This puts a with ring on function key 4, a with umlaut on F5, o with umlaut on F6, e with acute accent on F7, and u with umlaut on F8. X WINDOWS SUPPORT Only X11 is supported, not X10. I've only tried this on X11R4. Eventually, the idea is for each glyf, which is really just an unsigned 16-bit integer, to be treated as two bytes. The low order byte selects one face code in a font, for example 'g'. The high order byte selects a graphic context (GC). But for now, there's only one GC. For input of frequently-used characters I just hacked stringFuncVal in src/x11term.c. You may wish to do the same. Many of the X11R4 fonts advertised as ISO 8859/1 don't really contain all the characters; 7x14 does, so that's what I use for now. Here's another font to try: >From: jw@sics.se (Johan Widen) >Newsgroups: comp.windows.x >Subject: eightbit version of the 'fixed' font available >Message-ID: <1990Mar9.164011.1775@sics.se> >Date: 9 Mar 90 16:40:11 GMT >Distribution: comp >Organization: Swedish Institute of Computer Science, Kista > >An eightbit version of the X11R4 'fixed' font (also known as 6x13) is available >for anonymous ftp from > sics.se (192.16.123.90) >in the compressed tar file > archive/fixed.bdf.Z > >The glyphs below 128 are unchanged. The ISO-8859-1 characters from 160 to 255 >have been added. > >I'm interested in any improvements/fixes that you make to this font. > >-- >Johan Widen >SICS, PO Box 1263, S-164 28 KISTA, SWEDEN Internet: jw@sics.se >Tel: +46 8 752 15 32 Ttx: 812 61 54 SICS S Fax: +46 8 751 72 30 OTHER APPLICATIONS These modifications have other uses than supporting 8-bit character sets. The file lisp/emphasis.el uses the high bit to indicate emphasis, e.g. underlining, of 7-bit ASCII. A hook in lisp/man.el then displays italicized test in manual entries with emphasis if possible. The file lisp/rot13.el contains a disgusting hack that displays a buffer in another window, but with a rot13 char table. I really use this when reading rec.humor.funny with Gnews. If you don't like unprintable characters to be displayed in octal, you can change to hex or whatever. RELATED SOFTWARE My cz system lets you print ISO 8859/1 text on PostScript printers. It interfaces to GNU Emacs. To get it, get these articles from your nearest comp.sources.misc archive: cz comp.sources.misc volume 8 issues 65-75, 77-78 ( 1 Oct 1989) issue 97 (28 Oct 1989) libhoward comp.sources.misc volume 8 issues 80-87 ( 1 Oct 1989) issue 96 (28 Oct 1989) BUGS It should be possible to format texinfo files into info files by doing this (e.g. for cl.texinfo): % cd man; emacs -batch -funcall batch-texinfo-format cl.texinfo texinfo formatting /usr/local/free/gnu-emacs/18.55i/man/cl.texinfo... Formatting Info file... Making tags table for Info file... >> Error: (void-variable This) >> point at >> Info file: cl, -*-Text-*- >> produced by texinfo-format-buffer >> from file: cl.texinfo >> Copyright (C But that gives the error shown. However this works: % emacs -batch -load info -funcall batch-texinfo-format cl.texinfo To the first person who supplies me with a fix for this bug, I offer a color portrait of the Swedish Royal Family, with a genuine Swedish postage stamp on the other side. INSTALLATION Start with a copy of GNU Emacs 18.55 as distributed. Parts 1 through 4 are shar archives; unshar them. Two of the lisp files have high-order bits set. They are encoded with Brad Templeton's abe system, which was posted to comp.sources.misc on 4 June 1989 as volume 7, issues 1 and 2, archive name abe. To extract them, you must have the dabe command. Do: % cd lisp % dabe el.abe % cd .. Parts 5 through 12 are context diffs. Parts 11 and 12 are together the diffs to man/emacs.tex; they must be concatenated. Apply the diffs with patch. Now install Emacs as usual. When byte-recompiling the elisp code, it may be necessary to load case-table.el, char-table.el, sort-table.el, and trans-table.el first. Be sure to byte-compile all the new .el files you intend to use. Here's the complete list: case-table.el char-table-vt100.el char-table.el emphasis.el iso8859-1-ascii.el iso8859-1-insert.el iso8859-1-swedish.el iso8859-1.el rot13.el sort-table.el swedish.el trans-table.el term/id100.el term/fa4440a.el term/fa4440b.el You'll probably want to load some character set and language support from lisp/site-init.el. For example, ours starts like this: (load "iso8859-1") (garbage-collect) (load "iso8859-1-insert") (garbage-collect) (load "swedish") (garbage-collect) CHANGES Here's a brief summary of what I changed in each file. In src: abbrev.c: expand-abbrev: Use casetab.h macros. Use HYPHEN. alloc.c: GC case, char, sort, and trans tabs. buffer.c: reset_buffer_local_variables: Initialize case_table_v, etc.. Drop selective_display_ellipses. buffer.h: Add case_table_p, etc. & buffer_char_table. Drop ctl_arrow. casefiddle.c: casify_object & casify_region: Use casetab.h macros. config.h-dist: Add 30000 to PURESIZE. cmds.c: Use chartab.h macros. data.c: Add arg_out_of_range. dired.c: Use standard_downcase_table_p instead of downcase_table. dispextern.h: Change char to glyf_t. dispnew.c: Use chartab.h macros. Change char to glyf_t. Check for X windows in chartab.c now. editfns.c: Use casetab.h & chartab.h macros. emacs.c: Call init_case_table_once, init_char_table_once, syms_of_case_table, and syms_of_char_table. fileio.c: #include casetab.h fns.c: Add string-lessp*. indent.c: Use chartab.h macros. Use char table to compute lengths instead of hard code. Drop selective_display_ellipses. keyboard.c: Use ROPE_LEN to check if direct insertion OK. lisp.h: Move case macros to casetab.h. Add Lisp_Chartab and related definitions. minibuf.c: Use casetab.h macros. process.c: Use transtab.h macros. print.c: Print out char tables. regex.c: Drop translate. regex.h: Use sort table when compiling pattern. scroll.c: lisp.h must be included before dispextern.h. search.c: Remove downcase_table & compute_trt_inverse. syms_of_search: Remove initialization of downcase_table. Use NEWLINE. term.c: char -> glyf. termchar.h: Replace vector DCICcost by function. termhooks.h: {insert,write,delete}_chars_hook -> {insert,write,delete}_glyfs_hook window.c: Add window-char-table & set-window-char-table. Save char tables for saved windows. window.h: Add window_char_table. xdisp.c: Use chartab.h macros. char->glyf. Drop selective_display_ellipses. x11term.c: char->glyf ymakefile: Add new files and include dependencies. In lisp: keypad.el: Add backtab code. Comments. man.el: Add manual-entry-hook. Default to default-manual-entry-hook, which removed underlining and overstriking. mlconvert.el: Changing control-code display is different. rmail.el: Run rmail-get-new-mail-hook after getting new mail. sendmail.el: Run mail-send-hook just before sending mail. sort.el: string< -> string-lessp* text-mode.el: (provide 'text-mode) term/x-win.el: direct-map high-order ISO 8859 bits In etc: NEWS makedoc.com In man: emacs.tex EMAIL Here's how I read and send email in ISO 8859/1 while still living in a 7-bit (ISO 646) world. I run Chip Salzenberg's deliver program. My .deliver file looks like this: cat $HEADER $BODY | 78seus | deliver -n "$1" echo DROP (OK, I'm lying. My real .deliver file also saves a copy of incoming messages. Also, it has absolute path names to 78seus and deliver, because they're not in /usr/bin. But you get the idea.) The 78seus filter is part of my cz system (see above). It converts mixed English and Swedish to ISO 8859/1. Cz also has one for Danish, plus a paper on how to make your own. I then read mail with GNU Emacs rmail mode, as usual. When sending mail I write it in ISO 8859/1 in Emacs sendmail mode. Just before sending it, sendmail runs mail-send-hook, which is set in lisp/swedish.el to call the function 8859-to-swascii-buffer. This function maps the ISO 8859/1 to ISO 646. Deliver was posted to comp.sources.unix on 16 October 1989 as volume 20, issues 23 through 26, archive name deliver2.0. These are the patches I know about: 1 comp.sources.unix volume 20 issue 27 (16 Oct 1989) 2 comp.sources.bugs,comp.mail.misc 15 Dec 1989 3 comp.sources.bugs,comp.mail.misc 15 Dec 1989 4 comp.sources.bugs,comp.mail.misc 15 Dec 1989 5 comp.sources.bugs,comp.mail.misc 19 Dec 1989 6 comp.sources.bugs,comp.mail.misc 19 Feb 1990 7 comp.sources.bugs,comp.mail.misc 7 Mar 1990 8 comp.sources.bugs,comp.mail.misc 7 Mar 1990 9 comp.sources.bugs,comp.mail.misc 7 Mar 1990 -- Howard Gayle TN/ETX/TT/HL Ericsson Telecom AB S-126 25 Stockholm Sweden howard@ericsson.se uunet!ericsson.se!howard Phone: +46 8 719 5565 FAX : +46 8 719 8439