Title: Unicon FAQ/Information
Last Updated: 11/8/2000
Current Unicon Version: 3.0
Author: Bob Frey <bfrey@turbolinux.com.cn>

Q. What is Unicon?
A. Unicon is a software package that adds double-byte(*) language support
   to the Linux console. Independent of any particular application the
   package allows double-byte characters to be input and displayed on
   the console and any virtual console. In addition, input methods, the
   way in which double-byte language characters are typed into a computer,
   have been defined so that they can be shared with X Window applications.
   This provides the end-user a consistent set of double-byte input methods
   for all applications used under Linux - both for console text mode and
   X Window graphical mode applications.

   The Unicon software package consists of the following groups:
   1) kernel patch, 2) unicon daemon, and 3) input method shared libraries.

   1) The kernel patch adds support for double-byte input and display
      to the kernel. With the exception of changes to the keyboard,
      console, and frame buffer drivers, kernel changes for Unicon are
      added in a modular way with new kernel modules.
   2) The unicon daemon is a program run at system initialization time
      on the console to provide a double-byte input interface to the user.
      The program manages a status bar on the display that indicates the
      current input mode.
   3) The input method shared libraries define input methods that are
      used to input double-byte characters. Character, word, and phrase
      tables are maintained by the input methods to increase the speed
      a user can input double-byte text. For the Turbolinux Chinese
      distribution the shared libraries are used both by unicon and
      chinput, the X Window system double-byte input program.

   Unicon currently supports Simplified Chinese (GB and GBK) and Traditional
   Chinese (BIG5) character sets. Three input method shared library modules
   are provided based on the CCE (Chinese Console Environment) and
   TL (TurboLinux). These modules in total support four different input
   methods: PinYin, TurboLinux PinYin, GBK, and Smart PinYin.

   * The term double-byte is used to indicate any character set encoding
     method that uses 2 bytes. Unicon currently supports only 2 byte encoding
     methods.

Q. Is there a Unicon WWW page and mailing list?
A. Homepage:  http://turbolinux.com.cn/TLDN/chinese/project/unicon/index.html
   Mailing List: unicon@turbolinux.com.cn 	

Q. Where can Unicon be downloaded from?
A. http://turbolinux.com.cn/TLDN/chinese/project/unicon/download.html

Q. Who wrote Unicon?
A. Chris Li <chrisl at gnuchina dot org>
   Arthur Ma <arthur.ma@turbolinux.com.cn>
   Justin Yu <justin.yu@turbolinux.com.cn>

Q. Why is Unicon needed?
A. The Linux kernel by itself does not include support for double-byte
   language input and display on the console. If an end-user who is
   logged on to the Linux console uses the cat(1) program to display the
   contents of a double-byte language file, the file will not display
   correctly. And the user will be unable to input double-byte characters
   to applications on the console.
   
   To add double-byte language support a user-mode application can be
   added to Linux or the Linux kernel can be changed and used in combination
   with a user-mode application.
   
   Unicon and other solutions are not needed when an individual user
   application provides its own double-byte input and display support.
   But applications that follow this model can result in a lack of
   consistent input and display support which can be confusing and
   inconvenient to the end-user.
   
   Unicon and other solutions are also not needed if only X Window system
   applications are required by the user. The X Window system has good
   solutions for input and display of double-byte languages. (Note: The
   Unicon shared library input methods or some replacement is still
   required.) But in cases which a user has requirements or system that
   does not allow the use of the X Window system, a console mode double-byte
   input and display system is required.

Q. What about other text mode double-byte language support programs?
A. There are many other double-byte language support programs for the
   Linux console including pre-cursors to Unicon named CCE and Kon.
   These solutions do not support a shared library input methods. They
   also do not make changes to the Linux kernel. The advantages and
   disadvantages of changing the kernel are discussed below.
   XXX - Add more information about CCE and Kon.

Q. What is the architecture of Unicon?
A. Unicon includes kernel driver changes, kernel modules, the unicon daemon,
   and shared libraries. The diagram below describes the relationship of
   these different components.


               Unicon User/Kernel/Hardware Architecture
               ========================================

                             +------------------+ 
   +-------------------+     | libimm_server.so |     +--------------+
   |   unicon          |  +->|                  |<-+  |   chinput    |
   |                   |  |  |   shared input   |  |  |              |
   |   console         |  |  |   method server  |  |  |              |
   |   daemon          |  |  +------------------+  |  | X Window XIM |
   |                   |  |  |.so input methods |  |  |              |
   |      +------------|  |  |  /usr/local/lib  |  |  |--------------|
   |      |immclient.a |--+  +------------------+  +--| immclient.a  |
   +-------------------+                              +--------------+
     ^  ^ 
     |  |              +---------------+
     |  |              |  text mode    | 
     |  |              | applications  |
     |  |              +---------------+
     |  |                      ^
     |  |                      |                              user
   --------------------------------------------------------------------
     |  |                      |                              kernel
     |  +-----------------+    |          font modules  
     v                    |    |         +--------------+
   +---------------+      |    |         | encode-gb.o  |------+
   |  /dev/unikey  |      |    |         +--------------+      |
   |               |      |    |         +--------------+      |
   | driver module |      |    |         | encode-gbk.o |----+ |
   +---------------+      |    |         +--------------+    | |
        ^                 |    |         +--------------+    | |
        |                 |    |         | encode-big5.o|--+ | |
        |                 |    |         +--------------+  | | |
        v                 v    v                           | | |
   +-------------+    +-------------+    +--------------+  | | |
   | keyboard    |    |/dev/tty[0-6]|    | frame buffer |<-+ | |
   | driver      |<-->|             |<-->| driver       |<---+ |
   |             |    | console tty |    |              |<-----+
   | pc_keyb.c   |    |   driver    |    | fbcon.c      |
   +-------------+    +-------------+    +--------------+
        ^                                      |
        |                                      |
   --------------------------------------------------------------------
        |                                      |               hardware
        |                                      v
   +-----------+                         +------------+
   | keyboard  |                         |  display   |
   +-----------+                         +------------+


Q. Explain the Unicon source directory hierarchy.
A.
      README - readme file
      CREDITS - credits file
      INSTALL - manual installation directions
      configure - configure script
      ImmModules - directory of input method module source files
      client - directory with client library source code (immclient.a)
      datas - input method table data
      include - input method include files
      pth - GNU portable pthreads library
      server - input method shared library source (imm_server.o)
      unicon - daemon application
      unikey - kernel driver used by unicon for keyboard input

Q. What dependencies does Unicon have on other software?
A  Unicon and the input method libraries depend on the the GNU portable
   pthreads library. A version of the this library is included with the
   unicon software package. If it is already installed on the system it
   is not needed. The current version uses version pth 1.2.1.

   Unicon as well depends on frame buffer support being enabled in
   the kernel. More details about kernel configuration is described
   below.

Q. What is the road map for Unicon?
A. Future plans for Unicon include the following:
   Near Term:
     * Add support for additional double-byte language characters sets
       including Japanese and Korean.
     * Support IME Window Input method.

   Long Term:
     * Support more multi-byte character sets.
     * Support multi-lingual operation where multiple language character
       sets can be used input and displayed simultaneously.

    XXX - add more here.

Q. Why does Unicon need to patch the kernel?
A. Unicon patches the kernel for these reasons:

   Just as the X Window System runs entirely in user mode a double-byte
   language console solution could as well. Here are justifications for
   the Unicon kernel patch:

   Advantages:
   * The kernel patches to the console, keyboard, and frame buffer
     drivers add to the single-byte character support that is already
     present. With these changes a large user mode program that duplicates
     this kernel function is not required. With a kernel patch the user
     mode application is much simpler.
   * The kernel patch to display double-byte language characters modifies
     the frame buffer driver and adds loadable font modules. This patch
     allows double-byte language characters to be displayed without any
     user mode application action. This results in faster display
     performance. Also if the font module is compiled into the kernel
     statically, it is possible for kernel messages to be displayed
     in a double-byte language.
   * Warning: This is not a technical argument. Rightly or wrongly
     there is a perceived value to using a kernel solution for
     double-byte language support. To some end-users it makes the
     solution seem more official or technically advanced.

   Disadvantages:
   * Adds additional complexity to the kernel.
   * Because the kernel is modified and must be recompiled adding
     double-byte support to a system is more complicated.
     

     XXX - Add more advantages and disadvantages.

Q. What kernel options need to be enabled to build a kernel for Unicon?
A. First apply the unicon kernel patch.

     1. Run "make menuconfig" or "make xconfig".
        A. Code maturity level options ->Prompt for development...->Y
        B. Character Devices->Unikey->Y
        C. Console Drivers->Frame-Buffer Support->
           a. Support for frame buffer devices->Y
           b. UNICON Support->Y
           c. Double Byte GB encode->M
           d. Double Byte BIG5 encode->M
           e. Double Byte GBK encode->M
           f. VGA 16-color graphics console->Y

     2. Run "make dep; make bzImage". New kernel will be at
        arch/i386/boot/bzImage.

     3. After the kernel is booted, insmod the unikey driver,
        insmod a font module, and run the unicon program. Detailed
        directions for making and installing Unicon can be found
        in the file INSTALL.

Q. What are the details of the unicon kernel patch?
A. The following list assumes the unicon1.1-2-kernel-2.2.15.patch:

        1. configuration file changes
           linux/arch/i386/config.in
           linux/drivers/char/Config.in
           linux/drivers/char/Makefile
           linux/drivers/video/Config.in
           linux/drivers/video/Makefile

        2. console driver
           linux/drivers/char/console.c

        3. keyboard driver
           linux/include/linux/tty_flip.h
           linux/drivers/char/pc_keyb.c

        4. frame buffer driver
           *linux/include/linux/fb_doublebyte.h
           linux/drivers/video/fbcon.c
           *linux/drivers/video/test.c

        5. /dev/unikey driver - Built as a driver loaded at system
           initialization. Does not need to be loaded if Unicon is
           not used.

           *linux/drivers/char/unikey.h
           *linux/drivers/char/xl_hzfb.c
           *linux/drivers/char/xl_hzfb.h
           *linux/drivers/char/xl_key.h
           *linux/drivers/char/xl_keyhooks.c
           *linux/drivers/char/xl_keyhooks.h
           *linux/drivers/char/xl_keymasks.c
           *linux/drivers/char/xl_unikey.c

        6. Double-byte font table modules. Built as modules and
           loaded at system initialization time. Do not need to
           be loaded if Unicon is not used.

           *linux/drivers/video/encode-big5.c
           *linux/drivers/video/encode-gb.c
           *linux/drivers/video/encode-gbk.c
           *linux/drivers/video/font_big5_16.h
           *linux/drivers/video/font_gb16.h
           *linux/drivers/video/font_gbk16.h

        7. Console default Unicode map change. This change
           is not needed and should be replaced with a call
           to the user command loadunimap.
           linux/drivers/char/consolemap.c
           linux/drivers/char/direct.uni

    * - Indicates new file

    XXX - add more details.

Q. What are the minimal source changes to the kernel needed by Unicon
   so that it can be made an optional part of the system without having
   to re-compile the kernel?
A. The following list assumes the unicon1.1-2-kernel-2.2.15.patch:

   1. CONFIG_UNICON must be enabled before compiling the kernel.
   2. CONFIG_FB (frame buffer support) must be enabled before
      compiling the kernel.
   3. linux/drivers/char/console.c
      Add changes that double the size of the screen buffers,
      support double-byte scrolling, and differentiate double-byte
      characters from IBM graphics characters.
   4. linux/drivers/char/pc_keyb.c
      Add hooks for the /dev/unikey driver. There is a change
      that disables the caps lock key that should be removed.
   5. linux/drivers/video/fbcon.c
      Add changes to display double-byte characters from font modules.

   Th
   6. The unikey driver and font modules can be compiled separately
      from the kernel as modules. It is more convenient to include them
      in the kernel source can compile them as modules there but if this
      is not feasible they need to 

   Not needed:
   7. The consolemap.c change is not needed. Instead at system initialization
      the loadunimap command should be used to load a different UNICODE
      mapping table.

Q. What is the input method shared library module interface?
A.  The following structure defines the interface between the input method
    server (imm_server.so) and input method shared library objects in
    /usr/local/lib. (CCE_pinyin.so, CCE_hzinput.so, and TL_hzinput.so.)

    struct ImmOperation
    {
        char     *name;
        char     *version;
        char     *comments;
        u_long   type;               /* eg. IMM_BIG5 << 24 | IMM_CCE */

        /* File I/O Operation */
        IMM_CLIENT *(*open) (char *szFileName, long type);
        int (*save)  (IMM_CLIENT *p, char *szFileName);
        int (*close) (IMM_CLIENT *p);

        /* Indepent Modules support */
        int (*KeyFilter) (IMM_CLIENT *p, u_char key, char *buf, int *len);
        int (*ResetInput) (IMM_CLIENT *p);

        /* Input Area Configuration & Operation */
        int (*ConfigInputArea) (IMM_CLIENT *pImm, int SelectionLen);
        int (*GetInputDisplay) (IMM_CLIENT *pImm, char *buf, long buflen);
        int (*GetSelectDisplay) (IMM_CLIENT *pImm, char *buf, long buflen);
        
        /* Phrase Operation */
        PhraseItem * (*pGetItem) (IMM_CLIENT *p, u_long n);
        int    (*AddPhrase) (IMM_CLIENT *pClient, PhraseItem *p);
        int    (*ModifyPhraseItem) (IMM_CLIENT *p, long n, PhraseItem *pItem);
        int    (*Flush) (IMM_CLIENT *p);
    };

Q. What useful features does the input method server support?
A. The input methods support character, word, and phrase input.
   There is a system phrase table as well as a user phrase table
   the the user can add phrases to.
   XXX - add more here.

Q. Where are configuration files maintained?
A. By default unicon is installed to /usr/local/bin and its configuration
   files are installed to /usr/local/lib. The main configuration file
   for unicon is unicon.ini.

Q. How is X Window double-byte language support related to Unicon?
A. The TurboLinux Chinese distribution includes both Unicon and a
   package with double-byte X Window support called ZWinPro. ZWinPro
   includes the XIM (X Input Method), chinput, which shares input methods
   with Unicon using the Unicon input method shared library architecture.
   The TurboLinux Chinese ZWinPro package also includes other utilities
   related to double-byte language input under X Window.

Q. There are a lot of different terms used to describe software that
   runs in different languages. What are some of their definitions?
A. locale - A locale is a subset of a language. Some languages like
   English and Chinese are used in different countries or regions.
   In these regions there may different monetary and weight symbols
   or even different writing. A locale captures all the idiosyncrasies
   of a particular region in one specification.
   
   L10N - L10 is an abbreviation for localization. It refers to the
   process of changing a software program to work for a specific locale.
   After L10N the program will work correctly in only one locale.

   I18N - I18N is an abbreviation for Internationalization. It refers
   to the the process of changing a software program to work for more
   than one locale.

   Character Set - A character set is a group of characters usually
   defined by a standards committee for use in a particular locale.
   
   Input Method - An input method is a way of inputing characters
   into a computer. For a language with many more characters than will
   fit on a keyboard, a combination of keys must be used to specify
   a particular character. There are many different input methods for
   inputing Chinese into a computer.
   
   Double-Byte Language - A term used to specify a language that has
   more characters than can be encoded in a single byte. Chinese is
   an example of a language that requires more than one byte to encode
   characters.

   GB and GBK - GB and GBK are character set standards of the PRC or
   mainland China. GB in Chinese is short for National Standard. GBK
   is short for National Standard Extensions.

   Big5 - Big5 is a character set standard of Taiwan.

==============>8 END of FAQ_EN.TXT 8<====================
