CTOOL - A Program for Displaying Chinese Characters in Psychological Experiments
Version: 0.95
License: Free for noncommercial use
Copyright © 1995-2006 Chih-Hao Tsai (Email: )
Note. Most bitmap fonts mentioned in this article can be downloaded at ftp://ftp.ccu.edu.tw/pub2/chinese/fonts/big5/hbf/.
Related Technical Reports
Related Research Reports
Read This First
Ctool was written by me in 1993, and last updated in November 1995. It was originally available at ftp.ifcss.org. However, since ifcss.org is closed, the program is no longer accessible to the public. Recently I was asked by a few people about where to find the program. (Interestingly, they told me that what they wanted to do was to develop a Chinese system for the Palm OS.) I thought a while, and decided to make my old code available at my own website.
This article and program are archaic as you might have noticed from the date they were written. I did not update most of the information except for addresses pointing to important files. The information in the article may be obsolete, but the code may still be instructive to those who want to write their own programs to access Chinese bitmap fonts.
Please be aware that the contents of this article, as well as Ctool itself, may be obsolete and will not be updated.
Abstract
Chinese emulation systems provide a convenient way of inputing and displaying Chinese characters as well as adding Chinese capacity on text-mode based English softwares. However, their screen refreshing frequencies, ranging from 4.55 Hz to 18.2 Hz, are too low to be used for implementing psychological experiments. CTOOL has been developed to avoid this problem by accessing and displaying the font files of the Chinese emulation systems independently, without executing the emulation system. CTOOL allows researchers to control the prensetation of Chinese characters more precisely.
Introduction
Experimental Studies on the Processing of Chinese Writing System
Ever since Hung and Tzeng's (1981) influential paper which brought the processing issue of reading Chinese orthography to many researchers' attention, there have been more and more cognitive studies on the Chinese writing system.
Most studies are experimental in nature. These Chinese experiments are similar to their English counterparts in the tasks they use. Some commonly adapted tasks are the lexical decision task (LDT; e.g., Liu, Zhu, & Wu, 1992; Taft, Huang, & Zhu, 1993; Tsai, 1994), naming task (e.g., Wu, Chou, & Liu, 1993), rapid serial visual presentation (RSVP; e.g., Chen, 1985; Hung, Tzeng, & Tzeng, 1994), self-paced reading (Chen, 1994), and visually presented recall and recognition tasks (e.g., Liu, Zhu, & Wu, 1994).
In this article, problems of implementing psychological experiments in standard Chinese emulation environment will be discussed. To avoid the problems in standard Chinese emulation environment, CTOOL was developed and will be introduced in this article. CTOOL is a collection of C subroutines for displaying traditional Chinese characters in psychological experiments without entering Chinese emulation system, using IBM-compatible personal computers.
Computers
In most experiments, computers serve as a tool for two functions: presenting stimuli, and timing (either controlling display duration or measuring reaction times or both). To do millisecond resolution timing, one must reprogram the 8253/8254 Programmable Interrupt Timer of an IBM compatible PC (see Graves & Bradley, 1991, for a review of millisecond timing algorithms). There is no specific hardware requirement. It can be done even on an oldest IBM PC. In spite of that, IBM compatible PCs were not widely used in Chinese experiments until recently (Wu, 1988).
The major reason is that the hardware had not been powerful enough to display Chinese characters. First of all, it requires at least a 16 x 16 dot matrix to represent a Chinese character. In order to display 40 x 25 characters per page, at least 640 x 400 graphic mode resolution is needed for such a purpose. Secondly, in order to display Chinese characters efficiently, parts (if not all) of the fonts have to be loaded into memory, because disk access would inevitably slow down the displaying speed. Since the size of a typical font file is about 400 KB for16 x 16 fonts and 1 MB for 24 x 24 fonts (ETen Information System, 1992a), a PC with only 640 KB main memory is barely capable of using Chinese characters.
The Standard Chinese Environment
Chinese Characters
Currently there are two types of Chinese characters in the world: traditional and simplified. The traditional Chinese characters are primarily used in Taiwan and Hong Kong, while the simplified Chinese characters are primarily used in China. These two types of characters not only differs in the complexity, but also in their coding schemes that are used to represent them in computers. Our discussion will be exclusively on traditional Chinese characters.
BIG-5 Coding Scheme
There are many Chinese characters. The estimated number of characters that is used by most people in Taiwan is about 5,666 (Chinese Knowledge Information Processing Group, 1993). Besides these, there are many rarely used characters that are not used in representing modern spoken Chinese but do appear in some classical literature.
In Taiwan and Hong Kong, the standard method for coding Chinese characters is called the BIG-5 coding scheme. Basically, In it, a character is represented as combination of two bytes. BIG-5 allows a maximum of 19,782 characters to be defined. However, in currently available Chinese emulation systems, only 13,053 characters, and about 1,000 special symbols (graphic symbols, punctuation marks, etc.) are used (ETen Information System, 1992b). To be distinguishable from regular ASCII characters (e.g., English letters), the first byte of a BIG-5 Chinese code always has an ASCII value greater than $80. For a detailed description of the BIG-5 coding scheme, see ETen Information System (1992b).
Chinese Fonts and Chinese Emulation Systems
As mentioned earlier, two basic specifications of Chinese fonts are 16 x 16 and 24 x 24 pixels, respectively. When displayed, each character usually occupies two English letter spaces.
A Chinese emulation system (e.g., ETen Information System, 1992a; Kuo-Chiao Information, 1992) is basically a terminate and stay resident (TSR) program. Once loaded, it switches the display mode to graphic mode, and stays active in the background. The default screen refreshing frequency is 4.55 Hz in ETen Chinese System, and can be increased to 18.2 Hz (ETen Information System, 1992b). During each screen refreshing cycle, the Chinese system scans text video ram, identifies BIG-5 and regular ASCII codes, then maps their bit-map patterns onto graphic page. Consequently, such a Chinese emulation environment is able to execute text mode English softwares without difficulty.
Psychological Experiments and Problems with the Chinese Emulation Systems
With such an emulation system, the most intuitive way of implementing a Chinese experiment, for example, using LDT, is to use the program of its English counterpart, while replacing all English words with Chinese character strings. However, due to the 4.55 Hz (or 18.2 Hz) refreshing cycle of Chinese system, there is usually a time lag between requesting the display of a character string and the time at which the characters actually appear on the graphic screen. If one starts the timer as soon as the display command is executed, the reaction time (or reading time) obtained will include a random time lag component, which reduces the sensitivity of statistical tests. Such random time lags could even endanger the validity of experiments in which precise amount of presentation time is required. It has to be noted that such a refreshing rate problem is inevitable in Chinese emulation systems that are aimed at being compatible with the ASCII text mode.
A Solution to the Refreshing Cycle Problem
Our solution to this problem is to write programs to access and display the fonts of the Chinese emulation systems, without executing the emulation system. Since the structures of font files of different Chinese emulation systems are not exactly the same, separate programs have been writte for font files from different Chinese emulation systems. All the source code was developed under Turbo C 3.0 and should be executed only in standard VGA modes.
ETen Chinese System
ETen Chinese System (ETen Information System, 1992a) is by far the most popular commercial Chinese emulation system in Taiwan.The font files to be used are listed in Table 1.
Table 1 ETen Chinese Fonts Used by CTOOL-ET ______________________________________________________________________ File Name Font Size (dots) File Size (bytes) Characters ______________________________________________________________________ ascfont.24 12 x 24 12288 ASCII ascfont.15 8 x 15 3840 ASCII stdfont.24 24 x 24 942876 Chinese stdfont.15 16 x 15 392820 Chinese spcfont.24 24 x 24 29376 Special spcfont.15 16 x 15 12240 Special spcfsupp.24 24 x 24 26280 Supplemental special spcfsupp.15 16 x 15 10950 Supplemental special ______________________________________________________________________
The program used to access and display these files is "ctool-et.c", the header file is "ctool-et.h". To use this program, it must be compiled with the program that will control the experiment. Or ctool-et may be compiled first to make an object file (i.e., ctool-et.obj), then link the object file during the process of compiling the experiment-controlling program.
Open and close font files. Before accessing/displaying characters, the font files must be opened. And before quitting programs, the font files must be closed. For example, assuming that font files are in C:\ET3, the following functions open/close 12 x 24 and 24 x 24 font files.
open_et24_font("C:\\ET3"); close_et24_font();
The following functions open/close 8 x 15 and 16 x 15 font files.
open_et16_font("C:\\ET3"); close_et16_font();
Display ASCII/Chinese character strings. There are two ways of displaying a character string. One is to use the following functions to display 16 x 15 and 24 x 24 ASCII/Chinese characters. The character string will be displayed from left to right. The display functions should be executed in VGA graphic modes.
display_et24_string(unsigned char string[], int x, int y, int spacing, unsigned char fg_color); display_et16_string(unsigned char string[], int x, int y, int spacing, unsigned char fg_color);
In the above functions, "string" is the character string to be displayed (the string can be a mixture with Chinese and ASCII characters, the above functions will automatically identify them), "x" and "y" refers to the x- and y-coordinate (in pixels) of the top-left corner of the first character, "spacing" refers to the amount of space (in pixels) that is inserted after displaying an ASCII character, and "fg_color" refers to the foreground color that is used to plot the character string.
Note that since a Chinese character is twice as wide as an ASCII character, the space inserted after a Chinese character is automatically doubled. The above functions plot the characters transparently. That is, the background will not be affected.
The "display_et16_string" and "display_et24_string" functions read the fonts and display them on-line. That is, when they get called, they identify the ASCII/Chinese characters in the string, search the font files to get correspondent fonts, then display them. The disk operations slow down the display speed to some degree. Consequently, they are suitable only for displaying instructions.
To precisely control the display process, one must separate the font reading process and display process. Assume "UV" is one Chinese character and "WX" is another Chinese character, and that one wants to display 24 x 24 fonts. A 144 bytes buffer is needed to store the bit-map pattern of character "UV" and "WX" (each character requires 72 bytes for 24 x 24 font, and 30 bytes for 16 x 15 font).
unsigned char string[4] = "UVWX"; unsigned char buffer[144];
The following program is an example of how to get bit-map patterns of Chinese characters. The function "convert_big5_to_serial" converts the BIG-5 code to a serial number, which is returned to the variable "sernum". The "search_et24_big5_font" function then uses the serial number of the character to read the corresponding font file, and store the bit map pattern to the buffer.
sernum = convert_big5_to_serial(string[0], string[1]); search_et24_big5_font(sernum, &buffer[0]); sernum = convert_big5_to_serial(string[2], string[3]); search_et24_big5_font(sernum, &buffer[72]);
The stored bit-map patterns can be displayed by "display_et24_big5_font", one character at a time.
display_et24_big5_font(&buffer[0], 290, 228, 15); display_et24_big5_font(&buffer[72], 326, 228, 15);
The first argument represents the memory address of the first byte of a given character's bit-map pattern. The seconod and third arguments represent the x- and y- coordinates, respectively. The fourth argument is foreground color.
6.2. KC Chinese System
The KC Chinese System (Kuo-Chiao Information, 1992) is free for noncommercial use, and is available at Internet FTP sites, such as ftp.ifcss.org, cnd.org, or nctuccca.edu.tw. In fact, Kuo-Chiao is still producing commercial software, and the Chinese emulation system is one of its major commercial products. The free KC Chinese System (Version 6.22) is its contribution to Taiwan's Internet users, which is now also available to every Internet user. Please refer to the reference list for ftp sites, directory, and file names. The Internet version has the same function as its commercial counterpart, except that the Internet version does not provide a user's manual. The font files to be used are listed in Table 2.
Table 2 KC Chinese Fonts Used by CTOOL-KC ______________________________________________________________________ File Name Font Size (dots) File Size (bytes) Characters ______________________________________________________________________ kctext24.f00 12 x 24 12976 ASCII kctext16.f00 8 x 16 4608 ASCII kcchin24.f00 24 x 24 1005376 Chinese 24 x 24 Special 24 x 24 Supplemental special kcchin16.f00 16 x 14 394712 Chinese 16 x 16 Special 16 x 16 Supplemental special ______________________________________________________________________
The program used to access and display these files is "ctool-kc.c", the header file is "ctool-kc.h". To use this program, it must be compiled with the program that will control the experiment. Or ctool-kc may be compiled first to make an object file (i.e., ctool-kc.obj), then link the object file during the process of compiling the experiment-controlling program.
CTOOL-KC has exactly the same set of functions as CTOOL-ET, except for function names. They differ only in the identifiers embedded in function names (i.e., et24 vs. kc24; et16 vs. kc16).
There is also another subtle difference. An ETen 16 x 15 character consists of 30 bytes. However, there are both 16 x 14 and 16 x 16 fonts in kcchin16.f00. Therefore, one should reserve 32 bytes for both 16 x 14 and 16 x 16 fonts.
6.3. Chinese Character Database (CCDB)
CCDB (Chinese Character Analysis Group for Information Application, the Council for Cultural Planning and Development, Executive Yuan, Taiwan, 1993) is a nearly exhausitive collection of all Chinese characters. CCDB is free for noncommercial use and is available on the Internet. Please refer to the reference list for the ftp site, directiory, and file names. CCDB is introduced because the quality of its fonts is higher: each character is represented as a 64 x 64 bit map pattern.
Re-organize CCDB font files. Most CCDB font files are huge, both because they contain many characters that most people (including the experimenters and participants) will never use, and because each character needs more bytes to represent. A program has been worked out to pick out only those characters defined in the ETen Chinese system, and to make two reduced font files. The reduced font files are listed in Table 3. (Available FTP: Hostname: ftp://dongpo.math.ncu.edu.tw Directory: /pub/shann/chinese/CCDB/ Files: stdfont.64 and spcfont.64.)
Table 3 Recoded CCDB Font Files Used by CTOOL-CC ______________________________________________________________________ File Name Font Size (dots) File Size (bytes) Characters ______________________________________________________________________ stdfont.64 64 x 64 6704128 Chinese spcfont.64 64 x 64 208896 Special ______________________________________________________________________
The program used to access and display these files is "ctool-cc.c", the header file is "ctool-cc.h". To use this program, it must be compiled with the program that will control the experiment. Or ctool-cc may be compiled first to make an object file (i.e., ctool-cc.obj), then link the object file during the process of compiling the experiment-controlling program.
Display Chinese character strings. The functions of CTOOL-CC are very similar to those of CTOOL-ET and CTOOL-KC.
The following two functions open/close font files (stdfont.64 and spcfont.64). The "convert_big5_to_serial" function is the same as that in CTOOL-ET and CTOOL-KC.
open_etccdb_font(char dir[]); close_etccdb_font();
To get the bit-map pattern of a specific character:
search_etccdb_big5_font(unsigned sernum, unsigned char pattern[512], int fontsize);
This differs from its counterparts in CTOOL-ET and CTOOL-KC in the last argument: fontsize. The original CCDB font size is 64 x 64. However, with the "varsize" function (and other related functions) provided by CCDB, one can reduce the font size to 56 x 56, 48 x 48, 40 x 40, and 32 x 32. For example, if one wants to get a 48 x 48 font, the fontsize argument should be entered 48.
To display a character string:
display_etccdb_big5_font_string(unsigned char str[], int fontsize, int x, int y, int sp, unsigned char fg_color);
Again, one must specify font size. Note that this function can only display pure Chinese character strings, since CCDB does not have ASCII fonts.
To display a character whose bit-map pattern is already available:
display_etccdb_big5_font(unsigned char pattern[512], int fontsize, int x, int y, unsigned char fg_color);
This differs from its counterparts in CTOOL-ET and CTOOL-KC in the "fontsize" argument. Note that the "display_etccdb_big5_font" function does not reduce fontsize. The "fontsize" argument here is to instruct the function what the font size of the input bit-map pattern is.
Discussion
CTOOL is useful for displaying Chinese characters in psychological experiments, especially those reading experiments. Another advantage of CTOOL is that it can handle three different formats of font files (two of which are freely available on the Internet), which increases the flexibilty of CTOOL.
Although CTOOL is for Turbo C compiler and its Borland Graphic Interface (BGI), there is actually only one function that is graphic library dependent: fill_bitmap_pattern. As a result, CTOOL can be ported to other compilers and graphic libraries without too many difficulties.
The use of CTOOL avoids the problem of slower refreshing rate of standard Chinese emulation systems. However, CTOOL cannot get rid of the refreshing rate problem completely, because the VGA mode itself also scans the screen at a fixed frequency, usually around 70 Hz. The 70 Hz refreshing frequency is tolerable for most purposes, such as LDT, naming, or self-paced reading. For experiments that require stimuli to be displayed for only a very small amound of time, it is suggested that the experiment controlling program synchronizes the displaying and screen refreshing.
Software and Hardware Requirements
CTOOL must be compiled with Turbo C/C++ (Version 1.0 or greater). Programs compiled with CTOOL must be executed within the MS-DOS environment (Version 3.3 or greater) on an IBM AT or compatible. No specific requirement for base memory. CTOOL should be used only in standard VGA modes.
References
Chen, H. C. (1985). Reading Chinese text in sequential display format: Effects of display size. Perception and Motor Skills, 61, 595-598.
Chen, H. C. (1994). Comprehension processes in reading Chinese. Paper presented at the 35th Annual Meeting of the Psychonomic Society, St. Louis, MO.
Chinese Character Analysis Group for Information Application, the Council for Cultural Planning and Development, Executive Yuan, Taiwan. (1993). Chinese Character Database (CCDB) [Electronic database]. Available FTP: Hostname: nctuccca.edu.tw Directory: Chinese/CCDB Files: README, cdbdat.zip, execute.zip, newvfimg.zip, nfimg.zip, rareimg.zip, source.zip, and vfimg.zip.
Chinese Knowledge Information Processing Group. (1993). Corpus-based frequency count of characters in journal Chinese (CKIP Technical Report no. 93-01). Taipei, Taiwan: Academia Sinica.
ETen Information System. (1992a). ETen Chinese System (Version 3.51) [Computer software]. Taipei, Taiwan: Author.
ETen Information System. (1992b). Yitian zhongwen xitong jishu shouce [Reference Manual for the ETen Chinese System] (2nd ed.). Taipei, Taiwan: Author.
Graves, R. E. & Bradley, R. (1991). Millisecond timing on the IBM PC/XT/AT and Ps/2: A review of the options and corrections for the Graves and Bradley algorithm. Behavior Research Methods, Instruments, & Computer, 23, 377-379.
Hung, D. L, & Tzeng, O. (1981). Orthographic variations and visual information processing. Psychological Bulletin, 90, 377-414.
Hung, D. L., Tzeng, O. J. L., & Tzeng, A. K. Y. (1994). The implicit measure of the repetition blindness effect. Paper presented at the 35th Annual Meeting of the Psychonomic Society, St. Louis, MO.
Kuo-Chiao Information. (1992). Kuo-Chiao Chinese System (Version 6.22) [Computer software]. Available FTP: Hostname: cnd.org Directory: pub/software/dos/c-sys Files: kc622-1.zip, kc622-2.zip, kc622-3.zip, and kc622-4.zip.
Liu, I. M., Zhu, Y., & Wu. J. T. (1992). The long-term modality effect: In search of differences in processing logographs and alphabetic words. Cognition, 43, 31-66.
Taft, M., Huang, J., & Zhu, X. (1993). The influence of character frequency on word recognition responses in Chinese. Paper presented at the Sixth International Symposium on Cognitive Aspects of the Chinese Language. Taipei, Taiwan.
Tsai, C. H. (1994). Effects of semantic transparency on the recognition of Chinese two-character words: Evidence for a dual-process model. Unpublished master's thesis, National Chung-Cheng University, Chia-Yi, Taiwan.
Wu, J. T. (1988). Zai zhongwen diannao nei sheji xinlixue shiyian huo celiang chengshi de yixie wenti yu jiefa [Some problems and solutions of designing psychological experiments or reaction time measuring programs in Chinese computers]. Chinese Journal of Psychology, 30, 105-116.
Wu, J. T., Chou, T. L., & Liu, I. M. (1993). The locus of the character/word frequency effect. Paper presented at the Sixth International Symposium on Cognitive Aspects of the Chinese Language. Taipei, Taiwan.