Hiebeler's Chinese Stuff
This page has some Chinese language resources, which I hope may be particularly useful to people learning the language.
Vocabulary files, stories, etc.
- The vocabulary list (in GB format) from "Practical Chinese Reader volume 3" (ʵÓúºÓï¿Î±¾µÚ3²á, Shi2 Yong4 Han4 Yu3 Ke4 Ben3 volume 3). I may eventually get around to converting it into BIG5 format. Or if someone else does, let me know and I'll put a copy here.
- Here is a link to CEDICT, a free downloadable Chinese-English dictionary originally started by Paul Denisowski. See also the MDBG CEDICT home and MDBG Chinese-English dictionary which lets you look up words on-line (as opposed to being a downloadable dictionary).
- Two children's stories which I published in China, with accompanying vocabulary lists for people learning Chinese.
- A few pictures in China from a 1998 trip, and a few more pictures from another trip to China in 2002. I've been to China seven times so far.
- People have often asked me for suggestions about how to learn Chinese, so I threw together some rough thoughts and info about Learning Chinese which I hope to improve someday.
cedicttools
"cedicttools" is a set of several Perl scripts and an emacs interface I wrote for checking, merging, sorting, and doing vocabulary lookups from CEDICT-format files. I use these tools under Linux, but Perl and emacs exist under other operating systems as well, so they should work on other systems as well (and if you use cedicttools under another OS, I'd appreciate it if you let me know).
My latest version of these programs is version 1.2.1, released on July 22, 2005. The primary changes made in summer 2005 were to make the emacs code work with more modern versions of emacs (version 21) with MULE ("MUltiLingual Environment"), to make the cedictmerge program much faster (by using hash tables), and the addition of a new script, "cedictnew". I also renamed "cedictlookup.doc" to "cedictlookupdoc.txt" because it really is just a plain text file, not a Microsoft Word file. Note that as far as I know, the cedictlookup code does not currently work with UTF-8 encoded Chinese, sorry. I don't know when I will ever have time to work on that, so if you want to give me a patch to the code to do it, it'd be much appreciated (and acknowledged in the programs).
Update: Jos van Wolput has made an updated version of the tools. In his words: "I modified the script to lookup the new cedict-format vocabulary files containing both traditional and simplified Chinese and tone marks pinyin in UTF8 encoding. It now also converts tone number pinyin to show tone marks (if not present)." His version is available here on his web site.
Here are tar files of the current and previous releases:
- cedicttools-1.2.1.tar.gz, released on July 22, 2005. The emacs code should work under emacs with or without MULE.
- cedicttools-1.2.tar.gz, released on June 24, 2005. The emacs code will probably only work under emacs with MULE.
- cedicttools-1.1.01.tar.gz, also released on June 24, 2005. This is really the same as version 1.1, released in June 1999, but with my updated address. This is only for older versions of emacs not using mule. I'm not sure how old your emacs should be in order for you to need this; maybe version 20 or 19 or so. Basically, the older emacs saw Chinese characters as two characters in the emacs buffers, rather than one, so the code had to always work with pairs of characters.
And here are links to the individual files in the current release (1.2.1):
- COPYING, the Gnu General Public License which the code is released under.
- README, the basic file listing everything in the collection.
- CHANGELOG, listing changes between versions (only starting after version 1.2).
- cedictcheckformat, a Perl script which checks a CEDICT-format file for lines which are not in the standard CEDICT format. It works with both GB and BIG5 format files.
- cedictlookup, a Perl script for looking up vocabulary (by Chinese, pinyin, or English) from CEDICT-format files.
- cedictlookupdoc.txt, documentation for the cedictlookup script.
- cedictlookup.el, emacs-lisp code which calls the cedictlookup script above, so that it's pretty convenient to look up words when reading Chinese from within emacs.
- cedictmerge, a Perl script for merging multiple CEDICT-format files together, and flagging multiple entries with the same Chinese field.
- cedictnew, a Perl script for comparing multiple CEDICT-format files, finding new entries in the later files which are not in the earlier ones. (Very similar to cedictmerge.)
- cedictsort, a Perl script for sorting CEDICT-format files alphabetically by pinyin, as you find in a regular Chinese dictionary. It works with both GB and BIG5 format files.
- samplevocab.gb, a small sample vocabulary file in CEDICT format (contains vocabulary from my two children's stories).