GBK Text Codec

The GBK codec provides conversion to and from the Chinese GB18030/GBK/GB2312 encoding.

GBK, formally the Chinese Internal Code Specification, is a commonly used extension of GB 2312-80. Microsoft Windows uses it under the name codepage 936.

GBK has been superseded by the new Chinese national standard GB 18030-2000, which added a 4-byte encoding while remaining compatible with GB2312 and GBK. The new GB 18030-2000 may be described as a special encoding of Unicode 3.x and ISO-10646-1.

Special thanks to charset gurus Markus Scherer (IBM), Dirk Meyer (Adobe Systems) and Ken Lunde (Adobe Systems) for publishing an excellent GB 18030-2000 summary and specification on the Internet. Some must-read documents are:

  • ftp://ftp.oreilly.com/pub/e