Introduction to ICTCLAS
Word is the minimum meaningful unit of languages. It’s well known that there are no separators between words in Chinese text. Therefore, Chinese lexical analysis is a prerequisite to Chinese information processing. Based on years of research, we have developed a Chinese lexical analysis system ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) using an approach based on multi-layer HMM. ICTCLAS includes word segmentation, Part-Of-Speech tagging and unknown words recognition. Its segmentation precision is 97.58%(result from recent official evaluation in national 973 project). The recalling rates of unknown words recognized using roles tagging achieve more than 90%. Especially, the recalling of Chinese person names achieve nearly 98%. The speed for word segmentation and POS tagging is 31.5KB/s. All the source codes, papers or documents of ICTCLAS are available freely from URLs http://www.nlp.org.cn/project/project.php?proj_id=6 or http://www.ict.ac.cn/freeware/003_ictclas.asp. ICTCLAS and other 14 free systems from Institute of Computing Technology were broadly reported in China and abroad as well. Until Sep., ICTCLAS had been downloaded by over 2,000 researchers or commercial organizations from China, Japan, Singapore, Korea, USA and other countries or areas. We are honored to distribute ICTCLAS free of fees and help users solve problems from Chinese lexical analysis. In
addition, we provide ICTCLAS.dll for developers invoking in their own
systems. Any question, comments or advice about ICTCLAS are welcomed. Author: Kevin Zhang (张华平); Qun Liu(刘群) Inst. of Computing Tech., Chinese Academy of Sciences Email:
zhanghp@software.ict.ac.cn Tel: +86-10-88455001/5/7 to 714 |