Lexicon
- A Dictionary of Chinese Common Words (by State Language Affairs Commission, and for word segmentation) [download]
- A Dictionary from People's Daily Corpus (by ICL at Peking Univ., and for word segmentation) [download]
- Tongyici Cilin(Extended) (by SCIR at Harbin Inst. of Tech., and for information retrieval, opinion mining) [download]
- Chinese/English Vocabulary for Sentiment Analysis(VSA) (as part of HowNet, and for opinion mining) [download]
- Chinese Emotion Word Ontology (by IRLab at Dalian Univ. of Tech., and for opinion mining) [download]
- MPQA Subjectivity Lexicon (by Univ. of Pittsburgh, and for opinion mining) [download]
- SentiWordNet (by Inst. of Info. Sci. and Tech. of National Research Council in Italy, and for opinion mining) [download]
- Sentiment Dictionary (by NLPGroup at Tsinghua Univ., and for opinion mining) [download]
- A Dictionary of Chinese Praise and Blame Words (by me via Xuesheng Biaobianyi Cidian, and for opinion mining) [download]
- A List of Common "HAO BU" + AP (常见“好不AP”表) (by me, and for opinion mining) [download]
- A List of Modifiers including negation, degree, and conjunction words (by me, and for opinion mining) [download]
- A List of English and Chinese stopwords (by me, and for text classification) [download]
Corpus
- People's Daily Corpus(199801) (by ICL at Peking Univ., and for word segmentation, part-of-speech tagging) [download]
- SIGHAN2005-Chinese Word Segmentation Bakeoff (by Academia Sinica, City Univ. of Hong Kong, Peking Univ. and Microsoft Research, and for word segmentation) [download]
- SIGHAN2008-Chinese Word Segmentation Bakeoff (by Shanxi Univ., and for word segmentation) [download]
- Chinese Spam Corpus (by Wang Bin et al. at ICT of Chinese Academy of Sciences, and for spam filtering) [download]
- Tan Corpus (by Tan Songbo at ICT of Chinese Academy of Sciences, and for text classification) [download]
- TC Corpus (by Li Ronglu at Fudan Univ., and for text classification) [download]
- Reuters-21578 Collection Apte' Split, Ohsumed Collection, 20Newsgroups Corpus (for text classification) [download]
- Multi-Label TC Corpus (by our team, and for multi-label learning) [download]
- ChnSentiCorp (by Tan Songbo at ICT of Chinese Academy of Sciences, and for opinion mining) [download]
- MPQA Opinion Corpus (by Janyce Wiebe et al. at Univ. of Pittsburgh, and for opinion mining) [download]
- Cornell Movie-Review Corpus (by Lillian Lee et al. at Cornell Univ., and for opinion mining) [download]
- Multiple-Aspect Restaurant Reviews (by Regina Barzilay et al. at MIT, and for opinion mining) [download]
- Multi-Domain Sentiment Dataset (by Mark Dredze et al. at Johns Hopkins Univ.), and for opinion mining) [download]
- Customer Review Datasets (by Bing Liu et al. at Univ. of Illinois at Chicago, and for opinion mining) [download]
- NLP&CC2012-Weibo Opinion Analysis Evaluation (by CCF Chinese Information Committee, and for opinion mining) [download]
- Hotel Review Dataset (by our team, and for opinion mining) [download]
Tool
- NLPIR Chinese Word Segmentation(or ICTCLAS2013) (by Zhang Huaping at Beijing Inst. of Tech.) [download(old version)] [link]
- FudanNLP (by NLPGroup at Fudan Univ.)[link]
- LTP:Language Technology Platform (by SCIR at Harbin Inst. of Tech.) [link]
- Stanford Parser (by NLPGroup at Stanford Univ.) [link]
- NLTK:Natural Language Toolkit (by Steven Bird et al.) [download(guided book)] [link]
- LingPipe:A Toolkit for Processing Text Using Computational Llinguistics (by Alias-i, Inc.) [link]
- MALLET:A Machine Learning for Language Toolkit (by Andrew McCallum et al. at University of Massachusetts Amherst) [link]
- Mulan:A Java library for multi-label learning (by MLKD Group at Aristotle Univ. of Thessaloniki) [link]
Others
- Best Paper Awards in Computer Science (by Jeff Huang at Univ. of Washington) [link]
- An Annotated List of Resources about Statistical NLP (by Christopher Manning at Stanford Univ.) [link]
- Opinion Mining, Sentiment Analysis, and Opinion Spam Detection (by Bing Liu at Univ. of Illinois at Chicago) [link]
- Conditional Random Fields Webpage (by Hanna Wallach at Univ. of Massachusetts Amherst) [link]
- Topic Modeling Resources (by David Blei at Princeton Univ.) [link]
- Multi-Instance Multi-Label Learning Resources (by Min-Ling Zhang at Southeast Univ.) [link]