Lexicon

  • A Dictionary of Chinese Common Words (by State Language Affairs Commission, and for word segmentation) [download]
  • A Dictionary from People's Daily Corpus (by ICL at Peking Univ., and for word segmentation) [download]
  • Tongyici Cilin(Extended) (by SCIR at Harbin Inst. of Tech., and for information retrieval, opinion mining) [download]
  • Chinese/English Vocabulary for Sentiment Analysis(VSA) (as part of HowNet, and for opinion mining) [download]
  • Chinese Emotion Word Ontology (by IRLab at Dalian Univ. of Tech., and for opinion mining) [download]
  • MPQA Subjectivity Lexicon (by Univ. of Pittsburgh, and for opinion mining) [download]
  • SentiWordNet (by Inst. of Info. Sci. and Tech. of National Research Council in Italy, and for opinion mining) [download]
  • Sentiment Dictionary (by NLPGroup at Tsinghua Univ., and for opinion mining) [download]
  • A Dictionary of Chinese Praise and Blame Words (by me via Xuesheng Biaobianyi Cidian, and for opinion mining) [download]
  • A List of Common "HAO BU" + AP (常见“好不AP”表) (by me, and for opinion mining) [download]
  • A List of Modifiers including negation, degree, and conjunction words (by me, and for opinion mining) [download]
  • A List of English and Chinese stopwords (by me, and for text classification) [download]

Corpus

  • People's Daily Corpus(199801) (by ICL at Peking Univ., and for word segmentation, part-of-speech tagging) [download]
  • SIGHAN2005-Chinese Word Segmentation Bakeoff (by Academia Sinica, City Univ. of Hong Kong, Peking Univ. and Microsoft Research, and for word segmentation) [download]
  • SIGHAN2008-Chinese Word Segmentation Bakeoff (by Shanxi Univ., and for word segmentation) [download]
  • Chinese Spam Corpus (by Wang Bin et al. at ICT of Chinese Academy of Sciences, and for spam filtering) [download]
  • Tan Corpus (by Tan Songbo at ICT of Chinese Academy of Sciences, and for text classification) [download]
  • TC Corpus (by Li Ronglu at Fudan Univ., and for text classification) [download]
  • Reuters-21578 Collection Apte' Split, Ohsumed Collection, 20Newsgroups Corpus (for text classification) [download]
  • Multi-Label TC Corpus (by our team, and for multi-label learning) [download]
  • ChnSentiCorp (by Tan Songbo at ICT of Chinese Academy of Sciences, and for opinion mining) [download]
  • MPQA Opinion Corpus (by Janyce Wiebe et al. at Univ. of Pittsburgh, and for opinion mining) [download]
  • Cornell Movie-Review Corpus (by Lillian Lee et al. at Cornell Univ., and for opinion mining) [download]
  • Multiple-Aspect Restaurant Reviews (by Regina Barzilay et al. at MIT, and for opinion mining) [download]
  • Multi-Domain Sentiment Dataset (by Mark Dredze et al. at Johns Hopkins Univ.), and for opinion mining) [download]
  • Customer Review Datasets (by Bing Liu et al. at Univ. of Illinois at Chicago, and for opinion mining) [download]
  • NLP&CC2012-Weibo Opinion Analysis Evaluation (by CCF Chinese Information Committee, and for opinion mining) [download]
  • Hotel Review Dataset (by our team, and for opinion mining) [download]

Tool

  • NLPIR Chinese Word Segmentation(or ICTCLAS2013) (by Zhang Huaping at Beijing Inst. of Tech.) [download(old version)] [link]
  • FudanNLP (by NLPGroup at Fudan Univ.)[link]
  • LTP:Language Technology Platform (by SCIR at Harbin Inst. of Tech.) [link]
  • Stanford Parser (by NLPGroup at Stanford Univ.) [link]
  • NLTK:Natural Language Toolkit (by Steven Bird et al.) [download(guided book)] [link]
  • LingPipe:A Toolkit for Processing Text Using Computational Llinguistics (by Alias-i, Inc.) [link]
  • MALLET:A Machine Learning for Language Toolkit (by Andrew McCallum et al. at University of Massachusetts Amherst) [link]
  • Mulan:A Java library for multi-label learning (by MLKD Group at Aristotle Univ. of Thessaloniki) [link]

Others

  • Best Paper Awards in Computer Science (by Jeff Huang at Univ. of Washington) [link]
  • An Annotated List of Resources about Statistical NLP (by Christopher Manning at Stanford Univ.) [link]
  • Opinion Mining, Sentiment Analysis, and Opinion Spam Detection (by Bing Liu at Univ. of Illinois at Chicago) [link]
  • Conditional Random Fields Webpage (by Hanna Wallach at Univ. of Massachusetts Amherst) [link]
  • Topic Modeling Resources (by David Blei at Princeton Univ.) [link]
  • Multi-Instance Multi-Label Learning Resources (by Min-Ling Zhang at Southeast Univ.) [link]