Skip to content

Latest commit

 

History

History
308 lines (167 loc) · 13.5 KB

nlp.md

File metadata and controls

308 lines (167 loc) · 13.5 KB

NLP常用信息资源

resource portal

http://nlp.hivefire.com/ NLP News

https://nlppeople.com/ NLP Jobs

http://www.cs.rochester.edu/~tetreaul/conferences.html Computational Linguistics / NLP Conferences

http://www.ldc.upenn.edu/ LDC: The Linguistic Data Consortium

http://www.clt.gu.se/wiki/nlp-resources NLP Resources

http://www.aaai.org/AITopics/html/natlang.html AAAI Topics on NLP

http://www-nlp.stanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources

http://wordnet.princeton.edu/ WordNet

http://www.keenage.com/ 知网

http://www.corpus4u.org/ 语料库语言学在线

http://trec.nist.gov/ TREC

  • The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992 as part of the TIPSTER Text program.

tutorial

http://nlp.cs.berkeley.edu/tutorials/variational-tutorial-slides.pdf Variational Inference in Structured NLP Models, Presented at NAACL 2012 with David Burkett.

http://pages.cs.wisc.edu/~jerryzhu/pub/ZhuCCFADL46.pdf Tutorial on Statistical Machine Learning for NLP 2013

courses

http://www.stanford.edu/class/cs224n/ CS 224N / Ling 284 — Natural Language Processing

http://www.cs.berkeley.edu/~klein/cs288/sp10/ CS 288: Statistical Natural Language Processing, Spring 2010

http://demo.clab.cs.cmu.edu/fa2013-11711/index.php/Main_Page Algorithms for NLP: Basic Information (Fall 2013)

http://www.cs.colorado.edu/~martin/csci5832/lectures_and_readings.html Natural Language Processing, CSCI 5832 FALL 2013

http://www.cs.columbia.edu/~cs4705/ COMS W4705: Natural Language Processing 2013

http://www1.cs.columbia.edu/~julia/courses/CS4705/syllabus10.htm COMS 4705: Natural Language Processing, Fall 2010

http://www1.cs.columbia.edu/~julia/courses/CS4706/syllabus12.htm CS4706: Spoken Language Processing, Spring 2012

http://www.cs.cornell.edu/courses/cs4740/2014sp/ CS 4740/5740 - Introduction to Natural Language Processing, Spring 2014

http://l2r.cs.uiuc.edu/~danr/Teaching/CS546-13/ Machine Learning and Natural Language Spring 2013

http://www.cs.jhu.edu/~jason/465/ Natural Language Processing Course # 600.465 — Fall 2013

http://web.stanford.edu/class/cs224s/ CS 224S/LINGUIST 285 Spoken Language Processing

http://www.umiacs.umd.edu/~resnik/ling773_sp2014/ Ling773/CMSC773/INST728C, Spring 2014 Computational Linguistics II

http://cs.nyu.edu/courses/spring13/CSCI-GA.2590-001/index.html

http://www.cis.upenn.edu/~cis530/ CIS 530 Fall 2013 Computational Linguistics

http://pages.cs.wisc.edu/~jerryzhu/cs769.html CS 769: Advanced Natural Language Processing Spring 2010

http://pages.cs.wisc.edu/~bsnyder/cs769.html

group

http://nlp.stanford.edu/ Stanford NLP group

http://nlp.cs.berkeley.edu/ Berkeley NLP group

http://www.lti.cs.cmu.edu/ CMU Language Technologies Institute

http://nlp.ict.ac.cn/index_zh.php 中科院计算所自然语言处理研究组

http://www.sogou.com/labs/ Sogou实验室

http://linguistics.georgetown.edu/ Department of Linguistics, Georgetown University

http://ir.hit.edu.cn/ 哈工大社会计算与信息检索研究中心

http://www.childrenshospital.org/research-and-innovation/research-labs/natural-language-processing-lab

https://wiki.umiacs.umd.edu/clip/index.php/Main_Page

http://nlp.cs.nyu.edu/

http://nlp.cis.upenn.edu/

http://www.eng.utah.edu/~cs5340/

Textbook

http://www.cs.colorado.edu/~martin/slp2.html SPEECH and LANGUAGE PROCESSING 2nd edition 2009

  • 浔雨: "自然语言处理综论" 这本书的权威自不用说,译者是冯志伟老师和孙乐老师,当年读这本书的时候,还不知道冯老师是谁,但是读起来感觉非常好,想想如果没有在这个领域积攒多年的实力,是不可能翻译的这么顺畅的。这本书在国内外的评价都比较好,对自然语言处理的两个学派(语言学派和统计学派)所关注的内容都有所包含,但因此也失去一些侧重点。从我的角度来说更偏向于统计部分,所以需要了解统计

http://cognet.mit.edu/library/books/view?isbn=0262133601 Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.

people

http://nlp.stanford.edu/~manning/

http://www.umiacs.umd.edu/~hal/

http://mimno.infosci.cornell.edu/ David Mimno

  • maintainer of MALLET

http://www.cs.berkeley.edu/~klein/ Dan Klein

http://cs.brown.edu/people/ec/home.html Eugene Charniak

http://www.cs.colorado.edu/~martin/

http://www.cs.columbia.edu/~mcollins/

http://www1.cs.columbia.edu/~julia/

http://www.cs.cornell.edu/home/cardie/

http://www.eecs.harvard.edu/shieber/

  • computational Linguistics

http://l2r.cs.uiuc.edu/~danr/

http://www.cs.jhu.edu/~jason/

http://www.stanford.edu/~jurafsky/

http://www.umiacs.umd.edu/~resnik/

http://cs.nyu.edu/grishman/

http://homes.cs.washington.edu/~taskar/

http://www.cis.upenn.edu/~nenkova/

http://www.cs.utah.edu/~riloff/

http://pages.cs.wisc.edu/~jerryzhu/

http://pages.cs.wisc.edu/~bsnyder/

http://www.cs.cmu.edu/~nasmith/

http://www.cs.cmu.edu/~alavie/

Tools

NLP Toolbox

http://gate.ac.uk GATE

  • 孔牧: 你可以按照它的要求向其中添加组件, 完成自己的nlp任务. 我在的项目组曾经尝试过使用, 虽然它指出组件开发, 但是灵活性还是不高, 所以我们自己又开发了一套流水线。

http://nltk.org Natural Language Toolkit(NLTK)

http://mallet.cs.umass.edu MALLET MAchine Learning for LanguagE Toolkit

http://opennlp.apache.org/ OpenNLP

http://alias-i.com/lingpipe/ LingPipe is tool kit for processing text using computational linguistics.

https://textblob.readthedocs.org/en/dev/ TextBlob: Simplified Text Processing (python)

https://github.com/HIT-SCIR/ltp 语言技术平台(Language Technology Platform,LTP)是哈工大社会计算与信息检索研究中心历时十年开发的一整套中文语言处理系统。

  • http://www.ltp-cloud.com/ “语言技术平台云”(LTP-Cloud)
  • 孔牧: 这个是一个较完善的流水线了, 不说质量怎么样, 它提供分词、语义标注、 句法依赖、 实体识别。 虽然会出现错误的结果, 但是, 找不到更好的了。

https://github.com/xpqiu/fnlp/ 中文自然语言处理工具包

  • 邱锡鹏: 推荐自家的FudanNLP

English Stemmer

http://snowball.tartarus.org/ Snowball

English POS Tagger

http://nlp.stanford.edu/software/tagger.shtml Stanford POS Tagger

http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ TreeTagger

http://www.coli.uni-saarland.de/~thorsten/tnt/ TnT

Parser

http://nlp.stanford.edu/software/lex-parser.shtml Stanford Parser

http://nlp.cs.berkeley.edu/software.shtml Berkeley Parser

https://github.com/BLLIP/bllip-parser Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006

English Keyphrase Extractor

http://www.nzdl.org/Kea/index_old.html KEA keyphrase extraction

English Name Entity Recognizer

http://nlp.stanford.edu/software/CRF-NER.shtml Stanford NER

Chinese Word Segmentation

http://nlp.stanford.edu/software/segmenter.shtml Stanford Word Segmenter

https://github.com/fxsjy/jieba 中文分词

http://ictclas.org/ 中科院分词ICTCLAS

  • 孔牧: 一个比较权威的分词器, 相信你最后会选择它作为项目的分词工具, 虽然本身存在很多问题, 但是我找不到更好的开源项目了。

http://msdn.microsoft.com/zh-cn/library/jj163981.aspx

  • 孔牧: 当然这个是不开源的, 但是分词非常准, 但是悲剧的是它将分词和实体识别同时完成了, 而且分词(在它提供的工具中)不提供词性标注。

https://github.com/ansjsun/ansj_seg ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典

speech recognition

http://cmusphinx.sourceforge.net/ CMU Sphinx

Topic Modeling Tools

http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm Matlab Topic Modeling Toolbox 1.4

http://gibbslda.sourceforge.net/ GibbsLDA++

http://code.google.com/p/glda/ GLDA GPU-accelerated Latent Dirichlet allocation training

Search Engines

http://lucene.apache.org/ Lucene

classic papers

Chinese Word Segmentaion

http://zhangkaixu.github.io/bibpage/cws.html 张开旭同学整理的文献列表

Information Extraction

(2008) Sunita Sarawagi. Information extraction. Foundations and Trends in Databases.

Language Model

(2000) Rosenfeld, R. Two decades of statistical language modeling: where do we go from here?. Proc. IEEE. (2009) Chengxiang Zhai. Statistical Language Models For information Retrieval. Lecture Notes. http://www.cs.cmu.edu/~roni/papers/survey-slm-IEEE-PROC-0004.pdf Two decades of Statistical Language Models

Parsing

(2009) Sandra Kubler, Ryan McDonald, Joakim Nivre. Dependency Parsing. Synthesis Lectures on Human Language Technologies.

Sentiment Analysis and Opinion Mining

(2008) Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval .

Word Sense Disambiguation

(2009) Navigli, R. Word sense disambiguation: A survey. ACM Computing Surveys.

Topic Models

http://mimno.infosci.cornell.edu/topics.html Topic modeling bibliography

Parsing(句法结构分析~语言学知识多,会比较枯燥)

Klein & Manning: "Accurate Unlexicalized Parsing" ( )
Klein & Manning: "Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency" (革命性的用非监督学习的方法做了parser)
Nivre "Deterministic Dependency Parsing of English Text" (shows that deterministic parsing actually works quite well)
McDonald et al. "Non-Projective Dependency Parsing using Spanning-Tree Algorithms" (the other main method of dependency parsing, MST parsing)

Machine Translation(机器翻译,如果不做机器翻译就可以跳过了,不过翻译模型在其他领域也有应用)

Knight "A statistical MT tutorial workbook" (easy to understand, use instead of the original Brown paper)
Och "The Alignment-Template Approach to Statistical Machine Translation" (foundations of phrase based systems)
Wu "Inversion Transduction Grammars and the Bilingual Parsing of Parallel Corpora" (arguably the first realistic method for biparsing, which is used in many systems)
Chiang "Hierarchical Phrase-Based Translation" (significantly improves accuracy by allowing for gappy phrases)

Language Modeling (语言模型)

Goodman "A bit of progress in language modeling" (describes just about everything related to n-gram language models 这是一个survey,这个survey写了几乎所有和n-gram有关的东西,包括平滑 聚类)
Teh "A Bayesian interpretation of Interpolated Kneser-Ney" (shows how to get state-of-the art accuracy in a Bayesian framework, opening the path for other applications)

Machine Learning for NLP

Sutton & McCallum "An introduction to conditional random fields for relational learning" (CRF实在是在NLP中太好用了!!!!!而且我们大家都知道有很多现成的tool实现这个,而这个就是一个很简单的论文讲述CRF的,不过其实还是蛮数学= =。。。)
Knight "Bayesian Inference with Tears" (explains the general idea of bayesian techniques quite well)
Berg-Kirkpatrick et al. "Painless Unsupervised Learning with Features" (this is from this year and thus a bit of a gamble, but this has the potential to bring the power of discriminative methods to unsupervised learning)

Information Extraction

Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. COLING 1992. (The very first paper for all the bootstrapping methods for NLP. It is a hypothetical work in a sense that it doesn't give experimental results, but it influenced it's followers a lot.)
Collins and Singer. Unsupervised Models for Named Entity Classification. EMNLP 1999. (It applies several variants of co-training like IE methods to NER task and gives the motivation why they did so. Students can learn the logic from this work for writing a good research paper in NLP.)

Computational Semantics

Gildea and Jurafsky. Automatic Labeling of Semantic Roles. Computational Linguistics 2002. (It opened up the trends in NLP for semantic role labeling, followed by several CoNLL shared tasks dedicated for SRL. It shows how linguistics and engineering can collaborate with each other. It has a shorter version in ACL 2000.)
Pantel and Lin. Discovering Word Senses from Text. KDD 2002. (Supervised WSD has been explored a lot in the early 00's thanks to the senseval workshop, but a few system actually benefits from WSD because manually crafted sense mappings are hard to obtain. These days we see a lot of evidence that unsupervised clustering improves NLP tasks such as NER, parsing, SRL, etc,

Reference

  1. http://www.newsmth.net/nForum/#!article/NLP/43 zibuyu (得之我幸失之我命), NLP常用信息资源, 水木社区 (Wed Mar 14 23:56:43 2007)

  2. http://www.newsmth.net/nForum/#!article/NLP/3849 zibuyu (得之我幸失之我命), NLP常用开源/免费工具, 水木社区 (Wed Mar 14 23:56:43 2007)

  3. http://www.newsmth.net/nForum/#!article/NLP/5461 zibuyu (得之我幸失之我命), NLP领域经典综述, 水木社区 (Tue Feb 24 11:13:53 2009)

  4. http://www.zhihu.com/question/19929473 "目前常用的自然语言处理开源项目/开发包有哪些?" 孔牧, 邱锡鹏, 裴飞, 贺一帆 武博文

  5. http://www.zhihu.com/question/19895141 "自然语言处理怎么最快入门?"