- Future Students
- Academic Progams
- Undergraduate Program
- Current Semester CS Courses
- New Course Offerings
- Upcoming Semesters
- Previous Semesters
- Canonical Syllabi
- Course Access & Request Policy
- Academic Integrity Policy
- Grad Student Registration
- Variable Title Courses
- Study Abroad
- Professional Practice
- Co-Op Professional Practice
- Non-Co-Op Professional Practice
- ISS Application Process for International Students (CPT, OPT, RCL, Program Extension, COEL)
- Pass/Not Pass Spring 2020
CS 57700: Natural Language Processing
Course Description:
This course will cover the key concepts and methods used in modern Natural Language Processing (NLP). Throughout the course several core NLP tasks, such as sentiment analysis, information extraction, syntactic and semantic analysis, will be discussed. The course will emphasize machine-learning and data-driven algorithms and techniques, and will compare several different approaches to these problems in terms of their performance, supervision effort and computational complexity.
Course Outline:
Introduction (1 week)
What is natural language processing? Overview of natural language processing applications, computational linguistics and machine learning.
Language modeling (1 week)
Probability review. Language modeling, smoothing, evaluation. Applications of Language models.
Text classification (2 weeks)
Generative and discriminative classification models. Naïve Bayes, perceptron, log-linear models, large margin classification, multiclass classification, ranking, hierarchical classification. Applications: sentiment analysis, text categorization.
Introduction to Sequence prediction (2 weeks)
Hidden Markov models, the Viterbi algorithm. Discriminative models for sequence prediction. Local vs. global training protocols (MEMM vs. CRF). Applications: part-of-speech tagging, chunking.
Beyond Sequence Prediction (2 week)
Unified view of all the models as loss minimization. Global inference using Integer Linear Programming. Overview of inference in graphical models. Latent variable models. Applications: Semantic Role Labeling, Textual Entailment, Co-reference resolution.
Syntax and Semantics (3 weeks)
In depth review of algorithms used for solving various problems in syntactic and semantic analysis. Constituency parsing using the CYK algorithm, local and global models for dependency parsing, Abstract Meaning Representation (AMR), relation extraction and semantic parsing.
Deep Learning for NLP (3 weeks)
Distributed representation (word embedding), classification using feed-forward and convolutional networks. Using deep learning for structured data: recurrent and recursive networks.
Current Research Topics (2 weeks)