Week 2: Text Analysis

 Slides.

Can we use machines to help us understand text? In this class we will cover basic text analysis techniques, from word counting to topic modeling. The algorithms we will discuss this week are used in just about everything: search engines, document set visualization, figuring out when two different articles are about the same story, finding trending topics. The vector space document model is fundamental to algorithmic handling of news content, and we will need it to understand how just about every filtering and personalization system works.

Required

  • Online Natural Language Processing Course, Stanford University
    • Week 7: Information Retrieval, Term-Document Incidence Matrix
    • Week 7: Ranked Information Retrieval, Introducing Ranked Retrieval
    • Week 7: Ranked Information Retrieval, Term Frequency Weighting
    • Week 7: Ranked Information Retrieval, Inverse Document Frequency Weighting
    • Week 7: Ranked Information Retrieval, TF-IDF weighting

Recommended

Examples

Assignment: TF-IDF analysis of State of the Union speeches.

Leave a Reply

Your email address will not be published. Required fields are marked *