Week 11
Text as Data I: Discovery and Topics
Lecture notes
- Text analysis I: Counting, Describing and Fitting Topics html – Jupyter Notebook
Slides
Readings
Required Reading:
There is a copy of this book in the library. Unfortunately there is no online version of this book. If you cannot get your hands on the book, please read the articles suggested below.
- Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press. Chapter 2.
Or one of the three below:
Political Science Perspective: Wilkerson, J. and Casas, A. (2017). Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges. Annual Review of Political Science, 20, 529:544.
Economics Perspective: Gentzkow, Matthew, Bryan Kelly, and Matt Taddy. “Text as data.” Journal of Economic Literature 57, no. 3 (2019): 535-574.
Sociology Perspective: https://www.sciencedirect.com/science/article/pii/S0049089X22000904
Additional Readings:
Text Pre-Processing: Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Political Analysis, 26(2): 168-189.
Descriptive Analysis of Text: Linder, Fridolin, Bruce Desmarais, Matthew Burgess, and Eugenia Giraudy. “Text as policy: Measuring policy similarity through bill text reuse.” Policy Studies Journal 48, no. 2 (2020): 546-574.
Topic models : David Blei, Probabilistic Topic Models