PPOL 5203 - Data Science I: Foundations
Week 1: Introductions, Installations, IDEs, Command line
Welcome to Data Science I
Plans for Today
Motivating Data Science for Public Policy.
Goals of the course
Introductions
Course Logistics
IDEs
- Jupyter
Introduction to commandline
Motivation: Digital Information Age
Abundance of data
Powerful and Cheap Computing Power
World of Generative AI Models
As a consequence, we live in a world:
With abundance of data we can use for research and to recommend policy decisions:
- Internet data, social media, geo-tracking tools, etc…
With easily available technologies to analyze this data at scale
- Your laptops, cloud computing, ChatBots…
But with new technologies that also have novel social implications:
Privacy issues
Use of technology by bad actors.
Use of technology by governments to censor/monitor citizens.
- Policy scholars (but pretty much any researcher) need to be equipped to properly work with this data and understand the effects of these news technologies
Data Science for Public Policy
Data Scientist for Public Policy focuses on computational approaches to solve/understand policy problems.
- It is part of a larger and new field called computational social science
Social science
- Understands human behavior: human psychology, language, economic behavior, political systems, policy problems
- Involve many approaches: qualitative interviews, statistical analysis, simulations
Data Science:
- Tools to work with large-scale data + learning models + novel data sources
PPOL 5203 - Data Science I: Foundations
Course Schedule
Week | Topic | Date |
---|---|---|
Week 01 | Introduction, Installations, IDEs, Command line | September 09, 2025 |
Week 02 | Version Control, Workflow and Reproducibility: Or a bit of Git & GitHub | September 16, 2025 |
Week 03 | Intro to Python - OOP, Data Types, Control Statements and Functions | September 23, 2025 |
Week 04 | Intro to Python II: Scaling up your code - Iteration, Comprehension and Functions | September 30, 2025 |
Week 05 | From Nested lists to Dataframes: Numpy and Intro to Pandas | October 07, 2025 |
Week 06 | Pandas II: Data Wrangling | October 14, 2025 |
Week 07 | Joining, Tidying and Visualizing Data | October 21, 2025 |
Week 08 | Scraping + APIs | October 28, 2025 |
Week 09 | Statistical Learning | November 04, 2025 |
Week 10 | Text as Data I: Discovery and Topics | November 11, 2023 |
Week 11 | Text as Data II: Supervised Learning | November 18, 2025 |
Week 12 | Generative AI: Classification, Surveys and Prompting (Invited Speaker - Dr. Patrick Wu, American University) | November 25, 2025 |
Week 13 | SQL | December 03, 2024 |
Week 14 | Presentations of Final Projects | December 09, 2023 |
Between Text-as-Data I and II… I might have some updates…
Introductions
Professor Tiago Ventura (he/him)
- Assistant Professor at McCourt School.
- Political Science PhD
- Postdoc at Center for Social Media and Politics - NYU.
- Researcher at Twitter.
My Research: Effects of technology in politics + applications of computational models to social science:
- Global Social Media Deactivation.
- Developing a data donation pipeline for WhatsApp data.
- Measuring Humanness vs AI-Generated Content on Social Media.
- Using LLMs to augment web-browsing data with synthetic data
Outside of work, I enjoy watching soccer, reading sci-fi and running
Quiz!
Which programming language did I use the most at?
PhD
Postdoc
Twitter
As a Faculty
A comment from the pre-course survey (from last year)
Hi professor Ventura! I noticed that we gonna learn multiple data analysis tool this semester and I am definitely a novice of data science. I am little worried about how can I master all of them without being confused, because some commands might be very similar.
Your turn!
Name
(Briefly) what you were up to prior to the DSPP
If you could have any data source at your disposal, what would it be?
Logistics
Communication: via slack. Join the workspace!
All materials: hosted on the class website: https://tiagoventura.github.io/ppol5203/
Syllabus: also on the website.
My Office Hours: Every tuesday from 4 to 5pm. Just stop by!
Canvas: Only for official communication! Materials will be hosted in the website!
Datacamp: Additional exercises! Access our free account here
Task: go on slack and send me a message about the data source you choose in the last answer, and if you feel comfortable add a picture to your profile
05:00
TA
- Rebecca Wagner (DSPP Second-Year Student)
- Email:rlw137@georgetown.edu
- Office Hours:
- in person: Every Monday 2pm, McCourt, Room 602
- virtual: Every Tuesday 7pm. Zoom link
Evaluation
Assignment | Percentage of Grade |
---|---|
Participation/Attendance | 5% |
Coding Discussion | 5% |
Problem sets | 50% |
Final Project | 40% |
Problem Sets
Individual submission through GitHub.
Assignment | Date Assigned | Date Due |
---|---|---|
No. 1 | Week 2 | Before EOD of Friday of Week 3 |
No. 2 | Week 4 | Before EOD of Friday of Week 5 |
No. 3 | Week 6 | Before EOD of Friday of Week 7 |
No. 4 | week 8 | Before EOD of Friday of Week 9 |
No. 5 | November 10 | Before EOD of Friday of Week 111 |
EOD = 11:59pm!
Final Project
You will work on groups!
The project is composed of three parts:
- a 2 page project proposal: (which should be discussed and approved by me)
- an in-class presentation,
- A 10-page project report.
Due dates and Points:
Requirement | Due | Length | Percentage |
---|---|---|---|
Project Proposal | October 31 | 2 pages | 5% |
Presentation | December 09 | 10-15 minutes | 10% |
Project Report | December 16 | 10 pages | 25% |
GenAI
You are allowed to use GenAI as you would use google in this class. This means:
Do not copy the responses from GenAI Chatbots – a lot of them are wrong or will just not run on your computer
Use GenAI Chatbots as a auxiliary source.
If your entire homework comes straight from GenAI Chatbots, I will consider it plagiarism.
If you use GenAI Chatbots, I ask you to mention on your code how chatgpt worked for you.
Be mature and make smart decisions. You will not be able to cheat on a coding interview, remember you are a master student now!
Let’s take a break!
10:00
Survey Results
Summary of the survey
Most of you have some experience with Python.
Very few of you were using primarily Python in your work before!
- Most others are using R and Excel!
Most of you have Python in your laptops, some still do not have a github account. If you are having issues after today, talk to your TAs.
Main Policy Areas:
- Social Media/Tech (Talk to me!)
- Election (Talk to Professor Warshaw, Bailey or Ladd)
- Education (Talk with Professor Johnson)
Open Ended
this is a big group of international students, and some are concerned about not having english as you first language!
- THIS IS FINE ! I hope my non-native and strongly accented English will encourage you to participate in class and speak up!
“That’s just what translation is, I think. That’s all speaking is. Listening to the other and trying to see past your own biases to glimpse what they’re trying to say. Showing yourself to the world, and hoping someone else understands.” - R.F. Kuang, Babel
Open Ended
Some of you are slightly anxious about Python and the pace of the class
Python is definitely harder than excel and stata.
But you will be fine!
Our approach: We start slow, cover the basics, and move fast!
Transiton: Coding!
Set up your course infra-structure
Install Python
Install Jupyter
Setup your Git/Github account
- Homework: Try to make one successful Git push before class next week.
Jupyter:
Jupyter Notebook Tutorial in the Class Website
Note on my approach on Notebooks: I will go over quite quickly through the notebooks. You should run them by yourselves at a later point!
Command Line
Datacamp Course
Additional Materials: Quarto
See Quarto Notebook in the Class Website