Week 1: Introductions, Installations, IDEs, Command line
Motivating Data Science for Public Policy.
Goals of the course
Introductions
Course Logistics
IDEs
Introduction to commandline
With abundance of data we can use for research and to recommend policy decisions:
With easily available technologies to analyze this data at scale
But with new technologies that also have novel social implications:
Privacy issues
Use of technology by bad actors.
Use of technology by governments to censor/monitor citizens.
Data Scientist for Public Policy focuses on computational approaches to solve/understand policy problems.
Social science
Data Science:
....
so far ...
Step 1: Recruiting participants online (SS)
Step 2: Running online surveys (SS)
Step 3: Development of data donation pipeline (with MDI) (DS/CS)
Step 4: Analyze the data (SS/DS)
Bit by Bit: Social Research in the Digital Age By Mathew Salganik
Training Computational Social Science PhD Students for Academic and Non-Academic Careers - Written by me and some colleagues in academia, industry and non-profits
The goal of this course is to teach you:
Computational thinking: how to approach problems and devise solutions from a computational perspective.
Get you started on Python, introduce some useful tools (scrapping, text analysis and API applications of GenAI models), and bit of SQL for applied data science; lay the foundations for the remainder of the core DS sequence
Workflows and tools: Git/Github + Commandline.
Week | Topic | Date |
---|---|---|
Week 01 | Introduction, Installations, IDEs, Command line | September 09, 2025 |
Week 02 | Version Control, Workflow and Reproducibility: Or a bit of Git & GitHub | September 16, 2025 |
Week 03 | Intro to Python - OOP, Data Types, Control Statements and Functions | September 23, 2025 |
Week 04 | Intro to Python II: Scaling up your code - Iteration, Comprehension and Functions | September 30, 2025 |
Week 05 | From Nested lists to Dataframes: Numpy and Intro to Pandas | October 07, 2025 |
Week 06 | Pandas II: Data Wrangling | October 14, 2025 |
Week 07 | Joining, Tidying and Visualizing Data | October 21, 2025 |
Week 08 | Scraping + APIs | October 28, 2025 |
Week 09 | Statistical Learning | November 04, 2025 |
Week 10 | Text as Data I: Discovery and Topics | November 11, 2023 |
Week 11 | Text as Data II: Supervised Learning | November 18, 2025 |
Week 12 | Generative AI: Classification, Surveys and Prompting (Invited Speaker - Dr. Patrick Wu, American University) | November 25, 2025 |
Week 13 | SQL | December 03, 2024 |
Week 14 | Presentations of Final Projects | December 09, 2023 |
Professor Tiago Ventura (he/him)
My Research: Effects of technology in politics + applications of computational models to social science:
Outside of work, I enjoy watching soccer, reading sci-fi and running
PhD
Postdoc
As a Faculty
Hi professor Ventura! I noticed that we gonna learn multiple data analysis tool this semester and I am definitely a novice of data science. I am little worried about how can I master all of them without being confused, because some commands might be very similar.
Name
(Briefly) what you were up to prior to the DSPP
If you could have any data source at your disposal, what would it be?
Communication: via slack. Join the workspace!
All materials: hosted on the class website: https://tiagoventura.github.io/ppol5203/
Syllabus: also on the website.
My Office Hours: Every tuesday from 4 to 5pm. Just stop by!
Canvas: Only for official communication! Materials will be hosted in the website!
Datacamp: Additional exercises! Access our free account here
Task: go on slack and send me a message about the data source you choose in the last answer, and if you feel comfortable add a picture to your profile
05:00
Assignment | Percentage of Grade |
---|---|
Participation/Attendance | 5% |
Coding Discussion | 5% |
Problem sets | 50% |
Final Project | 40% |
Individual submission through GitHub.
Assignment | Date Assigned | Date Due |
---|---|---|
No. 1 | Week 2 | Before EOD of Friday of Week 3 |
No. 2 | Week 4 | Before EOD of Friday of Week 5 |
No. 3 | Week 6 | Before EOD of Friday of Week 7 |
No. 4 | week 8 | Before EOD of Friday of Week 9 |
No. 5 | November 10 | Before EOD of Friday of Week 111 |
You will work on groups!
The project is composed of three parts:
Requirement | Due | Length | Percentage |
---|---|---|---|
Project Proposal | October 31 | 2 pages | 5% |
Presentation | December 09 | 10-15 minutes | 10% |
Project Report | December 16 | 10 pages | 25% |
You are allowed to use GenAI as you would use google in this class. This means:
Do not copy the responses from GenAI Chatbots – a lot of them are wrong or will just not run on your computer
Use GenAI Chatbots as a auxiliary source.
If your entire homework comes straight from GenAI Chatbots, I will consider it plagiarism.
If you use GenAI Chatbots, I ask you to mention on your code how chatgpt worked for you.
10:00
Most of you have some experience with Python.
Very few of you were using primarily Python in your work before!
Most of you have Python in your laptops, some still do not have a github account. If you are having issues after today, talk to your TAs.
Main Policy Areas:
this is a big group of international students, and some are concerned about not having english as you first language!
“That’s just what translation is, I think. That’s all speaking is. Listening to the other and trying to see past your own biases to glimpse what they’re trying to say. Showing yourself to the world, and hoping someone else understands.” - R.F. Kuang, Babel
Some of you are slightly anxious about Python and the pace of the class
Python is definitely harder than excel and stata.
But you will be fine!
Our approach: We start slow, cover the basics, and move fast!
Install Python
Install Jupyter
Setup your Git/Github account
Jupyter Notebook Tutorial in the Class Website
Note on my approach on Notebooks: I will go over quite quickly through the notebooks. You should run them by yourselves at a later point!
See Quarto Notebook in the Class Website
Data science I: Foundations