PPOL 5203 - Data Science I: Foundations


Week 1: Introductions, Installations, IDEs, Command line

Author

Professor: Tiago Ventura

Welcome to Data Science I

Plans for Today

  • Motivating Data Science for Public Policy, or Computational Social Science.

  • Goals of the course

  • Introductions

  • Course Logistics

  • IDEs

    • Jupyter
    • Quarto
  • Introduction to commandline

Why are we here?

Rise of the digital information age

https://www.washingtonpost.com/wp-dyn/content/graphic/2011/02/11/GR2011021100614.html

Powerful and Cheap Computer Power

As a consequence:

  • Abundance of data we can use for research and governments can use to make better decisions

    • Novel research questions

    • New ways to answer old, long-standing research questions

  • New technologies also have social implications and can generate important policy questions.

    • Privacy issues

    • Use of technology by bad actors.

    • Use of technology by governments to censor/monitor citizens.

    • etc…

  • Policy scholars (but pretty much any researcher) need to be equipped to properly deal with these challenges

Data Science for Public Policy

Data Scientist for Public Policy focuses on computational approaches to solve/understand Policy Problems.

  • Part of a larger and new field called computational social sciente

  • But with a more policy-focus.

  • What is social science? It refers to a domain of study - social phenomena:

    • Encompasses many scales: human psychology, language, economic behavior, political systems, policy problems
    • Involve many approaches: qualitative interviews, statistical analysis, simulations
  • What is Data Science?:

    • Use often large-scale data + algorithms to answer questions

An example: Data Donation WhatsApp Groups

All the steps + Tools .... so far ...

  • Step 1: Recruiting participants online

    • Online Panels + Facebook Ads
  • Step 2: Running online surveys

    • Qualtrics + R
  • Step 3: Development of data donation pipeline (with MDI)

    • JavaScript (JS) + Python
  • Step 4: Analyze the data

    • SQL + Python + R

Readings for this week

Goals of the course


The goal of this course is to teach you:

  • Computational thinking: how to approach problems and devise solutions from a computational perspective.

  • Get you started on Python and a bit of SQL for applied data science; lay the foundations for the remainder of the core sequence

  • Workflows and tools: Git/Github + Commandline.

PPOL 5203 - Data Science I: Foundations

Course Schedule

Introductions

About me

  • Professor Tiago Ventura (he/him)

    • Assistant Professor at McCourt School.
    • Political Science Ph.D.
    • Postdoc at Center for Social Media and Politics at NYU.
    • Researcher at Twitter.
  • Research Interests:

    • Social media and politics
    • Computational methods, NLP and LLMs
    • Focus on Global South
  • Outside of work, I enjoy watching soccer and reading sci-fi.

    • Sometimes I enjoy soccer while working!

    • And I am from Brazil!

Quiz!

Which programming language did I use the most at?

  • PhD

  • Postdoc

  • Twitter

  • As a Faculty

A comment from the pre-course survey (from last year)

Hi professor Ventura! I noticed that we gonna learn multiple data analysis tool this semester and I am definitely a novice of data science. I am little worried about how can I master all of them without being confused, because some commands might be very similar.

Your turn!

  • Name

  • (Briefly) what you were up to prior to the DSPP

  • If you could have any data source at your disposal, what would it be?

Logistics

  • Communication: via slack. Join the workspace!

  • All materials: hosted on the class website: https://tiagoventura.github.io/ppol5203/

  • Syllabus: also on the website.

  • My Office Hours: Every tuesday from 4 to 6pm. Just stop by!

  • Canvas: Only for official communication! Materials will be hosted in the website!

  • Datacamp: Additional exercises! I will assign modules for you! Access our free account here

05:00

TAs

  • Aastha Jha (DSPP Second-Year Student)
    • Email: aj935@georgetown.edu
    • Office Hours:
      • Every Wednesdays, from 1pm to 2pm.
  • Shirui Zhou (DSPP Alumni)
    • Email: sz614@georgetown.edu
    • Office Hours:
      • Every Monday, from 1pm to 2pm

Evaluation

Assignment Percentage of Grade
Participation/Attendance 5%
Coding Discussion 5%
Problem sets 50%
Final Project 40%

Problem Sets

Individual submission through GitHub.

Assignment Date Assigned Date Due
No. 1 Week 2 Before EOD of Friday of Week 3
No. 2 Week 4 Before EOD of Friday of Week 5
No. 3 Week 6 Before EOD of Friday of Week 7
No. 4 week 8 Before EOD of Friday of Week 9
No. 5 November 10 Before EOD of Friday of Week 111

EOD = 11:59pm!

Final Project

  • You will work on randomly assigned groups!

  • The project is composed of three parts:

    • a 2 page project proposal: (which should be discussed and approved by me)
    • an in-class presentation,
    • A 10-page project report.

Due dates and Points:

Requirement Due Length Percentage
Project Proposal October 31 2 pages 5%
Presentation December 10 10-15 minutes 10%
Project Report December 17 10 pages 25%

ChatGPT

You are allowed to use ChatGPT as you would use google in this class. This means:

  • Do not copy the responses from chatgpt – a lot of them are wrong or will just not run on your computer

  • Use chatgpt as a auxiliary source.

  • If your entire homework comes straight from chatgpt, I will consider it plagiarism.

  • If you use chatgpt, I ask you to mention on your code how chatgpt worked for you.

Be mature and make smart decisions. You will not be able to cheat on a coding interview, remember you are a master student now!

Let’s take a break!

10:00

Survey Results

Summary of the survey

  • 72% of you have some experience with Python.

  • Only three of you were using primarily Python in your work before!

    • Most others are using R and Excel!
  • You all have Python in your laptops, some still do not have a github account. If you are having issue after today, talk to your TAs.

  • Main Policy Areas:

    • Social Media/Tech (Talk to me!)
    • Election (aha! great timing for it!)
    • Education (Talk with Professor Johnson)

Open Ended

  • Most of you are worried, slightly anxious Python is hard, and you will not be able to keep up.
  • Python is definitely harder than excel and stata.

  • But you will be fine!

  • Our approach: We start slow, cover the basics, and move fast!

Transiton: Coding!

Set up your course infra-structure

See Course Website

Jupyter:

See Jupyter Notebook in the Class Website

Note on my approach on Notebooks: I will go over quite quickly through the notebooks. You shouls run them by yourselves at a later point!

Command Line

See Command Line Tutorial in the Class Website

Quarto

See Quarto Notebook in the Class Website

Datacamp Course

Introduction to Shell

See you next week!