PPOL 5203 - Data Science I: Foundations

McCourt School for Public Polcy, Georgetown University

Course Description

This first course in the core data science sequence teaches Data Science for Public Policy (DSPP) students how to synthesize disparate, possibly unstructured data in order to draw meaningful insights. Topics covered include the fundamentals of object-oriented programming in Python; literate programming; an introduction to algorithms and data types; data wrangling, visualization, and extraction; an introduction to machine learning methods, and text analysis. In addition, students will be exposed to Git and Github for version control and reproducible research. The objective of the course is to teach students how incorporate data into their decision-making and analysis. No prior programming experience is assumed or required.

This is not the first time this course is taught at DSPP. For this reason, several of the materials here are borrowed from previous iterations of this PPOL 564 taught by Dr. Rebecca Johnson and Dr. Eric Dunford

Goals

After completing this course, the students will be able to:

  • General understanding of python’s object oriented programming syntax and data structures.

  • Competency using version control (Git/Github).

  • Learn to manipulate and explore data with Pandas and other tools.

  • General understanding of analyzing algorithms and data structures.

  • Learn to extract and process data from structured and unstructured sources.

  • Get some intuition of modeling text data in Python.

  • Learn the basics of machine learning as a modeling approach.

  • Learn basics of using SQL to query databases.

Instructors and TAs

Instructor: Professor Tiago Ventura

  • Pronouns: He/Him
  • Email: tv186@georgetown.edu
  • Office hours:
    • Time: Every Thursday, 4pm - 6pm
    • Location: Old North, 312
When should I go to your office hours?

You are all welcome to the office hours. You can come to the office hours to:

  • drink some coffee;

  • talk about soccer;

  • Ask what I am doing research at;

  • Ask any question about our class.

All are valid options! And no need to schedule time with me!

TA: Sierra Sikorski (DSPP, Second Year)

  • Email: sps126@georgetown.edu@georgetown.edu
  • Office Hours:
    • Wednesdays, 12:30 to 1:30 pm, at Old North Lounge
    • Thursday, 1:00pm to 2:00 pm, via Zoom

Course Infra-structure

Class Website: This class website will be used throughout the course and should be checked on a regular basis for lecture materials and required readings.

Class Slack Channel: The class also has a dedicated slack channel. The channel serves as an open forum to discuss, collaborate, pose problems/questions, and offer solutions. Students are encouraged to pose any questions they have there as this will provide the professor and TA the means of answering the question so that all can see the response. If you’re unfamiliar with, please consult the following start-up tutorial https://get.slack.help/hc/en-us/articles/218080037-Getting-started-for-new-members. Please follow the invite link to be added to the Slack channel.

Canvas: A Canvas site http://canvas.georgetown.edu will be used throughout the course and should be checked on a regular basis for announcements. Materials will be posted here, and not on canvas, or distributed in class or by e-mail. Support for Canvas is available at (202) 687-4949

Datacamp: As part of this course, you will have access to a DataCamp classroom that you can use to take Datacamp modules for free. Datacamp courses can be a useful tool for you to practice the concepts we see in class. Although I will not assign specific courses for you,, you can use Datacamp courses to review the topics we cover in class. The lecture notes will cover in details all our in-class discussions. Datacamp courses will be considered additional material.