Instructions for Data Science I - Final Project

Data Science and Public Policy, McCourt School for Public Polcy, Georgetown University

General Instructions

Data science is an applied field and the DSPP is particularly geared towards providing students the tools to make policy and substantive contributions using data and recent computational developments.

In this sense, it is fundamental that you understand how to conduct a complete analysis from collecting data, to cleaning and analyzing it, to presenting your findings.

For this reason, a considerable part of your grade (40%) will come from a an independent data science project, applying concepts learned throughout the course.

The project is composed of three parts:

  • a 2 page project proposal: (which should be discussed and approved by me)
  • an in-class presentation,
  • A 10-page project report.

Due dates and breakdowns for the project are as follows:

Requirement Due Length Percentage
Project Proposal October 31 2 pages 5%
Presentation December 5 10-15 minutes 10%
Project Report December 12 10 pages 25%

Logistics

  • Groups of three students.

  • You pick your groups.

  • Before October 24, your should have met with me to discuss your proposal.

  • At lest one hour before our meeting, send me a draft of your proposal.

  • Email me saying you are planning to go to my office hours, and I will block the time.

Spreadsheet

After you have your groups, add the information here: Final Projects PPOL 5203

Also remember to add when you want meet with me. You can pick any of the office hours time (Thursday between 4-6pm). If the office hours do not work for you, let me know and we will find a solution.

All components of the project:

Project Proposal

The project proposal asks that you sketch out a general 2 page (single-spaced; 12pt font) project proposal. The proposal should offer the following information:

  • A high-level statement of the problem you intend to address or the analysis you aim to generate

  • The data source(s) you intend to use

  • Your plan to obtain that data

  • The methods (learned in class) that you aim to employ

    • data wrangling
    • multiple visualizations
    • a text analysis component
    • a machine (statistical) learning component
  • A definition for what “success” means with respect to your project.

    • In your words, what would a successful project look like? What is the results you want to show? What would be a surprising finding for you?

Presentation

You will have the opportunity to present you final project in class. As a Data scientists, your presentation skills are as important as your data skills. You should prepare a 10-12 minutes presentation. It should cover:

  • Motivation or problem statement

  • Data collection

  • Methods

  • Results

  • Lessons learned and next steps

Project Report

The report is a complete description of the project’s analysis and results. The report should be a 10 page in length (single-spaced; 12pt font) and cover the points presented below. Citations and Front page will not be considered in the 10 page.

Below I’ve outlined points that one should aim to discuss in each section. Note that paper should read as a cohesive report:

  • Introduction

    • summarize your motivation, present some previous work related to your question
    • present your research question
    • summarize your report
  • Data and Methods

    • Where does the data come from? – What is the unit of observation? – What are the variables of interest? – What steps did you take to wrangle the data?
  • Analysis

    • Describe the methods/tools you explored in your project.
  • Results

    • Give a detailed summary of your results. – Present your results clearly and concisely. – Please use visualizations instead of tables whenever possible.
  • Discussion

    • Re-introduce your main results
    • State your contributions
    • Where do you want take this project next?

IMPORTANT: There is not literature review section. You can use the literature to motivate your work. But you don’t need to have a full section on literature review. Read for example, papers at main general interest journals, like here, here and here. All these super accomplished articles do not have long literature reviews.

Submission of the Final Project

The end product should be a github repository that contains:

  • The raw source data you used for the project. If the data is too large for GitHub, talk with me, and we will find a solution

  • Your proposal

  • A README for the repository that, for each file, describes in detail:

    • Inputs to the file: e.g., raw data; a file containing credentials needed to access an API

    • What the file does: describe major transformations.

    • Output: if the file produces any outputs (e.g., a cleaned dataset; a figure or graph).

    • A set of code files that transform that data into a form usable to answer the question you have posed in your descriptive research proposal.

    • Your final 10 pages report (I will share a template later in the semester)

Of course, no commits after the due date will be considered in the assessment.

Templates for writing

Here, I suggest you with a few templates for writing your reports. In my experience, writing data science reports using LaTeX is an gain in productivity in the long-run.

If you want to experiment with LaTeX via overleaf, you should use this template from PNAS (Just remember to switch to a one-column template):

You can also use the Journal of Quantitative Description templates in Word Doc or LaTeX:

Or, you can use templates from Quarto and write you entire project using quarto or markdown files

Example of final projects from previous DSPP Years

Here you can download some examples from projects from last year of DSPP students. Those are great projects, and involve both skills students learned in class + skills learned in other classes (like causal inference techniques)

PPOL 5203 Projects 2022