Week 1: Course Infrastructure
Introduction
Throughout the semester, we will use a combination of tools. This a summary of the main tools:
CommandLine: primarily to interact with git, install programs and run a few scripts
Python3: for programming taks
Git/Github: for version control, reproucibility and for sharing materials
Jupyter Notebooks: as a main IDE to work with Python
Quarto via RStudio: as a secondary IDE (you can make your primary if you prefer) to coding in Python/R/SQL
Slack: for communication.
Let’s cover how to set up each of these tools in your local machines.
If you run into issues, please reach out to the Teaching Assistant for assistance
CommandLine
At times, we’ll use a unix-based commandline. The commandline will feature into our discussion on using git
and also running Python programs. If you use a Mac or a Linux operating system, then a functioning commandline comes with your operating system. For Apple machines, this is the Terminal.
For Windows (specifically Windows 10), you can enable Linux Bash shell. The following offers a tutorial on how to do this.
If you’re using a version of Windows that pre-dates version 10, then Git Bash offers a program will allow you to use git
commands from your windows machine.
Later in the first class, we will cover some concepts of working with the commandline. You can get a full notebook with a intro to commandline in the materials for week 1
Python3
We’ll use Python3 throughout this course. Below are instructions for downloading Python3 using commandline packages manager (Homebrew
for mac, Chocolatey
for windows).
An alternative way to install Python3 is to download an Anaconda distribution. I will use pip
rather than conda
in the instruction for downloading Python modules. These are simply two ways of downloading and managing open-source software packages. Choose which ever works best for you
Most computers already have python3 installed. You can check if that is your case through your commandline
python3 --version
On some versions of Windows, you may need to use py instead of python3:
py --version
In either case, the output of this command should be something like Python 3.8.5
Jupyter Notebooks
Once you have Python3 on your computer, you can install a Jupyter Notebook. If you downloaded Python3 using Anaconda, then Jupyter Notebook comes with the distribution and requires no further installation on your part. If you are not using Anaconda, you can install Jupyter notebook running the following code using your commandline.
# on your command line
pip install jupyter
You can then activate a Jupyter Notebook from the commandline by typing:
# on your command line
jupyter notebook
Workflow to work with Juyter Notebooks
Here is my workflow to open Jupyter Notebooks using the commandline.
Open the terminal
Navigate (using
cd
) to the folder you want to be the root of your jupyter notebookOpen the notebook (
jupyter notebook
)
It looks like this if I were to open a notebook in the folder I have for this course
# open terminal
cd ppol_5203 jupyter notebook
Workflow with Anaconda.
If you installed Python using Anaconda distribution system (here: https://www.anaconda.com/products/individual). You can open Jupyter through a point-and-click system. It take forever, but it works!
In the lecture notes, you can also find a Introduction to Jupyter notebook. We will cover this in the first class of the course.
Rstudio + Reticulate|Quarto
A quick digression of the R vs Python debate
For some of your classes in the Data Science and Public Policy Masters, you will be using R. Some data scientists and computational social scientists have strong beleifs as to which langugae is better. I, and the DSPP faculty, do not subscribe to that view. Most techniques that are relevent for applied data science can be done in either language.
In my personal opinion, R outperforms Python in data manipulation tasks, visualization and statistical modeling. This is because R started out as a statistical programming environment, and that heritage is still visible. Meanwhile, Python started our a general purpose programming language, which was heavily adopted by computer scientists, which means Python outperforms on abstraction, machine learning tasks, working with more complex data types.
Most importantly, you will be using more one language compared to the other conditional on your career path and the type of tasks you end up working with. If you end up in a team full of computer scientists, it is more likely Python will be your favored language. If you move to work more with social scientists, R will probably be more heavily used. There is no need to chose now. Learn both and broad your horizons. As general programming languages, learning both requires almost the same amount of effort of learning one in isolation.
Back to the course infrastructure
In your classes that are focused on using R
, RStudio
will be your main IDE. However, RStudio
isn’t just for R
. It can handle a number of different languages. We can use Python in RStudio
using the reticulate
package.
I create a full notebook to teach you how to use Python in Rstudio. Check the intro do quarto notebook. Let’s cover some of the installation steps here:
To install RStudio
, download from the following link . reticulate
is an R package that allows one run a Python REPL in the R console. In addition, it allows one to read in and use Python code, and pass data between R and and Python. The following provides instructions on installing reticulate
.
With reticulate
, you can use Rstudio as a IDE for Python. Another option is to use Quarto (the next-generation version of R Markdown) as an unified framework to generate notebooks with text + code. If you’re an R Markdown user, you will see how Quarto is just an extension of the capabilities that were previously provided by R Markdown. Now, instead of .rmd
files, we have .qmd
files. Quarto is already installed with RStudio.
Git
We’ll go over more Git/GitHub instructions during the second class session. Before that session:
Slack
Our course will make use of Slack for internal communication. Enter in our workspace with this link: (https://join.slack.com/t/ppol5203fall2024/shared_invite/zt-2ou5gm0ww-YJUJfR3vTfzN4IuKcULFDQ).
When should you use slack?
Interact with the TAs
Ask questions to your colleagues
Share links that are interesting to the discussions in class.
When I should not use slack?
- If you have a question you believe will require a longer conversation, I prefer if you can stop by at my office hours
Remember, you don’t need to let me know you are going to my office hours. Just stop by!
If you’re new to Slack, check out this tutorial. In the first class, I’ll send out the invitation for everyone to join Slack and we’ll discuss how to use it.