PPOL 5203 Data Science I: Foundations

Week 2: Practice Exercise

Tiago Ventura

Question 1: Writing good code!

Consider the Python snippet below that simulates some data, runs a linear regression on the simulated data, and plots the predictors of the regression.

This is a poorly written code. It has the following issues:

  • No comments documenting what the code is doing
  • Poor redability
  • Graphs with no meaning
  • lines of code doing multiple things (you could split it up and make it more readable).

As an exercise, provide a refactored code as the solution to this problem. You solution needs to fix all the issues mentioned above. Check the slides if you need help.

In [2]:
import numpy as np
import matplotlib.pyplot as plt


a = np.random.rand(100)
b = 5 + 3*a + np.random.normal(size=100)
plt.scatter(a, b, color = 'red')
Out[2]:
<matplotlib.collections.PathCollection at 0x127d1c610>
In [3]:
# import your packages
import numpy as np
import matplotlib.pyplot as plt

# create variables
ind_variable = np.random.rand(100)
intercept = 5
slope = 3
error_term =  np.random.normal(size=100)

# simulate the data generation process
dep_variable = intercept + slope*ind_variable + error_term

# plot
plt.scatter(dep_variable, ind_variable, color = 'red')
Out[3]:
<matplotlib.collections.PathCollection at 0x137d3e010>

Question 2: Initialize version control with git.

Create a folder called “Practice” on your local computer. Use the commandline to navigate to that folder. Next, initialize a git repository in that folder. Do all these steps using the commandline

In [ ]:
# mkdir practice
# cd practive
# git init

Question 3: Git

Using the git repository that you created in Question 1. Do the following:

  1. Create a .txt named notes_about_ds5203 using either the commandline or just make this file yourself and move to the folder.

  2. Add that file to the staging area.

  3. Commit that file with a clear commit message

  4. Check to see if the file is staged using git status;

  5. Now update the file (i.e., make some change to it) and stage and commit that change;

  6. Look at your git log to see the two commits you made.

In [ ]:
# create file
touch notes_about_ds5203

# add to the stagging area
git add . 

# commit
git commit -m "my notes for dspp"

# go to github and create a repo
vim notes_about_ds5203

# add and stage
# add to the stagging area
git add . 

# commit
git commit -m "my notes for dspp"

# log
git log

Question 3: Github

  1. Create a new repository on your own Github account

  2. Clone the repository at your local machine

  3. Make some changes in your local machine

  4. Push your local changes to your github repository

In [ ]:
# example code of what would work

# clone
git clone <repo>

# cd to your new folder
cd repo

# modify something, stage and commit
git add .
git commit -m "something"

# push
git push

# you might need to set up your identity here. 
git config --global user.name "Your Name" 
git config --global user.email "your-email@example.com"
In [1]:
!jupyter nbconvert _week_02_solution.ipynb --to html --template classic
[NbConvertApp] Converting notebook _week_02_solution.ipynb to html
[NbConvertApp] Writing 322618 bytes to _week_02_solution.html