PPOL 5203 - Data Science I: Foundations
Week 3: Intro to Python - OOP, Data Types, Control Statements and Functions
Plans for Today
Quick review of Git and remotes with Github
In-class exercise about Git
Introduction to Python
Python as an Object-Oriented Programming Language
Data Types in Python
Next week (likely):
Control statements
Functions
The three stages of Git
Github (Remote Git Repositories)
Git Remotes: Git + Github.
Most times, you will use git integrated with Github.
Github allows multiple researchers to write and share code at the same time.
This is my workflow for github when starting a New Project. Before you write any code:
Go to your github, and create a new repository
Open your terminal, and clone the project
cd intro your new local folder
# clone
<url>
git clone
#Move your working directory to this new folder
<project-directory>
cd
#Write code!
Track your changes:
git add .
Commit:
-m 'describe your commit' git commit
Push the changes in your local repository to GitHub:
git push # or with branch
-u origin [branch-name] git push
When you do this for this first time, you need to set yo your identity
--global user.name "Your Name"
git config --global user.email "your-email@example.com" git config
Can anybody push to my repository?
No, all repositories are read-only for anonymous users. By default only the owner of the repository has write access. If you can push to your own repo, it’s because you are using one of the supported authentification methods (HTTPS, SSH, …).
If you want to grant someone else privileges to push to your repo, you would need to configure that access in the project settings.
To contribute to projects in which you don’t have push access, you push to your own copy of the repo, then ask for a pull-request. Linux is not a good example for that, because the kernel developers do not use GitHub pull requests.
Pull from Remotes
To keep up with your colleagues work, you need to first pull their updates from the git repo.
# go to your repo
<gitrepo>
cd
# pull the changes
git pull
See this tutorial
Some additional tasks:
check the discussion about .gitignore in the lecture notes.
you might need to set up an personal token to push things on github, see here
play around with gitub: readme, directories, and issues.
Practice!
Click here to setup your github classroom and do the in-class exercise for you to practice.
Git in Class Exercise
20:00
Solution
Any questions?
Git should be new for you. don’t feel desperate if you struggling with git. Most people do.
Google every time you have a question. That’s a big part of being a data scientist.
check the Cheatsheet and you lecture notes
Introduction to Python
Python: Object-Oriented Programming Language
In most Python introductions out there, you would start with:
Python as a calculator
Data Types in Python
Objects
We will take a different route starting with a deeper understanding of Python’s Object-oriented programming
Object-oriented programming (OOP) vs Functional Programming Language
Python is an object-oriented programming language (OOP).
- Key concept: it uses “objects” that contains data and methods (functions).
R is a functional programming language
- Key concept: treats functions as the first-class citizens, meaning they can be assigned to variables, passed as arguments, and returned by other functions
Python OOP
# call library
import pandas as pd
# Using a Pandas Series method (OOP)
= pd.Series([1, 2, 3, 4, 5])
s
# create a series object. You use a methods from this object to take the mean
s.mean()
R Functional Programming in Practice
<- c(1, 2, 3, 4, 5)
vec mean(vec)
- Objects are the core of Python.
- From the objects, you will access functions that will make Python work.
- You can also access data from object.
Basics: Creating an object in Python.
=
is the assignment operator in Python. (Different from R)
= 4 x
What happens upon assignment?
Action 1: A reference is assigned to an object, with an id number in memory
id(x)
5019018248
Action 2 An objects type (class) is defined at run time
type(x)
<class 'int'>
Action 3: Object’s class is instantiated upon assignment. An object is an instance of a particular class.
# what is the class?
type(x)
<class 'int'>
# Access methods (behaviors) using .
x.bit_length()
3
# see all methods
dir(x)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'as_integer_ratio', 'bit_count', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
Object-Oriented Programming: What are these classes?
Instantiation: When we create a object, we are creating an instance of this class.
Inheritance: Every time we create an object, the objects inherits a class.
A class is a blueprint holding of the properties of a particular data structure.
An instance is a realization of a particular class. This instance inherits the characteristics of its class.
Components: Classes have two major components:
Attributes: these are constant features, data, a characteristic of the broader class
Methods: these are actions, behaviors of this class. functions
Polymorphism: Both attributes and methods are accessed through
.
function, conditional on the class.
Create a Class
class Example():
def __init__ (self, name):
self.name = name
def hello(self):
print('Hi, I am ' + self.name)
Instantiate
# Instatiate
= Example(name="Tiago")
me type(me)
<class '__main__.Example'>
Attributes
me.name
'Tiago'
method
me.hello()
Hi, I am Tiago
Coding
Data types: Native Classes in Python
Data Type
Python comes with a number of built-in data types. Two major data types groups in Python:
Scalar types: hold one piece of information, like a digit.
Collection types: hold multiple pieces of information
There are two ways of instantiating a data class in Python:
Literals: syntactic representation in Python, e.g. []
Constructors: class constructors, e.g. list()
Scalar
Type | Description | Example | Literal | Constructor |
---|---|---|---|---|
int |
integer types | 4 |
x = 4 |
int(4) |
float |
64-bit floating point numbers | 4.567 |
x = 4.567 |
float(4) |
bool |
boolean logical values | True |
x = True |
bool(0) |
None |
null object (serves as a valuable place holder) | None |
x = None |
Collectors
Type | Description | Example | Mutable | Literal | Constructor |
---|---|---|---|---|---|
list |
heterogeneous sequences of objects | [1,"2",True] |
✓ | x = ["c","a","t"] |
x = list("cat") |
str |
sequences of characters | "A word" |
✘ | x = "12345" |
x = str(12345) |
tuples |
heterogeneous sequence of objects | (1,2) |
✘ | x = (1,2) |
x = tuple([1,2]) |
sets |
unordered collection of distinct objects | {1,2} |
✓ | x = {1,2} |
x = set([1,2]) |
dicts |
associative array of key/value mappings | {"a": 1} |
keys ✘ values ✓ |
x = {'a':1} |
x = dict(a = 1) |
Coding
Key concepts
Why, different from R, I should use
. to run specific functions, instead of just running the functions on a object function(obj)
?Tuples and Lists are pretty much the same. But tuples are more computationally efficient (storage and speed in accessing it) than lists. Why do you think this is the case?
Assume I have this dictionary
my_dict = {'a': 4, 'b': 7, 'c': 9.2}
. If I tried to acces the first element of the dictionary withmy_dict[0]
, Python will throw me an error. Why?
In-class exercises
Let’s practice with lists first. One way to explore data structures is to learn their methods. Check all the methods of a list by running ‘dir()’ on a list object. Let’s explore these functions using the following list object, by answering the below questions. See here for list methods:
= ["Ramy", "Victorie", "Letty", "Robin", "Antoine", "Griffin"] list_exercise
- Add “Cathy O’Neil” to the list. Insert “Professor Crast” as the first element of the list
- Remove “Letty” from the list. Also remove the last element of the list.
- Find the index of the occurrence of the name “Robin”. Count the number of times None appears in the list.
- Create a new list with the names in alphabetical order, copy this list as a new list without changing the values of the original list
- Add the string “Lovell” to copied_list and ensure that list_exercise remains unchanged.
Let’s do a similar exercise with Dictionaries. Consider the dictionary below. See here for dictionary methods:
= {"Ramy": "India",
dict_exercise "Victorie":"Haiti",
"Letty":"England",
"Robin":"Canton",
"Antoine":"Nigeria",
"Griffin":"China"}
dict_exercise
- Look up the keys in the dictionary, and store them in a list object called keys
- Add yourself, and two other colleagues in this dictionary. The values are the countries the person in the key was born.
- Remove “Ramy” from the dictionary, and save as another dictionary
Let’s now play around with some string methods. See the string below from the book “Babel:An Arcane History”. See here for string methods:
= "That's just what translation is, I think. That's all speaking is. Listening to the other and trying to see past your own biases to glimpse what they're trying to say. Showing yourself to the world, and hoping someone else understands." babel
- Determine if the word “Babel” is present in the string.
- Count how many times the word “translation” appears
- Convert the entire string to upper case
- Convert the pronoum “I” to “We” in the entire text.
- Strip any punctuation (like commas, exclamation marks, etc.) from the string.
Let’s take a break (Until next week!)
10:00
Class Part II: Control Statements, Loops and Functions
We will go over some concepts that are very general over any programming language.
Using logical operators for comparisons.
Control the behavior of code when iterating using control statements.
Explore iterating through containers using loops
Defining functions to make code more flexible, debuggable, and readable. (Probably for next week)
Comparison Operators
Operator | Property |
---|---|
== |
(value) equivalence |
> |
greater than |
< |
strictly less than |
<= |
less than or equal |
> |
strictly greater than |
>= |
greater than or equal |
!= |
Not Equals |
is |
object identity |
is not |
negated object identity |
in |
membership in |
not in |
negated membership in |
Conditional Statements
Any programming language needs statements that controls the sequence of execution of a particular piece of code.
- if-else statements
- for loops
- while loops
Ifelse Statements
Definition: Conditional execution.
if <logical statement>:
~~~~ CODE ~~~~
elif <logical statement>:
~~~~ CODE ~~~~
else:
~~~~ CODE ~~~~
For loops
Definition: Taking one item at a time from a collection. We start at the beginning of the collection and mover through it until we reach the end.
In python, we can iterate over:
lists
strings
dictionaries
file connections
grouped pandas df
Example:
# create a list
= [1, 2, 3, 4, 5]
my_list
# iterate with a for loop:
for m in my_list:
print(m)
1
2
3
4
5
While Loops
- While loops are all about repetitions. Instead of a sequence, the operation will repeat up according to the conditional statement in the loop.
# while loops
= 0
x while x < 5:
print("This")
+= 1 x
This
This
This
This
This