PPOL 5203 Data Science I: Foundations

Intro do Python: OOP + Data Types

Tiago Ventura


Learning Objectives

We will start this lecture notes with a broader overview of what Object-Oriented Programming means. This might sound a bit generic, but it is a super important concept for you to grasp and build a more general understanding of Python.

After that, we will go more into a classic introduction to Python, with a focus on data types and data collectors

  • Get some intuition behind objects in Python
  • Explore the different built-in data types.
  • Examine how to look up values in collection data type using an index and/or key
  • Learn about the issues that can arise when copying mutable objects and how to resolve them.

This notebook draws from materials developed by Dr. Eric Dunford for a previous iteration of PPOL 5203

Notion of Object-Oriented Programming

Python is an object-oriented programming language (OOP) where the object plays a more fundamental role for how we structure a program. Specifically, OOP allows one to bundle properties and behavior into individual objects. In Python, objects can hold both the data and the methods used to manipulate that data.

As you are progressing in the DSPP, you are also being introduced to R. R, on the other hand, is a functional programming language where functions are objects and data is manipulated using functions.

At first glance, the distinction is subtle, but the way we build programs in R and Python differs considerably. In practice, the OOP vs. Functional distinction changes how one engages with objects instantiated in the environment.

In Python, methods (functions) are self-contained in the object; whereas in R functions are external to the object. In other words, while much of the work in R consists on writing functions that are stored outside of classes/objects, in Python, you can borrow from general classes, inherit their methods/functions, or just add new functionalities to objects created by others.

The core of OOP Languages are the objects. They are also important in R, but not as flexible as in Python

As an example, see the differences between taking the mean of a vector in R (using a function) and the mean of a pandas series in Python (using a method).

Python OOP in Practice

# call library
import pandas as pd

# Using a Pandas Series method (OOP)
s = pd.Series([1, 2, 3, 4, 5])
s.mean()

R Function Programming in Practice

vec <- c(1, 2, 3, 4, 5)
mean(vec)

You access methods and attributes from objects using . function in Python

Creating object in Python

= is the assignment operator in Python. Different from R, in which there are multiple assignment operators, in Python, you only have the (=) assignment operator

In [23]:
# creating an object in Python
x = 4

This simple act of creating an object in Python comprise three different, and interesting actions.

1) A reference is assigned to an object (e.g. below, x references the object 4 in the statement x = 4). This is the name of the object as it is saved in your environment. But notice this is not how the object is saved in your machine

In [24]:
# in your machine
id(x)
Out[24]:
4376552448

In Python, variable names:

  • can include letters, digits, and underscores
  • cannot start with a digit
  • are case sensitive.

Use names that make sense. This simple action will make you code much easier to read.

2) An objects type is defined at runtime (also known as "duck typing"). Python is a dynamically typed language, which differs from other languages where type must be made explicit (e.g. C++, Java). Type cannot be changed once an object is created (coercing an object into a different type actually creates a new object).

# Creating object in C
int result = 0;

3) Object's class is instantiated upon assignment. An objects class provides a blueprint for object behavior and functionality. We use the pointer . to access an objects methods and attributes.

In [71]:
# what is the class?
type(x)
Out[71]:
int
In [72]:
# Access methods (behaviors) using .
x.bit_length()
Out[72]:
3
In [73]:
# see all
dir(x)
Out[73]:
['__abs__',
 '__add__',
 '__and__',
 '__bool__',
 '__ceil__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floor__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__le__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'as_integer_ratio',
 'bit_count',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'numerator',
 'real',
 'to_bytes']

101 on Python classes (or the core of OOP in Python)

Every time we create an object, this objects inherits a class. This is what we called instantiating.

Classes are used to create user-defined data structures. A class is a blueprint for how something should be defined.

An instance is a realization of a particular class. And this instance inherits the characteristics of its class.

Imagine we have a class called dog(), every time I use this class to create a new, concrete dog (object), I am capturing a instance, or a realization of this abstract class.

Classes have two major components:

  • Attributes: these are constant features, data, a characteristic of the broader class

  • Methods: these are actions, behaviors of this class.

Both attributes and methods are accessed through . function.

We will see later how to create our own classes. Most important now is for you to understand that every object in python has a class, and every realization of this class inherits both attributes and methods of the pre-defined class.

Let's see a quick example here.

In [25]:
# create a class
class dog():
  def __init__ (self, name, breed, age):
    self.name = name
    self.breed = breed
    self.age = age
  def say_hello_to_my_friends(self):
    print('Hi, I am ' + self.name, " and I am a " + self.breed)
In [26]:
# Instatiate
brisa = dog(name="Brisa", breed="Beagle Mix", age="6 years old")
In [27]:
type(brisa)
Out[27]:
__main__.dog
In [28]:
# Attributes
print(brisa.name + " " +  brisa.breed + " " + brisa.age)
Brisa Beagle Mix 6 years old
In [29]:
# method
brisa.say_hello_to_my_friends()
Hi, I am Brisa  and I am a Beagle Mix

Here we can print out all the different methods using the dir() function (which provides an internal directory of all the methods contained within the class). As we can see, there is a lot going on inside this single set object!

In [30]:
dir(brisa)
Out[30]:
['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'age',
 'breed',
 'name',
 'say_hello_to_my_friends']

Data Types

There are two ways of instantiating a data class in Python:

  1. Literals: syntatic representation in Python, e.g. []
  2. Constructors: class constructors, e.g. list()

Python comes with a number of built-in data types. When talking about data types, it's useful to differentiate between:

  • scalar types (data types that hold one piece of information, like a digit) and
  • collection types (data types that hold multiple pieces of information).

These built-in data types are the building blocks for more complex data types, like a pandas DataFrame (which we'll cover later).

Scalars

Scalar Data Types

Type Description Example Literal Constructor
int integer types 4 x = 4 int(4)
float 64-bit floating point numbers 4.567 x = 4.567 float(4)
bool boolean logical values True x = True bool(0)
None null object (serves as a valuable place holder) None x = None


Note two things from the above table:

  1. the literal occurs through assignment, and
  2. the constructor can be used to coerce one data type into another.

Int

Definition: Int. Int, or integer, is a whole number, positive or negative, without decimals, of unlimited length

Here we assign an integer (3) to the object x .

In [33]:
# int
x = 3
x
Out[33]:
3
In [34]:
# check type
type(x)
Out[34]:
int

Float

Definition: Floating point numbers are decimal values or fractional numbers

Now let's coerce the integer to a float using the constructor float(). float represent real numbers with both an integer and fractional component

In [35]:
# creating a float
y = 4.56
type(y)
Out[35]:
float
In [36]:
# float
float(x)
x
# check type
type(x)
Out[36]:
int

Note that behavior of the object being coerced depends both on the initial class and the output class.

In [37]:
#int
x=3

# add a int + float = float
type(x + 3.0)
Out[37]:
float

Bool

Boolean objects that are equal to True are truthy (True), and those equal to False are falsy (False). Numerically, int values equal to zero are False, and larger than zero are True

In [38]:
# literalx
x=True
x
Out[38]:
True
In [39]:
# constructor
x = bool(1)
id(x)
type(x)
x
Out[39]:
True

Immutability

Finally, all scalar data types are immutable, meaning they can't be changed after assignment. When we make changes to a data type, say by coercing it to be another type as we do above, we're actually creating a new object. We can see this by looking at the object id.

id() tells us the "identity" of an object. That shouldn't mean anything to you. Just know that when an object id is the same, it's referencing the same data in the computer. We'll explore the implications of this when we look at copying.

In [40]:
x = 4
id(x) 
Out[40]:
4376552448

Here we coerce x to be a float and then look up its id(). As we can see, there is a new number associated with it. This means x is a different object after coercion.

In [41]:
id(float(x))
Out[41]:
4744180880
In [42]:
x=6
id(x)
Out[42]:
4376552512

Object types determine behavior

Python knows how to behave given the methods assigned to the object when we create an instance. The methods dictate how different data types deal with similar operations (such as addition, multiplication, comparative evaluations, ect.).

Using what we learned from OOP, it means that for every class, we have specific methods. These methods can have specific names -- any user-defined function -- or they can same universal names (Magic or Dunder Methods). See the addition example with int instances.

In [43]:
# create int
x=4

# add literally
x + 4

# what is happening under the hood? 
x.__add__(4)
Out[43]:
8

Every class has a self-contained __add__ method. For this reason, the output of adding two int or an int and a float are different

In [13]:
# create int
x=4

# add literally
type(x + 4.2)
Out[13]:
float

Collections

Collection Data Types

Type Description Example Mutable Literal Constructor
list heterogeneous sequences of objects [1,"2",True] ✓ x = ["c","a","t"] x = list("cat")
str sequences of characters "A word" ✘ x = "12345" x = str(12345)
tuples heterogeneous sequence of objects (1,2) ✘ x = (1,2) x = tuple([1,2])
sets unordered collection of distinct objects {1,2} ✓ x = {1,2} x = set([1,2])
dicts associative array of key/value mappings {"a": 1} keys ✘
values ✓
x = {'a':1} x = dict(a = 1)

Each built-in collection data type in Python is distinct in important ways. Recall that an object's class defines how the object behaves with operators and its methods.

I'll explore some of the differences in behavior for each class type so we can see what this means in practice

Mutable vs. Immutable

Note the column referring to Mutable and Immutable collection types. Simply put, mutable objects can be changed after it is created, immutable objects cannot be changed. All the scalar data types are immutable. Even when we coerced objects into a different class, we aren't changing the existing object, we are creating a new one.

Some collection types, however, allow us to edit the data values contained within without needing to create a new object. This can allow us to effectively use the computer's memory. It can also create some problems down the line if we aren't careful (see the tab on copies).

In practice, mutability means we can alter values in the collection on the fly.

In [130]:
my_list = ["sarah","susan","ralph","eddie"]
my_list
Out[130]:
['sarah', 'susan', 'ralph', 'eddie']
In [131]:
## see id
id(my_list)
Out[131]:
4594940992
In [133]:
my_list[1] = "josh"
my_list
Out[133]:
['sarah', 'josh', 'ralph', 'eddie']
In [134]:
## see id
# Still the same object, even though we changed something in it
id(my_list) 
Out[134]:
4594940992

Immutability, on the other hand, means that we cannot alter values after the object is created. Python will throw an error at us if we try.

In [137]:
my_tuple =("sarah","susan","ralph","eddie")
my_tuple 
Out[137]:
('sarah', 'susan', 'ralph', 'eddie')
In [138]:
my_tuple[1] = "josh"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[138], line 1
----> 1 my_tuple[1] = "josh"

TypeError: 'tuple' object does not support item assignment

list

Lists allow for heterogeneous membership in the various object types. This means one can hold many different data types (even other collection types!). In a list, one can change items contained within the object after creating the instance.

In [140]:
x = [1, 2.2, "str", True, None] 
x
Out[140]:
[1, 2.2, 'str', True, None]

A list constructor takes in an iterable object as input. (We'll delve more into what makes an object iterable when covering loops, but the key is that the object must have an .__iter__() method.)

In [146]:
list([1, 2.2, "str", True, None])
Out[146]:
[1, 2.2, 'str', True, None]

At it's core, a list is a bucket for collecting different types of information. This makes it useful for collecting data items when one needs to store them. For example, we can store multiple container types in a list.

In [148]:
a = (1,2,3,4) # Tuple
b = {"a":1,"b":2} # Dictionary
c = [1,2,3,4] # List
In [151]:
# Combine these different container objects into a single list
together = [a,b,c] 
type(together[0])
Out[151]:
tuple
In [152]:
type(together[1])
Out[152]:
dict
In [153]:
type(together[2])
Out[153]:
list

A list class has a range of specific methods geared toward querying, counting, sorting, and adding/removing elements in the container. For a list of all the list methods, see here.

Let's explore some of the common methods used.

In [154]:
country_list = ["Russia","Latvia","United States","Nigeria","Mexico","India","Costa Rica"]
In [156]:
country_list
Out[156]:
['Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica']

Inserting values

Option 1: use the .append() method.

In [157]:
country_list.append("Germany")
country_list
Out[157]:
['Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica',
 'Germany']

Option 2: use the + (add) operator.

In [158]:
country_list = country_list + ['Canada']
country_list
Out[158]:
['Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica',
 'Germany',
 'Canada']

Addition means "append"?: Recall that an objects class dictates how it behaves in place of different operators. A list object has a .__add__() method built into it that provides instructions for what the object should do when it encounters + operator. Likewise, when it encounters a * multiplication operator and so on. This is why it's so important to know the class that you're using. Different object classes == different behavior.

You can also combine list through the reference names

In [159]:
more_countries = ["Brazil", "Argentina"]
country_list + more_countries
Out[159]:
['Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica',
 'Germany',
 'Canada',
 'Brazil',
 'Argentina']

Deleting values

Option 1: use the del operator + index.

In [160]:
# Drop Latvia
del country_list[1]
country_list
Out[160]:
['Russia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica',
 'Germany',
 'Canada']

Option 2: use the .remove() method

In [161]:
country_list.remove("Nigeria")
country_list
Out[161]:
['Russia',
 'United States',
 'Mexico',
 'India',
 'Costa Rica',
 'Germany',
 'Canada']

Sorting values

In [162]:
country_list.sort()
country_list
Out[162]:
['Canada',
 'Costa Rica',
 'Germany',
 'India',
 'Mexico',
 'Russia',
 'United States']
In [23]:
dir(country_list)
Out[23]:
['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

str

Strings are containers too. String elements can be accessed using an index, much like objects in a list (See the tab on indices and keys).

In [166]:
s = "This is a string"
s[2]
Out[166]:
'i'

The literal for a string is quotations: '' or "". When layering quotations, one needs to opt for the quotation type different than the one used to instantiate the string object.

In [167]:
s = 'This is a "string"'
print(s)
This is a "string"
In [168]:
s = "This is a 'string'"
print(s)
This is a 'string'

A Multiline string can be created using three sets of quotations. This is useful when writing documentation for a function.

In [170]:
s2 = '''
This is a long string!
    
    With many lines
    
    Many. Lines '''
print(s2)
This is a long string!
    
    With many lines
    
    Many. Lines 

String are quite versatile in Python! In fact, many of the manipulations that we like to perform on strings, such as splitting text up (also known as "tokenizing"), cleaning out punctuation and characters we don't care for, and changing the case (to name a few) are built into the string class method.

For example, say we wanted to convert a string to upper case.

In [171]:
str1 = "the professor is here!"
str1.upper()
Out[171]:
'THE PROFESSOR IS HERE!'
In [174]:
str1.split(" ")
Out[174]:
['the', 'professor', 'is', 'here!']

Or remove words.

In [172]:
str1.replace("professor","student")
Out[172]:
'the student is here!'

This is just a taste. The best way to learn what we can do with a string is to use it. We'll deal with strings all the time when dealing with public policy data. So keep in mind that the str data type is a powerful tool in Python. For a list of all the str methods, see here.

tuple

Like a list, a tuple allows for heterogeneous membership among the various scalar data types.

However, unlike a list, a tuple is immutable, meaning you cannot change the object after creating it.

The literal for a tuple is the parentheses ()

In [47]:
my_tuple = (1,"a",1.2,True)
my_tuple
Out[47]:
(1, 'a', 1.2, True)

The constructor is tuple(). Like the list constructor, tuple() an iterable object (like a list) as an input.

In [45]:
my_tuple = tuple([1,"a",1.2,True])
my_tuple
Out[45]:
(1, 'a', 1.2, True)

Tuples are valuable if you want a data value to be fixed, such as if it were an index on a data frame, denoting a unit of analysis, or key on a dictionary. Tuples pop up all the time in the wild when dealing with more complex data modules, like Pandas. So we'll see them again and again.

One nice thing about tupes is that that it is a data type that allow for unpacking. Unpacking allows one to deconstruct the tuple object into named references (i.e. assign the values in the tuple to their own objects). This allows for flexibility regarding which objects we want when performing sequential operations, like iterating.

In [178]:
# Unpacking
my_tuple = ["A","B","C"]
# Here we're unpacking the three values into their own objects
obj1, obj2, obj3 = my_tuple 

# Now let's print each object
print(obj1)
print(obj2)
print(obj3)
A
B
C
In [ ]:
# list
my_tuple = ["A","B","C"]

# Here we're unpacking the three values into their own objects
obj1, obj2, obj3 = my_tuple 
In [182]:
type(my_tuple)
Out[182]:
list

Also, like a list, a tuple can store different collection data types as well as the scalar types. For example, we can store multiple container types in a tuple.

In [ ]:
a = (1,2,3,4) # Tuple
b = {"a":1,"b":2} # Dictionary
c = [1,2,3,4] # List

# Combine these different container objects into a single tuple
together = (a,b,c)
together

For a list of all the tuple methods, see here.

set

A set is an unordered collection of unique elements (this just means there can be no duplicates). set is a mutable data type (elements can be added and removed). Moreover, the set methods allow for set algebra. This will come in handy if we want to know something about unique values and membership.

The literal for set is the brackets {}.1


  1. Note that this is very similar to the literal for a dictionary but in that data structure we define a key/value pair (see the dict tab)

In [48]:
my_set = {1,2,3,3,3,4,4,4,5,1}
my_set
Out[48]:
{1, 2, 3, 4, 5}

The constructor is set(). As before, it takes an iterable object as an input.

In [49]:
new_set1 = set([1,2,4,4,5])
new_set1
Out[49]:
{1, 2, 4, 5}
In [50]:
new_set2 = set("Georgetown")
new_set2
Out[50]:
{'G', 'e', 'g', 'n', 'o', 'r', 't', 'w'}

In the above, we can see that order isn't a thing for a set.

We can add elemets to a set using the .add() or .update() methods.

In [187]:
my_set.add(6)
my_set
Out[187]:
{1, 2, 3, 4, 5, 6}
In [188]:
my_set.update({8})
my_set
Out[188]:
{1, 2, 3, 4, 5, 6, 8}

Where a set really shines is with the set operations. Say we had a set of country names.

In [189]:
countries = {"nigeria","russia","united states","canada"}

And we wanted to see which countries from our set were in another set (say another data set). Not a problem for a set!

In [190]:
other_data = {"nigeria","netherlands","united kingdom","canada"}

Which countries are in both sets?

In [191]:
countries.intersection(other_data) 
Out[191]:
{'canada', 'nigeria'}

Which countries are in our data but not in the other data?

In [38]:
countries.difference(other_data)
Out[38]:
{'russia', 'united states'}

Note that values in a set cannot be accessed using an index.

In [39]:
my_set[1]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[39], line 1
----> 1 my_set[1]

TypeError: 'set' object is not subscriptable

Rather we either .pop() values out of the set.

In [41]:
my_set.pop()
 
Out[41]:
2

Or we can .remove() specific values from the set.

In [42]:
my_set.remove(3)
my_set
Out[42]:
{4, 5, 6, 8}

Finally, note that sets can contain heterogeneous scalar types, but they cannot contain other mutable container data types.

In [43]:
set_a = {.5,6,"a",None}
set_a
Out[43]:
{0.5, 6, None, 'a'}

In set_b, the list object is mutable.

In [44]:
set_b = {.5,6,"a",None,[8,5,6]}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[44], line 1
----> 1 set_b = {.5,6,"a",None,[8,5,6]}

TypeError: unhashable type: 'list'

All this is barely scratching the surface of what we can do with sets. For a list of all the set methods, see here.

dict

A dictionary is the true star of the Python data types. dict is an associative array of key-value pairs. That means, we have some data (value) that we can quickly reference by calling its name (key). As we'll see next week, this allows for a very efficient way to look data values, especially when the dictionary is quite large.

There is no intrinsic ordering to the keys, and keys can't be changed once created (that is, the keys are immutable), but the values can be changed (assuming that the data type occupying the value spot is mutable, like a list). Finally, keys cannot be duplicated. Recall we're going to use the keys to look up data values, so if those keys were the same, it would defeat purpose!

The literal for a dict is {:} as in {<key>:<value>}.

In [51]:
my_dict = {'a': 4, 'b': 7, 'c': 9.2}
my_dict
Out[51]:
{'a': 4, 'b': 7, 'c': 9.2}

The constructor is dict(). Note the special way we can designate the key value pairing when using the constructor.

In [52]:
my_dict = dict(a = 4.23, b = 10, c = 6.6)
my_dict
Out[52]:
{'a': 4.23, 'b': 10, 'c': 6.6}

The dict class has a number of methods geared toward listing the information contained within. To access the dict's keys, use the .keys() method.

In [55]:
l = my_dict.keys()
l
Out[55]:
dict_keys(['a', 'b', 'c'])

Just want the values? Use .values()

In [54]:
my_dict.values()
Out[54]:
dict_values([4.23, 10, 6.6])

Want both? Use .items(). Note how the data comes back to us --- as tuples nested in a list! This just goes to show you how intertwined the different data types are in Python.

In [60]:
a, s, b = my_dict.items()
print(a)
print(s)
print(b)
('a', 4.23)
('b', 10)
('c', 6.6)

We can combine dictionary with other data types (such as a list) to make an efficient and effective data structure.

In [61]:
grades = {"John": [90,88,95,86],"Susan":[87,91,92,89],"Chad":[56,None,72,77]}

We can use the keys for efficient look up.

In [62]:
grades["John"]
Out[62]:
[90, 88, 95, 86]

We can also use the .get() method to get the values that correspond to a specific key.

In [63]:
grades.get("Susan")
Out[63]:
[87, 91, 92, 89]

Updating Dictionaries

We can add new dictionary data entries using the .update() method.

In [64]:
new_entry = {"Wendy":[99,98,97,94]} # Another student dictionary entry with grades
grades.update(new_entry) # Update the current dictionary 
grades
Out[64]:
{'John': [90, 88, 95, 86],
 'Susan': [87, 91, 92, 89],
 'Chad': [56, None, 72, 77],
 'Wendy': [99, 98, 97, 94]}

In a similar fashion, we can update the dictionary directly by providing a new key entry and storing the data.

In [65]:
grades["Seth"] = [10, "sh"]
grades
Out[65]:
{'John': [90, 88, 95, 86],
 'Susan': [87, 91, 92, 89],
 'Chad': [56, None, 72, 77],
 'Wendy': [99, 98, 97, 94],
 'Seth': [10, 'sh']}

One can also drop keys by .pop()ing the key value pair out of the collection...

In [66]:
grades.pop("Seth")
Out[66]:
[10, 'sh']

...or deleting the key using the del operator.

In [67]:
del grades['Wendy']
grades
 
Out[67]:
{'John': [90, 88, 95, 86],
 'Susan': [87, 91, 92, 89],
 'Chad': [56, None, 72, 77]}

Likewise, one can drop values by:

  • overwriting the original data stored in the key
  • dropping the key (and thus deleting the data value)
  • clearing the dictionary (and deleting all the data values stored within. )
In [68]:
# Example of using .clear()
grades.clear()
grades
Out[68]:
{}

Indices & Keys

Learning how to access the data types is a foundation of your fluency as a data scientist.

As you transition across different languages, keep track of accessing methods across different data types is actually quite challenging. You will definitely find yourself searching online many times for this. The important issue here is make an effort to understand general rules for acessing elements across languages and data types

Indices

A first way to access elements in collectors is through their index position.

different from R, Python objects start its index at zero

In [223]:
# Define a list 
x = [1, 2.2, "str", True, None]
In [225]:
# first element in python
x[0]
x[1]
Out[225]:
2.2
In [ ]:
# can see how many values are in our container with len()
len(x)
In [226]:
# Can look up individual data values by referencing its location
x[3]
Out[226]:
True
In [227]:
# Python throws an error if we reference an index location that doesn't exist
x[7]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[227], line 2
      1 # Python throws an error if we reference an index location that doesn't exist
----> 2 x[7]

IndexError: list index out of range
In [231]:
# We use a negative index to count BACKWARDS in our collection data type.
x[-3]
 
Out[231]:
'str'

This way to access data using index position is going to be very standard across a range of data type.

In [232]:
# tuples
tup = (1, 2, "no", True)
In [233]:
# first element
tup[0]
Out[233]:
1
In [234]:
# Last element
tup[-1]
Out[234]:
True

Slicing

We use the : operator to slice (i.e. select ranges of values). This works using the numerical indices we juat learned. Slicing in a nutshell goes like this :

In [236]:
# To pull out values in position 1 and 2
x[1:3]
Out[236]:
[2.2, 'str']
In [ ]:
# When we leave left or right side blank, Python implicitly goes to the beginning or end
x[:3]
In [239]:
x[2:]
Out[239]:
['str', None]

Using keys in dictionaries

In [240]:
# Define a dictionary
grades = {"John":[90,88,95,86],"Susan":[87,91,92,89],"Chad":[56,None,72,77]}
In [242]:
# Unlike lists/tuples/sets, we use a key to look up a value in a dictionary
grades["John"]
Out[242]:
[90, 88, 95, 86]
In [244]:
# We can then index in the data structure housed in that key's value position
# as is appropriate for that data object
grades["John"][0:2]
Out[244]:
[90, 88]

Copies

In [251]:
# Copies with mutable objects -----------------------
# Create a list object
x = ["a","b","c","d"]
x
Out[251]:
['a', 'b', 'c', 'd']
In [252]:
# Dual assignment: when objects reference the same data.
y = x
print(id(x)) 
print(id(y))
4653772736
4653772736
In [253]:
# If we make a change in one
y[1] = "goat"
In [254]:
# That change is reflected in the other
print(x)
['a', 'goat', 'c', 'd']

Because these aren't independent objects

In [256]:
# We can get around this issue by making **copies**
y = x.copy() # Here y is a copy of x.
id(y)
Out[256]:
4652574400

This duplicates the data in memory, so that y and x are independent.

Three ways to make a copy:

In [257]:
# (1) Use copy method
y = x.copy()
In [258]:
# (2) Use constructor
y = list(x)
In [259]:
# (3) Slice it
y = x[:]

In class Exercise

Exercise 1

Let's practice with lists first. One way to explore data structures is to learn their methods. Check all the methods of a list by running dir() on a list object. Let's explore these functions using the following list object, by answering the below questions. See here for list methods:

In [ ]:
list_exercise = ["Ramy", "Victorie", "Letty", "Robin", "Antoine", "Griffin"] 
In [ ]:
1.  Add "Cathy O'Neil" to the list. Insert " Professor Crast" as the first element of the list
2.  Remove "Letty" from the list. Also remove the last element of the list.
3.  Find the index of the occurrence of the name "Robin". Count the number of times None appears in the list.
4.  Create a new list with the names in alphabetical order, copy this list as a new list without changing the values of the original list
5.  Add the string "Lovell" to copied_list and ensure that list_exercise remains unchanged.

Exercise 2

Let's do a similar exercise with Dictionaries. Consider the dictionary below. See here for dictionary methods:

In [ ]:
dict_exercise = {"Ramy": "India",
                  "Victorie":"Haiti", 
                  "Letty":"England", 
                  "Robin":"Canton", 
                  "Antoine":"Nigeria", 
                  "Griffin":"China"}
dict_exercise
  1. Look up the keys in the dictionary, and story them in a list object called keys

  2. Add yourself, and two other collegues in this dictionary. The values are the countries the person in the key was born.

  3. Remove "Ramy" from the dictionary, and save as another dictionary

Exercise 3

Let's now play around with some string methods. See the string below from the book "Babel:An Arcane History". See here for string methods:

In [93]:
babel = "That's just what translation is, I think. That's all speaking is. Listening to the other and trying to see past your own biases to glimpse what they're trying to say. Showing yourself to the world, and hoping someone else understands."
babel
Out[93]:
"That's just what translation is, I think. That's all speaking is. Listening to the other and trying to see past your own biases to glimpse what they're trying to say. Showing yourself to the world, and hoping someone else understands."
  1. Determine if the word "Babel" is present in the string.

  2. Count how many times the word "translation" appears

  3. Convert the entire string to upper case

  4. Convert the pronoum "I" to "We" in the entire text.

  5. Strip any punctuation (like commas, exclamation marks, etc.) from the string.

In [96]:
!jupyter nbconvert _week-03_data_types.ipynb --to html --template classic
[NbConvertApp] Converting notebook _week-03_data_types.ipynb to html
[NbConvertApp] Writing 406481 bytes to _week-03_data_types.html