In this notebook, we will cover:
As a Data Scientist, quite often you will need to iterate over a sequence. We learned one general approach to iterations: loops
.
There are some more efficient
and more readable
options to repeat operations compared to loops.
For Efficiency: we will always prefer a vectorize approach compared to element-wise repetions. This will be covered in the Numpy notebook
Instead of focusing on efficiency, this notebook will focus on readability. The notebook discusses comprehensions and generators. These are different ways, in general more redable, to repeat operations in Python over a series of elements.
This operation would be equivalent to the following loop:
a_list = [...]
result = []
for e in a_list:
if type(e) == int: # use int for Python 3
result.append(e**2)
Attention on the output:
Using the list literals []
(brackets), we construct a for
loop from within.
words = "This is a such a long course".split(" ")
words
[w for w in words]
[len(w) for w in words]
[w for w in words if "This" in w]
List comprehensions are a tool for transforming one list (or any container object in Python) into another list. This is a syntactic work around for the long standing filter()
and map()
functions in python, or a loop.
# object
words = "This is a such a long course".split(" ")
# Filter step: include those with more than one character
filtered_words = list(filter(lambda word: len(word) > 1, words))
print(filtered_words)
# Map step: apply a function to a container
lengths = list(map(len, filtered_words))
print(lengths)
result_list = []
for word in words:
if len(word) > 1:
result_list.append(len(word))
# much easier with list comprehensions
list_comp = [len(w) for w in words if len(w) > 1]
result_list
list_comp
(New to Python 3)
Using the set literals {}
, we construct a for
loop from within.
# example 1
{len(word) for word in words}
# example 2
{word for word in [1, 2, 3, 3, 3, 3, 4]}
(New to Python 3)
Using the set literals {}
and assigning a key value pair {key : value}
, we construct a for
loop from within.
As with lists/sets:
Dictionary comprehension can replace loops when creating dictionaries
Or for transforming one dictionary into another dictionary.
# object
words = "This is a such a long course".split(" ")
# dict comprehension
dict_ = {word:len(word) for word in words}
print(dict_)
# modifying a fully formed dictionary => Use .items() methods
dict_.items()
# dict comprehension modifying both keys and values
{keys.upper(): values for keys, values in dict_.items()}
# dict compehension with dict as inputs
{keys.lower(): values for keys, values in dict_.items()}
if
statements in comprehensions¶# Quickly produce a series of numbers
[i for i in range(10)]
[i for i in range(10) if i > 5 ]
else
statements aren't valid in a comprehension, so the code statement needs to be kept simple.
[i for i in range(10) if i > 5 else "hello"]
Concise if-then
statements
<this_thing> if <this_is_true> else <this_other_thing>
x = 4
"Yes" if x > 5 else "No"
x = 6
"Yes" if x > 5 else "No"
["Yes" if x > 5 else "No" for x in range(10)]
# Started with a nested list
nested_list = [[1, 2, 3, 4, 5],
["this", "is", "starting", "to", "get", "weird"]]
nested_list
Works as a nested loop: starts from the outer to the inned element.
# Unnesting a nested list, for example.
[element for sublist in nested_list for element in sublist] # notice the order inverted here
Read here for a in-depth discussion about the performance of loops, map/filter, and comprehensions techniques in Python.
Comprehensions are generally faster (no middle-man).
Map/Filter can improve perfomance on complex functions.
IMHO:
Comprehensions not only make our code more concise, they also increase the speed of our code
import time
start = time.time()
container = []
for i in range(100000000):
container.append(i)
end = time.time()
end-start
import time
start = time.time()
container = []
container = [i for i in range(100000000)]
end = time.time()
end-start
The comprehension expression takes half of the time!
We will now introduce an important tool in Python called generators
.
Definition: Generators are a special type of function in Python that creates an iterator. It allows you to generate a series of values over time, rather than computing them at once and holding them in memory.
Let's compare the idea of a generator with a simple iterable object, a list. A list readily stores all of its members; you can access any of its contents via indexing, or iterating over them with a loop. A generator, on the other hand, works with lazy evalution, and only create contents as request. A generator produces one value at a time, on the fly, only when you ask for it.
The whole point of this is that you can use a generator to produce a long sequence of items, without having to store them all in memory
An extremely popular built-in generator is range
. Range is often used as a way to implement loops. It takes the following inputs range(start, stop, step)
, where:
‘start’ (inclusive, default=0)
‘stop’ (exclusive)
‘step’ (default=1)
As a generator, range
will generate the corresponding sequence of integers (from start to stop, using the step size) upon iteration. But remember, this will be a lazy evaluation.
def gen123():
yield 1
yield 2
yield 3
gen123
g = gen123() # instantiate in an object
g
Generators are similar to functions; however, rather than use the return
keyword, we leverage the yield
keyword. If you use the yield
keyword once in a function, then that function is a generator.
Understanding yield function
When a generator function calls yield, the "state" of the generator function is frozen; the values of all variables are saved and the next line of code to be executed is recorded, until the generator is called again. Once it is, it picks up where it left off and continues execution until it hits another yield statement.
To recap, Generators:
Behaves just like an iterator; however, the next thing being demanded isn't the next item, but rather the next computation
next(g)
next(g)
next(g)
next(g)
l_ =[1, 2, 3]
l_iter = iter(l_)
next(l_iter)
next(l_iter)
next(l_iter)
A generator is a simple way to construct a new iterator object, and evaluate lazily. You can feed a non-iterable object, and make it iterable with a generator.
That flexibility gives Python many different ways to create iterable objects.
Let's see some examples of three built-in modules that return generators. Notice that when a function returns a generator, it doesn't have to generate all the output at once. Instead, it generates each item one at a time on-the-fly as you iterate over the generator. This means that a generator can generate a very large, or even infinite, amount of output while using very little memory.
We will focus on two that are heavily used in data science:
range()
zip()
enumerate()
range()
¶An extremely popular built-in generator is range
. Range is often used as a way to implement loops. It takes the following inputs range(start, stop, step)
, where:
‘start’ (inclusive, default=0)
‘stop’ (exclusive)
‘step’ (default=1)
As a generator, range
will generate the corresponding sequence of integers (from start to stop, using the step size) upon iteration. But remember, this will be a lazy evaluation.
Consider the following example usages of range:
# create a range
r = range(0, 10, 2)
# type
type(r)
# list comprehension
print([r_ for r_ in r])
for r_ in range(10):
print(r_)
zip()
¶syncs two series of numbers up into tuples.
a = list(range(10))
b = list(range(-10,0))
sync = zip(a,b) # It's own object type
sync
next(sync)
[item for item in zip(a,b)]
# you can also unpack those
for a_, b_ in zip(a,b):
print("element a:", a_, "element b:", b_)
enumerate()
¶Generates an index and value tuple pairing
my_list = 'Iterator tools are useful to move across iterable objects in complex ways.'.split()
print(my_list)
my_list_gen = enumerate(my_list)
next(my_list_gen)
[i for i in enumerate(my_list)]
Part of the python standard library. Itertools deals with pythons iterator objects. This provides a robust functionaliy for iterable sequences. Functions in itertools operate on iterators to produce more complex iterators.
There are many methods in itertools. See the documentation here. Most importantly, try to understand what is going on behind each function just reading the documentation. It is fun!
Permutations of all potential combinations
import itertools
x = ['a','b','c','d']
[i for i in itertools.combinations(x,2)]
for i in itertools.permutations(x):
print(i)
Creates a count generator.
# don't loop over this. it is a infinite generator
counter = itertools.count(start=0,step=.3)
next(counter)
next(counter)
list(zip(itertools.count(step=5),"Georgetown"))
lazily concatenate lists together without the memory overhead of duplication.
list(itertools.chain('ABC', 'DEF'))
list(itertools.repeat("a",10))
!jupyter nbconvert _week_4_comprehension_generators.ipynb --to html --template classic