In this notebook, we will cover:
open()
, close()
with
to manage connections. .csvs
TLDR: Most often we will use high-level functions from Pandas to load data into Python objects. However, if you are migrating from R
or Stata
to Python, the use of connection management functions (open()
, close()
and with()
) are very characteristic of writing code in Python, and are not heavily used in other languages. These file handlers are also important when working non-tabular data, in which most often you don't need to (or don't want to) fit into a dataframe.
Reading: Check Section 3.3 of Python for Data Analysis to learn more about the topics covered in the notebook.
import os
# check my working directory
os.getcwd()
open()
¶The built-in open()
function opens files on our system. The function takes the following arguments:
file = open("redrising.txt",mode='r',encoding='UTF-8')
open()
returns a special item type _io.TextIOWrapper
. It is file-like-object which is just loosely defined in Python.
type(file)
file_ = file.read()
file_
# Once we've read through the items, the file object is empty
print(file.read())
file.close()
Opening and forgetting to close files can lead to a bunch of issues --- mainly the mismanagement of computational resources on your machine.
Moreover, close()
is necessary for actually writing files to our computer
Method | Description |
---|---|
._CHUNK_SIZE() |
int([x]) -> integer int(x, base=10) -> integer |
._finalizing() |
bool(x) -> bool |
.buffer() |
Create a new buffered reader using the given readable raw IO object. |
.closed() |
bool(x) -> bool |
.encoding() |
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
.errors() |
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
.line_buffering() |
bool(x) -> bool |
.mode() |
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
.name() |
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
.readlines() |
Return a list of lines from the stream. |
.reconfigure() |
Reconfigure the text stream with new parameters. |
.write_through() |
bool(x) -> bool |
file = open("redrising.txt",mode='rt',encoding='UTF-8')
file.readlines() # convert all items to a list
# Is the file closed?
file.closed
mode
s¶Mode | Description |
---|---|
r | "open for reading" default |
w | open for writing |
x | open for exclusive creation, failing if the file already exists |
a | open for writing, appending to the end of the file if it exists |
b | binary mode |
t | text mode (default) |
Examples,
mode = 'rb'
→ "read binary"mode = 'wt'
→ "write text"f = open('redrising.txt',mode="rt",encoding='utf-8')
# Print the mode
print(f.mode)
f.close()
f = open('text_file.txt',mode="wt",encoding='utf-8')
f.write('This is an example\n')
f.write('Of writing a file...\n')
f.write('Neat!\n')
f.close()
NOTE that you must
close()
for your lines to be written to the file
Now, read the file back in in "read mode"
f = open('text_file.txt',mode="rt",encoding='utf-8')
print(f.read())
file = open("redrising.txt",mode='rt',encoding='UTF-8')
[i for i in dir(file) if (i=="__iter__" or i=="__next__")]
We'll note when looking at the object's attributes that there is an __iter__()
and __next__()
method, meaning we can iterate over the open file object.
Most times we will iterate over to convert our open file to a single list.
See a few options below:
# with a loop
file = open("redrising.txt",mode='rt',encoding='UTF-8')
text=[]
for line in file:
text.append(line)
file.close()
## It is an object in python now
text
# with list comprehension
file = open("redrising.txt",mode='rt',encoding='UTF-8')
result = [line.replace("\n", "") for line in file if line!="\n"]
file.close()
result
Or you can assign the output of .read()
to an object:
file = open("redrising.txt",mode='rt',encoding='UTF-8')
result = file.read()
result
file = open("redrising.txt",mode='rt',encoding='UTF-8')
for line in file:
if line == '\n':
continue
n_words_per_line = len(line.split())
print(n_words_per_line)
file.close()
with
: beyond opening and closing with context managers¶As you'll note, the need to open()
and close()
files can get a bit redundant after awhile. This issue of closing after opening to deal with resource cleanup is common enough that python has a special protocol for it: the with
code block.
# using list comprehension
# with open() as alias:
with open("redrising.txt",mode='rt',encoding='UTF-8') as file:
res=[len(line.split()) for line in file if line!="\n"]
print(res)
Here we will pretty much always use pandas.read_csv()
to import csv files to Python. In case you want to learn a bit about the csv
module, here are some examples.
See the python documentation for more on the csv
module located in the standard library.
import csv
with open("student_data.csv",mode='rt') as file:
data = csv.reader(file)
print(data)
Reading in .csv data
with open("student_data.csv",mode='rt') as file:
data = csv.reader(file)
for row in data:
print(row)
with open("student_data.csv",mode='rt') as file:
data = csv.reader(file)
output = [row for row in data]
output
# Student data as a nested list.
student_data = [["Student","Grade"],
["Susan","A"],
["Sean","B-"],
["Cody","A-"],
["Karen",'B+']]
# Write the rows with the .writerows() method
with open("student_data_write.csv",mode='w') as file:
csv_file = csv.writer(file)
csv_file.writerows(student_data)
!jupyter nbconvert _week_4_file_management.ipynb --to html --template classic