In this notebook, we will cover:
open(), close()with to manage connections. .csvsTLDR: Most often we will use high-level functions from Pandas to load data into Python objects. However, if you are migrating from R or Stata to Python, the use of connection management functions (open(), close() and with()) are very characteristic of writing code in Python, and are not heavily used in other languages. These file handlers are also important when working non-tabular data, in which most often you don't need to (or don't want to) fit into a dataframe.
Reading: Check Section 3.3 of Python for Data Analysis to learn more about the topics covered in the notebook.
import os
# check my working directory
os.getcwd()
open()¶The built-in open() function opens files on our system. The function takes the following arguments:
file = open("redrising.txt",mode='r',encoding='UTF-8')
open() returns a special item type _io.TextIOWrapper. It is file-like-object which is just loosely defined in Python.
type(file)
file_ = file.read()
file_
# Once we've read through the items, the file object is empty
print(file.read())
file.close()
Opening and forgetting to close files can lead to a bunch of issues --- mainly the mismanagement of computational resources on your machine.
Moreover, close() is necessary for actually writing files to our computer
| Method | Description |
|---|---|
._CHUNK_SIZE() |
int([x]) -> integer int(x, base=10) -> integer |
._finalizing() |
bool(x) -> bool |
.buffer() |
Create a new buffered reader using the given readable raw IO object. |
.closed() |
bool(x) -> bool |
.encoding() |
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
.errors() |
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
.line_buffering() |
bool(x) -> bool |
.mode() |
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
.name() |
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
.readlines() |
Return a list of lines from the stream. |
.reconfigure() |
Reconfigure the text stream with new parameters. |
.write_through() |
bool(x) -> bool |
file = open("redrising.txt",mode='rt',encoding='UTF-8')
dir(file)
# read line by line
file.readlines() # convert all items to a list
# Is the file closed?
file.closed
modes¶| Mode | Description |
|---|---|
| r | "open for reading" default |
| w | open for writing |
| x | open for exclusive creation, failing if the file already exists |
| a | open for writing, appending to the end of the file if it exists |
| b | binary mode |
| t | text mode (default) |
Examples,
mode = 'rb' → "read binary"mode = 'wt' → "write text"f = open('redrising.txt',mode="rt",encoding='utf-8')
# Print the mode
print(f.mode)
f.close()
f = open('text_file.txt',mode="wt",encoding='utf-8')
f.write('This is an example\n')
f.write('Of writing a file...\n')
f.write('Neat!\n')
f.close()
NOTE that you must
close()for your lines to be written to the file
Now, read the file back in in "read mode"
f = open('text_file.txt',mode="rt",encoding='utf-8')
print(f.read())
Most times we will iterate over to convert our open file to a single list.
See a few options below:
# with a loop
file = open("redrising.txt",mode='rt',encoding='UTF-8')
text=[]
for line in file:
text.append(line)
file.close()
## It is an object in python now
text
# with list comprehension
file = open("redrising.txt",mode='rt',encoding='UTF-8')
result = [line.replace("\n", "") for line in file if line!="\n"]
file.close()
result
Or you can assign the output of .read() to an object:
file = open("redrising.txt",mode='rt',encoding='UTF-8')
result = file.read()
result
file = open("redrising.txt",mode='rt',encoding='UTF-8')
for line in file:
if line == '\n':
continue
n_words_per_line = len(line.split())
print(n_words_per_line)
file.close()
with: beyond opening and closing with context managers¶As you'll note, the need to open() and close() files can get a bit redundant after awhile. This issue of closing after opening to deal with resource cleanup is common enough that python has a special protocol for it: the with code block.
# using list comprehension
# with open() as alias:
with open("redrising.txt",mode='rt',encoding='UTF-8') as file:
res=[len(line.split()) for line in file if line!="\n"]
print(res)
Here we will pretty much always use pandas.read_csv() to import csv files to Python. In case you want to learn a bit about the csv module, here are some examples.
See the python documentation for more on the csv module located in the standard library.
import csv
with open("student_data.csv",mode='rt') as file:
data = csv.reader(file)
print(data)
Reading in .csv data
with open("student_data.csv",mode='rt') as file:
data = csv.reader(file)
for row in data:
print(row)
with open("student_data.csv",mode='rt') as file:
data = csv.reader(file)
output = [row for row in data]
output
# Student data as a nested list.
student_data = [["Student","Grade"],
["Susan","A"],
["Sean","B-"],
["Cody","A-"],
["Karen",'B+']]
# Write the rows with the .writerows() method
with open("student_data_write.csv",mode='w') as file:
csv_file = csv.writer(file)
csv_file.writerows(student_data)
!jupyter nbconvert _week_4_file_management.ipynb --to html --template classic