PPOL 5203 Data Science I: Foundations

Using Jupyter Notebooks, Magic Commands, & Extensions

Tiago Ventura


What is a Notebook

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain

  • live code,
  • equations,
  • visualizations and
  • narrative text.

Its use includes:

  • data cleaning and transformation,
  • numerical simulation,
  • statistical modeling,
  • data visualization,
  • machine learning (some!!),
  • And much more.

What doesn't go on a Jupyter Notebook?

When running time and computational intensive models, particulrly making you of cloud-computing, you will often test your code on a notebook, but submit for running as a .py file.

.ipynb is really a JSON file

At it's core, an Jupyter notebook is a JSON (JavaScript Object Notation) file.

Open a notebook with a text editor and you will see it!


Pros and Cons of Jupyter Noteboks

Pros:

  • Notebooks are ubiquitous,
  • Reproducible: transmitting and conveying results
  • We can build code interactively (like we do in R). This makes Jupyter notebooks particularly friendly when you're first learning Python
  • stable

Cons:

  • There is a process to spinning notebooks up. (easy to overcome!)
  • It doesn't allow you to run the code line-by-line. (very annoying!)
  • For those used to working with a text editor, writing code in cells on notebook can be frustating.
  • Non-linear: sometimes we can fall out of sequence when writing code.
    • E.g. write code dependencies after we first need to use them.

Initializing a Notebook

There are two primary methods for initializing a notebook.

1. Via the command line + Jupyter

  • Go into the working directory containing your .ipynb notebook.
    • e.g. cd /Users/me/Desktop/
  • Type jupyter notebook
  • The web application will open up in your default browser.
  • From there, click on the notebook and "spin it up". The notebook will then be "running".
  • We can close the notebook by clicking on the Quit and Logout buttons on the page.
    • Quit == close the local server (i.e. the web application connection)
    • Logout == shut down the home page of the web application (but keep the server running)
  • We can also close the server connection in the console using the combo of Control-C in the console.
  • We can also relocate the the server (say if we accidentally close the Notebook) by using the local URL pathway provided when the notebook first activates.

Jupyter lab

Instead of using the simple jupyter notebooks, you can also follow the same steps above using JupyterLab. Jupyter notebook only offers a very simple interface. Jupyter lab offers a more interactive interface that includes notebooks, consoles, terminals, CSV editors, markdown editors, interactive maps, and more.

To use Jupyter lab:

  • Install JupyterLab
  • Go into the terminal
  • Type jupyter lab
  • Navigate to your folder, and code.

2. Point and Click with Anaconda Distribution

If you prefer point-and-click, you can start jupyter notebook through your anaconda distribution.

  • Make sure Anaconda is installed
  • Go to Applications and click on the Anaconda-Navigator icon
  • Click on the Launch icon under Jupyter Notebook

Kernels

A kernel is a computational engine that executes the code contained in a notebook document. A cell (or "Chunk") is a container for text to be displayed in the notebook or code to be executed by the notebook's kernel.

Though we can only have one type of kernel running for any given notebook (we can't change between kernels in the middle of a notebook), we can use jupyter beyond just a python kernel.

Here is a list of all the kernels that you can use with a jupyter notebook. For example, we can easily employ an R kernel in a jupyter notebook.

This was always the notebooks original intent. Actually, "Jupyter" is a loose acronym meaning Julia, Python and R

Usage

Code Chunks

Code chunks are what we use to execute Python (or whatever kernel we have running) code. In addition, we can write prose in a code chunk by altering the metadata regarding how the code should be run.

There are two states of a code chunk:

  • Edit Mode: Edit mode is indicated by a green cell border and a prompt showing in the editor area. When a cell is in edit mode, you can type into the cell, like a normal text editor. Enter edit mode by pressing Enter or using the mouse to click on a cell's editor area.
  • Command Mode: Command mode is indicated by a grey cell border with a blue left margin. When you are in command mode, you are able to edit the notebook as a whole, but not type into individual cells. Most importantly, in command mode, the keyboard is mapped to a set of shortcuts that let you perform notebook and cell actions efficiently. For example, if you are in command mode and you press c, you will copy the current cell - no modifier is needed. Don't try to type into a cell in command mode. Enter command mode by pressing Esc or using the mouse to click outside a cell's editor area.

We to switch between Markdown and Code chunks ?

  • By using the drop down menu in the tool bar (in either mode)
  • By using the shortcut:
    • Press y when on the cell in Command Mode to switch to a code chunk.
    • Press m when on the cell in Command Mode to switch to a markdown chunk

Executing Code

A code chunk will always reflect the behavior of the kernel that you're using (e.g. a Python code chunk will follow Python coding Syntax).

Best Practices

  • Break code chunks up!
  • Every code chunk should render some output (the aim is to be able to read what we were doing without needing to fire the notebook back up)
  • Use spaces. Keep the chunk readable. Less is more.

Using Markdown

The Markdown chunks will use the Markdown and will allow for writing mathematical equations using LaTex.

Shortcuts

As with most user interfaces, Jupyter Notebooks have developed their own way of doing things. Thus there are a number of useful shortcuts that you can employ to help perform useful tasks.

We can access a full (searchable) list of keyboard shortcuts by pressing p when in Command Mode, or by clicking the keyboard icon in the tools.

Important ones while in Command Mode:

  • a: create a new code chunk above the current one.
  • b: create a new code chunk below the current one.
  • ii: interrupt the kernel (really useful when some code is running too long or you've accidentally initiated an infinite loop!
  • y: code mode
  • m: markdown mode
  • shift + m: merge cells (when more than one cell is highlighted)
  • dd: delete cell.

Important ones while in Edit Mode:

  • shit + ctrl + minus: split cell

Magic Commands

Magic commands, and are prefixed by the % character. These magic commands are designed to succinctly solve various common problems in standard data analysis.

Magic commands come in two flavors:

  • line magics, which are denoted by a single % prefix and operate on a single line of input,
  • cell magics, which are denoted by a double %% prefix and operate on multiple lines of input.

List off all the available magic commands.

In [1]:
%lsmagic
Out[1]:
Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

Or consult the quick reference sheet of all available magic

In [2]:
%quickref

Useful Magic

Here are some useful magic commands that come in handy as you're working with code.

Bookmarking

"Come back here later"

In [4]:
%bookmark Home

See below

Changing working directories

In [21]:
%cd ~/Dropbox
/Users/tb186/Dropbox
In [22]:
%pwd
Out[22]:
'/Users/tb186/Dropbox'

Using the bookmark to return to where we were...

In [23]:
%cd -b Home
(bookmark:Home) -> /Users/tb186/Dropbox/courses/ds-1/ppol5203/lecture_notes/week-01
/Users/tb186/Dropbox/courses/ds-1/ppol5203/lecture_notes/week-01
In [24]:
%pwd
Out[24]:
'/Users/tb186/Dropbox/courses/ds-1/ppol5203/lecture_notes/week-01'

Writing code to files

Extremely useful when we develop some functionality that we'd like to utilize later on.

In [25]:
%%writefile my_fib_func.py
def fib(n):
    '''Fibonacci Sequence'''
    x = [0]*n
    for i in range(n):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i-2] + x[i-1]
    return x
Overwriting my_fib_func.py
In [26]:
%ls # list files ( see our function)
_basics_of_cmd.html             _using-jupyter-notebooks.ipynb
_basics_of_cmd.ipynb            course_infrastructure.qmd
_minimal_example_python.html    intro-to-quarto.qmd
_minimal_example_python.qmd     intro-to-quarto_files/
_minimal_example_python_files/  my_fib_func.py
_using-jupyter-notebooks.html

Reading in files

In [27]:
# %load my_fib_func.py
def fib(n):
    '''Fibonacci Sequence'''
    x = [0]*n
    for i in range(n):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i-2] + x[i-1]
    return x

Run an external file as a program

In [28]:
%run my_fib_func.py

Timing Code

How fast does what we wrote run?

In [29]:
%time fib(10)
CPU times: user 14 µs, sys: 1e+03 ns, total: 15 µs
Wall time: 28.1 µs
Out[29]:
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

How long does many runs take (statistical sample)?

In [30]:
%timeit fib(10)
679 ns ± 1.39 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Look up object names in the name space

In [31]:
main_dat = [1,2,3,4]
main_key = ["a","b"]
x = 5
y = 6
In [32]:
%psearch main*

Whenever you encounter an error or exception, just open a new notebook cell, type %debug and run the cell. This will open a command line where you can test your code and inspect all variables right up to the line that threw the error. Type n and hit Enter to run the next line of code (The -> arrow shows you the current position). Use c to continue until the next breakpoint. q quits the debugger and code execution.

Asking for help

In [33]:
%%timeit?

Do you need the % signal?

Let's try

In [39]:
psearch main*
In [40]:
pwd
Out[40]:
'/Users/tb186/Dropbox/courses/ds-1/ppol5203/lecture_notes/week-01'
In [41]:
timeit fib(10)
682 ns ± 9.24 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

answer: actually no. You can use the magin


Notebook Extensions

We can expand the functionality of Jupyter notebooks through extensions. Extensions allow for use to create and use new features that better customize the notebook's user experience. For example, there are extensions for spell check, a table of contents to ease navigation, run code in parallel, and for viewing differences in notebooks when using Version control.

Download python module to install notebook extensions: https://github.com/ipython-contrib/jupyter_contrib_nbextensions

Using PyPi (module manager):

pip install jupyter_nbextensions_configurator jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user

Using Conda (Anaconda module manager):

conda install -c conda-forge jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user

Extensions can be activated most easily on the home screen when you first activate your Jupyter notebook.

Useful Extensions

  • Collapsible headings: allows you to collapse some parts of the notebooks.
  • Notify: sends a notification when the notebook becomes idle (for long running tasks)
  • Code folding: folds function, loops, and indented code chunks (makes things tidy)
  • nbdime: provides tools for git differencing and merging of Jupyter Notebooks.
    • Requires installation: pip install nbdime
In [2]:
!jupyter nbconvert _using-jupyter-notebooks.ipynb --to html --template classic
[NbConvertApp] Converting notebook _using-jupyter-notebooks.ipynb to html
[NbConvertApp] Writing 309206 bytes to _using-jupyter-notebooks.html