class: center, middle, inverse, title-slide # General Programming: Conditional, Loops and Functions ## IPSA-Flacso Summer School ### Tiago Ventura --- # Introduction Today we will cover more advanced programming skills. They are: - Conditional Flows. - Functions. - Iteration. Mastering these skills will allow you to make all your work scalable and eventually more reproducible. These concepts are general programming skills. Not specific to R. Honestly, once you learn the logic behind loops, conditionals, and functions, it is going to be easy for you to move through different programming languages. --- class: inverse, middle, center ## Conditional Flow --- ## Conditional Flow It allows you to impose restrictions on your code. - When should you run this? - Is this condition true? - Is this object a numeric? To do that, we use an these three main statements: `if`, `else` and `else if` --- ## if-else ```r if (condition) { # Code executed when condition is TRUE } else { # Code executed when condition is FALSE } ``` ``` ## Error in eval(expr, envir, enclos): object 'condition' not found ``` Where: - condition = logical boolean statement. --- ## Simple Example ```r year=2022 if(year>2021){ print("Virtual Flacso -> Tiago is not Travelling to Mexico") } else{ print("In person Flacso -> Please, invite me to teach this course again!!!") } ``` ``` ## [1] "Virtual Flacso -> Tiago is not Travelling to Mexico" ``` ```r year=2023 if(year==2022){ print("Virtual Flacso -> Tiago is not Travelling to Mexico") } else{ print("In person Flacso -> Please, invite me to teach this course again!!!") } ``` ``` ## [1] "In person Flacso -> Please, invite me to teach this course again!!!" ``` --- ## Multiple Conditions The statement above was not very accurate. We need more than one condition. We use `else if` statements. ```r year=2020 if(year==2022|year==2021){ print("Virtual Flacso -> Tiago is not Travelling to Mexico") } else if(year>2022){ print("In person Flacso -> Please, invite me to teach this course again!!!") } else if (year <2021){ print("In person Flacso -> but sorry Tiago, we cannot come back on time") } ``` ``` ## [1] "In person Flacso -> but sorry Tiago, we cannot come back on time" ``` Or just: ```r year="2020" if(year==2022|year==2021){ print("Virtual Flacso -> Tiago is not Travelling to Mexico") } else if(year>2022){ print("In person Flacso -> Please, invite me to teach this course again!!!") } else{ print("Your input is wrong. Or you want to come back on time") } ``` ``` ## [1] "Your input is wrong. Or you want to come back on time" ``` --- class:center, middle, inverse ## Loops --- # Loops As one quickly notes, doing any task in R can become redundant. Loops can dramatically increase the efficiency of our workflow when a task is _systematic and repeatable_. Let's see a toy example. ```r places_with_ilcss <- c("Buenos Aires", "College Park", "Mexico City") # for loop for(cities in places_with_ilcss){ #iterador # repeating print(cities) } ``` ``` ## [1] "Buenos Aires" ## [1] "College Park" ## [1] "Mexico City" ``` --- ## What is this doing? ```r cities <- places_with_ilcss[1] print(cities) ``` ``` ## [1] "Buenos Aires" ``` ```r cities <- places_with_ilcss[2] print(cities) ``` ``` ## [1] "College Park" ``` ```r cities <- places_with_ilcss[3] print(cities) ``` ``` ## [1] "Mexico City" ``` --- ## Components of a for loop **`for(item in group_of_items)`: the sequence** - item: index for each repetition. - group_of_items: object we are drawing the item from -- **Parenthesis {}** - What it is to be repeated goes here. -- **Containers** - Created outside of the loop. - Object to save all your outputs. -- --- ## A more realistic example. ```r ##devtools::install_github("apreshill/bakeoff") library(bakeoff) data("ratings_seasons") ``` We want to calculate the average number of views per season. ```r # How many Seasons? temporadas <- unique(ratings_seasons$series) temporadas ``` ``` ## [1] 1 2 3 4 5 6 7 8 ``` ```r # Average Views #Temporadas temporada_1 <- ratings_seasons[ratings_seasons$series==temporadas[1],] temporada_2 <- ratings_seasons[ratings_seasons$series==temporadas[2], ] temporada_3 <- ratings_seasons[ratings_seasons$series==temporadas[3], ] # Take the mean mean(temporada_1$viewers_7day) ``` ``` ## [1] 2.77 ``` --- ## Building our loop: Computationally Thinking. 1. Think about and specify the length of something you want to loop through. In our case, it's the number of seasons. 2. Set the code up so that every iteration only performs a manipulation on a single subset at a time. 3. Save the contents of each iteration in a new object that won't be overwritten. Here we want to think in terms of "stacking" results or concatenating them. In practice... read your code, see what is repeating, and make it general. --- ```r # Containes container <- list() for(i in 1:length(temporadas)){ # Step 1 # step 2 temp=ratings_seasons[ratings_seasons$series==temporadas[i], ] # Step 3 container[[i]] <- mean(temp$viewers_7day) } container ``` ``` ## [[1]] ## [1] 2.77 ## ## [[2]] ## [1] 3.95125 ## ## [[3]] ## [1] 5.001 ## ## [[4]] ## [1] 7.354 ## ## [[5]] ## [1] 10.0393 ## ## [[6]] ## [1] 12.311 ## ## [[7]] ## [1] 13.563 ## ## [[8]] ## [1] 9.017 ``` --- class:center, middle, inverse ## Functions --- ## What are Functions? **Everything that generates an output in R is a function.** ```r x<- c(1, 2, 3, 4, 5) # Mathematical Functions in R sum(x, na.rm = TRUE) ``` ``` ## [1] 15 ``` ```r log(x) ``` ``` ## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 ``` ```r sqrt(x) ``` ``` ## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 ``` ```r mean(x) ``` ``` ## [1] 3 ``` --- ## Inputs, outputs and arguments. As we learned in our basic math course, a function maps a input to an output. The logic is the same in R. Every function has the following components: - inputs: most of times an object you send in to the function. - outputs: the results of the function. - arguments: specify how the function works. ```r # Lets see how the mean function works ?mean ``` --- ## Creating our own functions. .pull-left[ ```r some_me <- function( argument1, argument2 ){ value <- argument1 + argument2 return(value) # "return" define the output } some_me(2,3) ``` ``` ## [1] 5 ``` ```r some_me(100,123) ``` ``` ## [1] 223 ``` ```r some_me(60,3^4) ``` ``` ## [1] 141 ``` ] .pull-right[ Components: - **name**: some_me; - **argument**: argument1 e argument2; - **body**: everything inside of {}; - **return**: output. ] --- #### Another More General Example ```r nome_da_funcao <- function(x,y,z){ ## Argumentos ### Corpo: o que a função faz out <- what the function does. ### Conclui Corpo. return(out) ## output } ## fecha a função ``` --- ## When should you write a function? <br><br><br> $$ \texttt{"You should consider writing a function whenever} \\ $$ $$ \texttt{you’ve copied and pasted a block of code more than twice"} $$ --- ### Applying a function to multiple objects. Most of times, we will use our functions across different tasks. These taks can be distinct codes, or just distinct objects. This is particular important on scrapping digital data, where we usually write a function to collect information from one page, and then scale-up to several others. There are three different ways we can scale up our functions. - Loops - apply functions - `purr::map` We will conclude today learning more about the `map` functions from the `purrr` package. --- ## First, some references We will just introduce you to the beautiful world of `purrr`. If you want to learn more, you shoudl see these tutorials. - [Jenny BC tutorial](https://jennybc.github.io/purrr-tutorial/) - [Rebeca Barter Tutorial](http://www.rebeccabarter.com/blog/2019-08-19_purrr/) - [HAdley Wickham Class](https://www.youtube.com/watch?v=bzUmK0Y07ck) --- ### Purrr `Purrr` is a tidyverse packages that introduces `map` functions. These functions are all about iteration (like loops). But in a cleaner and more consistent way. Let's see an example. We will go through three steps. - Create ten dataframes and save as a .csv - Open all of them with a loop. - Then open all of them with map. --- ## Create some dataframes ```r # create a dir dir.create("fake_data") for(i in 1:10){ d <- tibble(n=1000, norm=rnorm(n, 0, 1), unif=runif(n, 0,1)) write.csv(d, paste0("fake_data/data", i,".csv")) } ``` --- ## Open with a loop ```r # create container data <- list() # iterator path=paste0("./fake_data/", list.files("./fake_data")) # the loop for(i in 1:length(path)){ data[[i]] <- read.csv(path[[i]]) } str(data, 1) ``` ``` ## List of 10 ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ## $ :'data.frame': 1000 obs. of 4 variables: ``` --- ## Using Map ```r # simple map = output is a list l <- map(path, read.csv) # and some extensions l <- map_dfr(path, read.csv) # Check glimpse(l) ``` ``` ## Rows: 10,000 ## Columns: 4 ## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19… ## $ n <int> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000,… ## $ norm <dbl> -0.07878661, 0.69403199, 0.45043064, 0.28780269, -0.46415634, -0.… ## $ unif <dbl> 0.442120615, 0.528951428, 0.456746661, 0.980050923, 0.342971583, … ``` --- ## Maps Functions A map function is one that applies the same action/function to every element of an object (e.g. each entry of a list or a vector, or each of the columns of a data frame). The output is determined by the term that follows the `map`. As such: - `map(.x, .f)` is the main mapping function and returns a list - `map_df(.x, .f)` returns a data frame - `map_dbl(.x, .f)` returns a numeric (double) vector - `map_chr(.x, .f)` returns a character vector - `map_lgl(.x, .f)` returns a logical vector --- ### More Examples ```r # Create a List lista1 <- list(a=rnorm(1000, 0, 1), b=rnorm(1000, 1, 1), c=rnorm(1000, 10, 1)) # Apply a function to all the elements of a list map(lista1, mean) ``` ``` ## $a ## [1] 0.04345109 ## ## $b ## [1] 1.034161 ## ## $c ## [1] 10.03092 ``` ```r # same as: mean(lista1$a) ``` ``` ## [1] 0.04345109 ``` ```r mean(lista1$b) ``` ``` ## [1] 1.034161 ``` ```r mean(lista1$c) ``` ``` ## [1] 10.03092 ``` --- ### map_df ```r map_df(lista1, mean) ``` ``` ## # A tibble: 1 x 3 ## a b c ## <dbl> <dbl> <dbl> ## 1 0.0435 1.03 10.0 ``` --- ## The tilde-dot shorthand for functions To make the code more concise you can use the tilde-dot shorthand for anonymous functions. This shortcut basically allows you to work with arguments of functions inside of the map. ```r # With tilde map(lista1, ~ log(abs(sum(.x))) + 100) ``` ``` ## $a ## [1] 103.7716 ## ## $b ## [1] 106.9413 ## ## $c ## [1] 109.2134 ``` ```r # With anonymous functions map(lista1, function(x) log(abs(sum(x))) + 100) ``` ``` ## $a ## [1] 103.7716 ## ## $b ## [1] 106.9413 ## ## $c ## [1] 109.2134 ``` --- Super brief introduction to purrr. We will learn more tomorrow with our scrapping classes. And please, Check here for more materials: - [here](https://jennybc.github.io/purrr-tutorial/), - [here](http://www.rebeccabarter.com/blog/2019-08-19_purrr/), and - [here](https://www.youtube.com/watch?v=bzUmK0Y07ck). --- class:center, middle, inverse ## Gracias!!