class: center, middle, inverse, title-slide # Introduction to R ## IPSA-Flacso Summer School ### Tiago Ventura --- # Plans for Today 1. Workflow: R and Rstudio. 2. How to Interact with R. 3. Packages and Asking for Help in R. 4. Objects e Classes. 5. Boolean Operators. 5. Data Structure. 6. Data Manipulation with Base R. 6. Importing and Exporting Data in R. --- # What's R? R is a versatile, open source programming/scripting language that's useful both for statistics but also data science. - Open source software under **GPL**. - Superior (if not just comparable) to commercial alternatives. - Not just for statistics, but also **general purpose** programming. - Is **object oriented** (= R has objects) and **functional** (= You can write functions). - Large and growing community of peers. --- ## RStudio. RStudio is the premier R graphical user interface (GUI) and integrated development environment (IDE) that makes R easier to use. <img src="R_vs_RStudio_1.png" width="100%" /> .footnote[Source: [Rochele Terman's Intro to CSS Book](https://plsc-31101.github.io/course/r-basics.html)] --- ## Understanding the R Studio <img src="rstudio.png" width="100%" /> --- ## How to Interact with R ? `Open RStudio!` --- ### Command Line (bottom left panel). You can interact with R directly using the Command Line. The symbol `>` indicates that R is ready to work! Copy and paste the command below to your command line, and click enter. ```r 2+2 ``` ``` ## [1] 4 ``` -- When you see the symbol `+` it means your code was not completed. Then, you should press esc to move back to `>`. ```r # Incomplete Code incomplete<- "I am going to give you an incomplete ``` --- ## Using Scripts. Open a new script: `File` -> `New File` -> `R Script` The script is just a plain text file. But you can send command from the script to your command line directly. -- #### `command + enter` (Mac) ou `Ctrl + enter` (PC). -- ```r # Rode estas operação em um script. # Hashtags permite você comentar seu código. 2^2 2*2 2/2 ``` --- ### Why Should I go with scripts? - More appropriate when working with long codes. - Show others all your steps. - Allow your future self to return to your code. --- ## Other Important Tips. #### Comments. Use # signs to add comments within your code chunks. Help yourself in the future and make as much comments as you can in your code! ```r # Hello all! ``` #### Errors When the text is a legitimate error, it will be prefaced with “Error:”, and R will try to explain what went wrong. ```r plot("hello") ``` ``` ## Error in plot.window(...): need finite 'ylim' values ``` <!-- --> --- ## R Packages. -- .pull-left[ #### What the Packages are? - A set of thematic functions that someone put together for you. - Sometimes packages also bring data - In the end, just a folder saved downloaded to your computer. ] -- .pull-right[ There are a number of `packages` that are supplied with the R distribution. These are known as ``[base packages](https://stat.ethz.ch/R-manual/R-devel/library/base/html/00Index.html)". Other we need to install. - Packages should be installed only once. `install.packages()` - Should be loaded every R session. `library()` ] -- --- class: center, middle ### Installing Packages Via Cran. ```r install.packages("devtools", force=TRUE) ``` ### Load a Package. ```r library(devtools) ``` ### Install Packages Via Github ```r devtools::install_github("electorArg/polAr") ``` --- ## Asking for Help in R ```r # Specific to a function ?mean # Help para a função mean. help(mean) # More General ??mean ``` .center[ <img src="ajuda.png" width="70%" /> ] --- ## Asking for Help on Google. - **Google**: name of the function + the error message you are getting. - Remove all the information about you and your local path - Try to understand the solution. Copy and Paste will rarely solve your problems. --- class: center, middle ## 15 minutes Rule Taken from [Rochele Terman plsc-31101](https://plsc-31101.github.io/course/introduction.html). <img src="intro_r_flacso_files/figure-html/unnamed-chunk-13-1.png" width="50%" /> --- class: center, middle, inverse # Objects in R --- # Objects: Definition? In simple terms, an `object` is a bit of text that represents a specific value. Objects names can only contain letters, numbers, the underscore character, and (unlike Python) the period character. Assigning values to new objects is how you create things in R. --- class:center, middle ### `Everything that exists in R is an object.` <br><br> --- ## Creating Objects *(assignment operator)* `<-` ```r # Numeric Objects x <- 5 y <- 7 # Character Objects nome <- "Tiago Ventura" ``` All R statements where you create objects, assignment statements, have the same form: ```r object_name <- value ``` --- ## Can I use `=` to create objects? -- Yes. But... -- .center[  ] -- ### Best Practice Use "<-" to create objects. And use "=" inside of functions. ```r x<-6 mean(x=c(5, 7)) ``` --- ## Whats is an object name? An object name is just a piece of text. Object names must start with a letter, and can only contain letters, numbers, _ and .. You want your object names to be descriptive, so you’ll need a convention for multiple words. Best Practice: - **snake_case:** style of writing in which each space is replaced by an underscore (__) character, and the first letter of each word written in lowercase. --- class: center, middle ## Other important commands #### Checking my environment. ```r ls() ``` ``` ## [1] "nome" "x" "y" ``` #### Remove Objects ```r rm(y) ``` #### Visualizing Objects ```r print(nome) ``` ``` ## [1] "Tiago Ventura" ``` #### Changing Objects ```r nome <- "Tiago Augusto Ventura" ``` --- ## Objects Classes Every object in R has a **class**. The class describes what the object is. Main classes are: <table class="table table-striped table-hover table-condensed" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Example </th> <th style="text-align:left;"> Tipo </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> a , swc </td> <td style="text-align:left;"> Character </td> </tr> <tr> <td style="text-align:left;"> 2, 3, 15 </td> <td style="text-align:left;"> Numeric </td> </tr> <tr> <td style="text-align:left;"> 1, 2 </td> <td style="text-align:left;"> Interger </td> </tr> <tr> <td style="text-align:left;"> FALSE, TRUE </td> <td style="text-align:left;"> Logical </td> </tr> </tbody> </table> --- ## Checking the Class of my Objects. `class()` ```r class(3) ``` ``` ## [1] "numeric" ``` ```r class(TRUE) ``` ``` ## [1] "logical" ``` ```r meu_numero_da_sorte= "13" class(meu_numero_da_sorte) ``` ``` ## [1] "character" ``` ```r class(meu_numero_da_sorte==13) ``` ``` ## [1] "logical" ``` --- ## Another Way to Go: is.class? ```r is.numeric(2) ``` ``` ## [1] TRUE ``` ```r is.logical(TRUE) ``` ``` ## [1] TRUE ``` ```r is.character("2") ``` ``` ## [1] TRUE ``` ```r is.integer(1L) ``` ``` ## [1] TRUE ``` --- ## Coercing Objects. R allows you to easily change the class of your objects using the functions: `as.class()` ```r # Cria Objeto num_1_5 <- c(1, 2, 3, 4, 5) # Altera a classe char_1_5 <- as.character(num_1_5) # Checando class(char_1_5) ``` ``` ## [1] "character" ``` ```r class(num_1_5) ``` ``` ## [1] "numeric" ``` ```r # Ou as.numeric("25") ``` ``` ## [1] 25 ``` --- class: center, middle, inverse # Boolean Operators in R --- # Boolean ("logical") Operators To complement our discussion on objects in R, let's briefly see how boolean operators work. **Important Note**: These operators are logical. Always return TRUE or FALSE. ```r x == y # equals to x != y # does not equal x >= y # greater than or equal to x <= y # less than or equal to x > y # greater than x < y # less than x==1 & y==5 # "and" conditional statements x==1 | y==5 # "or" conditional statements ``` Boolean operators will be important in the future for you on tasks of data manipulations, creating new variables, writing functions and loops. --- # Challenge 1. Let's practice a bit What do the commands below return? Try to guess! #### Question 1: ```r install.packages(tidyverse) ``` #### Question 2: ```r false <- "FALSE" false <- as.logical(false) class(false) ``` #### Question 3 ```r mean(x = sample(1:50, 5)) == mean(x) # TRUE ou FALSE ``` --- class: middle, center, inverse # Data Structure --- ## Vector ```r # Numeric Vector X <- c(1, 2.3, 4, 5, 6.78, 6:10) X ``` ``` ## [1] 1.00 2.30 4.00 5.00 6.78 6.00 7.00 8.00 9.00 10.00 ``` ```r # Class class(X) ``` ``` ## [1] "numeric" ``` ```r # Length length(X) ``` ``` ## [1] 10 ``` --- ## Matrix **Main Feature:** Rectangular and Same type. ```r # Coerce to a matrix x_matrix <- as.matrix(X) # Build a Matrix. m <- matrix(1:10, nrow=5, ncol=2) m ``` ``` ## [,1] [,2] ## [1,] 1 6 ## [2,] 2 7 ## [3,] 3 8 ## [4,] 4 9 ## [5,] 5 10 ``` ```r # Acessing a values m[1, 1] # [rows, columns] ``` ``` ## [1] 1 ``` --- ## List **Main Feature**: Flexible, huge drawer where you can put anything you want. ```r # coerce to a list x_list<- as.list(X) # or lista_1 <- list(X, as.matrix(X), as.character(X)) # Visualize a lista. str(lista_1) ``` ``` ## List of 3 ## $ : num [1:10] 1 2.3 4 5 6.78 6 7 8 9 10 ## $ : num [1:10, 1] 1 2.3 4 5 6.78 6 7 8 9 10 ## $ : chr [1:10] "1" "2.3" "4" "5" ... ``` ```r # Acessing a value: double [[]] lista_1[[1]] ``` ``` ## [1] 1.00 2.30 4.00 5.00 6.78 6.00 7.00 8.00 9.00 10.00 ``` --- # Data Frame. 1. Classic Database. 2. Rectangular. 3. Works with columns with different classes. 4. Like a excel spreadsheet . --- ## Creating a Data Frame. ```r # Coercing as.data.frame(X) ``` ``` ## X ## 1 1.00 ## 2 2.30 ## 3 4.00 ## 4 5.00 ## 5 6.78 ## 6 6.00 ## 7 7.00 ## 8 8.00 ## 9 9.00 ## 10 10.00 ``` ```r # Criando Manualmente data <- data.frame(name=c("Tiago", "Tiago"), last_name=c("Ventura", "Ventura") , school=c("UMD", "FGV"), age=c(30,32)) data ``` ``` ## name last_name school age ## 1 Tiago Ventura UMD 30 ## 2 Tiago Ventura FGV 32 ``` --- ## Pre-Built Data Frames ```r mtcars # already loaded in your R environment ``` ``` ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 ``` --- ## Pre-Built Data Frames from Packages ```r ## Bancos de Dados Pré-Construídos. devtools::install_github("apreshill/bakeoff") library(bakeoff) # Chamando o pacote # O que tenho no meu ambiente? ls() ``` ``` ## [1] "char_1_5" "data" "lista_1" ## [4] "m" "meu_numero_da_sorte" "nome" ## [7] "num_1_5" "x" "X" ## [10] "x_list" "x_matrix" ``` ```r # ativa o banco de dados data("bakers") ls() ``` ``` ## [1] "bakers" "char_1_5" "data" ## [4] "lista_1" "m" "meu_numero_da_sorte" ## [7] "nome" "num_1_5" "x" ## [10] "X" "x_list" "x_matrix" ``` ```r # Examine o objeto. class(bakers) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` ```r str(bakers) ``` ``` ## tibble[,8] [120 × 8] (S3: tbl_df/tbl/data.frame) ## $ series : Factor w/ 10 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ baker_full : chr [1:120] "Annetha Mills" "David Chambers" "Edward \"Edd\" Kimber" "Jasminder Randhawa" ... ## $ baker : chr [1:120] "Annetha" "David" "Edd" "Jasminder" ... ## $ age : num [1:120] 30 31 24 45 25 51 44 48 37 31 ... ## $ occupation : chr [1:120] "Midwife" "Entrepreneur" "Debt collector for Yorkshire Bank" "Assistant Credit Control Manager" ... ## $ hometown : chr [1:120] "Essex" "Milton Keynes" "Bradford" "Birmingham" ... ## $ baker_last : chr [1:120] "Mills" "Chambers" "Kimber" "Randhawa" ... ## $ baker_first: chr [1:120] "Annetha" "David" "Edward" "Jasminder" ... ``` --- ## Acessing Information from your Data Frame `data[rows, columns]` **Rows**: Only using Numeric Index. **Columns**: Using both Numeric Index or Textual Keys. --- **Using Numeric Index** ```r bakers[,1] # primeira coluna ``` ``` ## # A tibble: 120 x 1 ## series ## <fct> ## 1 1 ## 2 1 ## 3 1 ## 4 1 ## 5 1 ## 6 1 ## 7 1 ## 8 1 ## 9 1 ## 10 1 ## # … with 110 more rows ``` --- **Textual Key** ```r bakers$hometown ``` ``` ## [1] "Essex" ## [2] "Milton Keynes" ## [3] "Bradford" ## [4] "Birmingham" ## [5] "St Albans" ## [6] "Midlothian, Scotland" ## [7] "Manchester" ## [8] "South Wales" ## [9] "Midhurst, West Sussex" ## [10] "Poynton, Cheshire" ## [11] "Northampton" ## [12] "Leicester" ## [13] "Dunstable, Bedfordshire" ## [14] "Formby, Liverpool" ## [15] "Croydon" ## [16] "Ongar, Essex" ## [17] "Arlesey, Bedfordshire" ## [18] "Kidderminster, Worcestershire" ## [19] "London" ## [20] "Norfolk" ## [21] "Enfield, London" ## [22] "West Kirby, The Wirral" ## [23] "Sutton Coldfield" ## [24] "Pease Pottage, West Sussex" ## [25] "Sheffield" ## [26] "Hillswick, Shetland Islands" ## [27] "Wigan" ## [28] "Leicester" ## [29] "Tamworth, Staffordshire" ## [30] "Windsor, Berkshire" ## [31] "Bristol" ## [32] "Bewbush, West Sussex" ## [33] "Lichfield, Staffordshire" ## [34] "Somerset" ## [35] "Saltley, Birmingham" ## [36] "Aldershot, Hampshire" ## [37] "Didcot, Oxfordshire" ## [38] "Peterborough" ## [39] "Market Harborough, Leicestershire" ## [40] "Teignmouth, Devon" ## [41] "Sheffield" ## [42] "London" ## [43] "Grimsby, Lincolnshire" ## [44] "Milton Keynes" ## [45] "Melbourn, Cambridgeshire" ## [46] "Southend, Essex" ## [47] "Reading, Berkshire" ## [48] "Broadstairs, Kent" ## [49] "Ashton upon Mersey, Trafford" ## [50] "Alkington, Shropshire" ## [51] "Portsmouth, Hampshire" ## [52] "London/Belfast" ## [53] "Sneinton, Nottingham" ## [54] "Brighton, East Sussex" ## [55] "Poynton, Cheshire" ## [56] "Ascot, Berkshire" ## [57] "Barton-upon-Humber, Lincolnshire" ## [58] "Portknockie, Moray" ## [59] "Mill Hill, London" ## [60] "Bracknell, Berkshire" ## [61] "Penwortham, Lancashire" ## [62] "Dunkeld, Perth and Kinross" ## [63] "Great Wilbraham, Cambridgeshire" ## [64] "Auchterarder, Perthshire" ## [65] "London" ## [66] "Leeds / Luton" ## [67] "Swansea, Wales" ## [68] "Yeadon, West Yorkshire" ## [69] "Guildford, Surrey" ## [70] "Manchester" ## [71] "Woodford, London / Vilkaviškis, Lithuania" ## [72] "Derby / Holywood, County Down" ## [73] "South London" ## [74] "Barton-Le-Clay, Bedfordshire" ## [75] "Beckenham" ## [76] "Brooke, Norfolk" ## [77] "Bolton" ## [78] "Cardiff" ## [79] "Durham" ## [80] "Erith" ## [81] "London" ## [82] "Rochdale" ## [83] "Yeovil" ## [84] "Bristol" ## [85] "Merseyside" ## [86] "Brentwood, Essex" ## [87] "Crawley, West Sussex" ## [88] "Merseyside" ## [89] "North London" ## [90] "Southend, Essex" ## [91] "West Molesey, Surrey" ## [92] "Radlett, Hertfordshire" ## [93] "Watford, Hertfordshire" ## [94] "Edinburgh" ## [95] "North London" ## [96] "London" ## [97] "Bristol" ## [98] "London" ## [99] "County Tyrone" ## [100] "Newport" ## [101] "Wakefield" ## [102] "Leeds" ## [103] "Sheffield" ## [104] "London" ## [105] "Rotherham" ## [106] "London" ## [107] "West Midlands" ## [108] "Essex" ## [109] "Halifax" ## [110] "Rotherham" ## [111] "Whitby" ## [112] "Leeds" ## [113] "Durham" ## [114] "Surrey" ## [115] "Stratford-upon-Avon" ## [116] "Tenby, Wales" ## [117] "Rainham" ## [118] "Leicester" ## [119] "Somerset" ## [120] "Chester" ``` ```r # Ou bakers[,"hometown"] ``` ``` ## # A tibble: 120 x 1 ## hometown ## <chr> ## 1 Essex ## 2 Milton Keynes ## 3 Bradford ## 4 Birmingham ## 5 St Albans ## 6 Midlothian, Scotland ## 7 Manchester ## 8 South Wales ## 9 Midhurst, West Sussex ## 10 Poynton, Cheshire ## # … with 110 more rows ``` --- **Row with Numeric Index** ```r bakers[1:5, ] ``` ``` ## # A tibble: 5 x 8 ## series baker_full baker age occupation hometown baker_last baker_first ## <fct> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> ## 1 1 "Annetha Mi… Annet… 30 Midwife Essex Mills Annetha ## 2 1 "David Cham… David 31 Entrepreneur Milton … Chambers David ## 3 1 "Edward \"E… Edd 24 Debt collect… Bradford Kimber Edward ## 4 1 "Jasminder … Jasmi… 45 Assistant Cr… Birming… Randhawa Jasminder ## 5 1 "Jonathan S… Jonat… 25 Research Ana… St Alba… Shepherd Jonathan ``` --- ### Boolean Operators... back, back, back again. ```r # Bakers com mais de 60 anos bakers[bakers$age>60,] ``` ``` ## # A tibble: 10 x 8 ## series baker_full baker age occupation hometown baker_last baker_first ## <fct> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> ## 1 2 "Janet Basu" Janet 63 Teacher of… Formby, … Basu Janet ## 2 3 "Brendan Ly… Brend… 63 Recruitmen… Sutton C… Lynch Brendan ## 3 4 "Christine … Chris… 66 Director o… Didcot, … Wallace Christine ## 4 5 "Diana Bear… Diana 69 Women's In… Alkingto… Beard Diana ## 5 5 "Norman Cal… Norman 66 Retired Na… Portknoc… Calder Norman ## 6 6 "Marie Camp… Marie 66 Retired Auchtera… Campbell Marie ## 7 7 "Jane Beedl… Jane 61 Garden des… Beckenham Beedle Jane ## 8 7 "Lee Banfie… Lee 67 Pastor Bolton Banfield Lee ## 9 7 "Valerie \"… Val 66 Semi-retir… Yeovil Stones Valerie ## 10 8 "Flo Atkins" Flo 71 Retired Merseysi… Atkins Flo ``` ```r #Bakers de Londers bakers[bakers$hometown=="London",] ``` ``` ## # A tibble: 8 x 8 ## series baker_full baker age occupation hometown baker_last baker_first ## <fct> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> ## 1 2 Robert Bill… Robert 25 Photographer London Billington Robert ## 2 4 Kimberley W… Kimbe… 30 Psychologist London Wilson Kimberley ## 3 6 Mat Riley Mat 37 Fire fighter London Riley Mat ## 4 7 Selasi Gbor… Selasi 30 Client servi… London Gbormittah Selasi ## 5 9 Antony Amou… Antony 30 Banker London Amourdoux Antony ## 6 9 Dan Beasley… Dan 36 Full-time pa… London Beasley-H… Dan ## 7 9 Manon Lagrè… Manon 26 Software pro… London Lagrève Manon ## 8 9 Ruby Bhogal Ruby 29 Project mana… London Bhogal Ruby ``` --- ### Useful Functions to Understand your Data Frames. ```r head(bakers) # First lines tail(bakers) # last lines linhas summary(bakers) # classes dim(bakers) # dimensions glimpse(bakers) # summary ``` --- ## Exporting Data Frames. An crucial task you will perform in R relates to exporting your results, including a new dataframe. There are several functions to do this, depending on the format of the output you want. Some examples: - `write.table()` for txt - `write.csv()` for csv - `write.xlsx` for xlsx - `save()` to export as a RData --- # Exporting as a csv. ```r # make a fake data set dfake <- data.frame(normal=rnorm(100, 0, 1), uniform=runif(100, 0, 1), pois=rpois(100, 10)) # write.function(data, name_to_be_saved) write.csv(dfake, "dfake.csv") ``` --- ### But.. wait.. where is my data? Where R is looking at ? `R` does not intuitively know where your data is. If the data is in a special folder called "super secret search", we have to tell `R` how to get there. We do this in two ways: 1. Learn where your files are. 2. Define your **working directory** for your preferred folder: all your outputs and inputs will be there. --- ## Paths Every time `R` is started, it looks at the same place (called _global path_), unless it is asked to go elsewhere. ```r # Where is my R looking at? getwd() ``` ``` ## [1] "/home/venturat/Dropbox/Workshops/flacso_workshop_/day_1" ``` ```r # Where should R be looking instead? setwd("/home/venturat/Downloads") ``` --- ## R Projects A super useful way to organize your files in R is through the use of R projects. We are not going to cover this feature here in the workshop. But, I strongly recommend you to read about it [here](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/) --- ## Importing data ```r # Check Dir getwd() ``` ``` ## [1] "/home/venturat/Dropbox/Workshops/flacso_workshop_/day_1" ``` ```r # See what we have here. list.files() ``` ``` ## [1] "ajuda.png" "auth_bolsonaro.png" "bail_design.jpg" ## [4] "bail.png" "Bolsonarobasicmap.png" "camara.png" ## [7] "data_store.png" "data.jpeg" "dfake.csv" ## [10] "eleicoes.png" "F1.large.jpg" "intro_css_flacso_cache" ## [13] "intro_css_flacso.html" "intro_css_flacso.Rmd" "intro_r_flacso_files" ## [16] "intro_r_flacso.html" "intro_r_flacso.R" "intro_r_flacso.Rmd" ## [19] "libs" "mob_01.png" "news_.png" ## [22] "parameters_ideo.png" "portunol" "portunol.png" ## [25] "R_vs_RStudio_1.png" "redes_sociais.png" "redes.png" ## [28] "rstudio.png" "scott.gif" "survey.png" ## [31] "toystore.jpg" "tweets.png" "twitter-api.jpg" ## [34] "waiting.gif" "xaringan-themer.css" ``` ```r # import dados <- read.csv("dfake.csv") head(dados) ``` ``` ## X normal uniform pois ## 1 1 0.61264269 0.8716483 15 ## 2 2 0.96576466 0.7061860 11 ## 3 3 -0.03444607 0.7349059 14 ## 4 4 -0.74612240 0.1884793 4 ## 5 5 0.23555598 0.7643842 6 ## 6 6 0.27551826 0.2249295 16 ``` --- ## Descriptive Statistics Now that we can get data into R, we want to explore and summarize what's going on. `summary()` allows for one to quickly summarize the distributions across a set of variables ```r summary(mtcars) ``` ``` ## mpg cyl disp hp ## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 ## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 ## Median :19.20 Median :6.000 Median :196.3 Median :123.0 ## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 ## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 ## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 ## drat wt qsec vs ## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000 ## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 ## Median :3.695 Median :3.325 Median :17.71 Median :0.0000 ## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375 ## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 ## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000 ## am gear carb ## Min. :0.0000 Min. :3.000 Min. :1.000 ## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000 ## Median :0.0000 Median :4.000 Median :2.000 ## Mean :0.4062 Mean :3.688 Mean :2.812 ## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000 ## Max. :1.0000 Max. :5.000 Max. :8.000 ``` --- There are a wealth of useful summary operators that are built into `R`. ```r mean() sd() var() range() min() max() median() quantile() fivenum() colMeans() rowMeans() table() ``` ...to name a few! --- ## I know!!! .center[  ] --- class: center, middle, inverse ## Tommorrow ### Tidyverse: and you will fell better! I promise!