libraries, packages, and repositories

  • A “fresh” R installation already contains hundreds of functions.
  • Functions are organized in libraries.
  • Libraries address a certain topic or area (e.g., graphics, a specific statstical method)
  • A package is a ‘container’ for distributing and sharing libraries.
  • You can add additional packages to extend you R installation.
  • Additional packages are provided in repositories or in seperate files.
  • Repositories are online data storages.
  • The most important repository for R is CRAN (The Comprehensive R Archive Network)

Installing and activating new packages

  • You find a list of all CRAN packages on https://cran.r-project.org/ Packages
  • You get an overview of all functions within a package with the help() or short ? function.
  • You can directly install a package from CRAN with the install.packages() function.

After successful installation, add on packages have to be activated and loaded into memory in each R sesssion with the library() function.
Note: You only install once, but you use library() each time you restart R or R Studio.

Task

Install the packages psych and tidyverse.
Then activate both packages:

library(tidyverse)  
library(psych)
install.packages(c("psych", "tidyverse"))

R-Studio projects

As soon as you have more than one source file and/or external data, it makes sense to start a project instead of just using single source files.

  • A project is a feature of R Studio, not of R.
  • A project always hosted in a folder on your harddrive.
  • All scripts, data, and other files are stored in that folder.
  • When later opening a project, the working directory is directly set to the folder location.

Working directory: The place on your harddrive R will save and load data from by default (i.e. when no other place is explicitly set). Use the getwd() and setwd() functions to get and set the working directory.

Starting an R-Studio project

You can start a project from R studio through:

  1. File New Project …
  2. Now choose whether you already have a folder you like to start a project in or you create a new empty folder for an R project.
  3. Choose New R Project as the project type.
  4. Choose a directory name and start the project.

Task

  • Create a new R project with a name of your choice (e.g. ‘R_course’).
  • Copy all your R scripts related to this R course into that new project folder.
  • Close and reopen R Studio
  • Open the project through:
    • File Recent projects
    • Or the project menue in the upper right corner of R Studio

Importing a data set from Excel

  • The read_excel() function from the readxl package (included in tidyverse) is used to import files created by Microsoft Excel.
  • Alternatively: R Studio provided an easy way to import data:
    File Import Dataset From Excel
  • But if you want to have a full script that runs by itself, I recommend to use the R functions.
  • Store your data within your R project folder.
  • If you do not install it there, you need to know the folder location to load it into R.

Example

library(readxl)
dat <- read_xlsx("res/cars.xlsx")
names(dat) # this function shows the variable names of a data frame 
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

Task

  • Download the Excel file “cars.xlsx” from the moodle course.
  • Save it in your project directory.
  • Import the data set and assign it to an object dat.
  • Apply the View() function to see the dataset.

Note: View() opens a new tab in RStudio with the content of a data frame (e.g. View(dat)).

Task

  • Calculate the mean of mpg (miles per gallon) for cars with 4, 6, and 8 cylinders (variable cly).

:-)

Task - solution

  • Calculate the mean of mpg (miles per gallon) for cars with 4, 6, and 8 cylinders (variable cly).
mean(dat$mpg[dat$cyl == 4])
mean(dat$mpg[dat$cyl == 6])
mean(dat$mpg[dat$cyl == 8])
[1] 26.66364
[1] 19.74286
[1] 15.1