Manipulating and Plotting Data

Coding for Reproducible Research

Laura Roldan-Gomez

11/7/22

We are…


Emma Walker


Theresa Wacker


Laura Roldan-Gomez

lr480@exeter.ac.uk

Welcome again to this hybrid session!

Last week was tough!

Today…

  • A quick word on matrices

  • Working with what we learned on the previous session:

  • Data Exploration

  • Data Manipulation

  • Plotting!!!

Let’s reflect

So far you’ve…

  • Installed R and Rstudio, know its parts, how to open a script.

  • Learned there are three core data types:

    • Numerical: 3 15.5 1+4i
    • Characters or strings: “a” “I love R” “3”
    • Logical: TRUE or FALSE 1 or 0

  • Learned that these data types can be organised into
    • Vectors
    • Lists
    • Data frames

6. Matrices

Matrices are atomic vectors with dimensions; the number of rows and columns. As with atomic vectors, the elements of a matrix must be of the same data type.


To create an empty matrix, we need to define those dimensions:

m<-matrix(nrow=2, ncol=2)
m
     [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA


We can find out how many dimensions a matrix has by using

dim(m)
[1] 2 2


You can check that matrices are vectors with a class attribute of matrix by using class() and typeof().

m <- matrix(c(1:3))

While class() shows that m is a matrix, typeof() shows that in this case the matrix is an integer vector (these can be character vectors, too).


class(m)
[1] "matrix" "array" 


typeof(m)
[1] "integer"

When creating a matrix, it is important to remember that matrices are filled column-wise

m<-matrix(1:6, nrow=2, ncol=3)
m
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6


If that is not what you want, you can use the byrow argument (a logical: can be TRUE or FALSE) to specify how the matrix is filled

m<-matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
m
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

You can create a matrix from a vector:

m<-sample(1:100, size=10)
m
 [1] 69  7 13 85 51 12 31 98 54 19


dim(m)<-c(2,5)
m
     [,1] [,2] [,3] [,4] [,5]
[1,]   69   13   51   31   54
[2,]    7   85   12   98   19

A lot is going on here. Let’s dissect it:

  • We generate a random integer vector using sample(). sample() in this case randomly draws 10 (size=10) numbers from 1 to 100 (1:100).

  • We assign the vector dimensions using dim() and c(2,5), with the later being c(rows, columns).

All of the above takes the random integer vector and transforms it into a matrix with 2 rows and 5 columns.

You can also bind columns and rows using cbind() and rbind:

x <- 1:3
y <- 10:12
m<-cbind(x, y)
m
     x  y
[1,] 1 10
[2,] 2 11
[3,] 3 12


n<-rbind(x,y)
n
  [,1] [,2] [,3]
x    1    2    3
y   10   11   12

6.1. Matrix indexing


Akin to vectors, we revisit our square-brackets and can retrieve elements of a matrix by specifying the index along each dimension (e.g. “row” and “column”) in single square brackets.

m[3,2] # Note that it is [row,column].
 y 
12 

Let’s begin

Next week…

In Session 4 - Introduction to Statistical Analysis, you will learn…

  • Summary statistics
  • Normal and other distributions
  • Confidence Intervals
  • Random Sampling
  • Basic analysis
  • Correlation
  • T-test
  • ANOVA

Congratulations this is the END!

via Gfycat

This was hard but do keep going. Join us for Session 3 - Manipulating and Plotting Data. Hopefully, you will find it easier than today.

  • Thank YOU! For your attention and effort

  • Be a part of this! You can helps us running or assisting a future session (they can )opt into this via the feedback survey)

  • Tell us… What you liked and what you didn’t using the feedback survey

Refresher

Let’s pretend you are working with your own data.

These commands are key

  1. Load the data set
  1. Always check how R read your variables
'data.frame':   20 obs. of  7 variables:
 $ Field.Name  : chr  "Nashs Field" "Silwood Bottom" "Nursery Field" "Rush Meadow" ...
 $ Area        : num  3.6 5.1 2.8 2.4 3.8 3.1 3.5 2.1 1.9 1.5 ...
 $ Slope       : int  11 2 3 5 0 2 3 0 0 4 ...
 $ Vegetation  : chr  "Grassland" "Arable" "Grassland" "Meadow" ...
 $ Soil.pH     : num  4.1 5.2 4.3 4.9 4.2 3.9 4.2 4.8 5.7 5 ...
 $ Damp        : logi  FALSE FALSE FALSE TRUE FALSE FALSE ...
 $ Worm.density: int  4 7 2 5 6 2 3 4 9 7 ...
  1. Fix your data

If any of the variables don’t look the way you want them to, use these commands to tell R to change read it properly.

  • as.logical
  • as.character
  • as.numeric
  • as.factor

Use them like so:

Create new variables (columns)

[1] 20

Indexing: remember the [squared brackets]?

Remove the “nonsense”…

[1] "Field.Name"     "Area"           "Slope"          "Vegetation"    
[5] "Soil.pH"        "Damp"           "Worm.density"   "new_column"    
[9] "another_column"

Change the names of columns

Subset your data