# Easy R for M.F.Sc students – Part 3 (Data types in R)

Welcome to the third chapter of Easy R tutorial. Today we will learn how to use R for creating and storing datasets for simple mathematical calculations. It is important that you should know what types of data R understands.

**Data Types**

There are five different types of data that R understands: 1. Vectors, 2. Matrices, 3. Arrays, 4. Data Frames, 5. Lists.

**1. Vectors**

Vectors are data with a group of numbers oriented in a particular direction (One dimension). We have already seen how to make a vector data in Part 2 of the series using a concatinating function “c ( )”.

mydata <- c (1, 2, 3, 4) > mydata [1] 1 2 3 4

Vectors can also be made with a single number using “c ( )” function.

**2. Matrices**

Matrices are data with two dimensions. So essentially a matrix have rows and columns. This will be a very handy and useful data type particularly when you deal with large datasets or multivariate data. This is the one which I use most of the time. To create a matrix, the “matrix ( )” function is used. See the example below.

Three arguments are shown in the above example. You will get a matrix even with a single argument, “data=” (Check Part 2 of the series to learn what is an argument). But this matrix will be with a single column. You can try the same data without the “nrow and ncol”arguments. “nrow” tells how many rows you need and “ncol” tells how many columns you need. The “data=” argument accepts input in vectors. Each element of the vector is arranged columnwise. It fills the first column and then moves to successsive columns as long as data is available in the vector. If there are not enough numbers in the vector which is equal to nrows times ncols, then either you will get an error message or R will start filling the empty spaces again from the first value of the data vector (if the length of vector is a multiple of the number of empty spaces). Play around with this function to learn more.

You can also tell R to fill the matrix row wise instead of column wise using the optional argument “byrow= TRUE”. By default, this value is “FALSE”.

**3. Arrays**

Arrays are an extended version of matrices. Arrays are data types that can hold more than two dimensions. Following is an example dataset made with three dimensions. The length of first dimension is 5. The length of second dimension is 10 and the length of third dimension is 2.

> mydata <- array (data= NA, dim= c (5, 10, 2) ) > mydata , , 1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA NA NA NA NA NA NA NA [2,] NA NA NA NA NA NA NA NA NA NA [3,] NA NA NA NA NA NA NA NA NA NA [4,] NA NA NA NA NA NA NA NA NA NA [5,] NA NA NA NA NA NA NA NA NA NA , , 2 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA NA NA NA NA NA NA NA [2,] NA NA NA NA NA NA NA NA NA NA [3,] NA NA NA NA NA NA NA NA NA NA [4,] NA NA NA NA NA NA NA NA NA NA [5,] NA NA NA NA NA NA NA NA NA NA

The “data=” accept vectors and fill the array. A vector of 100 values is required to fill this array. I simply used “NA” (R understand NA as missing value) to make it easy. You can try “c (1:100)” instead of NA and see how it comes. 1:100 in R, will create a continous vector 1,2,3,4….98,99, 100.

**4. Data Frames**

Data Frames are more difficult to understand but you will learn this while using it. Many standard statistical functions work only if the data input is in the form of a data frame (Eg: Simple Linear Regression).

Data Frames are similar to matrices by structure. But the columns in data frames can exist in different modes (eg: numeric, logical…etc.). Learning a data frame is infact practicing to handle the different modes in a data set, so they can be subjected to desired statistical functions.

To create a data frame, use the function “data.frame ( )” with all data vectors seperated by comma. Let’s take an example in which we have four observations from three parameters of a fish species i.e., 1. Total Length, 2. Sex and 3. whether the fish is Mature or not.

# First create the data vectors > TL <- c (32, 23, 35, 40) > Mature <- c (TRUE, TRUE, TRUE, FALSE) > Sex <- c ("Male", "Male", "Female", "Female") # Second create the Data Frame > mydata <- data.frame (TL, Sex, Mature) # Call mydata to view the Data Frame > mydata TL Sex Mature 1 32 Male TRUE 2 23 Male TRUE 3 35 Female TRUE 4 40 Female FALSE

It doesn’t look complicated until this part. But it is important to check whether each columns are in the right mode. For example, Total Length is a data that should be in “numeric”mode. Sex is a categorical variable and hence such data are named as “Factors” in R. The third column in this data is an information on “Presence / Absence” matter i.e, whether the specimen is mature or not. Such data can be represented by “TRUE”or “FALSE” and they are known as “Logical”in R. To check whether the columns are in the desired mode, we can use the function “str ( )” as below:

Here you can see that all three variables are in the desired mode. R also tells that Sex is a Factor (categorical variable) with two levels in it (Female and Male).

**5. Lists**

Another form of data that R understands is the list form. Essentially list is something which can hold all the above forms of data in one place. Suppose you have a number, a vector, a data frame, a matrix and an array. You can keep all of them together in one place with a list type data using the function “list ( )”. To demonstrate this, let’s now repeat all the earlier exrecises and put them together in a list named ‘mydata’.

# Number A<- 134 # Vector B<- c (1, 2, 3, 4) # Matrix C<- matrix (data= c (1,2,3,4,5,6), nrow= 3, ncol= 2) # Array D<- array (data= NA, dim= c (5, 10, 2) ) # Creating a Data Frame TL <- c (32, 23, 35, 40) Sex <- c ("Male", "Male", "Female", "Female") Mature <- c (TRUE, TRUE, TRUE, FALSE) E <- data.frame (TL, Sex, Mature) # Now a list can be made as follows: mydata<- list (A, B, C, D, E) # To see the list you created, call: mydata [[1]] [1] 134 [[2]] [1] 1 2 3 4 [[3]] [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 [[4]] , , 1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA NA NA NA NA NA NA NA [2,] NA NA NA NA NA NA NA NA NA NA [3,] NA NA NA NA NA NA NA NA NA NA [4,] NA NA NA NA NA NA NA NA NA NA [5,] NA NA NA NA NA NA NA NA NA NA , , 2 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA NA NA NA NA NA NA NA [2,] NA NA NA NA NA NA NA NA NA NA [3,] NA NA NA NA NA NA NA NA NA NA [4,] NA NA NA NA NA NA NA NA NA NA [5,] NA NA NA NA NA NA NA NA NA NA [[5]] TL Sex Mature 1 32 Male TRUE 2 23 Male TRUE 3 35 Female TRUE 4 40 Female FALSE

In this example, the list mydata have 5 items which are displayed above. To use or get one subset data, say if you need the third item, call:

mydata [[3]] [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6

In next chapter, we will see how to do basic computations using vectors and matrices in R. Hope you enjoyed this tutorial and leave your mark in my blog with a comment. Thank you all.

Posted on January 8, 2012, in Bio-Statistics and Analysis. Bookmark the permalink. Leave a comment.

## Leave a comment

## Comments 0