Easy R for M.F.Sc students – Part 7 (Functions and Packages)


Welcome to Chapter 7 of EasyR Tutorial. People say: “You can do many / most things in R”. This is just because R is a language and you can use this language to write your own functions. When you install R, many functions come by default.  A very simple example of a base R function is the mean ( ). It finds the average of  a group of numbers.

Base functions are useful for doing basic and established statistical techniques. But what if you want to analyse a data with a very specific method in your discipline, say “Yield Per Recruit analysis” in fisheries management. They are not generally used by common people and hence you won’t find it in any statistical softwares designed for general purpose.

In R, you can write your own function to do this job. Once made, you will never need to do it again in your life. That is the beauty of making your own functions and R is a good platform for biologists since you really do not need to be a hard core programmer. Today, we will learn to write a function. We will not do anything very complicated, but very simple and easy to understand.

1. Functions

In simple terms, function is something which takes data as input, process it and give the result as output. The input data are called “arguments”.

To demonstrate, let’s construct a function that calculate the weight of a fish for given length. The length-weight relationship of fish is given by:

W= a L^b

Here ‘W’ (weight) will be the output of our function and there will be three input arguments. 1. Length of the fish 2. The intercept (a) and slope (b) coefficients in the equation.

The general format for constructing a function is:

nameofthefunction <- function (input arguments)
{
do equations or jobs
}

Let’s name our function “LWRel”. Our input arguments are

1. Length of the fish (L)

2. a coefficient (acoeff)

3. b coefficient (bcoeff)

Now open TinnR. The best place to write functions is an editor like TinnR because it is easier to correct if mistakes happens while you write and you can avoid rewriting the same code several times. Write the following function in TinnR and source the code into R (refer to the first few chapters of my tutorial if you do not know this procedure):

LWRel <- function (L, acoeff, bcoeff)
{
W <- acoeff * ( L ^ bcoeff )
print ( W )
}

The above function takes L, acoeff and bcoeff as input, calculate the weight of the fish. I added an extra job to print the result otherwise R will not show us the result of this calculation. Now to see how this work, call the function in R with arguments as follows. For example to know the weight of a fish (Bombay Duck) with length 200 mm given that coefficients (a= 0.0000058, b=2.915):

> LWRel(200,0.0000058,2.915)
[1] 29.57539

The result is weight in grams. Now with our new function let’s estimate the weight of the fish for a range of length, say, from 10 mm to 300 mm with 5 mm interval. First create a vector of length measurements with seq ( ) function.

> lrange <- seq (10, 300, 5)

> lrange
 [1]  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90  95 100
[20] 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195
[39] 200 205 210 215 220 225 230 235 240 245 250 255 260 265 270 275 280 285 290
[58] 295 300

Now just use this vector instead of a single length measurement in LWRel function:

> LWRel (lrange, 0.0000058, 2.915)

 [1]  0.004769007  0.015550130  0.035969171  0.068932355  0.117283374
 [6]  0.183817283  0.271289427  0.382421650  0.519906870  0.686412579
[11]  0.884583610  1.117044381  1.386400736  1.695241503  2.046139812
[16]  2.441654229  2.884329744  3.376698641  3.921281257  4.520586673
[21]  5.177113315  5.893349512  6.671773988  7.514856319  8.425057344
[26]  9.404829545 10.456617395 11.582857678 12.785979792 14.068406020
[31] 15.432551787 16.880825905 18.415630793 20.039362685 21.754411835
[36] 23.563162692 25.467994086 27.471279382 29.575386643 31.782678775
[41] 34.095513667 36.516244325 39.047218994 41.690781283 44.449270277
[46] 47.325020644 50.320362745 53.437622726 56.679122619 60.047180430
[51] 63.544110228 67.172222226 70.933822865 74.831214889 78.866697421
[56] 83.042566030 87.361112806 91.824626421 96.435392197

Here we get the weight of the fish for all length observations. To visualise this result, we can put the output into a graph using plot ( ) function.

plot ( lrange, LWRel (lrange, 0.0000058, 2.915), xlab="Length", ylab="Weight" )

I hope this was very impressive and might have sparked many ideas. In brief, you can visualise the pattern or behaviour of a population if a defined relationship is known. What I would suggest you to do as a homework is to try the same example with a Von Bertalanffy Growth Function equation and visualise the growth curve of a species with increasing length or age of the fish.

2. Packages

Base R comes with many functions that are commonly used in statistics. But there are not many if you are looking for high level advanced statistics or when you want to visualise graphs and data in a more user friendly style. You can ofcourse make R to do things in the way you want. However, most often this  consume a lot of your time.

Now clever people have already coded functions that could be useful. A set of functions are usually compiled together and made available to public in the form of a “Package”. So essentially, if you install a package and load it in R, you can use all the functions that comes with the package. Explanation on how to use these functions and arguments would be detailed in the user manual of the package.

Find a useful package

The easiest way would be to search in google with carefully chosen keywords. Second option is to find the required package from a list  in CRAN (Click Here).

Installing a package

First, you must need an internet connection. You can install a package in R by an automated process within R or by manually downloading the ZIP file from CRAN. The best option is to install a package using the automated process.

If you choose to install using the automated process, all other dependent packages required will be downloaded and installed by itself. Go to Packages menu and then choose “Install Packages”. R will ask you to choose the nearest CRAN mirror. Choose a location very close to your place.

Next R will give another list of all packages available in the chosen CRAN mirror. Choose the required package from the list and then patiently wait for R to download and install all the packages required.

The second option to install a package is to manually download them as a ZIP file from CRAN. From “Packages ” menu in R, choose “Install package from local zip file”. It will ask you the location of the downloaded file in your computer.

The problem of this installation is that, R won’t automatically identify what extra packages (dependent) are required. You have to manually download and install each one by one.

To demonstrate with an example, let’s install a package named “msm” which denotes “Multi-state Markov and hidden Markov models in continuous time”. We are not going to do any big stats here. This package have a function named “rtnorm ( )” which help the user to choose a random number between two limits. This is what I’m interested to show you.

The CRAN page for this package is here (Click). If you scroll down, you can see a title “Reverse dependencies”. This is a list of packages on which msm package depends on. So msm do not work until all the listed packages are installed in your computer. All these packages will be installed automatically if you choose the first method. But at times, you might be forced to use the second option.

Using a package

To use the msm package in R, type:

Now all the functions in msm package is ready to use. Before we use the msm functions, let’s have a look what is available in base R and why we are opting a different function which comes with msm package.

To generate a random number from a normal distribution, the function in R is the rnorm ( ) function. To generate 10 random numbers from a normal distribution with mean=100 and standard deviation=10, use:

> rnorm (10, mean=100, sd=10)

 [1]  94.17938 101.65504 108.81036  86.14346  86.43911  76.26140  89.82155
 [8] 113.54656 101.42332  96.31000

You will get much different values from what is printed here. Generating random numbers…. it is like rolling a dice. Suppose in the above output, If I wish only to have values between 90 and 110 !!! There are a few values less than 90 and more than 110. We can write our own function to make R to choose only values between 90 and 110.

But anyway, the rtnorm ( ) function in msm package is useful in doing that task.

Let’s try using the rtnorm ( ) function.

> rtnorm (10, mean=100, sd=10, lower=90, upper=110)

 [1]  91.90775  91.09023 105.49530  92.87104 105.92710  97.77873  94.99413
 [8] 103.52435  90.80200 103.80593

First argument is the number of random numbers required (10).

Fourth and fifth argument are the limits within which the random numbers should be choosen.

This is just an example. There are thousands of packages available in R and do fantastic things. I will finish this tutorial with an example that demonstrate the use of packages, functions and loops by integrating them for different purposes.

Below, the graph was plotted using a  loop process.

The code used for plotting this graph is given below. Paste the code in TinnR and source the file in R.

Remember to install “msm”package before using this code. Otherwise you will get error.

# Remove all existing objects in R
rm(list=ls(all=TRUE))
# To load the package "msm" 
library(msm)
# Function to calculate the weight of fish for given length
LWRel<-function(L, acoeff, bcoeff)
{W <- acoeff*(L^bcoeff)}
# Vector of fish length
lrange    <-seq(10,300,5)
# Vector of fish weight 
fishweight<-LWRel(lrange,0.0000058,2.915)
# Asymptotic weight in grams 
maxweight <-150 

# Loop process to plot the graph
for(i in 1:100){                             

newfishweight<-c()
for(j in 1: length(lrange)){
newfishweight<-c(newfishweight,rtnorm(1, fishweight[j], fishweight[j]*0.25, 0, maxweight))
}

if(i==1){
plot(lrange,newfishweight,xlab="Length of Bombay Duck (mm)", ylab="Weight(grams)")}else{
points(lrange,newfishweight)
}

}

title("Length-Weight relationship of Bombay Duck")

The plotting function might confuse you. But leave it for the moment. In our next chapter, we will learn more about making graphs. Using a random number helps you to visualise patterns similar to real world situations.

My tutorial for this chapter ends here. Hope you enjoyed it and leave a comment if you like it. You can also sign up to my blogs with your email address if you would like to get notified whenever I publish a new chapter.

About Deepak George Pazhayamadom

I'm a fish biologist and a mathematical modeller. I have a wide range of research interests, mostly centered on fisheries resource management.

Posted on February 19, 2012, in Bio-Statistics and Analysis and tagged , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: