Easy R for M.F.Sc students – Part 1 (Introduction)


Hello friends,

I assume you all are familiar with MS Excel. So lets start with a data set in Excel. As a R beginner, the first thing I wanted to know is “Whether I could draw the graphs in R what exactly Excel does for me?”. The answer is “More than what you expect, given your commitment towards learning R”. In this blog, I illustrate very simple examples to give a quick kick start for the beginners to shift from excel to R.

R is a free statistical software.

1. Installing ‘R’ in Windows

To install ‘R’, go to your search engine and use the key words “CRAN (Click) (Comprehensive R Archieve Network). Choose “Download R for Windows”. You will reach on a webpage as in the following picture. Choose “base” link from this page.

When you click the base link, you will reach into a webpage where you can find links to Download R for Windows with latest version number. Download the latest version of R executable file and run in Windows to install it in your system.

2. Create and prepare data

Let’s create a simple data in Excel

This is the reading of Biological Oxygen Demand (BOD) at different progressive time. Next step is to take this data into R. There are many ways of doing this. I will explain which was found most comfortable to me. First convert your MS Excel data file to a format that would be consistent and stable in terms of using them in many platforms. Go to “File”menu and “Save as” a CSV file format.

Now next step would be to import the CSV file to R. Before doing this, I would recommend you to use a nice editor for working with R. Since it’s a kind of work with some programming skills, it would be nice and easy to avoid mistakes if you use an editor for R commands. I’m very much into an editor called “Tinn R(Click). It have a nice Graphical User Interphase (GUI) with different colours for syntax. By default, R has an editor. This is not very pleasing to work with. So if you don’t have Tinn R in your system, please install it now.

3. Import data into R

From here onwards, we will use Tinn R to write our code. So open Tinn R with a new page. At the moment, don’t worry about the different menu items in Tinn R. To import a CSV file to R, we have to use the command “read.table ( )”. Within the brackets, you have to give the location of your file in the computer system.

From the picture above, you can see many characters with different colours. In Tinn R the R commands are displayed in blue colour by default. So if the command is spelled wrong, it won’t appear in blue colour. Brackets will be in red colour. This way it is easy to find whether you missed to close a bracket. Now in a fresh page of Tinn R write the following command:

mydata<-read.table("C:\\Users\\Pazhayamadom\\Desktop\\datatrial.csv",header = TRUE, sep = ",", quote="", dec=".",comment.char="")

This command tells R to read the CSV file and import it into R with the name ‘mydata’. In this command, “C:\\Users\\Pazhayamadom\\Desktop\\data.csv” should be replaced by the location where you have saved your CSV file. The other optional parts of the command are ‘header=’, ‘sep=’,’quote=’, ‘dec=’ and ‘comment.char=’.

The <- symbol means ‘equals to’ or essentially telling R to call my imported file by this name.

Each optional part of the command are called ‘Arguments’ and we are not discussing about them now to avoid confusion. But they are a set of parameters to tweak the way you would like to see your data in R. The first argument of read.table ( ) will be always the location of your CSV File and is mandatory.

Save the Tinn R file into a desired location in your computer. Tinn R file will be in the format “.r”, which is readable by R. Here my Tinn R file was saved in the name “Untitled1.r”.

Now open ‘R’. Click on ‘File’ menu and choose ‘Source R code’.

Choose the location where the Tinn R file was saved and select the file. Click ร“pen’. Now the command in the Tinn R file will be executed and data will be imported into R. However, this won’t be very obvious. To see whether your data was imported, you should type “mydata”into R and press ENTER key.

Ta..da…

Your data is now live in R.

4. Plot a graph in R

Once your data is in R, it is pretty much easy to plot simple graphs. But if you would like to make it pretty, that would turn much tedious than doing any descriptive statistics. Whatever, your results would be always more than worth the effort you put in.

To plot a graph in R, the command is plot (x, y), where first argument is the readings for x-axis and second argument is the readings for y-axis. So let’s plot a graph with Time on x-axis and BOD on y-axis. Open your Tinn R file and add the following command as second line or after the read.table command.

plot(mydata[,1],mydata[,2])

Here the x-axis is mydata[,1] and y-axis is mydata[,2] . This is the way how you tell R which is your x and y-axis in your data named “mydata”. Essentially, the data exist in the form of matrix in R and our data consists of 6 rows and 2 columns.

mydata[,1] means choose all the observations in the first column of mydata.

mydata[1,] means choose all the observations in the first row of mydata.

mydata[1,1] means choose the observation in the first row and first column of mydata.

mydata[5,2] means choose the observation in the fifth row and second column of mydata.

Save the Tinn R file and source the file in R exactly as we did before. You will get a graph plotted with all the points.

Now this doesn’t look very appealing and it is not very clear about the trend since all the points are joined with a line. By default, a plot command will give you a dot chart. To create a line graph, you need to add some optional arguments to the basic plot command.

plot(mydata[,1],mydata[,2],type="l")

Here, we added an extra argument named “type=l”, which means to plot a graph of type “l”.ย  Type”l”is a line graph. You can replace this with other characters to plot different graphs.

To find what are the other valid graph types in R for the plot command, type “?plot ( )” in R and press the ENTER key. This will open a HTML help page for the plot command in R. You can seek help for any R command like this by adding a question mark in front of the command with empty brackets at the end. To get help for read.table command, type “?read.table ( )” and press ENTER key.

?plot( )

5. Play with data available in R

The BOD data we used is infact a data available in R by default to all R users in the package “datasets”. To get sample data in R, type “data ( )”and press ENTER key. All data available in your R will be listed in a seperate window. To use that data, you just have enter the name of that data in R. Remember that in R, all letters are case specific. So you have to enter the name of data as exactly it appears to you in R.

data ( )

Before I finish this tutorial, R will ask you whether to save workspace when you attempt to close the program. Choose ‘No’ for the moment. I will address this issue in my second chapter of the series..

I hope you enjoyed my short tutorial. Please give feed back and thank you all for your time.

Good Luck with R ๐Ÿ™‚

Advertisements

About Deepak George Pazhayamadom

I'm a fish biologist and a mathematical modeller. I have a wide range of research interests, mostly centered on fisheries resource management.

Posted on December 25, 2011, in Bio-Statistics and Analysis. Bookmark the permalink. 2 Comments.

  1. hi
    i read the first part, understood 90% but not applied practically.Before applying i wanted to get some more tutorial reg.
    wt is the difference between MS excel, SPSS (i know only these)and the “R”
    whether it has a wide and more application in our field?
    or is it the latest of its sort?

    • Good to hear that you made an attempt to understand and even more happy to know that I should improve my writing standards 10% more. MS Excel and SPSS are easy to use. You get everything in a button click. But they have limitations because they simply don’t do anything beyond what it is designed to use by the original programmer. In R, you yourself is the programmer. You design your own analysis. In other words, you have more flexibility and possibilities in R.

      Learning R is not easy (a steep learning curve) and it is not an application. It is a computer low level language something like C, C++ and Fortran. Once learned, you can do any statistics. R is free and no need to renew a license. Moreover, you can develop your own package (something like a software does) in R.

      R is getting more attention in biological fields now because the language is not too difficult to learn. In fisheries management, a new world class package known as “FLR” is consistently under development and use. If you have a new method of analysis and if you can’t do it in Excel or SPSS (Chances are less since you invented it), R is the best option for you to demonstrate it to the world. SPSS have an option where users are welcome to write their own program, but I don’t think any of our people are doing it.

      R is not any latest software. It was developed somewhere near 90’s. I will say, it will take a while for biologists to realize its potential. You can easily get a biologist panic by showing a very complex equation. There will be only a few who would attempt figuring it out. The “attempt” is more important than whether he succeeded or failed to understand the equation. It is only this craving that will help you to learn new things. And ofcourse, patience ๐Ÿ˜ So it is only beacuse of the steep learning curve that R is less popular among people in our field. Feel free to ask if I didn’t answered you well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: