Easy R for M.F.Sc students – Part 1 (Introduction)
I assume you all are familiar with MS Excel. So lets start with a data set in Excel. As a R beginner, the first thing I wanted to know is “Whether I could draw the graphs in R what exactly Excel does for me?”. The answer is “More than what you expect, given your commitment towards learning R”. In this blog, I illustrate very simple examples to give a quick kick start for the beginners to shift from excel to R.
R is a free statistical software.
1. Installing ‘R’ in Windows
To install ‘R’, go to your search engine and use the key words “CRAN“ (Click) (Comprehensive R Archieve Network). Choose “Download R for Windows”. You will reach on a webpage as in the following picture. Choose “base” link from this page.
When you click the base link, you will reach into a webpage where you can find links to Download R for Windows with latest version number. Download the latest version of R executable file and run in Windows to install it in your system.
2. Create and prepare data
This is the reading of Biological Oxygen Demand (BOD) at different progressive time. Next step is to take this data into R. There are many ways of doing this. I will explain which was found most comfortable to me. First convert your MS Excel data file to a format that would be consistent and stable in terms of using them in many platforms. Go to “File”menu and “Save as” a CSV file format.
Now next step would be to import the CSV file to R. Before doing this, I would recommend you to use a nice editor for working with R. Since it’s a kind of work with some programming skills, it would be nice and easy to avoid mistakes if you use an editor for R commands. I’m very much into an editor called “Tinn R” (Click). It have a nice Graphical User Interphase (GUI) with different colours for syntax. By default, R has an editor. This is not very pleasing to work with. So if you don’t have Tinn R in your system, please install it now.
3. Import data into R
From here onwards, we will use Tinn R to write our code. So open Tinn R with a new page. At the moment, don’t worry about the different menu items in Tinn R. To import a CSV file to R, we have to use the command “read.table ( )”. Within the brackets, you have to give the location of your file in the computer system.
From the picture above, you can see many characters with different colours. In Tinn R the R commands are displayed in blue colour by default. So if the command is spelled wrong, it won’t appear in blue colour. Brackets will be in red colour. This way it is easy to find whether you missed to close a bracket. Now in a fresh page of Tinn R write the following command:
This command tells R to read the CSV file and import it into R with the name ‘mydata’. In this command, “C:\\Users\\Pazhayamadom\\Desktop\\data.csv” should be replaced by the location where you have saved your CSV file. The other optional parts of the command are ‘header=’, ‘sep=’,’quote=’, ‘dec=’ and ‘comment.char=’.
The <- symbol means ‘equals to’ or essentially telling R to call my imported file by this name.
Each optional part of the command are called ‘Arguments’ and we are not discussing about them now to avoid confusion. But they are a set of parameters to tweak the way you would like to see your data in R. The first argument of read.table ( ) will be always the location of your CSV File and is mandatory.
Save the Tinn R file into a desired location in your computer. Tinn R file will be in the format “.r”, which is readable by R. Here my Tinn R file was saved in the name “Untitled1.r”.
Now open ‘R’. Click on ‘File’ menu and choose ‘Source R code’.
Choose the location where the Tinn R file was saved and select the file. Click Ópen’. Now the command in the Tinn R file will be executed and data will be imported into R. However, this won’t be very obvious. To see whether your data was imported, you should type “mydata”into R and press ENTER key.
Your data is now live in R.
4. Plot a graph in R
Once your data is in R, it is pretty much easy to plot simple graphs. But if you would like to make it pretty, that would turn much tedious than doing any descriptive statistics. Whatever, your results would be always more than worth the effort you put in.
To plot a graph in R, the command is plot (x, y), where first argument is the readings for x-axis and second argument is the readings for y-axis. So let’s plot a graph with Time on x-axis and BOD on y-axis. Open your Tinn R file and add the following command as second line or after the read.table command.
Here the x-axis is mydata[,1] and y-axis is mydata[,2] . This is the way how you tell R which is your x and y-axis in your data named “mydata”. Essentially, the data exist in the form of matrix in R and our data consists of 6 rows and 2 columns.
mydata[,1] means choose all the observations in the first column of mydata.
mydata[1,] means choose all the observations in the first row of mydata.
mydata[1,1] means choose the observation in the first row and first column of mydata.
mydata[5,2] means choose the observation in the fifth row and second column of mydata.
Save the Tinn R file and source the file in R exactly as we did before. You will get a graph plotted with all the points.
Now this doesn’t look very appealing and it is not very clear about the trend since all the points are joined with a line. By default, a plot command will give you a dot chart. To create a line graph, you need to add some optional arguments to the basic plot command.
Here, we added an extra argument named “type=l”, which means to plot a graph of type “l”. Type”l”is a line graph. You can replace this with other characters to plot different graphs.
To find what are the other valid graph types in R for the plot command, type “?plot ( )” in R and press the ENTER key. This will open a HTML help page for the plot command in R. You can seek help for any R command like this by adding a question mark in front of the command with empty brackets at the end. To get help for read.table command, type “?read.table ( )” and press ENTER key.
5. Play with data available in R
The BOD data we used is infact a data available in R by default to all R users in the package “datasets”. To get sample data in R, type “data ( )”and press ENTER key. All data available in your R will be listed in a seperate window. To use that data, you just have enter the name of that data in R. Remember that in R, all letters are case specific. So you have to enter the name of data as exactly it appears to you in R.
data ( )
Before I finish this tutorial, R will ask you whether to save workspace when you attempt to close the program. Choose ‘No’ for the moment. I will address this issue in my second chapter of the series..
I hope you enjoyed my short tutorial. Please give feed back and thank you all for your time.
Good Luck with R 🙂