Easy R for M.F.Sc students – Part 8 (Bar Chart)
Welcome to chapter 8 of Easy R tutorial. Today we will learn how to plot one of the simplest and widely used bar charts.
Bar chart is the easiest way to convey a message about which component in the data is bigger or smaller. Say for example, if you have a frequency data from a research survey… where you have the number of samples for different groups of species… 100 fish, 87 crabs, 12 shrimps….this will be more eye catching to displayed in a bar chart.
We will use the same data for which we have used in our earlier tutorials ….These are the length measurements of Bombay duck specimens collected from four geographical locations in India. Click here to download the CSV file.
Source or import the data into R using TinnR. This is explained in my earlier tutorials and hence I am not repeating them again. Now, remember to always edit the code in TinnR and source them into R instead of tryng directly. TinnR will help you to identify mistakes using different colours.
In the fish data, sample size from each location was fixed (20 numbers). So it will be less exciting to show them in a bar chart. But the sex of the specimens were random. So it will be more interesting to see the proportion of sex that occured using a bar chart…like how many of them are males, females etc.. From the above data, first extract the sex vector out of it.
bducksex<-table(fish$sex) F M U 21 48 11
The table function count the numbers or in other words…. it will get you the frequency of factorial data. Factorial data are qualitative and represent common to atleast a few number of observations. The location, coast, sample and sex in our data are factors. But only the number of sex was not chosen in prior to sampling. Sex of Bombay duck can’t be identified unless you dissect them. So it is not possible to collect fixed number of males and females for the study. In this light, it will be interesting to find, how many of them occured when the effect was random. The table function shows that there were 21 females, 48 males and 11 specimens which were not able to identify the sex (unidentified sex). To plot a bar chart, chip the following code in TinnR and source into R:
That was easy….isn’t it? The graph shows that the male specimens were more than other. The difficult part while drawing graph is when you want to change the default look you get in R. This graph do not have a caption or axis labels. To add a title, we have to use an extra argument. Try the following now for this.
barplot(bducksex, main="Barplot of sex obtained from samples")
This now looks better. All we did was, included an extra argument called “main=” and the title inside double quotes. Double quotes are mandatory whenever you intend to use a text to name something. Here it was the title for picture. Now let’s label the x and y axis.
barplot(bducksex, main="Barplot of sex obtained from samples", xlab="Sex", ylab="Frequency")
Even more better now. We added two more arguments this time. One for labeling the x axis and another one for y axis. The arguments are self explanatory. “xlab=” give the label for x axis and “ylab=” gives the label for y axis. Now what if you want to see the plots in an increasing order? In that case, you have to sort the frequency of the counts first. The “Order” function can sort a vector. Try the following:
Here before using the barplot command, the bducksex vector is sorted using the ” order ” function.
Again another argument called “names.arg=” helps you to edit the names in the x axis. The resulting graph is:
I added again an extra argument called “horiz=”. T represents TRUE. If it is F or FALSE, the graph will be with vertical bar charts as we got before.
Please note the change in second line of the code where I used the rev ( ) command to reverse the order of bducksex vector.
Excellent 🙂 Now what if we would like to know the proportion of sex in each location? Then in that case, we have to modify the bducksex vector. Since there will be information from more than one variable, the data can no longer exist as a vector. Instead our data will be in the form of a matrix. To count the number of sex from various locations use the “table ( )” function with two factorical vectors.
bducksex<-table(fish$sex,fish$location) bducksex Calcutta Cochin Madras Mumbai F 5 4 5 7 M 7 16 14 11 U 8 0 1 2
In the above code, we used two factors from the original data set with table function to create a matrix data showing how many males and females are there from each location. To plot them, just use the barplot as we practiced before.
The total number of samples from each location was fixed (20). Hence the height of all bars are equal in our case. However, the proportion of sex from each location are different. But here, it is not clear which colour represent which sex. We need a legend to display which is what sex. To add a legend, use the “legend=” argument in bar plot function.
In the above code, I added the “legend=” argument and asked the function to use the row names of the bducksex matrix data. The rownames ( ) function will read the names in the rwo header of the matrix. A similar function is colnames ( ) which read the column names. It will not work in this context since the length of rows is not equal to length of columnes. Moreover, the data do not correspond to the names in the column of the matrix data.
In the above graph, the legend was printed but it was layed over the bar charts. There are more code writing that can help you with placing the legend somewhere outside the box. We will cover this in a later chapter/ At the moment, this is beyond the scope of the present tutorial.
If you do not intend to publish the graphs in a printed journal, probably colour bar charts will be more attractive in PDFs. To add colour to each sex, try the following code.
In the above code, I added an extra argument “col=” that represent the colour. We have three sex. So three colours were specified in double quotes. All the colours were given as an input to the argument in the form of a vector using the c ( ) function. This is covered in our earlier tutorials.
Wow… looks fancy. But it will be more better if all the sex proportions were stacked side by side for each location. Then it will be more easy to find which sex is more in each location. Try the following code:
In the above code, I added another argument known as “beside=” and pointed it to T or TRUE. By default this value is FALSE. This argument enables the feature to stack each group side by side.
The legend is clear in this picture thanks for the less number of males and unidentified sex in Mumbai location. All locations except Calcutta have much higher proportions of males in the collected samples.
You must have understood now that, learning R is a step by step process. We started from a very simple code with one single line. As we made the graph prettier, the number of lines in the code or the length increased….since we increased the number of arguments. In the same way, you develop your skill wiht time. It won’t happen all at once. Start learning R with simple graph. You grow while you learn R. If you liked this tutorial, please give a feed back and like the post. If you would like to get notified in my future posts, subscribe to this blog using your email address. Thanks for your patience. Cheers !!!