The freq.class function of TFSA


1. Download, Install and Open R

2. Install the R packages: lattice and TFSA (R>Packages> Install packages from local zip files). This is required only once. If the package is successfully installed, you will get the message “package ‘TFSA’ successfully unpacked and MD5 sums checked”.

3. Load both packages using the library function.
Example:  library (lattice); library(TFSA)

The freq.class function

freq.class function builds frequency distribution table from a vector of data by grouping them into class intervals. The frequency of a class interval is the number of observations that occur in a particular predefined interval. So, for example, if 20 fish samples in our data are aged between 3 to 5 , the frequency for the 3–5 interval is 20. The endpoints of a class interval are the lowest and highest values that a variable can take. Class interval width is the difference between the lower endpoint of an interval and the lower endpoint of the next interval.

Mandatory arguments (inputs):

'x'- A vector of data

Mandatory arguments for specifying class interval (inputs):

'll'- Lowest endpoint of class intervals

'ul'- Uppermost endpoint of class intervals

'cl'- Preferred width of the class interval.

The above arguments work only if all three of them are used together. Unless specified, the function categorize data into 10 classes

Optional arguments (inputs):

'groups'- The ordinal (factorial or categorical) variables if data belong to more than one group. Default is NA

'scales'- If TRUE, bar graph for groups will be plotted independent of their relation in size. Default is FALSE

'density.plot'- If TRUE, density graph will be plotted instead of bar graphs. Default is FALSE

Protocol to use freq.class function

1. Prepare data with variables in each column (qualitative and quantitative)

2. Import your data into R using the function read.table () or alternatively for learning purposes, you can use the ‘fishdata’ that comes with TFSA package. This data has 506 observations on fish length.

> data(fishdata)
> fishdata

2. Find the lowest and highest values of the variables using the functions max () and min ().

> max(fishdata)
[1] 299
> min(fishdata)
[1] 127

3. Decide on width of the class interval.

In deciding on the width of the class intervals, you will have to find a compromise between having intervals short enough so that not all of the observations fall in the same interval, but long enough so that you do not end up with only one observation per interval.

Usage:

1. To get a quick result, use freq.class() with the data vector. This will categorize data into 10 classes.

> freq.class(fishdata)

 Frequency Distribution Table 

   Lower Upper Midclass Frequency
1    127   144    135.5         1
2    144   161    152.5        18
3    161   178    169.5       104
4    178   195    186.5       137
5    195   212    203.5        63
6    212   229    220.5        68
7    229   246    237.5        57
8    246   263    254.5        45
9    263   280    271.5         8
10   280   297    288.5         4

2. To customize the class intervals, use the arguments ‘ll=’, ‘ul=’ and ‘cl=’. The fish data is in the range between 127 – 299. So let’s keep the lowest end point 125, highest endpoint 300 with a class interval of 7.

> freq.class(fishdata,ll=125,ul=300,cl=7)

 Frequency Distribution Table 

   Lower Upper Midclass Frequency
1    125   132    128.5         1
2    132   139    135.5         0
3    139   146    142.5         0
4    146   153    149.5         5
5    153   160    156.5         9
6    160   167    163.5        23
7    167   174    170.5        48
8    174   181    177.5        61
9    181   188    184.5        76
10   188   195    191.5        37
11   195   202    198.5        22
12   202   209    205.5        30
13   209   216    212.5        25
14   216   223    219.5        31
15   223   230    226.5        27
16   230   237    233.5        22
17   237   244    240.5        25
18   244   251    247.5        26
19   251   258    254.5        21
20   258   265    261.5         8
21   265   272    268.5         2
22   272   279    275.5         2
23   279   286    282.5         3
24   286   293    289.5         1
25   293   300    296.5         1

3. If the entire data belongs to different groups, this can be specified with the argument ‘groups=’.

TFSA package comes with a second example data known as ‘fishgrp’. This data is a vector of fish length measured from different commercial landing centres in India. The data have two columns in which one is factorial variable (landing centres) and the other one is quantitative variable (fish length).

If you are using your own data, make sure it is formatted in the standard way

> data(fishgrp)

> str(fishgrp)
'data.frame':   506 obs. of  2 variables:
 $ Stock: Factor w/ 4 levels "Calcutta","Gujarat",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ SL   : int  228 264 244 209 220 202 242 256 249 170 ...

> freq.class(fishgrp$SL,groups=fishgrp$Stock)

 Frequency Distribution Table 

   Lower Upper Midclass Calcutta Gujarat Mumbai Orissa
1    127   144    135.5        0       0      0      1
2    144   161    152.5        1       4      1     12
3    161   178    169.5       36      25      1     42
4    178   195    186.5       81      33      3     20
5    195   212    203.5       15      21     14     13
6    212   229    220.5        0      11     33     24
7    229   246    237.5        0       7     32     18
8    246   263    254.5        0      11     25      9
9    263   280    271.5        0       1      5      2
10   280   297    288.5        0       0      4      0

4. In the above graph, Calcutta have a frequency of 80 in one particular graph. Because of this, the true nature of frequency distribution from other locations are not clear. The argument ‘scales=TRUE’ would make the graphs independent of each other. This will show a better picture.

> freq.class(fishgrp$SL,groups=fishgrp$Stock,scales=TRUE)

5. If you are interested in a density plot instead of bar plot, the argument ‘density.plot=’ can be used.

> freq.class(fishgrp$SL,groups=fishgrp$Stock,density.plot=TRUE)

Please leave a comment or email if you find a bug. Thanks for reading 🙂

Advertisements

About Deepak George Pazhayamadom

I'm a fish biologist and a mathematical modeller. I have a wide range of research interests, mostly centered on fisheries resource management.

Posted on June 24, 2012, in TFSA and tagged , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: