Category Archives: TFSA

The descriptive function in TFSA


The new tool in TFSA version 1.1 is the descriptive function. This is designed to explore data with less R programming and can be completed within a short period of time.  However, the function assumes that the data is `continuous’, not `discrete’. The function can do three important things. They are:-

1. The function returns the following values in the form of a data frame:

  • Total number of observations
  • Mean
  • Median
  • Minimum
  • Maximum
  • Variance
  • Standard deviation
  • Range

2. The function can handle categorical data, say if the observations are measured from different locations or classified based on sex.

3. The descriptive information can be visualized in the form of histogram, box plot or dot plot.

Some examples are provided below using the fishgrp data which comes along with the TFSA package (If you want to use your own data, import them into R- How to do this? Click). First tell R to load the TFSA package.

How to install R? (Check my old post)

library(TFSA)

Now load the fishgrp data from TFSA

data(fishgrp)

Now view the structure of fishgrp data

str(fishgrp)

'data.frame':   506 obs. of  3 variables:
 $ Stock: Factor w/ 4 levels "Calcutta","Gujarat",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Sex  : Factor w/ 2 levels "F","M": 2 2 1 1 1 1 1 2 1 2 ...
 $ SL   : int  228 264 244 209 220 202 242 256 249 170 ...

This simply tells that the data contains two attributed or factors (Stock and Sex) and one column of numerical observation (SL- Standard length). To visualize the whole data, please type in fishgrp and press the ENTER key. Now, we will use the descriptive function in TFSA to see the descriptive statistics of Standard length (SL).

descriptive(fishgrp$SL)

    N     Mean Median Minimum Maximum Variance  Std dev Range
1 506 202.0435    193     127     299 926.9684 30.44616   172

The `$’ symbol is used in R to choose the variable (SL) from data (fishgrp). The results shows lot of information i.e., the total number of observations are 506, the mean is 202.04 etc… For more details on results, please type in ?descriptive and press the ENTER key.

To visualize the histogram, use the argument HISTOGRAM=TRUE

descriptive(fishgrp$SL,Histogram=TRUE)

f1To visualize the Box plot, use the argument Boxplot=TRUE

descriptive(fishgrp$SL,Boxplot=TRUE)

f2To visualize the Dot plot, use the argument Dotplot=TRUE

descriptive(fishgrp$SL,Dotplot=TRUE)

f3To obtain descriptive statistics for each category, use the argument groups=. For example, the fishgrp data have observations for 4 different fish stocks i.e., Calcutta, Gujarat, Mumbai and Orissa.

descriptive(fishgrp$SL,groups=fishgrp$Stock,Boxplot=TRUE)

           N     Mean Median Minimum Maximum  Variance   Std dev Range
Calcutta 133 183.3459    183     160     211  95.31887  9.763138    51
Gujarat  113 197.9115    192     148     274 775.63496 27.850224   126
Mumbai   119 232.9832    234     157     299 534.62683 23.121999   142
Orissa   141 196.8794    187     127     266 996.27822 31.563875   139

f4To obtain descriptive statistics of sub-categories within groups, use the argument divisions=. For example, the fishgrp data have observations for males and females in each stock.

descriptive(fishgrp$SL,groups=fishgrp$Stock,division=fishgrp$Sex,Boxplot=TRUE)

               N     Mean Median Minimum Maximum   Variance   Std dev Range
F in Calcutta 59 185.0847  184.0     167     207   88.66511  9.416215    40
M in Calcutta 74 181.9595  181.0     160     211   97.51888  9.875165    51
F in Gujarat  52 194.3077  185.5     148     259  788.64857 28.082887   111
M in Gujarat  61 200.9836  198.0     155     274  756.64973 27.507267   119
F in Mumbai   65 232.8000  234.0     187     299  460.72500 21.464506   112
M in Mumbai   54 233.2037  233.0     157     287  633.86338 25.176644   130
F in Orissa   62 198.8387  196.5     127     266 1218.07192 34.900887   139
M in Orissa   79 195.3418  184.0     152     251  830.15093 28.812340    99

f5The same information can be viewed using a histogram or dot plot by simply changing the argument. But use only one at a time.

Thanks for reading 🙂

The freq.class function of TFSA


1. Download, Install and Open R

2. Install the R packages: lattice and TFSA (R>Packages> Install packages from local zip files). This is required only once. If the package is successfully installed, you will get the message “package ‘TFSA’ successfully unpacked and MD5 sums checked”.

3. Load both packages using the library function.
Example:  library (lattice); library(TFSA)

The freq.class function

freq.class function builds frequency distribution table from a vector of data by grouping them into class intervals. The frequency of a class interval is the number of observations that occur in a particular predefined interval. So, for example, if 20 fish samples in our data are aged between 3 to 5 , the frequency for the 3–5 interval is 20. The endpoints of a class interval are the lowest and highest values that a variable can take. Class interval width is the difference between the lower endpoint of an interval and the lower endpoint of the next interval.

Mandatory arguments (inputs):

'x'- A vector of data

Mandatory arguments for specifying class interval (inputs):

'll'- Lowest endpoint of class intervals

'ul'- Uppermost endpoint of class intervals

'cl'- Preferred width of the class interval.

The above arguments work only if all three of them are used together. Unless specified, the function categorize data into 10 classes

Optional arguments (inputs):

'groups'- The ordinal (factorial or categorical) variables if data belong to more than one group. Default is NA

'scales'- If TRUE, bar graph for groups will be plotted independent of their relation in size. Default is FALSE

'density.plot'- If TRUE, density graph will be plotted instead of bar graphs. Default is FALSE

Protocol to use freq.class function

1. Prepare data with variables in each column (qualitative and quantitative)

2. Import your data into R using the function read.table () or alternatively for learning purposes, you can use the ‘fishdata’ that comes with TFSA package. This data has 506 observations on fish length.

> data(fishdata)
> fishdata

2. Find the lowest and highest values of the variables using the functions max () and min ().

> max(fishdata)
[1] 299
> min(fishdata)
[1] 127

3. Decide on width of the class interval.

In deciding on the width of the class intervals, you will have to find a compromise between having intervals short enough so that not all of the observations fall in the same interval, but long enough so that you do not end up with only one observation per interval.

Usage:

1. To get a quick result, use freq.class() with the data vector. This will categorize data into 10 classes.

> freq.class(fishdata)

 Frequency Distribution Table 

   Lower Upper Midclass Frequency
1    127   144    135.5         1
2    144   161    152.5        18
3    161   178    169.5       104
4    178   195    186.5       137
5    195   212    203.5        63
6    212   229    220.5        68
7    229   246    237.5        57
8    246   263    254.5        45
9    263   280    271.5         8
10   280   297    288.5         4

2. To customize the class intervals, use the arguments ‘ll=’, ‘ul=’ and ‘cl=’. The fish data is in the range between 127 – 299. So let’s keep the lowest end point 125, highest endpoint 300 with a class interval of 7.

> freq.class(fishdata,ll=125,ul=300,cl=7)

 Frequency Distribution Table 

   Lower Upper Midclass Frequency
1    125   132    128.5         1
2    132   139    135.5         0
3    139   146    142.5         0
4    146   153    149.5         5
5    153   160    156.5         9
6    160   167    163.5        23
7    167   174    170.5        48
8    174   181    177.5        61
9    181   188    184.5        76
10   188   195    191.5        37
11   195   202    198.5        22
12   202   209    205.5        30
13   209   216    212.5        25
14   216   223    219.5        31
15   223   230    226.5        27
16   230   237    233.5        22
17   237   244    240.5        25
18   244   251    247.5        26
19   251   258    254.5        21
20   258   265    261.5         8
21   265   272    268.5         2
22   272   279    275.5         2
23   279   286    282.5         3
24   286   293    289.5         1
25   293   300    296.5         1

3. If the entire data belongs to different groups, this can be specified with the argument ‘groups=’.

TFSA package comes with a second example data known as ‘fishgrp’. This data is a vector of fish length measured from different commercial landing centres in India. The data have two columns in which one is factorial variable (landing centres) and the other one is quantitative variable (fish length).

If you are using your own data, make sure it is formatted in the standard way

> data(fishgrp)

> str(fishgrp)
'data.frame':   506 obs. of  2 variables:
 $ Stock: Factor w/ 4 levels "Calcutta","Gujarat",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ SL   : int  228 264 244 209 220 202 242 256 249 170 ...

> freq.class(fishgrp$SL,groups=fishgrp$Stock)

 Frequency Distribution Table 

   Lower Upper Midclass Calcutta Gujarat Mumbai Orissa
1    127   144    135.5        0       0      0      1
2    144   161    152.5        1       4      1     12
3    161   178    169.5       36      25      1     42
4    178   195    186.5       81      33      3     20
5    195   212    203.5       15      21     14     13
6    212   229    220.5        0      11     33     24
7    229   246    237.5        0       7     32     18
8    246   263    254.5        0      11     25      9
9    263   280    271.5        0       1      5      2
10   280   297    288.5        0       0      4      0

4. In the above graph, Calcutta have a frequency of 80 in one particular graph. Because of this, the true nature of frequency distribution from other locations are not clear. The argument ‘scales=TRUE’ would make the graphs independent of each other. This will show a better picture.

> freq.class(fishgrp$SL,groups=fishgrp$Stock,scales=TRUE)

5. If you are interested in a density plot instead of bar plot, the argument ‘density.plot=’ can be used.

> freq.class(fishgrp$SL,groups=fishgrp$Stock,density.plot=TRUE)

Please leave a comment or email if you find a bug. Thanks for reading 🙂

The vbgf function of TFSA


1. Download, Install and Open R

2. Install the R packages: lattice and TFSA (R>Packages> Install packages from local zip files). This is required only once. If the package is successfully installed, you will get the message “package ‘TFSA’ successfully unpacked and MD5 sums checked”.

3. Load both packages using the library function.
Example:  library (lattice); library(TFSA)

The vbgf function

This function computes the length at age of a fish species. ‘vbgf’ refers to The von Bertalanffy Growth Function.

The VBGF is given by L(t) = L¥ *[1 – exp(-K*(t-t0))].

Mandatory arguments (inputs):

‘k’ – The growth coefficient for the species (K).

‘t0’ – The age of the species at length zero.

‘linfinity’ – The asymptotic length of the fish species (L¥ ).

Usage:

If k=0.8, linfinity=230 and t0=0, then the length of fish at age 3 can be estimated as follows:

> vbgf(a=3,k=0.8,linfinity=230,t0=0)
[1] 209.1349

If the interest is to estimate length for a range of age groups, a vector of ages (e.g. 3,4,5,…10) can be used instead. Try the following code:

> vbgf(a=c(3:10),k=0.8,linfinity=230,t0=0)
[1] 209.1349 220.6247 225.7874 228.1072 229.1495 229.6178 229.8283 229.9228

The output is a vector of fish length with respect to ages from 3 to 10. The result can be visualised using the plot function in R.

> ages<-c(1:10)
> fishlength<-vbgf(a=ages,k=0.8,linfinity=230,t0=0)
> plot(ages,fishlength)

A line graph can be drawn using an extra argument to the plot function as follows:

plot(ages,fishlength,type="l")

Hope this was useful. Thanks.

Developing TFSA package for R


I have been writing R functions since 2009 as part of my PhD in Fisheries Management. Since then I was longing to pull out useful functions into an R package so others can use it. I named it TFSA (Tropical Fish Stock Assessment) since my intention is to make the functions available to students who work with tropical fisheries where length data is more reliable than age estimation. TFSA is in its very premature form and under development. A few reasons why I got inspired and motivated in developing TFSA (Tropical Fish Stock Assessment) is listed below.

1. There are a few R packages that are  useful (FLR project, FSA, fishmethods, fsap) to employ and evaluate fish stock assessments for research and teaching. However, from my understanding the utility of these packages in a tropical system (e.g. Indian fisheries context) is limited since age data is highly unreliable and difficult to standardise due to the lack of strong seasons. The only reliable data would be the size of the fish species in terms of length and weight.

2. The number of R users are increasing in the very recent years. However, the change is dramatic for marine biology. Perhaps people are realizing the power of R in terms of its flexibility (through seminars or social networking) and increase in number of R books. However, there is no R book made for fisheries research despite of the number of R packages in CRAN.

3. Learning R is not easy as compared to other user friendly statistical softwares like Excel or SPSS.  But it seems now fisheries biologists are showing more interests after realising its power and flexibility in modelling. However, the resources to learn R is very limited in the perspective of a biologist. I conducted a training program in India on April 2012 and all the participants (professors and researchers in biology) except one person were unaware about R. The person who knew about the existence of R couldn’t manage to make use of it for her research work.

I’m looking forward for people who can contribute and help towards developing this package. Ofcourse, they will be acknowledged in the package description.

TFSA package is available to download from here. At the moment, TFSA only have the following functions:

1. vbgf – Compute length of a fish species given the parameters (k,t0 and Linf). This is based on the Von Bertalanffy Growth Function equation.

2. freq.class – Build the frequency distribution table based on specified class intervals. Useful for building length frequency table. The function can also handle groupings (eg: length data from different months).

More ideas and suggestions are welcome. A tutorial on using these functions will be published soon.