# Category Archives: TFSA

## The descriptive function in TFSA

The new tool in TFSA version 1.1 is the *descriptive* function. This is designed to explore data with less R programming and can be completed within a short period of time. However, the function assumes that the data is `continuous’, not `discrete’. The function can do three important things. They are:-

1. The function returns the following values in the form of a data frame:

- Total number of observations
- Mean
- Median
- Minimum
- Maximum
- Variance
- Standard deviation
- Range

2. The function can handle categorical data, say if the observations are measured from different locations or classified based on sex.

3. The *descriptive* information can be visualized in the form of histogram, box plot or dot plot.

Some examples are provided below using the** fishgrp** data which comes along with the TFSA package (If you want to use your own data, import them into R- How to do this? Click). First tell R to load the TFSA package.

How to install R? (Check my old post)

library(TFSA)

Now load the **fishgrp** data from TFSA

data(fishgrp)

Now view the structure of **fishgrp** data

This simply tells that the data contains two attributed or factors (Stock and Sex) and one column of numerical observation (SL- Standard length). To visualize the whole data, please type in **fishgrp** and press the ENTER key. Now, we will use the *descriptive* function in TFSA to see the descriptive statistics of Standard length (SL).

descriptive(fishgrp$SL) N Mean Median Minimum Maximum Variance Std dev Range 1 506 202.0435 193 127 299 926.9684 30.44616 172

The `$’ symbol is used in R to choose the variable (SL) from data (fishgrp). The results shows lot of information i.e., the total number of observations are 506, the mean is 202.04 etc… For more details on results, please type in **?descriptive** and press the ENTER key.

To visualize the histogram, use the argument **HISTOGRAM=TRUE**

descriptive(fishgrp$SL,Histogram=TRUE)

To visualize the Box plot, use the argument **Boxplot=TRUE**

descriptive(fishgrp$SL,Boxplot=TRUE)

To visualize the Dot plot, use the argument **Dotplot=TRUE**

descriptive(fishgrp$SL,Dotplot=TRUE)

To obtain descriptive statistics for each category, use the argument **groups=**. For example, the fishgrp data have observations for 4 different fish stocks i.e., Calcutta, Gujarat, Mumbai and Orissa.

descriptive(fishgrp$SL,groups=fishgrp$Stock,Boxplot=TRUE) N Mean Median Minimum Maximum Variance Std dev Range Calcutta 133 183.3459 183 160 211 95.31887 9.763138 51 Gujarat 113 197.9115 192 148 274 775.63496 27.850224 126 Mumbai 119 232.9832 234 157 299 534.62683 23.121999 142 Orissa 141 196.8794 187 127 266 996.27822 31.563875 139

To obtain descriptive statistics of sub-categories within groups, use the argument **divisions=**. For example, the fishgrp data have observations for males and females in each stock.

descriptive(fishgrp$SL,groups=fishgrp$Stock,division=fishgrp$Sex,Boxplot=TRUE) N Mean Median Minimum Maximum Variance Std dev Range F in Calcutta 59 185.0847 184.0 167 207 88.66511 9.416215 40 M in Calcutta 74 181.9595 181.0 160 211 97.51888 9.875165 51 F in Gujarat 52 194.3077 185.5 148 259 788.64857 28.082887 111 M in Gujarat 61 200.9836 198.0 155 274 756.64973 27.507267 119 F in Mumbai 65 232.8000 234.0 187 299 460.72500 21.464506 112 M in Mumbai 54 233.2037 233.0 157 287 633.86338 25.176644 130 F in Orissa 62 198.8387 196.5 127 266 1218.07192 34.900887 139 M in Orissa 79 195.3418 184.0 152 251 830.15093 28.812340 99

The same information can be viewed using a histogram or dot plot by simply changing the argument. But use only one at a time.

Thanks for reading 🙂

## The freq.class function of TFSA

1. Download, Install and Open R

2. Install the R packages: lattice and TFSA (R>Packages> Install packages from local zip files). This is required only once. If the package is successfully installed, you will get the message “package ‘TFSA’ successfully unpacked and MD5 sums checked”.

3. Load both packages using the library function.

Example: library (lattice); library(TFSA)

**The freq.class function**

freq.class function builds frequency distribution table from a vector of data by grouping them into class intervals. The *frequency* of a class interval is the number of observations that occur in a particular predefined interval. So, for example, if 20 fish samples in our data are aged between 3 to 5 , the frequency for the 3–5 interval is 20. The *endpoints* of a class interval are the lowest and highest values that a variable can take. *Class interval width* is the difference between the lower endpoint of an interval and the lower endpoint of the next interval.

**Mandatory arguments (inputs):**

`'x'- `

A vector of data

**Mandatory arguments for specifying class interval (inputs):**

`'ll'- `

Lowest endpoint of class intervals

`'ul'- `

Uppermost endpoint of class intervals

`'cl'- `

Preferred width of the class interval.

The above arguments work only if all three of them are used together. Unless specified, the function categorize data into 10 classes

**Optional arguments (inputs):**

`'groups'- `

The ordinal (factorial or categorical) variables if data belong to more than one group. Default is NA

`'scales'- `

If TRUE, bar graph for groups will be plotted independent of their relation in size. Default is FALSE

`'density.plot'- `

If TRUE, density graph will be plotted instead of bar graphs. Default is FALSE

**Protocol to use freq.class function**

1. Prepare data with variables in each column (qualitative and quantitative)

2. Import your data into R using the function read.table () or alternatively for learning purposes, you can use the ‘fishdata’ that comes with TFSA package. This data has 506 observations on fish length.

> data(fishdata) > fishdata

2. Find the lowest and highest values of the variables using the functions max () and min ().

3. Decide on width of the class interval.

In deciding on the width of the class intervals, you will have to find a compromise between having intervals short enough so that not all of the observations fall in the same interval, but long enough so that you do not end up with only one observation per interval.

Usage:

1. To get a quick result, use freq.class() with the data vector. This will categorize data into 10 classes.

> freq.class(fishdata) Frequency Distribution Table Lower Upper Midclass Frequency 1 127 144 135.5 1 2 144 161 152.5 18 3 161 178 169.5 104 4 178 195 186.5 137 5 195 212 203.5 63 6 212 229 220.5 68 7 229 246 237.5 57 8 246 263 254.5 45 9 263 280 271.5 8 10 280 297 288.5 4

2. To customize the class intervals, use the arguments ‘ll=’, ‘ul=’ and ‘cl=’. The fish data is in the range between 127 – 299. So let’s keep the lowest end point 125, highest endpoint 300 with a class interval of 7.

> freq.class(fishdata,ll=125,ul=300,cl=7) Frequency Distribution Table Lower Upper Midclass Frequency 1 125 132 128.5 1 2 132 139 135.5 0 3 139 146 142.5 0 4 146 153 149.5 5 5 153 160 156.5 9 6 160 167 163.5 23 7 167 174 170.5 48 8 174 181 177.5 61 9 181 188 184.5 76 10 188 195 191.5 37 11 195 202 198.5 22 12 202 209 205.5 30 13 209 216 212.5 25 14 216 223 219.5 31 15 223 230 226.5 27 16 230 237 233.5 22 17 237 244 240.5 25 18 244 251 247.5 26 19 251 258 254.5 21 20 258 265 261.5 8 21 265 272 268.5 2 22 272 279 275.5 2 23 279 286 282.5 3 24 286 293 289.5 1 25 293 300 296.5 1

3. If the entire data belongs to different groups, this can be specified with the argument ‘groups=’.

TFSA package comes with a second example data known as ‘fishgrp’. This data is a vector of fish length measured from different commercial landing centres in India. The data have two columns in which one is factorial variable (landing centres) and the other one is quantitative variable (fish length).

If you are using your own data, make sure it is formatted in the standard way

> data(fishgrp) > str(fishgrp) 'data.frame': 506 obs. of 2 variables: $ Stock: Factor w/ 4 levels "Calcutta","Gujarat",..: 3 3 3 3 3 3 3 3 3 3 ... $ SL : int 228 264 244 209 220 202 242 256 249 170 ... > freq.class(fishgrp$SL,groups=fishgrp$Stock) Frequency Distribution Table Lower Upper Midclass Calcutta Gujarat Mumbai Orissa 1 127 144 135.5 0 0 0 1 2 144 161 152.5 1 4 1 12 3 161 178 169.5 36 25 1 42 4 178 195 186.5 81 33 3 20 5 195 212 203.5 15 21 14 13 6 212 229 220.5 0 11 33 24 7 229 246 237.5 0 7 32 18 8 246 263 254.5 0 11 25 9 9 263 280 271.5 0 1 5 2 10 280 297 288.5 0 0 4 0

4. In the above graph, Calcutta have a frequency of 80 in one particular graph. Because of this, the true nature of frequency distribution from other locations are not clear. The argument ‘scales=TRUE’ would make the graphs independent of each other. This will show a better picture.

> freq.class(fishgrp$SL,groups=fishgrp$Stock,scales=TRUE)

5. If you are interested in a density plot instead of bar plot, the argument ‘density.plot=’ can be used.

> freq.class(fishgrp$SL,groups=fishgrp$Stock,density.plot=TRUE)

Please leave a comment or email if you find a bug. Thanks for reading 🙂

## The vbgf function of TFSA

1. Download, Install and Open R

2. Install the R packages: lattice and TFSA (R>Packages> Install packages from local zip files). This is required only once. If the package is successfully installed, you will get the message “package ‘TFSA’ successfully unpacked and MD5 sums checked”.

3. Load both packages using the library function.

Example: library (lattice); library(TFSA)

**The vbgf function**

This function computes the length at age of a fish species. ‘vbgf’ refers to The von Bertalanffy Growth Function.

The VBGF is given by L(t) = L_{¥ }*[1 – exp(-K*(t-t_{0}))].

**Mandatory arguments (inputs):**

‘k’ – The growth coefficient for the species (K).

‘t0’ – The age of the species at length zero.

‘linfinity’ – The asymptotic length of the fish species (L_{¥ }).

**Usage:**

If k=0.8, linfinity=230 and t0=0, then the length of fish at age 3 can be estimated as follows:

> vbgf(a=3,k=0.8,linfinity=230,t0=0) [1] 209.1349

If the interest is to estimate length for a range of age groups, a vector of ages (e.g. 3,4,5,…10) can be used instead. Try the following code:

> vbgf(a=c(3:10),k=0.8,linfinity=230,t0=0) [1] 209.1349 220.6247 225.7874 228.1072 229.1495 229.6178 229.8283 229.9228

The output is a vector of fish length with respect to ages from 3 to 10. The result can be visualised using the plot function in R.

A line graph can be drawn using an extra argument to the plot function as follows:

plot(ages,fishlength,type="l")

## Developing TFSA package for R

I have been writing R functions since 2009 as part of my PhD in Fisheries Management. Since then I was longing to pull out useful functions into an R package so others can use it. I named it TFSA (Tropical Fish Stock Assessment) since my intention is to make the functions available to students who work with tropical fisheries where length data is more reliable than age estimation. TFSA is in its very premature form and under development. A few reasons why I got inspired and motivated in developing TFSA (Tropical Fish Stock Assessment) is listed below.

1. There are a few R packages that are useful (FLR project, FSA, fishmethods, fsap) to employ and evaluate fish stock assessments for research and teaching. However, from my understanding the utility of these packages in a tropical system (e.g. Indian fisheries context) is limited since age data is highly unreliable and difficult to standardise due to the lack of strong seasons. The only reliable data would be the size of the fish species in terms of length and weight.

2. The number of R users are increasing in the very recent years. However, the change is dramatic for marine biology. Perhaps people are realizing the power of R in terms of its flexibility (through seminars or social networking) and increase in number of R books. However, there is no R book made for fisheries research despite of the number of R packages in CRAN.

3. Learning R is not easy as compared to other user friendly statistical softwares like Excel or SPSS. But it seems now fisheries biologists are showing more interests after realising its power and flexibility in modelling. However, the resources to learn R is very limited in the perspective of a biologist. I conducted a training program in India on April 2012 and all the participants (professors and researchers in biology) except one person were unaware about R. The person who knew about the existence of R couldn’t manage to make use of it for her research work.

I’m looking forward for people who can contribute and help towards developing this package. Ofcourse, they will be acknowledged in the package description.

TFSA package is available to download from here. At the moment, TFSA only have the following functions:

1. vbgf – Compute length of a fish species given the parameters (k,t0 and Linf). This is based on the Von Bertalanffy Growth Function equation.

2. freq.class – Build the frequency distribution table based on specified class intervals. Useful for building length frequency table. The function can also handle groupings (eg: length data from different months).

More ideas and suggestions are welcome. A tutorial on using these functions will be published soon.