# Blog Archives

## Potential Fishing Zones (PFZs) – too risky for Indian fisheries?

**Background**

Googling ‘Potential Fishing Zones’ will hit you on several links and most of them (well all) may refer to Indian fisheries. To my knowledge, no other country currently use remotely sensed data to provide fishing locations (on a daily basis) where the fishermen may target to improve their profitability (thus it reduces the time and effort on searching fish shoals). These locations are provided by the INCOIS (Indian National Centre for Ocean Information Services) who use satellite data (for chlorophyll concentrations and sea surface temperature) to map areas of fish aggregations (see the NEWS). The method inherently assume that fish aggregations occur at areas with high phytoplankton density (the probability is high for small pelagic fishes since they directly feed on phytoplanktons, thus skipping many levels of the trophic pyramid; see here). Since 1999, many authors have validated this assumption proving higher Catch Per Unit Effort (CPUE) at PFZs adviced by the INCOIS (see here, here and here).

**Pros and Cons**

The PFZ advisory mechanism in India is one of the finest example where other countries could potentially follow because it has a direct impact on the socio-economic status of the fishermen. I highlight this because many criticisms have been raised recently on pumping tax payers money to research areas where no benefits (in short term) are visible to the public. Having said that, there are increasing evidences on declining fish populations all over the world (due to overfishing and possibly the ‘climate change’). In that context, providing advice on PFZs is neither good for the fish population or the fishermen community (unless a high abundance of the fish population exist).

**State of Indian fisheries**

Based on global definitions, most fish stocks along the Indian coast can be considered as ‘data-limited’ or ‘data poor’ because the information available is limited to a time series of fish landings (note that the landings are not the actual amount of fish harvested from the population because a good proportion is discarded back to sea). Hence a fully quantitative based analysis or stock assessment is not possible to determine whether the fish stock is being underfished, sustainably fished, overfished or collapsed. A few fish stocks have length-frequency data but the analysis is limited to methods described in the FAO technical paper (i.e., Introduction to tropical fish stock assessment by Per Sparre and Venema 1998). However, the reports on fish stock assessments appear every 5-10 year (on an average) which means that the risk to the stock is not monitored in a per annum basis (even though the fish landings are monitored annually). The country also lack expertise (more than the data availability itself) to develop or run the more complex and efficient analytical models (e.g. Integrated size-structured models) that are currently being used for length based fish stock assessments elsewhere. In a nutshell, there is no information on the state of fisheries (underfished, overfished or collapsed) even for the well studied fish stocks along the Indian coast.

**PFZs – are they too risky for Indian fisheries?**

It is risky to advice the hot fishing spots when the abundance of fish populations are not available. More importantly, the shoal forming pelagic species such as Indian Oil Sardine (*Sardinella longiceps*) and Indian Mackerel (*Rastrelliger kanagurta*) are potentially more vulnerable to PFZ implementation as they feed directly on the planktons. These species are short lived and have a recruitment driven fisheries but the Spawning Stock Biomass (the biomass of mature fish in the population) estimates are not available (at least to the public) due to the lack of annual fish stock assessments. Even if they are available, the only regulatory mechanism to control overfishing is the monsoon trawl ban (trawl fishing is closed between June-August for a fixed period of 45 days during the breeding season) but the period of ‘closed season’ do not change with the abundance level of the fish populations.

**Whether PFZs are working?**

There is no information on whether PFZs are actually giving any benefits to the fishermen (or declining the fish stocks). All research so far has been spent on improving the identification of PFZs (remote sensing) and validation of fisheries data (using CPUE) through research vessel surveys. There are no reports on whether the PFZ advisories are currently being used by the fishermen to improve their profit (or do they even rely on PFZ fishing spots!). This is a potential area of research for a Masters or PhD thesis.

**P.S: Why not ‘Potential No-take Zones (PNZs)’ instead?**

## Can we detect a trend with no data?

Well, the answer is not an absolute NO. You do require a few observations but, a long time series is not necessary. My recent paper talks about the potential application of **Self-Starting CUSUM** in fisheries research. The paper investigates, “Whether a fishery can be managed if no historical data are available?” The fish cartoon depicted above is a symbolic representation to highlight the key messages.

1. **Number of samples**: A minimum of three observations could help detecting a trend.

2. **Reference point**: Reference point (value indicating ‘OK’) is not required to detect a trend.

3. **Large fish indicator**: Large fishes in the population indicate the health of the fish stock.

**What is a CUSUM?**

The CUSUM refers to Cumulative Sum control chart in ‘Statistical Process Control (SPC)’ theory. They are used for monitoring purposes and help detecting whether the observations in a time series are deviating consistently from a desired level of performance. The graphical version of SPC is known as a control chart and the simplest version is the ‘Shewhart’. In Shewhart chart, the data is monitored by checking whether the observations cross an upper or lower limit (red lines). The example demonstrated below shows the pH observations in an aquarium where the desired level of performance (or reference point) is pH=7 in blue line (neither acidic nor alkaline).

The CUSUM control chart is a modified version of Shewhart, where it computes the deviation of each observation from the reference point and compute their cumulative sum until the last observation. Hence, CUSUM has the advantage of detecting small and gradual shift in the time series (or trend) as early as it occurs (see the graph below).

**What is a Self-Starting CUSUM?**

The SS-CUSUM works on the same principle but, a reference point is not used in the computational process i.e., you do not have to say that the desired level of performance should be pH=7 (and hence you do not need that information to detect a trend). In SS-CUSUM, a ‘running mean’ is used instead of the reference point. The running mean is estimated from the data itself and updated when new observations are added to the time series. In long term, the running mean will approach closer to the reference point and as a result, the SS-CUSUM will appear similar to a normal basic CUSUM (see the graphs below).

Obviously, the running mean will drive you bananas if the initial observations are not close to the reference point. However, SS-CUSUM is an excellent method if the user is sure about the distributional properties of data i.e., the initial observations are not outliers.

**What is the story of my paper?**

In the context of a data poor situation, my paper investigated ‘Whether SS-CUSUM can be used for monitoring indicators such as mean length, mean weight etc. so that a change in state of the fish stock can be detected at the earliest possible?’. The performance of SS-CUSUM is displayed below in the form of ‘Receiver Operator Characteristic’ (ROC) curves. The closer the apex of the ROC curve towards the upper left corner, the better is the performance of SS-CUSUM in detecting the change in state of the stock.

We found that a trend can be detected even if there are only three observations in the time series and the best performances was obtained with the Large Fish Indicators (LFIs) 🙂

## The freq.class function of TFSA

1. Download, Install and Open R

2. Install the R packages: lattice and TFSA (R>Packages> Install packages from local zip files). This is required only once. If the package is successfully installed, you will get the message “package ‘TFSA’ successfully unpacked and MD5 sums checked”.

3. Load both packages using the library function.

Example: library (lattice); library(TFSA)

**The freq.class function**

freq.class function builds frequency distribution table from a vector of data by grouping them into class intervals. The *frequency* of a class interval is the number of observations that occur in a particular predefined interval. So, for example, if 20 fish samples in our data are aged between 3 to 5 , the frequency for the 3–5 interval is 20. The *endpoints* of a class interval are the lowest and highest values that a variable can take. *Class interval width* is the difference between the lower endpoint of an interval and the lower endpoint of the next interval.

**Mandatory arguments (inputs):**

`'x'- `

A vector of data

**Mandatory arguments for specifying class interval (inputs):**

`'ll'- `

Lowest endpoint of class intervals

`'ul'- `

Uppermost endpoint of class intervals

`'cl'- `

Preferred width of the class interval.

The above arguments work only if all three of them are used together. Unless specified, the function categorize data into 10 classes

**Optional arguments (inputs):**

`'groups'- `

The ordinal (factorial or categorical) variables if data belong to more than one group. Default is NA

`'scales'- `

If TRUE, bar graph for groups will be plotted independent of their relation in size. Default is FALSE

`'density.plot'- `

If TRUE, density graph will be plotted instead of bar graphs. Default is FALSE

**Protocol to use freq.class function**

1. Prepare data with variables in each column (qualitative and quantitative)

2. Import your data into R using the function read.table () or alternatively for learning purposes, you can use the ‘fishdata’ that comes with TFSA package. This data has 506 observations on fish length.

> data(fishdata) > fishdata

2. Find the lowest and highest values of the variables using the functions max () and min ().

3. Decide on width of the class interval.

In deciding on the width of the class intervals, you will have to find a compromise between having intervals short enough so that not all of the observations fall in the same interval, but long enough so that you do not end up with only one observation per interval.

Usage:

1. To get a quick result, use freq.class() with the data vector. This will categorize data into 10 classes.

> freq.class(fishdata) Frequency Distribution Table Lower Upper Midclass Frequency 1 127 144 135.5 1 2 144 161 152.5 18 3 161 178 169.5 104 4 178 195 186.5 137 5 195 212 203.5 63 6 212 229 220.5 68 7 229 246 237.5 57 8 246 263 254.5 45 9 263 280 271.5 8 10 280 297 288.5 4

2. To customize the class intervals, use the arguments ‘ll=’, ‘ul=’ and ‘cl=’. The fish data is in the range between 127 – 299. So let’s keep the lowest end point 125, highest endpoint 300 with a class interval of 7.

> freq.class(fishdata,ll=125,ul=300,cl=7) Frequency Distribution Table Lower Upper Midclass Frequency 1 125 132 128.5 1 2 132 139 135.5 0 3 139 146 142.5 0 4 146 153 149.5 5 5 153 160 156.5 9 6 160 167 163.5 23 7 167 174 170.5 48 8 174 181 177.5 61 9 181 188 184.5 76 10 188 195 191.5 37 11 195 202 198.5 22 12 202 209 205.5 30 13 209 216 212.5 25 14 216 223 219.5 31 15 223 230 226.5 27 16 230 237 233.5 22 17 237 244 240.5 25 18 244 251 247.5 26 19 251 258 254.5 21 20 258 265 261.5 8 21 265 272 268.5 2 22 272 279 275.5 2 23 279 286 282.5 3 24 286 293 289.5 1 25 293 300 296.5 1

3. If the entire data belongs to different groups, this can be specified with the argument ‘groups=’.

TFSA package comes with a second example data known as ‘fishgrp’. This data is a vector of fish length measured from different commercial landing centres in India. The data have two columns in which one is factorial variable (landing centres) and the other one is quantitative variable (fish length).

If you are using your own data, make sure it is formatted in the standard way

> data(fishgrp) > str(fishgrp) 'data.frame': 506 obs. of 2 variables: $ Stock: Factor w/ 4 levels "Calcutta","Gujarat",..: 3 3 3 3 3 3 3 3 3 3 ... $ SL : int 228 264 244 209 220 202 242 256 249 170 ... > freq.class(fishgrp$SL,groups=fishgrp$Stock) Frequency Distribution Table Lower Upper Midclass Calcutta Gujarat Mumbai Orissa 1 127 144 135.5 0 0 0 1 2 144 161 152.5 1 4 1 12 3 161 178 169.5 36 25 1 42 4 178 195 186.5 81 33 3 20 5 195 212 203.5 15 21 14 13 6 212 229 220.5 0 11 33 24 7 229 246 237.5 0 7 32 18 8 246 263 254.5 0 11 25 9 9 263 280 271.5 0 1 5 2 10 280 297 288.5 0 0 4 0

4. In the above graph, Calcutta have a frequency of 80 in one particular graph. Because of this, the true nature of frequency distribution from other locations are not clear. The argument ‘scales=TRUE’ would make the graphs independent of each other. This will show a better picture.

> freq.class(fishgrp$SL,groups=fishgrp$Stock,scales=TRUE)

5. If you are interested in a density plot instead of bar plot, the argument ‘density.plot=’ can be used.

> freq.class(fishgrp$SL,groups=fishgrp$Stock,density.plot=TRUE)

Please leave a comment or email if you find a bug. Thanks for reading 🙂

## Easy R for M.F.Sc students – Part 7 (Functions and Packages)

Welcome to Chapter 7 of EasyR Tutorial. People say: “You can do many / most things in R”. This is just because R is a language and you can use this language to write your own functions. When you install R, many functions come by default. A very simple example of a base R function is the mean ( ). It finds the average of a group of numbers.

Base functions are useful for doing basic and established statistical techniques. But what if you want to analyse a data with a very specific method in your discipline, say “Yield Per Recruit analysis” in fisheries management. They are not generally used by common people and hence you won’t find it in any statistical softwares designed for general purpose.

In R, you can write your own function to do this job. Once made, you will never need to do it again in your life. That is the beauty of making your own functions and R is a good platform for biologists since you really do not need to be a hard core programmer. Today, we will learn to write a function. We will not do anything very complicated, but very simple and easy to understand.

**1. Functions**

In simple terms, function is something which takes data as input, process it and give the result as output. The input data are called “arguments”.

To demonstrate, let’s construct a function that calculate the weight of a fish for given length. The length-weight relationship of fish is given by:

Here ‘W’ (weight) will be the output of our function and there will be three input arguments. 1. Length of the fish 2. The intercept (a) and slope (b) coefficients in the equation.

The general format for constructing a function is:

nameofthefunction <- function (input arguments) { do equations or jobs }

Let’s name our function “LWRel”. Our input arguments are

1. Length of the fish (L)

2. a coefficient (acoeff)

3. b coefficient (bcoeff)

Now open TinnR. The best place to write functions is an editor like TinnR because it is easier to correct if mistakes happens while you write and you can avoid rewriting the same code several times. Write the following function in TinnR and source the code into R (refer to the first few chapters of my tutorial if you do not know this procedure):

The above function takes L, acoeff and bcoeff as input, calculate the weight of the fish. I added an extra job to print the result otherwise R will not show us the result of this calculation. Now to see how this work, call the function in R with arguments as follows. For example to know the weight of a fish (Bombay Duck) with length 200 mm given that coefficients :

> LWRel(200,0.0000058,2.915) [1] 29.57539

The result is weight in grams. Now with our new function let’s estimate the weight of the fish for a range of length, say, from 10 mm to 300 mm with 5 mm interval. First create a vector of length measurements with seq ( ) function.

> lrange <- seq (10, 300, 5) > lrange [1] 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 [20] 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 [39] 200 205 210 215 220 225 230 235 240 245 250 255 260 265 270 275 280 285 290 [58] 295 300

Now just use this vector instead of a single length measurement in LWRel function:

> LWRel (lrange, 0.0000058, 2.915) [1] 0.004769007 0.015550130 0.035969171 0.068932355 0.117283374 [6] 0.183817283 0.271289427 0.382421650 0.519906870 0.686412579 [11] 0.884583610 1.117044381 1.386400736 1.695241503 2.046139812 [16] 2.441654229 2.884329744 3.376698641 3.921281257 4.520586673 [21] 5.177113315 5.893349512 6.671773988 7.514856319 8.425057344 [26] 9.404829545 10.456617395 11.582857678 12.785979792 14.068406020 [31] 15.432551787 16.880825905 18.415630793 20.039362685 21.754411835 [36] 23.563162692 25.467994086 27.471279382 29.575386643 31.782678775 [41] 34.095513667 36.516244325 39.047218994 41.690781283 44.449270277 [46] 47.325020644 50.320362745 53.437622726 56.679122619 60.047180430 [51] 63.544110228 67.172222226 70.933822865 74.831214889 78.866697421 [56] 83.042566030 87.361112806 91.824626421 96.435392197

Here we get the weight of the fish for all length observations. To visualise this result, we can put the output into a graph using plot ( ) function.

plot ( lrange, LWRel (lrange, 0.0000058, 2.915), xlab="Length", ylab="Weight" )

I hope this was very impressive and might have sparked many ideas. In brief, you can visualise the pattern or behaviour of a population if a defined relationship is known. What I would suggest you to do as a homework is to try the same example with a Von Bertalanffy Growth Function equation and visualise the growth curve of a species with increasing length or age of the fish.

**2. Packages**

Base R comes with many functions that are commonly used in statistics. But there are not many if you are looking for high level advanced statistics or when you want to visualise graphs and data in a more user friendly style. You can ofcourse make R to do things in the way you want. However, most often this consume a lot of your time.

Now clever people have already coded functions that could be useful. A set of functions are usually compiled together and made available to public in the form of a “Package”. So essentially, if you install a package and load it in R, you can use all the functions that comes with the package. Explanation on how to use these functions and arguments would be detailed in the user manual of the package.

**Find a useful package**

The easiest way would be to search in google with carefully chosen keywords. Second option is to find the required package from a list in CRAN (Click Here).

**Installing a package**

First, you must need an internet connection. You can install a package in R by an automated process within R or by manually downloading the ZIP file from CRAN. The best option is to install a package using the automated process.

If you choose to install using the automated process, all other dependent packages required will be downloaded and installed by itself. Go to Packages menu and then choose “Install Packages”. R will ask you to choose the nearest CRAN mirror. Choose a location very close to your place.

Next R will give another list of all packages available in the chosen CRAN mirror. Choose the required package from the list and then patiently wait for R to download and install all the packages required.

The second option to install a package is to manually download them as a ZIP file from CRAN. From “Packages ” menu in R, choose “Install package from local zip file”. It will ask you the location of the downloaded file in your computer.

The problem of this installation is that, R won’t automatically identify what extra packages (dependent) are required. You have to manually download and install each one by one.

To demonstrate with an example, let’s install a package named “msm” which denotes “Multi-state Markov and hidden Markov models in continuous time”. We are not going to do any big stats here. This package have a function named “rtnorm ( )” which help the user to choose a random number between two limits. This is what I’m interested to show you.

The CRAN page for this package is here (Click). If you scroll down, you can see a title “Reverse dependencies”. This is a list of packages on which msm package depends on. So msm do not work until all the listed packages are installed in your computer. All these packages will be installed automatically if you choose the first method. But at times, you might be forced to use the second option.

**Using a package**

To use the msm package in R, type:

Now all the functions in msm package is ready to use. Before we use the msm functions, let’s have a look what is available in base R and why we are opting a different function which comes with msm package.

To generate a random number from a normal distribution, the function in R is the rnorm ( ) function. To generate 10 random numbers from a normal distribution with mean=100 and standard deviation=10, use:

You will get much different values from what is printed here. Generating random numbers…. it is like rolling a dice. Suppose in the above output, If I wish only to have values between 90 and 110 !!! There are a few values less than 90 and more than 110. We can write our own function to make R to choose only values between 90 and 110.

But anyway, the rtnorm ( ) function in msm package is useful in doing that task.

Let’s try using the rtnorm ( ) function.

> rtnorm (10, mean=100, sd=10, lower=90, upper=110) [1] 91.90775 91.09023 105.49530 92.87104 105.92710 97.77873 94.99413 [8] 103.52435 90.80200 103.80593

First argument is the number of random numbers required (10).

Fourth and fifth argument are the limits within which the random numbers should be choosen.

This is just an example. There are thousands of packages available in R and do fantastic things. I will finish this tutorial with an example that demonstrate the use of packages, functions and loops by integrating them for different purposes.

Below, the graph was plotted using a loop process.

The code used for plotting this graph is given below. Paste the code in TinnR and source the file in R.

Remember to install “msm”package before using this code. Otherwise you will get error.

# Remove all existing objects in R rm(list=ls(all=TRUE)) # To load the package "msm" library(msm) # Function to calculate the weight of fish for given length LWRel<-function(L, acoeff, bcoeff) {W <- acoeff*(L^bcoeff)} # Vector of fish length lrange <-seq(10,300,5) # Vector of fish weight fishweight<-LWRel(lrange,0.0000058,2.915) # Asymptotic weight in grams maxweight <-150 # Loop process to plot the graph for(i in 1:100){ newfishweight<-c() for(j in 1: length(lrange)){ newfishweight<-c(newfishweight,rtnorm(1, fishweight[j], fishweight[j]*0.25, 0, maxweight)) } if(i==1){ plot(lrange,newfishweight,xlab="Length of Bombay Duck (mm)", ylab="Weight(grams)")}else{ points(lrange,newfishweight) } } title("Length-Weight relationship of Bombay Duck")

The plotting function might confuse you. But leave it for the moment. In our next chapter, we will learn more about making graphs. Using a random number helps you to visualise patterns similar to real world situations.

My tutorial for this chapter ends here. Hope you enjoyed it and leave a comment if you like it. You can also sign up to my blogs with your email address if you would like to get notified whenever I publish a new chapter.