Potential Fishing Zones (PFZs) – too risky for Indian fisheries?


Googling ‘Potential Fishing Zones’ will hit you on several links and most of them (well all) may refer to Indian fisheries. To my knowledge, no other country currently use remotely sensed data to provide fishing locations (on a daily basis) where the fishermen may target to improve their profitability (thus it reduces the time and effort on searching fish shoals). These locations are provided by the INCOIS (Indian National Centre for Ocean Information Services) who use satellite data (for chlorophyll concentrations and sea surface temperature) to map areas of fish aggregations (see the NEWS). The method inherently assume that fish aggregations occur at areas with high phytoplankton density (the probability is high for small pelagic fishes since they directly feed on phytoplanktons, thus skipping many levels of the trophic pyramid; see here). Since 1999, many authors have validated this assumption proving higher Catch Per Unit Effort (CPUE) at PFZs adviced by the INCOIS (see here, here and here).23VJTANHI-W032-_HY_1496501e

Pros and Cons

The PFZ advisory mechanism in India is one of the finest example where other countries could potentially follow because it has a direct impact on the socio-economic status of the fishermen. I highlight this because many criticisms have been raised recently on pumping tax payers money to research areas where no benefits (in short term) are visible to the public. Having said that, there are increasing evidences on declining fish populations all over the world (due to overfishing and possibly the ‘climate change’). In that context, providing advice on PFZs is neither good for the fish population or the fishermen community (unless a high abundance of the fish population exist).

State of Indian fisheries

Based on global definitions, most fish stocks along the Indian coast can be considered as ‘data-limited’ or ‘data poor’ because the information available is limited to a time series of fish landings (note that the landings are not the actual amount of fish harvested from the population because a good proportion is discarded back to sea). Hence a fully quantitative based analysis or stock assessment is not possible to determine whether the fish stock is being underfished, sustainably fished, overfished or collapsed. A few fish stocks have length-frequency data but the analysis is limited to methods described in the FAO technical paper (i.e., Introduction to tropical fish stock assessment by Per Sparre and Venema 1998). However, the reports on fish stock assessments appear every  5-10 year (on an average) which means that the risk to the stock is not monitored in a per annum basis (even though the fish landings are monitored annually). The country also lack expertise (more than the data availability itself) to develop or run the more complex and efficient analytical models (e.g. Integrated size-structured models) that are currently being used for length based fish stock assessments elsewhere. In a nutshell, there is no information on the state of fisheries (underfished, overfished or collapsed) even for the well studied fish stocks along the Indian coast.

PFZs – are they too risky for Indian fisheries?

It is risky to advice the hot fishing spots when the abundance of fish populations are not available. More importantly, the shoal forming pelagic species such as Indian Oil Sardine (Sardinella longiceps) and Indian Mackerel (Rastrelliger kanagurta) are potentially more vulnerable to PFZ implementation as they feed directly on the planktons. These species are short lived and have a recruitment driven fisheries but the Spawning Stock Biomass (the biomass of mature fish in the population) estimates are not available (at least to the public) due to the lack of annual fish stock assessments. Even if they are available, the only regulatory mechanism to control overfishing is the monsoon trawl ban (trawl fishing is closed between June-August for a fixed period of 45 days during the breeding season) but the period of ‘closed season’ do not change with the abundance level of the fish populations.

Whether PFZs are working?

There is no information on whether PFZs are actually giving any benefits to the fishermen (or declining the fish stocks). All research so far has been spent on improving the identification of PFZs (remote sensing) and validation of fisheries data (using CPUE) through research vessel surveys. There are no reports on whether the PFZ advisories are currently being used by the fishermen to improve their profit (or do they even rely on PFZ fishing spots!). This is a potential area of research for a Masters or PhD thesis.

P.S: Why not ‘Potential No-take Zones (PNZs)’ instead?

Get that paper published: hints

Here are a few things I learned after publishing my own research papers. I recommend these to students who are ready to embark into a research career. Studies have shown that publishing research articles is the only way of success in academia (see “Publish or Perish“).

Writing research papers can be extremely daunting. It is quite normal to get a rejection or major revision for your first paper. If there are constructive comments from the reviewer, you still have a chance to get it published. Some useful hints are:-

Read the rest of this entry

Can we detect a trend with no data?


Well, the answer is not an absolute NO. You do require a few observations but, a long time series is not necessary. My recent paper talks about the potential application of Self-Starting CUSUM in fisheries research. The paper investigates, “Whether a fishery can be managed if no historical data are available?” The fish cartoon depicted above is a symbolic representation to highlight the key messages.

1. Number of samples: A minimum of three observations could help detecting a trend.

2. Reference point: Reference point (value indicating ‘OK’) is not required to detect a trend.

3. Large fish indicator: Large fishes in the population indicate the health of the fish stock.

What is a CUSUM?

The CUSUM refers to Cumulative Sum control chart in ‘Statistical Process Control (SPC)’ theory. They are used for monitoring purposes and  help detecting whether the observations in a time series are deviating consistently from a desired level of performance. The graphical version of SPC is known as a control chart and the simplest version is the ‘Shewhart’. In Shewhart chart, the data is monitored by checking whether the observations cross an upper or lower limit (red lines). The example demonstrated below shows the pH observations in an aquarium where the desired level of performance (or reference point) is pH=7 in blue line (neither acidic nor alkaline).

The CUSUM control chart is a modified version of Shewhart, where it computes the deviation of each observation from the reference point and compute their cumulative sum until the last observation. Hence, CUSUM has the advantage of detecting small and gradual shift in the time series (or trend) as early as it occurs (see the graph below).


What is a Self-Starting CUSUM?

The SS-CUSUM works on the same principle but, a reference point is not used in the computational process i.e., you do not have to say that the desired level of performance should be pH=7 (and hence you do not need that information to detect a trend). In SS-CUSUM, a ‘running mean’ is used instead of the reference point. The running mean is estimated from the data itself and updated when new observations are added to the time series. In long term, the running mean will approach closer to the reference point and as a result, the SS-CUSUM will appear similar to a normal basic CUSUM (see the graphs below).

UntitledLimitation of SS-CUSUM?

Obviously, the running mean will drive you bananas if the initial observations are not close to the reference point. However, SS-CUSUM is an excellent method if the user is sure about the distributional properties of data i.e., the initial observations are not outliers.

What is the story of my paper?

In the context of a data poor situation, my paper investigated ‘Whether SS-CUSUM can be used for monitoring indicators such as mean length, mean weight etc. so that a change in state of the fish stock can be detected at the earliest possible?’. The performance of SS-CUSUM is displayed below in the form of ‘Receiver Operator Characteristic’ (ROC) curves. The closer the apex of the ROC curve towards the upper left corner, the better is the performance of SS-CUSUM in detecting the change in state of the stock.

UntitledWe found that a trend can be detected even if there are only three observations in the time series and the best performances was obtained with the Large Fish Indicators (LFIs) 🙂

The descriptive function in TFSA

The new tool in TFSA version 1.1 is the descriptive function. This is designed to explore data with less R programming and can be completed within a short period of time.  However, the function assumes that the data is `continuous’, not `discrete’. The function can do three important things. They are:-

1. The function returns the following values in the form of a data frame:

  • Total number of observations
  • Mean
  • Median
  • Minimum
  • Maximum
  • Variance
  • Standard deviation
  • Range

2. The function can handle categorical data, say if the observations are measured from different locations or classified based on sex.

3. The descriptive information can be visualized in the form of histogram, box plot or dot plot.

Some examples are provided below using the fishgrp data which comes along with the TFSA package (If you want to use your own data, import them into R- How to do this? Click). First tell R to load the TFSA package.

How to install R? (Check my old post)


Now load the fishgrp data from TFSA


Now view the structure of fishgrp data


'data.frame':   506 obs. of  3 variables:
 $ Stock: Factor w/ 4 levels "Calcutta","Gujarat",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Sex  : Factor w/ 2 levels "F","M": 2 2 1 1 1 1 1 2 1 2 ...
 $ SL   : int  228 264 244 209 220 202 242 256 249 170 ...

This simply tells that the data contains two attributed or factors (Stock and Sex) and one column of numerical observation (SL- Standard length). To visualize the whole data, please type in fishgrp and press the ENTER key. Now, we will use the descriptive function in TFSA to see the descriptive statistics of Standard length (SL).


    N     Mean Median Minimum Maximum Variance  Std dev Range
1 506 202.0435    193     127     299 926.9684 30.44616   172

The `$’ symbol is used in R to choose the variable (SL) from data (fishgrp). The results shows lot of information i.e., the total number of observations are 506, the mean is 202.04 etc… For more details on results, please type in ?descriptive and press the ENTER key.

To visualize the histogram, use the argument HISTOGRAM=TRUE


f1To visualize the Box plot, use the argument Boxplot=TRUE


f2To visualize the Dot plot, use the argument Dotplot=TRUE


f3To obtain descriptive statistics for each category, use the argument groups=. For example, the fishgrp data have observations for 4 different fish stocks i.e., Calcutta, Gujarat, Mumbai and Orissa.


           N     Mean Median Minimum Maximum  Variance   Std dev Range
Calcutta 133 183.3459    183     160     211  95.31887  9.763138    51
Gujarat  113 197.9115    192     148     274 775.63496 27.850224   126
Mumbai   119 232.9832    234     157     299 534.62683 23.121999   142
Orissa   141 196.8794    187     127     266 996.27822 31.563875   139

f4To obtain descriptive statistics of sub-categories within groups, use the argument divisions=. For example, the fishgrp data have observations for males and females in each stock.


               N     Mean Median Minimum Maximum   Variance   Std dev Range
F in Calcutta 59 185.0847  184.0     167     207   88.66511  9.416215    40
M in Calcutta 74 181.9595  181.0     160     211   97.51888  9.875165    51
F in Gujarat  52 194.3077  185.5     148     259  788.64857 28.082887   111
M in Gujarat  61 200.9836  198.0     155     274  756.64973 27.507267   119
F in Mumbai   65 232.8000  234.0     187     299  460.72500 21.464506   112
M in Mumbai   54 233.2037  233.0     157     287  633.86338 25.176644   130
F in Orissa   62 198.8387  196.5     127     266 1218.07192 34.900887   139
M in Orissa   79 195.3418  184.0     152     251  830.15093 28.812340    99

f5The same information can be viewed using a histogram or dot plot by simply changing the argument. But use only one at a time.

Thanks for reading 🙂

Reading data from webpages using R

The story behind

A couple of weeks back, I wanted to test some time series models with marine fish landings data of India. I certainly knew that Central Marine Fisheries Research Institute (CMFRI, India) publish the estimated values through annual reports.  However,  I found the data published in their website (http://www.cmfri.org.in/annual-data.html). Compiling all of them manually was a tedious job.


I inquired my friends in CMFRI for an electronic version of the data. But I was told to pay for this and since then I thought there should be a way out because it is already published.

Technical part

After a little bit of research I found the XML R package. This package provides many approaches for both reading and creating XML (and HTML) documents, both local and accessible via HTTP or FTP.  The function to read HTML (webpages) is readHTMLTable (“The URL”)

1. You have to install the XML package first. (See this post for installing R Packages)

2. Now load the package using the library function

3. Read the data from CMFRI website and name it as `mydata’.

mydata<- readHTMLTable("http://www.cmfri.org.in/annual-data.html")

This will read the data and produce a table in the form a list. Following is the output in R.

UntitledHowever, this looks dirty. We have to clean this up to obtain data in the required format.

4. So I cropped the data using the following code:


The output is shown below:

Untitled25. Now we can save this data as a CSV file in our computer. Following is the code to do that. You have to give the path to the location where you want the data to be saved.

write.csv(cleanup.data, file = "C:\\give.location.here\\data.csv", row.names = FALSE)

Following is a snap shot of the CSV file opened in MS Excel.

So that is how I managed to get data from the website with no manual errors and by not paying for it. Thanks for reading. Happy Coding…


Shaded graphs in R

Sometimes it is comprehensive to visualize complex distribution of a model with shaded graph using the percentiles as boundaries. Following are two graphs – dirty and neat examples of the Von Bertalanffy Growth Model, showing growth of fish with increasing age.

21Shaded graphs are easier to plot, but difficult to understand the trick. Essentially, this can be achieved using the polygon () function in R.

Computing the percentiles from data

If suppose we have data from 50 fish, 10 each from Age 1 to Age 5:

          Age1     Age2      Age3      Age4     Age5
 [1,] 35.63813 59.81503  97.39460 102.23921 134.7052
 [2,] 32.10304 68.34917  86.97411  94.06250 118.3878
 [3,] 35.93959 61.62869  88.71316 113.95664 127.2596
 [4,] 30.73885 56.36995  95.56985 105.57710 129.5275
 [5,] 29.88816 59.83590  79.20284 113.07792 120.7841
 [6,] 31.16976 48.10858  84.04000  80.52727 136.6355
 [7,] 30.71656 47.70394  82.34205  87.58575 123.5584
 [8,] 31.05663 63.18036 101.23041 108.09523 153.9219
 [9,] 34.99453 57.75032  96.64804 102.09258 122.3556
[10,] 32.40417 57.13938  86.48527  95.42354 121.9671

Following is a code to generate a similar data as above in R:

for(i in 1:5){
colnames(mydata)<-c("Age1", "Age2", "Age3", "Age4", "Age5")

The quantile () function in R can be used to compute the percentiles. To find out the percentiles column wise (for each age), use the sapply() function :


         Age1     Age2      Age3      Age4     Age5
0%   29.88816 47.70394  79.20284  80.52727 118.3878
25%  30.81829 56.56231  84.65132  94.40276 122.0642
50%  31.63640 58.78268  87.84363 102.16590 125.4090
75%  34.34694 61.18049  96.37849 107.46570 133.4108
100% 35.93959 68.34917 101.23041 113.95664 153.9219

Shading between percentile boundaries

A polygon is drawn between the boundaries and a colour is specified for shading the region. This will need the XY co-ordinates in the right order (clockwise or anti-clockwise). Here the X-axis is the age of fish and Y-axis are the percentile boundaries.

To draw a polygon to shade the region between 25% and 75% boundaries, we have to extract Y values from the data.

> mynewdata25
     Age1      Age2      Age3      Age4      Age5 
 32.10304  68.34917  86.97411  94.06250 118.38777 

> mynewdata50
     Age1      Age2      Age3      Age4      Age5 
 35.93959  61.62869  88.71316 113.95664 127.25961 

> mynewdata75
     Age1      Age2      Age3      Age4      Age5 
 30.73885  56.36995  95.56985 105.57710 129.52751

Now we can first plot the graph using the 50% data i.e., mydata50

plot(1:5,mynewdata50,type="l",ylim=range(mydata),xlab="Age",ylab="Length (cm)")


Now add the shades between boundary percentiles into the graph using the polygon ()

polygon(X co-ordinates, Y co-ordinates, colour, border)


The rev() function in R can be used to reverse a vector:

[1] 5 4 3 2 1

5The X and Y co-ordinates should follow clockwise or anticlockwise direction such that it enclose a loop to form a polygon as demonstrated below.

6The polygon will mask the initial plot showing the 50% line. This can be re-plotted by adding the following command.


7To add a title:

title("Fish Growth")


Installing R

R is a free statistical software.

To install ‘R’, go to your search engine and use the key words “CRAN (Click) (Comprehensive R Archieve Network).

Choose “Download R for Windows”.

You will reach on a webpage as in the following picture. Choose “base” link from this page.

When you click the base link, you will reach into a webpage where you can find links to Download R for Windows with latest version number.

Download the latest version of R executable file and open by double clicking.

Run in Windows to install it in your system.

It may ask later whether to install 32 bit or 64 bit version depending upon the Windows version you have installed in your computer. Usually 64 bit Windows are faster than 32 bit.

Once installation is done, you will the R icon in the desktop or in the program folder. Double click on this R icon to start working.

Organizing data for analysis in R

Many basic functions in R analyse data only if they are in the form of “Dataframe” type (Check my chapter about Data Types in R). It is important to understand the arrangement of rows and columns while preparing your data for analysis.

Regardless of R, most analytical software work only if data is in the format as discussed below:

Important steps:-

1. Open a fresh file of MS Excel to prepare data.

2. Enter data only on one sheet in MS Excel.

3. Enter each variable in each column.

4. Enter each observation in seperate rows.

5. Enter ‘NA’ for missing values or empty observations. Some people use zero instead which actually doesn’t mean the observation is empty. However, zero can be used in a presence-absence data where presence represented by ‘1’ and absence by ‘0’ (e.g. Infected by virus or not?).

6. Delete all empty columns and rows if used atleast once before (Very important).

7. Save MS Excel as .txt or .csv file (CSV represent Comma Seperated Values)


Download the example data from here: Click

The data is in a CSV file (Comma Seperated Values) and can be opened by any text editor or MS Excel.

The attributes or qualitative variables in this data are Location, Sampling period and Sex (Categorical data). The data have a few quantitative variables i.e., Total Length (TL), Standard Length (SL), Fork Length (FL) and Bondy Weight (BW) of a fish species. Both qualitative and quantitative variables are arranged in columns. Each fish is an observtion and hence the details about each fish is entered in each row.

1. The categories in ‘Location’ are four places from which the fish was collected (Mumbai, Madras, Calcutta and Cochin).

2. The categories in sampling period was 1 and 2 representing two months i.e, October and February. This was intentionally done since the analysis is much easier with numbers instead of the text itself.

3. The categories in ‘Sex’ are Male, Female and Undeterminate. As mentioned before, it is much easier to analyse if  the categories are represented by numbers:- Male-1, Female-2 and Undeterminate-3.

Installing packages in R

Base installation of R does not have functions that can fit models that are tailored to certain disciplines e.g. Yield per Recruit model in fisheries management. In such cases, you may require functions written and contributed by experts. These contributions are known as “Packages”. They are free to download, install and use.

Find a useful package

The easiest way would be to search in google with carefully chosen keywords. Second option is to find the required package from a list  in CRAN (Click Here).

Installing a package

You can install packages in R in two ways.

1. An automated process within R (Recommended but require internet while installation)

2. Manually install using ZIP files from CRAN website. (Download and install later)

1. Automated installation

A package may work only if some other packages are installed along with it. If you choose to install using the automated process, all other dependent packages required will be downloaded and installed by itself.

Go to Packages menu and then choose “Install Packages”. R will ask you to choose the nearest CRAN mirror. Choose a location very close to your place.

List of packages available in the CRAN mirror will be displayed. Choose the required package and then press OK. Patiently wait for R to download and install all the packages.

2. Manual installation

For manual installation, download the required ZIP files from CRAN website (For Windows). The most important thing to remember here is to download ZIP files of all the dependent packages as well.

The following example shows the CRAN website details of pgirmess, a package with functions useful for data analysis in ecology. The website says that this package work only if six other R packages are installed along with pgirmess i.e., boot, nlme, rgdal, sp, spdep and splancs. Hence you have to download ZIP files of all the dependent packages and install them along with pgirmess

Once the ZIP file is dowloaded, go to “Packages ” menu in R. Choose “Install package from local zip file”. It will ask you the location of the downloaded file in your computer. In this method of installation, R won’t automatically identify the extra packages (dependent ones) required. You have to manually download and install each one by one.


Draw maps in R

Use R to draw maps for your paper

Very often I have found researchers using google map or low resolution map to show population distributions (or survey regions). They usually don’t appear that great in a research journal atleast when printed out.

This is a step by step approach to draw maps in R. I demonstrate how to use functions in PBSmapping package of R. You can draw any map with two small piece of R code. The following are required (Assuming you have Windows):

1. R installed in your computer

2. PBSmapping – R package (and all its dependencies)

3. GSHHS Database file

I will skip step 1 and 2 since the tutorial for installations have been covered in my earlier posts (Install R, Install Package).

Import GSHHS Database

GSHHS refers to “Global Self-consistent, Hierarchical, High-resolution Shoreline”.  They present a high-resolution shoreline data set maintained by NOAA and are available to download for free. More details are available at : http://www.soest.hawaii.edu/wessel/gshhs/

The shorelines come in five resolutions:

  1. full resolution: Original data resolution.
  2. high resolution: About 80 % reduction.
  3. intermediate resolution: Another ~80 % reduction.
  4. low resolution: Another ~80 % reduction.
  5. crude resolution: Another ~80 % reduction.

They are each organized into 4 hierarchical levels:

  1. L1: boundary between land and ocean.
  2. L2: boundary between lake and land.
  3. L3: boundary between island-in-lake and lake.
  4. L4: boundary between pond-in-island and island.

STEP 1 (Download data)

Find the root directory of PBSmapping R package in your Windows. For example in my computer, the location is ‘C:\Users\Pazhayamadom\Documents\R\win-library\2.15\PBSmapping’

Download ‘GSHHS database in binary format’ (zip file) from the website.

Extract the contents into the root directory of PBSmapping.

The contents include database files (Data for world map) – gshhs_f.b, gshhs_h.b, gshhs_i.b ….etc.

STEP 2 (Load PBSmapping package)

Use the following code to load PBSmapping package in R

STEP 3 (Import data into R)

To import the required data (for map), you should have

1. xlim= range of X-coordinates or Longitudes (for clipping). The range should be between 0 and 360.

2. ylim= range of Y-coordinates or Latitudes (for clipping).

3. maxlevel= maximum level of polygons to import: 1 (land), 2 (lakes on land), 3 (islands in lakes), or 4 (ponds on islands).

The importGSHHS( ) function can be used for importing data (demonstrated with Map of India)

polys <- importGSHHS("C:\\.....\\R\\win-library\\2.15\\PBSmapping\\gshhs_h.b" , xlim=c(63, 93) , ylim=c(5, 30.5) , maxLevel=4)

In the above function, there are 4 inputs or arguments seperated by commas. The data file for high resolution map is specified as gshhs_h.b with the source path or location (use gshhs_f.b file for full resolution). xlim are the minimum and maximum longitudes required. Similarly, ylim are the minimum and maximum latitudes required. maxlevel=4 means to import data for lakes, island and ponds along with land.

This process may take some time to read required data from the file and you should be getting the following message:

importGSHHS status:
--> Pass 1: complete: 1493 bounding boxes within limits.
--> Pass 2: complete.
--> Clipping...

STEP 4 (Draw the map)

The final step is to plot the data which is now imported from the database file.

The plotMap ( ) function is used for this purpose.

plotMap (polys)

R have the option of saving pictures in the format of JPEG, PNG, TIFF, Metafile, Postscript, Bmp and PDF. I prefer Metafile for MS Word, PNG for Latex and Postscript for graphical works (the best choices on my experience).

There are many optional arguments (inputs) for the functions mentioned above. I leave this for you to explore. But a few examples are given below:

To get grey shades for the land, use the plotMap( ) function with an extra argument col=’beige’.

plotMap (polys, col="beige")

If you would like the ocean (background) with a blue shade, use the following code:

plotMap (polys, col="beige", bg="lightblue")

You can add text, pointers and captions with other functions or extra arguments. Since they are beyond the scope of this post, please refer to the graph chapters in any R book.

Hope this was helpful. Thanks for reading the post.

P.S. – Have fun with changing the colour of your choice 🙂


Add Rivers and Borders

rivers <- importGSHHS("C:\\....\\PBSmapping\\wdb_rivers_f.b" , xlim=c(63, 93) , ylim=c(5, 30.5))
borders <- importGSHHS("C:\\....\\PBSmapping\\wdb_borders_f.b" , xlim=c(63, 93) , ylim=c(5, 30.5))

Here we import the database for rivers and borders (wdb_rivers_f.b and wdb_borders_f.b ).

addLines (rivers, col="lightblue")
addLines (borders, col="red")

The above function add rivers and borders to the map. This works only if the map is already plotted using plotMap( ) function.

Add text into the map

You can add labels into the map using text ( ) function. Again this will work only if the map is already plotted.

text (70, 10, "Arabian Sea")

Here 70 and 10 are the x and y coordinates in the map / graph.