Author Archives: Deepak George Pazhayamadom
Googling ‘Potential Fishing Zones’ will hit you on several links and most of them (well all) may refer to Indian fisheries. To my knowledge, no other country currently use remotely sensed data to provide fishing locations (on a daily basis) where the fishermen may target to improve their profitability (thus it reduces the time and effort on searching fish shoals). These locations are provided by the INCOIS (Indian National Centre for Ocean Information Services) who use satellite data (for chlorophyll concentrations and sea surface temperature) to map areas of fish aggregations (see the NEWS). The method inherently assume that fish aggregations occur at areas with high phytoplankton density (the probability is high for small pelagic fishes since they directly feed on phytoplanktons, thus skipping many levels of the trophic pyramid; see here). Since 1999, many authors have validated this assumption proving higher Catch Per Unit Effort (CPUE) at PFZs adviced by the INCOIS (see here, here and here).
Pros and Cons
The PFZ advisory mechanism in India is one of the finest example where other countries could potentially follow because it has a direct impact on the socio-economic status of the fishermen. I highlight this because many criticisms have been raised recently on pumping tax payers money to research areas where no benefits (in short term) are visible to the public. Having said that, there are increasing evidences on declining fish populations all over the world (due to overfishing and possibly the ‘climate change’). In that context, providing advice on PFZs is neither good for the fish population or the fishermen community (unless a high abundance of the fish population exist).
State of Indian fisheries
Based on global definitions, most fish stocks along the Indian coast can be considered as ‘data-limited’ or ‘data poor’ because the information available is limited to a time series of fish landings (note that the landings are not the actual amount of fish harvested from the population because a good proportion is discarded back to sea). Hence a fully quantitative based analysis or stock assessment is not possible to determine whether the fish stock is being underfished, sustainably fished, overfished or collapsed. A few fish stocks have length-frequency data but the analysis is limited to methods described in the FAO technical paper (i.e., Introduction to tropical fish stock assessment by Per Sparre and Venema 1998). However, the reports on fish stock assessments appear every 5-10 year (on an average) which means that the risk to the stock is not monitored in a per annum basis (even though the fish landings are monitored annually). The country also lack expertise (more than the data availability itself) to develop or run the more complex and efficient analytical models (e.g. Integrated size-structured models) that are currently being used for length based fish stock assessments elsewhere. In a nutshell, there is no information on the state of fisheries (underfished, overfished or collapsed) even for the well studied fish stocks along the Indian coast.
PFZs – are they too risky for Indian fisheries?
It is risky to advice the hot fishing spots when the abundance of fish populations are not available. More importantly, the shoal forming pelagic species such as Indian Oil Sardine (Sardinella longiceps) and Indian Mackerel (Rastrelliger kanagurta) are potentially more vulnerable to PFZ implementation as they feed directly on the planktons. These species are short lived and have a recruitment driven fisheries but the Spawning Stock Biomass (the biomass of mature fish in the population) estimates are not available (at least to the public) due to the lack of annual fish stock assessments. Even if they are available, the only regulatory mechanism to control overfishing is the monsoon trawl ban (trawl fishing is closed between June-August for a fixed period of 45 days during the breeding season) but the period of ‘closed season’ do not change with the abundance level of the fish populations.
Whether PFZs are working?
There is no information on whether PFZs are actually giving any benefits to the fishermen (or declining the fish stocks). All research so far has been spent on improving the identification of PFZs (remote sensing) and validation of fisheries data (using CPUE) through research vessel surveys. There are no reports on whether the PFZ advisories are currently being used by the fishermen to improve their profit (or do they even rely on PFZ fishing spots!). This is a potential area of research for a Masters or PhD thesis.
P.S: Why not ‘Potential No-take Zones (PNZs)’ instead?
Here are a few things I learned after publishing my own research papers. I recommend these to students who are ready to embark into a research career. Studies have shown that publishing research articles is the only way of success in academia (see “Publish or Perish“).
Writing research papers can be extremely daunting. It is quite normal to get a rejection or major revision for your first paper. If there are constructive comments from the reviewer, you still have a chance to get it published. Some useful hints are:-
Well, the answer is not an absolute NO. You do require a few observations but, a long time series is not necessary. My recent paper talks about the potential application of Self-Starting CUSUM in fisheries research. The paper investigates, “Whether a fishery can be managed if no historical data are available?” The fish cartoon depicted above is a symbolic representation to highlight the key messages.
1. Number of samples: A minimum of three observations could help detecting a trend.
2. Reference point: Reference point (value indicating ‘OK’) is not required to detect a trend.
3. Large fish indicator: Large fishes in the population indicate the health of the fish stock.
What is a CUSUM?
The CUSUM refers to Cumulative Sum control chart in ‘Statistical Process Control (SPC)’ theory. They are used for monitoring purposes and help detecting whether the observations in a time series are deviating consistently from a desired level of performance. The graphical version of SPC is known as a control chart and the simplest version is the ‘Shewhart’. In Shewhart chart, the data is monitored by checking whether the observations cross an upper or lower limit (red lines). The example demonstrated below shows the pH observations in an aquarium where the desired level of performance (or reference point) is pH=7 in blue line (neither acidic nor alkaline).
The CUSUM control chart is a modified version of Shewhart, where it computes the deviation of each observation from the reference point and compute their cumulative sum until the last observation. Hence, CUSUM has the advantage of detecting small and gradual shift in the time series (or trend) as early as it occurs (see the graph below).
What is a Self-Starting CUSUM?
The SS-CUSUM works on the same principle but, a reference point is not used in the computational process i.e., you do not have to say that the desired level of performance should be pH=7 (and hence you do not need that information to detect a trend). In SS-CUSUM, a ‘running mean’ is used instead of the reference point. The running mean is estimated from the data itself and updated when new observations are added to the time series. In long term, the running mean will approach closer to the reference point and as a result, the SS-CUSUM will appear similar to a normal basic CUSUM (see the graphs below).
Obviously, the running mean will drive you bananas if the initial observations are not close to the reference point. However, SS-CUSUM is an excellent method if the user is sure about the distributional properties of data i.e., the initial observations are not outliers.
What is the story of my paper?
In the context of a data poor situation, my paper investigated ‘Whether SS-CUSUM can be used for monitoring indicators such as mean length, mean weight etc. so that a change in state of the fish stock can be detected at the earliest possible?’. The performance of SS-CUSUM is displayed below in the form of ‘Receiver Operator Characteristic’ (ROC) curves. The closer the apex of the ROC curve towards the upper left corner, the better is the performance of SS-CUSUM in detecting the change in state of the stock.
The new tool in TFSA version 1.1 is the descriptive function. This is designed to explore data with less R programming and can be completed within a short period of time. However, the function assumes that the data is `continuous’, not `discrete’. The function can do three important things. They are:-
1. The function returns the following values in the form of a data frame:
- Total number of observations
- Standard deviation
2. The function can handle categorical data, say if the observations are measured from different locations or classified based on sex.
3. The descriptive information can be visualized in the form of histogram, box plot or dot plot.
Some examples are provided below using the fishgrp data which comes along with the TFSA package (If you want to use your own data, import them into R- How to do this? Click). First tell R to load the TFSA package.
How to install R? (Check my old post)
Now load the fishgrp data from TFSA
Now view the structure of fishgrp data
This simply tells that the data contains two attributed or factors (Stock and Sex) and one column of numerical observation (SL- Standard length). To visualize the whole data, please type in fishgrp and press the ENTER key. Now, we will use the descriptive function in TFSA to see the descriptive statistics of Standard length (SL).
descriptive(fishgrp$SL) N Mean Median Minimum Maximum Variance Std dev Range 1 506 202.0435 193 127 299 926.9684 30.44616 172
The `$’ symbol is used in R to choose the variable (SL) from data (fishgrp). The results shows lot of information i.e., the total number of observations are 506, the mean is 202.04 etc… For more details on results, please type in ?descriptive and press the ENTER key.
To visualize the histogram, use the argument HISTOGRAM=TRUE
descriptive(fishgrp$SL,groups=fishgrp$Stock,Boxplot=TRUE) N Mean Median Minimum Maximum Variance Std dev Range Calcutta 133 183.3459 183 160 211 95.31887 9.763138 51 Gujarat 113 197.9115 192 148 274 775.63496 27.850224 126 Mumbai 119 232.9832 234 157 299 534.62683 23.121999 142 Orissa 141 196.8794 187 127 266 996.27822 31.563875 139
descriptive(fishgrp$SL,groups=fishgrp$Stock,division=fishgrp$Sex,Boxplot=TRUE) N Mean Median Minimum Maximum Variance Std dev Range F in Calcutta 59 185.0847 184.0 167 207 88.66511 9.416215 40 M in Calcutta 74 181.9595 181.0 160 211 97.51888 9.875165 51 F in Gujarat 52 194.3077 185.5 148 259 788.64857 28.082887 111 M in Gujarat 61 200.9836 198.0 155 274 756.64973 27.507267 119 F in Mumbai 65 232.8000 234.0 187 299 460.72500 21.464506 112 M in Mumbai 54 233.2037 233.0 157 287 633.86338 25.176644 130 F in Orissa 62 198.8387 196.5 127 266 1218.07192 34.900887 139 M in Orissa 79 195.3418 184.0 152 251 830.15093 28.812340 99
Thanks for reading 🙂
The story behind
A couple of weeks back, I wanted to test some time series models with marine fish landings data of India. I certainly knew that Central Marine Fisheries Research Institute (CMFRI, India) publish the estimated values through annual reports. However, I found the data published in their website (http://www.cmfri.org.in/annual-data.html). Compiling all of them manually was a tedious job.
I inquired my friends in CMFRI for an electronic version of the data. But I was told to pay for this and since then I thought there should be a way out because it is already published.
After a little bit of research I found the XML R package. This package provides many approaches for both reading and creating XML (and HTML) documents, both local and accessible via HTTP or FTP. The function to read HTML (webpages) is readHTMLTable (“The URL”)
1. You have to install the XML package first. (See this post for installing R Packages)
2. Now load the package using the library function
3. Read the data from CMFRI website and name it as `mydata’.
This will read the data and produce a table in the form a list. Following is the output in R.
4. So I cropped the data using the following code:
The output is shown below:
Following is a snap shot of the CSV file opened in MS Excel.
So that is how I managed to get data from the website with no manual errors and by not paying for it. Thanks for reading. Happy Coding…
Sometimes it is comprehensive to visualize complex distribution of a model with shaded graph using the percentiles as boundaries. Following are two graphs – dirty and neat examples of the Von Bertalanffy Growth Model, showing growth of fish with increasing age.
Computing the percentiles from data
If suppose we have data from 50 fish, 10 each from Age 1 to Age 5:
mydata Age1 Age2 Age3 Age4 Age5 [1,] 35.63813 59.81503 97.39460 102.23921 134.7052 [2,] 32.10304 68.34917 86.97411 94.06250 118.3878 [3,] 35.93959 61.62869 88.71316 113.95664 127.2596 [4,] 30.73885 56.36995 95.56985 105.57710 129.5275 [5,] 29.88816 59.83590 79.20284 113.07792 120.7841 [6,] 31.16976 48.10858 84.04000 80.52727 136.6355 [7,] 30.71656 47.70394 82.34205 87.58575 123.5584 [8,] 31.05663 63.18036 101.23041 108.09523 153.9219 [9,] 34.99453 57.75032 96.64804 102.09258 122.3556 [10,] 32.40417 57.13938 86.48527 95.42354 121.9671
Following is a code to generate a similar data as above in R:
The quantile () function in R can be used to compute the percentiles. To find out the percentiles column wise (for each age), use the sapply() function :
mynewdata<-sapply(as.data.frame(mydata),FUN=quantile) mynewdata Age1 Age2 Age3 Age4 Age5 0% 29.88816 47.70394 79.20284 80.52727 118.3878 25% 30.81829 56.56231 84.65132 94.40276 122.0642 50% 31.63640 58.78268 87.84363 102.16590 125.4090 75% 34.34694 61.18049 96.37849 107.46570 133.4108 100% 35.93959 68.34917 101.23041 113.95664 153.9219
Shading between percentile boundaries
A polygon is drawn between the boundaries and a colour is specified for shading the region. This will need the XY co-ordinates in the right order (clockwise or anti-clockwise). Here the X-axis is the age of fish and Y-axis are the percentile boundaries.
To draw a polygon to shade the region between 25% and 75% boundaries, we have to extract Y values from the data.
mynewdata25<-mynewdata[2,] > mynewdata25 Age1 Age2 Age3 Age4 Age5 32.10304 68.34917 86.97411 94.06250 118.38777 mynewdata50<-mynewdata[3,] > mynewdata50 Age1 Age2 Age3 Age4 Age5 35.93959 61.62869 88.71316 113.95664 127.25961 mynewdata75<-mynewdata[4,] > mynewdata75 Age1 Age2 Age3 Age4 Age5 30.73885 56.36995 95.56985 105.57710 129.52751
Now we can first plot the graph using the 50% data i.e., mydata50
Now add the shades between boundary percentiles into the graph using the polygon ()
polygon(X co-ordinates, Y co-ordinates, colour, border)
The rev() function in R can be used to reverse a vector:
R is a free statistical software.
Choose “Download R for Windows”.
You will reach on a webpage as in the following picture. Choose “base” link from this page.
Download the latest version of R executable file and open by double clicking.
Run in Windows to install it in your system.
It may ask later whether to install 32 bit or 64 bit version depending upon the Windows version you have installed in your computer. Usually 64 bit Windows are faster than 32 bit.
Once installation is done, you will the R icon in the desktop or in the program folder. Double click on this R icon to start working.
Many basic functions in R analyse data only if they are in the form of “Dataframe” type (Check my chapter about Data Types in R). It is important to understand the arrangement of rows and columns while preparing your data for analysis.
Regardless of R, most analytical software work only if data is in the format as discussed below:
1. Open a fresh file of MS Excel to prepare data.
2. Enter data only on one sheet in MS Excel.
3. Enter each variable in each column.
4. Enter each observation in seperate rows.
5. Enter ‘NA’ for missing values or empty observations. Some people use zero instead which actually doesn’t mean the observation is empty. However, zero can be used in a presence-absence data where presence represented by ‘1’ and absence by ‘0’ (e.g. Infected by virus or not?).
6. Delete all empty columns and rows if used atleast once before (Very important).
7. Save MS Excel as .txt or .csv file (CSV represent Comma Seperated Values)
Download the example data from here: Click
The data is in a CSV file (Comma Seperated Values) and can be opened by any text editor or MS Excel.
The attributes or qualitative variables in this data are Location, Sampling period and Sex (Categorical data). The data have a few quantitative variables i.e., Total Length (TL), Standard Length (SL), Fork Length (FL) and Bondy Weight (BW) of a fish species. Both qualitative and quantitative variables are arranged in columns. Each fish is an observtion and hence the details about each fish is entered in each row.
1. The categories in ‘Location’ are four places from which the fish was collected (Mumbai, Madras, Calcutta and Cochin).
2. The categories in sampling period was 1 and 2 representing two months i.e, October and February. This was intentionally done since the analysis is much easier with numbers instead of the text itself.
3. The categories in ‘Sex’ are Male, Female and Undeterminate. As mentioned before, it is much easier to analyse if the categories are represented by numbers:- Male-1, Female-2 and Undeterminate-3.
Base installation of R does not have functions that can fit models that are tailored to certain disciplines e.g. Yield per Recruit model in fisheries management. In such cases, you may require functions written and contributed by experts. These contributions are known as “Packages”. They are free to download, install and use.
Find a useful package
The easiest way would be to search in google with carefully chosen keywords. Second option is to find the required package from a list in CRAN (Click Here).
Installing a package
You can install packages in R in two ways.
1. An automated process within R (Recommended but require internet while installation)
2. Manually install using ZIP files from CRAN website. (Download and install later)
1. Automated installation
A package may work only if some other packages are installed along with it. If you choose to install using the automated process, all other dependent packages required will be downloaded and installed by itself.
Go to Packages menu and then choose “Install Packages”. R will ask you to choose the nearest CRAN mirror. Choose a location very close to your place.
2. Manual installation
For manual installation, download the required ZIP files from CRAN website (For Windows). The most important thing to remember here is to download ZIP files of all the dependent packages as well.
The following example shows the CRAN website details of pgirmess, a package with functions useful for data analysis in ecology. The website says that this package work only if six other R packages are installed along with pgirmess i.e., boot, nlme, rgdal, sp, spdep and splancs. Hence you have to download ZIP files of all the dependent packages and install them along with pgirmess
Once the ZIP file is dowloaded, go to “Packages ” menu in R. Choose “Install package from local zip file”. It will ask you the location of the downloaded file in your computer. In this method of installation, R won’t automatically identify the extra packages (dependent ones) required. You have to manually download and install each one by one.
Use R to draw maps for your paper
Very often I have found researchers using google map or low resolution map to show population distributions (or survey regions). They usually don’t appear that great in a research journal atleast when printed out.
This is a step by step approach to draw maps in R. I demonstrate how to use functions in PBSmapping package of R. You can draw any map with two small piece of R code. The following are required (Assuming you have Windows):
1. R installed in your computer
2. PBSmapping – R package (and all its dependencies)
3. GSHHS Database file
Import GSHHS Database
GSHHS refers to “Global Self-consistent, Hierarchical, High-resolution Shoreline”. They present a high-resolution shoreline data set maintained by NOAA and are available to download for free. More details are available at : http://www.soest.hawaii.edu/wessel/gshhs/
The shorelines come in five resolutions:
- full resolution: Original data resolution.
- high resolution: About 80 % reduction.
- intermediate resolution: Another ~80 % reduction.
- low resolution: Another ~80 % reduction.
- crude resolution: Another ~80 % reduction.
They are each organized into 4 hierarchical levels:
- L1: boundary between land and ocean.
- L2: boundary between lake and land.
- L3: boundary between island-in-lake and lake.
- L4: boundary between pond-in-island and island.
STEP 1 (Download data)
Find the root directory of PBSmapping R package in your Windows. For example in my computer, the location is ‘C:\Users\Pazhayamadom\Documents\R\win-library\2.15\PBSmapping’
Download ‘GSHHS database in binary format’ (zip file) from the website.
Extract the contents into the root directory of PBSmapping.
The contents include database files (Data for world map) – gshhs_f.b, gshhs_h.b, gshhs_i.b ….etc.
STEP 2 (Load PBSmapping package)
Use the following code to load PBSmapping package in R
STEP 3 (Import data into R)
To import the required data (for map), you should have
1. xlim= range of X-coordinates or Longitudes (for clipping). The range should be between 0 and 360.
2. ylim= range of Y-coordinates or Latitudes (for clipping).
3. maxlevel= maximum level of polygons to import: 1 (land), 2 (lakes on land), 3 (islands in lakes), or 4 (ponds on islands).
The importGSHHS( ) function can be used for importing data (demonstrated with Map of India)
In the above function, there are 4 inputs or arguments seperated by commas. The data file for high resolution map is specified as gshhs_h.b with the source path or location (use gshhs_f.b file for full resolution). xlim are the minimum and maximum longitudes required. Similarly, ylim are the minimum and maximum latitudes required. maxlevel=4 means to import data for lakes, island and ponds along with land.
This process may take some time to read required data from the file and you should be getting the following message:
importGSHHS status: --> Pass 1: complete: 1493 bounding boxes within limits. --> Pass 2: complete. --> Clipping...
STEP 4 (Draw the map)
The final step is to plot the data which is now imported from the database file.
The plotMap ( ) function is used for this purpose.
R have the option of saving pictures in the format of JPEG, PNG, TIFF, Metafile, Postscript, Bmp and PDF. I prefer Metafile for MS Word, PNG for Latex and Postscript for graphical works (the best choices on my experience).
There are many optional arguments (inputs) for the functions mentioned above. I leave this for you to explore. But a few examples are given below:
To get grey shades for the land, use the plotMap( ) function with an extra argument col=’beige’.
plotMap (polys, col="beige")
If you would like the ocean (background) with a blue shade, use the following code:
plotMap (polys, col="beige", bg="lightblue")
Hope this was helpful. Thanks for reading the post.
P.S. – Have fun with changing the colour of your choice 🙂
Add Rivers and Borders
Here we import the database for rivers and borders (wdb_rivers_f.b and wdb_borders_f.b ).
The above function add rivers and borders to the map. This works only if the map is already plotted using plotMap( ) function.
You can add labels into the map using text ( ) function. Again this will work only if the map is already plotted.
text (70, 10, "Arabian Sea")
Here 70 and 10 are the x and y coordinates in the map / graph.