Organizing data for analysis in R
Many basic functions in R analyse data only if they are in the form of “Dataframe” type (Check my chapter about Data Types in R). It is important to understand the arrangement of rows and columns while preparing your data for analysis.
Regardless of R, most analytical software work only if data is in the format as discussed below:
1. Open a fresh file of MS Excel to prepare data.
2. Enter data only on one sheet in MS Excel.
3. Enter each variable in each column.
4. Enter each observation in seperate rows.
5. Enter ‘NA’ for missing values or empty observations. Some people use zero instead which actually doesn’t mean the observation is empty. However, zero can be used in a presence-absence data where presence represented by ‘1’ and absence by ‘0’ (e.g. Infected by virus or not?).
6. Delete all empty columns and rows if used atleast once before (Very important).
7. Save MS Excel as .txt or .csv file (CSV represent Comma Seperated Values)
Download the example data from here: Click
The data is in a CSV file (Comma Seperated Values) and can be opened by any text editor or MS Excel.
The attributes or qualitative variables in this data are Location, Sampling period and Sex (Categorical data). The data have a few quantitative variables i.e., Total Length (TL), Standard Length (SL), Fork Length (FL) and Bondy Weight (BW) of a fish species. Both qualitative and quantitative variables are arranged in columns. Each fish is an observtion and hence the details about each fish is entered in each row.
1. The categories in ‘Location’ are four places from which the fish was collected (Mumbai, Madras, Calcutta and Cochin).
2. The categories in sampling period was 1 and 2 representing two months i.e, October and February. This was intentionally done since the analysis is much easier with numbers instead of the text itself.
3. The categories in ‘Sex’ are Male, Female and Undeterminate. As mentioned before, it is much easier to analyse if the categories are represented by numbers:- Male-1, Female-2 and Undeterminate-3.