Statistical Data / Variables – Introduction
(Classification of Statistical Data / Variable – Numeric vs Categorical)
What is ‘data’ or ‘variable’?
Ø Data is a set of values of qualitative or quantitative variables.
Ø In biostatistics (also in statistics) data are the individual observations.
Ø The scientific investigations involve observations on variables.
Ø The observations made on these variables are obtained in the form of ‘data’.
Ø Variable is a quantity or characteristic which can ‘vary from one individual to another’.
Ø Example: Consider the characteristic ‘weight’ of individuals and let it be denoted by the letter ‘N’. The value of ‘N’ varies from one individual to another and thus, ‘N’ is a variable.
Ø Data and variable are not exact but used frequently as synonyms.
Ø The variables can also be called as ‘data items’.
Ø Majority of the statistical analysis are done on variables.
Type of Variables in Statistics
Statistical variables can be classified based on two criterion (I) Nature of Variables and (II) Source of variables
I. Classification of variable based on Nature of Variables
Ø Based on the nature of variables, statistical variables can be classified to TWO major categories such as (1) Numerical and (2) Categorical.
Ø The classification chart of variables is given below:
(1). Numerical Variable
Ø Numerical variables are the measurable or countable variables.
Ø They are better called as quantitative variable because they give the quantitative data.
Ø Example: plant height, fruit weight, crop yield, number of petals, seeds, leaves in a plant etc.
Ø Numerical variables are further categorized into (a) Discrete variables and (b) Continuous variables.
(a) Discrete variables:
Ø Discrete variables are also called as discontinuous variables.
Ø Here, the values which variables can assume are limited to whole numbers only (0, 1, 2, 3 etc.).
Ø There will be ‘gaps’ between the successive values of the variable.
Ø Example: Consider the number of petals in a flower as a discrete variable X. In the real situation, the number of petals in a flower may be 4 or 5 or 6 or any whole numbers. There will not be a variable such as 5 ½ petals or 4.2 petals. Such variables are called discrete variables or discontinuous variables.
Ø Example: number of brothers, number of petals etc.
(b) Continuous variables
Ø Continuous are those variables that can take any value within a certain range.
Ø There are NO ‘gaps’ between the successive values of the variable.
Ø Example: Consider the height of plant as the variable X. In real situation the height of plant may be 10 cm, 10.1 cm, 10.5 cm, 10.8 cm, 11 cm etc. Thus, between two whole numbers (here 10 and 11), there are numerous possible values. Such a variable is called continuous variable.
Ø Examples: height, weight, length, speed etc.
(2). Categorical Variable
Ø Categorical variables are un-measurable variables.
Ø They are also called as non-numerical or qualitative variable since they give qualitative data.
Ø Example: colour of flower, shape of leaves, shape of seeds etc.
Ø Categorical variables are further classified into (a) Nominal variables and (b) Ordinal variables.
(a). Nominal Variables:
Ø Nominal variables have distinct levels that have NO inherent ordering.
Ø Example: Hair colour (white, black, brown etc.), gender (male and female).
Ø In statistics the nominal measurement means the awarding of a numeral value to a specific characteristic (example: Gender of employees in an office: male 20, female 28).
(b). Ordinal Variables:
Ø Ordinal variables have levels that follow distinct ordering.
Ø Examples: The degrees of changes in fever patient after the antibiotic treatment (such as: vast improvement, moderate improvement, no change, death).
II. Classification of variable based on Source of Variables
Ø Based on the source of data (variables), the data can be classified into (a) Primary Data and (b) Secondary Data
(a). Primary Data
Ø The data originally collected in the process of investigation by the investigator is called primary data.
Ø Primary data are more accurate and uniform.
Ø Primary data involves the supervision of the investigator.
Ø Primary data collection is time and labour consuming.
Ø Biological studies, particularly experimental studies, primarily depend on primary data.
(b). Secondary Data
Ø Secondary data is the data collected by some other person or organization for their own use.
Ø It is the data that already in existence for the same or other purpose than answering of the question in hand (Blair M.M.).
Ø Secondary data are usually published data by the primary investigator.
Ø Getting the secondary data is advantageous since it is less expensive and less time consuming.
Ø Secondary data is frequently used in disciplines such as economics, commerce, agriculture, public health etc.
Ø Example: population census data, national mortality rate, annual rain fall, budget records etc.
Ø Research results published in reputed journals can also acts as secondary data.
Source of Secondary Data
Ø Published sources are the excellent and frequently used source of secondary data.
Ø These are the records published or maintained by government and non-governmental agencies such as department of census, department of statistics, health department, agriculture and fisheries department, official publications of UN, WHO, UNEP, UNESCO etc. are good source of secondary data.
Ø Important sources of secondary data are summarized below:
(a). International publications: These are the regular or occasional reports of international organizations such as UN, WHO, WWF, IMF (International monetary fund) etc.
(b). Official publications of the state and central government: These are the publications by the state of central government on current issues or regular periodic reports. Example: Census of India, Reserve bank bulletin, Report of currency and finance etc.
(c). Committee reports: these are the reports of enquiry commissions appointed by the government. Example: Madhav Gadgil committee report, Kasturirangan committee report etc.
(d). Newspapers and magazines: These are the important review reports and articles published in reputed newspapers and magazines.
(e). Research scholars: They are the reports or results of the previous research published on reputed journals.
(f). Semi-official publications: These are the publications by the semi-governmental organizations such as municipalities, provinces etc.
Ø Apart from published data, some genuine but unpublished data can also be used as the source of secondary data with great precaution.
Care to be taken before taking the secondary data
Ø Before taking the secondary data, the investigator must enquire about the following aspects of the data:
$ The reliability of the data.
$ The competency of the individual (or organization) who collected the data.
$ The suitability of the data for the particular study.
$ The quantity of the data.
$ The time of data collection (whether the data is recent or very old).
$ The original method adopted for the collection of data.
You might also like…
@. Terminologies in Biostatistics
@. Methods of Collection of Data