1. Statistics Techniques To Deal With Data
Data Collection
The first activity that needs to be performed before undertaking any statistical analysis project is
collecting relevant data/information. Data is mainly gathered from two sources- primary and
secondary. Primary sources refer to the data collected by the researcher himself and secondary
data is collected from outside. Primary data is original while the secondary data is hardly original
at times. Primary data includes surveys, observations, and experiments. Secondary data has
internal records and government published data.
Data Categorization And Classification
Categorization needs the data to be organized in order to get some insights from it. Basic
insights about the data can be obtained through the various listing of values in an ordered array.
For example, we have data of heights of 10 people
160cm, 165cm, 155cm, 190cm, 177cm, 181cm, 179cm, 185cm, 159cm, 173cm
This data in an ordered array will look like
155cm, 159cm, 160cm ,165cm, 173cm, 177cm, 179cm, 181cm, 185cm, 190cm
The above data tells us that 155cm is the shortest height while 190cm is the tallest.
Data classification is the assembly of relevant facts/data into different categories/groups as per
certain features. It helps in compressing portions of data in order to differentiate between the
similarities and dissimilarities in the data. It encourages association. The factors, based on
which classification is done are
• Geographical
• Chronological
• Qualitative
• Quantitative
• Geographical classification
It is classified on the basis of geographical location. For example, classifying colleges
based on which state they belong to.
• Chronological classification
It is divided on the basis of time. For example, babies born in a hospital in the current
year and last year.
• Qualitative classification
It is ranked on the basis of some attributes. For example, classifying people based on
area, gender, and literacy.
• Quantitative classification
2. It is organized as per quantitative class intervals. For example, classifying individuals
based on their annual income.
Data Presentation
Presentation of data includes frequency distribution which has a group of data split into mutually
exclusive categories conferring the frequency of observations in each class.
Constructing a frequency distribution involves
• Determining the question to be addressed
• Collecting raw data
• Organizing data (frequency distribution)
• Presenting data (Histogram)
For example, assume you are looking for prospective clients for your new product which is an
electric bike. You want to target a particular section of IT employees in some locations of your
area. From your past experience, you know that people who travel up to 10 km every day to
their offices are more interested to buy this product. As reaching each and every employee in
the IT park may incur a huge cost, you decided to do a pilot survey to get some idea about the
prospective market of your product in the IT park. You engaged an executive who was
supposed to ask every employee coming to the office in the morning about how much time they
need to reach the office. This data can then be used to calculate the number of potential
customers who are interested in your product.