1. Notes of analytics interview
Definition of population and sample
A population is any specific collection of objects of interest. A sample is any subset or
sub collection of the population, including the case that the sample consists of the
whole population, in which case it is termed a census.
Type of data
1.Qualitative data
2.Quantitative data
Types of quantitative data
Discrete data- has a particular fixed value. It can be counted
Continuous data- is not fixed but has a range of data. It can be measured.
Central tendency
Measure by mean, mode,median
2. Cluster Analysis
Unsupervised
Aims to decompose or partition a data set in to clusters such that each cluster is
similar within itself but is as dissimilar as possible to other clusters.
Inter-cluster (between-groups) distance is maximized and intra-cluster (within-
group) distance is minimized.
Different to classification as there are no predefined classes, no training data, may
not even know how many clusters.
Cluster analysis
Formulate the problem
Select a distance measure
Select a clustering procedure
Decide on the number of clusters
Interpret and profile clusters
Assess the validity of clustering
5
3. Mention the differences between Data Mining and Data Profiling?
Data Mining Data Profiting
Data mining is the process of discovering relevant
information that has not yet been identified before.
Data profiling is done to evaluate a dataset
for its uniqueness, logic, and consistency.
In data mining, raw data is converted into valuable
information.
It cannot identify inaccurate or incorrect
data values.
Define the term 'Data Wrangling in Data Analytics.
Data Wrangling is the process wherein raw data is cleaned, structured, and enriched
into a desired usable format for better decision making. It involves discovering,
structuring, cleaning, enriching, validating, and analysing data. This process can turn
4. and map out large amounts of data extracted from various sources into a more useful
format. Techniques such as merging, grouping, concatenating, joining, and sorting are
used to analyse the data. Thereafter it gets ready to be used with another dataset.
What are the different types of sampling techniques used by data analysts?
Sampling is a statistical method to select a subset of data from an entire dataset
(population) to estimate the characteristics of the whole population.
There are majorly five types of sampling methods:
Simple random sampling
Systematic sampling
Cluster sampling
Stratified sampling
Judgmental or purposive sampling
Explain descriptive, predictive, and prescriptive analytics.
Descriptive Predictive Prescriptive
5. It provides insights into the
past to answer “what has
happened”
Understands the future to answer
“what could happen”
Suggest various courses
of action to answer
“what should you do”
Uses data aggregation and
data mining techniques
Uses statistical models and
forecasting techniques
Uses simulation
algorithms and
optimization techniques
to advise possible
outcomes
Example: An ice cream
company can analyze how
much ice cream was sold,
which flavors were sold, and
whether more or less ice
cream was sold than the day
before
Example: An ice cream company
can analyze how much ice cream
was sold, which flavors were sold,
and whether more or less ice
cream was sold than the day
before
Example: Lower prices
to increase the sale of
ice creams, produce
more/fewer quantities of
a specific flavor of ice
cream
Mckinsey session
Problem solving
Define problem
Problem solving hypotheis
Hypothesis Led
Domain IP-Ied
Advanced Analytics
Design Thinking
Engineering
SMART Specific
MECE criteria Mckinsey term