2. Statistical Model
A statistical model is a type of mathematical model that comprises of the assumptions
undertaken to describe the data generation process.
Let us focus on the two highlighted terms above:
Type of mathematical model
Statistical model is non-deterministic unlike other mathematical models where variables
have specific values. Variables in statistical models are stochastic i.e. they have probability
distributions.
3. Assumptions
But how do those assumptions help us understand the properties or characteristics of the
true data? Simply put, these assumptions make it easy to calculate the probability of an
event.
Need of Statistical Model
The statistical model plays a fundamental role in carrying out statistical inference which
helps in making propositions about the unknown properties and characteristics of the
population as below:
• Estimation
• Confidence Interval
• Hypothesis Testing
4. Estimation
It is the central idea behind Machine Learning i.e. finding out the number which can
estimate the parameters of distribution.
Note that the estimator is a random variable in itself, whereas an estimate is a single
number which gives us an idea of the distribution of the data generation process. For
example, the mean and sigma of Gaussian distribution.
Confidence Interval
It gives an error bar around the single estimate number i.e. a range of values to signify the
confidence in the estimate arrived on the basis of a number of samples.
5. Hypothesis Testing
It is a statement of finding statistical evidence. Let’s further understand the need to
perform statistical modeling with the help of an example below:
6. Aspect Statistical Modeling Mathematical Modeling
Focus Captures relationships and patterns in data. Represents real-world situations using equations.
Data Usage Utilizes empirical data to build models. Often uses theoretical or assumed data.
Assumptions Models may rely on assumptions about data distribution. Relies on assumptions about relationships between variables.
Goal Inference, hypothesis testing, understanding relationships. Solving complex problems through mathematical equations.
Applications Predictive analytics, decision-making, hypothesis testing. Physical sciences, engineering, economic models.
Model Complexity Can handle complex real-world patterns and noise. Can represent intricate systems and interactions.
Interpretability Often provides insights into data relationships. Focuses on understanding mathematical relationships.
Variables Incorporates real data variables and interactions. Utilizes mathematical variables and constants.
Validation Involves testing against empirical data. Validates against theoretical results or experiments.
Example Linear regression, ANOVA. Differential equations, optimization models.
Statistical Modeling Vs Mathematical Modeling
7. Uses of Statistical Modeling
Statistical modeling in data science is invaluable in various contexts:
Exploratory Data Analysis: At the outset of a project, statistical models help identify
trends, outliers, and relationships within the dataset, setting the stage for further analysis.
Hypothesis Testing: When you have a research question or hypothesis, statistical models
facilitate rigorous testing, confirming or refuting assumptions.
Feature Selection: Statistical modeling aids in choosing relevant features for predictive
models, enhancing model accuracy and interpretability.
Regression Analysis: When exploring relationships between variables, regression models
reveal how one variable influences another, enabling predictions and insights.
8. Classification: Statistical models assist in classifying data into distinct categories, essential
for tasks like sentiment analysis or disease diagnosis.
Anomaly Detection: Statistical models uncover unusual patterns, anomalies, or outliers
in data, crucial for fraud detection or quality control.
Time Series Forecasting: For data with a temporal component, statistical models forecast
future values, aiding in inventory management and financial predictions.
Segmentation Analysis: Models divide data into clusters based on similarities, enhancing
customer segmentation and personalized marketing.
Predictive Modeling: In machine learning, statistical models predict outcomes based on
historical data, essential for business forecasts and decision support.