This document discusses modeling claim amounts in insurance using probability distributions and simulations. It begins with an introduction to fitting distributions to insurance claims data. The key steps are outlined as selecting an appropriate loss distribution, estimating its parameters using maximum likelihood estimation, and testing the fit using goodness of fit tests. The Pareto distribution is discussed in more detail as it is commonly used to model insurance claim amounts. The document concludes by describing how to simulate claim amounts above a deductible using the Pareto distribution and calculate reinsurance premiums.
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...guest48424e
This paper proposes a new fraud detection method using multiple classifiers to handle skewed data distributions. The method partitions the minority class using oversampling and trains Naive Bayes, C4.5, and backpropagation classifiers on each partition. The classifiers are then combined using stacking and bagging. Experimental results found that stacking and bagging the classifiers achieved higher cost savings than single classifiers, with stacking-bagging performing the best. Oversampling performed better than undersampling or SMOTE for this skewed fraud detection data.
High-Dimensional Methods: Examples for Inference on Structural EffectsNBER
This document describes a study that uses high-dimensional methods to estimate the effect of 401(k) eligibility on measures of accumulated assets. It begins by outlining the baseline model and notes areas for improvement, such as controlling for income. It then discusses using regularization like LASSO for variable selection in high-dimensional settings. The document explores more flexible specifications by generating many interaction and polynomial terms but notes the need for dimension reduction. It describes using LASSO to select important variables from a large set. The results select a parsimonious set of variables and estimate similar 401(k) effects as the baseline.
Econometrics of High-Dimensional Sparse ModelsNBER
The document discusses high-dimensional sparse econometric models where the number of predictors (p) is much larger than the sample size (n). It outlines an approach for estimating regression functions using penalization methods like the LASSO. Specifically, it discusses:
1. Using the LASSO estimator to minimize squared errors while penalizing the l1-norm of coefficients, inducing sparsity.
2. Choosing the optimal penalty level as a function of the error variance and sample size. Variants like the square-root LASSO provide a tuning-free approach.
3. Examples showing how sparse approximations can better capture patterns in population data than traditional low-dimensional approximations.
This document provides teaching suggestions for regression models:
1) It suggests emphasizing the difference between independent and dependent variables in a regression model using examples.
2) It notes that correlation does not necessarily imply causation and gives an example of variables that are correlated but changing one does not affect the other.
3) It recommends having students manually draw regression lines through data points to appreciate the least squares criterion.
4) It advises selecting random data values to generate a regression line in Excel to demonstrate determining the coefficient of determination and F-test.
5) It suggests discussing the full and shortcut regression formulas to provide a better understanding of the concepts.
Big Data analysis involves building predictive models from high-dimensional data using techniques like variable selection, cross-validation, and regularization to avoid overfitting. The document discusses an example analyzing web browsing data to predict online spending, highlighting challenges with large numbers of variables. It also covers summarizing high-dimensional data through dimension reduction and model building for prediction versus causal inference.
The document discusses probabilistic and statistical models for outlier detection, specifically focusing on methods for extreme-value analysis of univariate and multivariate data distributions. It introduces some of the earliest and most fundamental statistical methods for outlier detection, including probabilistic tail inequalities like the Markov inequality and Chebychev inequality, which can be used to determine the probability that an extreme value should be considered anomalous. It also discusses how extreme-value analysis methods can be extended from univariate to multivariate data and how mixture models provide a probabilistic approach to identifying both outliers and extreme values.
This paper proposes using a "shrinkage" estimator as an alternative to the traditional sample covariance matrix for portfolio optimization. The shrinkage estimator combines the sample covariance matrix with a structured "shrinkage target" using a shrinkage constant to minimize distance from the true covariance matrix. The paper finds this shrinkage estimator significantly increases the realized information ratio of active portfolio managers compared to the sample covariance matrix. An empirical study on historical stock return data confirms the shrinkage method leads to higher ex post information ratios in portfolio optimization. However, the shrinkage target assumes identical pairwise correlations that may not fully reflect market characteristics.
The document discusses confidence intervals and hypothesis testing. It provides examples of constructing 95% confidence intervals for a population mean using a sample mean and standard deviation. It also demonstrates how to identify the null and alternative hypotheses, determine if a test is right-tailed, left-tailed, or two-tailed, and calculate p-values to conclude whether to reject or fail to reject the null hypothesis based on a significance level of 0.05. Examples include testing claims about population proportions and means.
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...guest48424e
This paper proposes a new fraud detection method using multiple classifiers to handle skewed data distributions. The method partitions the minority class using oversampling and trains Naive Bayes, C4.5, and backpropagation classifiers on each partition. The classifiers are then combined using stacking and bagging. Experimental results found that stacking and bagging the classifiers achieved higher cost savings than single classifiers, with stacking-bagging performing the best. Oversampling performed better than undersampling or SMOTE for this skewed fraud detection data.
High-Dimensional Methods: Examples for Inference on Structural EffectsNBER
This document describes a study that uses high-dimensional methods to estimate the effect of 401(k) eligibility on measures of accumulated assets. It begins by outlining the baseline model and notes areas for improvement, such as controlling for income. It then discusses using regularization like LASSO for variable selection in high-dimensional settings. The document explores more flexible specifications by generating many interaction and polynomial terms but notes the need for dimension reduction. It describes using LASSO to select important variables from a large set. The results select a parsimonious set of variables and estimate similar 401(k) effects as the baseline.
Econometrics of High-Dimensional Sparse ModelsNBER
The document discusses high-dimensional sparse econometric models where the number of predictors (p) is much larger than the sample size (n). It outlines an approach for estimating regression functions using penalization methods like the LASSO. Specifically, it discusses:
1. Using the LASSO estimator to minimize squared errors while penalizing the l1-norm of coefficients, inducing sparsity.
2. Choosing the optimal penalty level as a function of the error variance and sample size. Variants like the square-root LASSO provide a tuning-free approach.
3. Examples showing how sparse approximations can better capture patterns in population data than traditional low-dimensional approximations.
This document provides teaching suggestions for regression models:
1) It suggests emphasizing the difference between independent and dependent variables in a regression model using examples.
2) It notes that correlation does not necessarily imply causation and gives an example of variables that are correlated but changing one does not affect the other.
3) It recommends having students manually draw regression lines through data points to appreciate the least squares criterion.
4) It advises selecting random data values to generate a regression line in Excel to demonstrate determining the coefficient of determination and F-test.
5) It suggests discussing the full and shortcut regression formulas to provide a better understanding of the concepts.
Big Data analysis involves building predictive models from high-dimensional data using techniques like variable selection, cross-validation, and regularization to avoid overfitting. The document discusses an example analyzing web browsing data to predict online spending, highlighting challenges with large numbers of variables. It also covers summarizing high-dimensional data through dimension reduction and model building for prediction versus causal inference.
The document discusses probabilistic and statistical models for outlier detection, specifically focusing on methods for extreme-value analysis of univariate and multivariate data distributions. It introduces some of the earliest and most fundamental statistical methods for outlier detection, including probabilistic tail inequalities like the Markov inequality and Chebychev inequality, which can be used to determine the probability that an extreme value should be considered anomalous. It also discusses how extreme-value analysis methods can be extended from univariate to multivariate data and how mixture models provide a probabilistic approach to identifying both outliers and extreme values.
This paper proposes using a "shrinkage" estimator as an alternative to the traditional sample covariance matrix for portfolio optimization. The shrinkage estimator combines the sample covariance matrix with a structured "shrinkage target" using a shrinkage constant to minimize distance from the true covariance matrix. The paper finds this shrinkage estimator significantly increases the realized information ratio of active portfolio managers compared to the sample covariance matrix. An empirical study on historical stock return data confirms the shrinkage method leads to higher ex post information ratios in portfolio optimization. However, the shrinkage target assumes identical pairwise correlations that may not fully reflect market characteristics.
The document discusses confidence intervals and hypothesis testing. It provides examples of constructing 95% confidence intervals for a population mean using a sample mean and standard deviation. It also demonstrates how to identify the null and alternative hypotheses, determine if a test is right-tailed, left-tailed, or two-tailed, and calculate p-values to conclude whether to reject or fail to reject the null hypothesis based on a significance level of 0.05. Examples include testing claims about population proportions and means.
Maxentropic and quantitative methods in operational risk modelingErika G. G.
This document presents an overview of maximum entropy approaches for operational risk modeling. It discusses challenges in modeling loss distributions with scarce data, including parameter uncertainty, poor tail fits, and an inability to separately fit the body and tails of distributions. The maximum entropy approach models the loss distribution by maximizing entropy subject to moment constraints derived from the empirical Laplace transform of the loss data. This provides a robust density reconstruction over the entire range of losses. The document provides examples of using maximum entropy to model univariate and multivariate loss distributions, including dependencies between different risk types.
In this paper, the black-litterman model is introduced to quantify investor’s views, then we expanded
the safety-first portfolio model under the case that the distribution of risk assets return is ambiguous. When
short-selling of risk-free assets is allowed, the model is transformed into a second-order cone optimization
problem with investor views. The ambiguity set parameters are calibrated through programming
This document is a problem report submitted by Harini Vaidyanath to West Virginia University in partial fulfillment of the requirements for a Master of Science degree in Statistics. The 38-page report discusses approximating probability density functions for aggregate claims in collective risk models. It covers risk theory concepts, methods for obtaining aggregate claims distributions like moment generating functions and direct convolutions, and using recursive techniques like the Panjer recursion formula to compute probability functions for different classes of claim distributions. Examples and R code are provided. The goal is to recursively compute probability functions and compare to normal and translated gamma approximations.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 8: Hypothesis Testing
8.4: Testing a Claim About a Standard Deviation or Variance
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 8: Hypothesis Testing
8.2: Testing a Claim About a Proportion
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
This document covers probability theory and fuzzy sets and fuzzy logic, which are topics for an applied artificial intelligence unit. It discusses key concepts for probability theory including joint probability, conditional probability, and Bayes' theorem. It also covers fuzzy sets and fuzzy logic, including fuzzy set operations, types of membership functions, linguistic variables, and fuzzy propositions and inference rules. Examples are provided throughout to illustrate probability and fuzzy set concepts. The document is presented as a slideshow with explanatory text and diagrams on each slide.
The document discusses hypothesis testing and outlines the steps to conduct hypothesis testing. It defines key terms like the null hypothesis, alternative hypothesis, and level of significance. It also explains how to determine the rejection region and calculate the test statistic. An example is provided where the null hypothesis that the mean number of TVs in US homes is at least 3 is tested using sample data and the steps of hypothesis testing.
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...ijfls
A support vector machine (SVM) learns the decision surface from two different classes of the input points. In several applications, some of the input points are misclassified and each is not fully allocated to either of these two groups. In this paper a bi-objective quadratic programming model with fuzzy parameters is utilized and different feature quality measures are optimized simultaneously. An α-cut is defined to transform the fuzzy model to a family of classical bi-objective quadratic programming problems. The weighting method is used to optimize each of these problems. For the proposed fuzzy bi-objective quadratic programming model, a major contribution will be added by obtaining different effective support vectors due to changes in weighting values. The experimental results, show the effectiveness of the α-cut with the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions. The main contribution of this paper includes constructing a utility function for measuring the degree of infection with coronavirus disease (COVID-19).
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...cscpconf
The document proposes an alternative approach for selecting pseudo-random numbers for online examination systems. It compares three random number generators: a procedural language random number generator, the PHP random number generator, and an atmospheric noise-based true random number generator. It tests the randomness quality of patterns generated by each using the Diehard statistical tests. The results show that the true random number generator passes all tests, while the procedural language and PHP generators fail most tests, indicating their patterns have lower randomness quality than the true random generator.
Credit card fraud detection and concept drift adaptation with delayed supervi...Andrea Dal Pozzolo
This document discusses methods for credit card fraud detection in the presence of concept drift and delayed labeling. It formalizes the problem and proposes learning strategies to handle feedback from investigators and delayed labels separately. Experiments on real and artificial datasets show that aggregating classifiers trained on feedback and delayed samples outperforms approaches that do not distinguish between sample types or do not aggregate classifiers. Future work will focus on adaptive aggregation and addressing sample selection bias.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Common evaluation measures in NLP and IRRushdi Shams
This document discusses various evaluation measures used in information retrieval and natural language processing. It describes precision, recall, and the F1 score as fundamental measures for unranked retrieval sets. It also covers averaged precision and recall, accuracy, novelty and coverage ratios. For ranked retrieval sets, it discusses recall-precision graphs, interpolated recall-precision, precision at k, R-precision, ROC curves, and normalized discounted cumulative gain (NDCG). The document also discusses agreement measures like Kappa statistics and parses evaluation measures like Parseval and attachment scores.
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutTELKOMNIKA JOURNAL
1) The document proposes combining the ARIMA forecasting model with fuzzy alpha cuts (FAC) to improve the interval width of crime forecasting.
2) Typically, ARIMA models provide uncertainty ranges in the form of 95% confidence intervals. However, the amount of historical data can impact these intervals.
3) The proposed approach applies FAC to the forecasting results and confidence intervals from an ARIMA model. This converts the data to triangular fuzzy numbers. Then, alpha cuts are used to calculate new lower and upper interval bounds in order to obtain a better interval width for crime forecasting.
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
This document discusses using a maximum entropy approach to model loss distributions for operational risk modeling. It begins by motivating the need to accurately model loss distributions given challenges with limited datasets, including heavy tails and dependence between risks. It then provides an overview of the loss distribution approach commonly used in operational risk modeling and its limitations. The document introduces the maximum entropy approach, which frames the problem as maximizing entropy subject to moment constraints. It discusses using the Laplace transform of loss distributions to compress information into moments and how these can be estimated from sample data or fitted distributions to serve as constraints for the maximum entropy approach.
IN ORDER TO IMPLEMENT A SET OF RULES / TUTORIALOUTLET DOT COMjorge0050
Solve each trigonometric equation in the interval [0,2n) by first squaring both sides. fleas x=1+sin x Select the correct choice below and, if necessary, fill in the answer box to complete your choice. O A- The solution set is .
(Simplify your answer. Use a comma to separate answers as needed. Type an exact answer, using 1: as needed. Use integers or fractions for any numbers in the expression.) 0 B. There is no solution on this interval.
Intro and maths behind Bayes theorem. Bayes theorem as a classifier. NB algorithm and examples of bayes. Intro to knn algorithm, lazy learning, cosine similarity. Basics of recommendation and filtering methods.
Here are the key differences between supervised and unsupervised learning:
Supervised Learning:
- Uses labeled examples/data to learn. The labels provide correct answers for the learning algorithm.
- The goal is to build a model that maps inputs to outputs based on example input-output pairs.
- Common algorithms include linear/logistic regression, decision trees, k-nearest neighbors, SVM, neural networks.
- Used for classification and regression predictive problems.
Unsupervised Learning:
- Uses unlabeled data where there are no correct answers provided.
- The goal is to find hidden patterns or grouping in the data.
- Common algorithms include clustering, association rule learning, self-organizing maps.
-
The document discusses various techniques for data reduction including data sampling, data cleaning, data transformation, and data segmentation to break down large datasets into more manageable groups that provide better insight, as well as hierarchical and k-means clustering methods for grouping similar objects into clusters to analyze relationships in the data.
This document provides an overview of key concepts in statistics for data science, including:
- Descriptive statistics like measures of central tendency (mean, median, mode) and variation (range, variance, standard deviation).
- Common distributions like the normal, binomial, and Poisson distributions.
- Statistical inference techniques like hypothesis testing, t-tests, and the chi-square test.
- Bayesian concepts like Bayes' theorem and how to apply it in R.
- How to use R and RCommander for exploring and visualizing data and performing statistical analyses.
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
The document describes a sample average approximation method for solving stochastic programs with integer recourse. It approximates the expected recourse cost function using a sample average based on a sample of scenarios. It shows that as the sample size increases, the solution to the sample average approximation problem converges exponentially fast to the optimal solution of the true stochastic program. It also describes statistical and deterministic techniques for validating candidate solutions. Preliminary computational results applying this method are also mentioned.
Maxentropic and quantitative methods in operational risk modelingErika G. G.
This document presents an overview of maximum entropy approaches for operational risk modeling. It discusses challenges in modeling loss distributions with scarce data, including parameter uncertainty, poor tail fits, and an inability to separately fit the body and tails of distributions. The maximum entropy approach models the loss distribution by maximizing entropy subject to moment constraints derived from the empirical Laplace transform of the loss data. This provides a robust density reconstruction over the entire range of losses. The document provides examples of using maximum entropy to model univariate and multivariate loss distributions, including dependencies between different risk types.
In this paper, the black-litterman model is introduced to quantify investor’s views, then we expanded
the safety-first portfolio model under the case that the distribution of risk assets return is ambiguous. When
short-selling of risk-free assets is allowed, the model is transformed into a second-order cone optimization
problem with investor views. The ambiguity set parameters are calibrated through programming
This document is a problem report submitted by Harini Vaidyanath to West Virginia University in partial fulfillment of the requirements for a Master of Science degree in Statistics. The 38-page report discusses approximating probability density functions for aggregate claims in collective risk models. It covers risk theory concepts, methods for obtaining aggregate claims distributions like moment generating functions and direct convolutions, and using recursive techniques like the Panjer recursion formula to compute probability functions for different classes of claim distributions. Examples and R code are provided. The goal is to recursively compute probability functions and compare to normal and translated gamma approximations.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 8: Hypothesis Testing
8.4: Testing a Claim About a Standard Deviation or Variance
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 8: Hypothesis Testing
8.2: Testing a Claim About a Proportion
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
This document covers probability theory and fuzzy sets and fuzzy logic, which are topics for an applied artificial intelligence unit. It discusses key concepts for probability theory including joint probability, conditional probability, and Bayes' theorem. It also covers fuzzy sets and fuzzy logic, including fuzzy set operations, types of membership functions, linguistic variables, and fuzzy propositions and inference rules. Examples are provided throughout to illustrate probability and fuzzy set concepts. The document is presented as a slideshow with explanatory text and diagrams on each slide.
The document discusses hypothesis testing and outlines the steps to conduct hypothesis testing. It defines key terms like the null hypothesis, alternative hypothesis, and level of significance. It also explains how to determine the rejection region and calculate the test statistic. An example is provided where the null hypothesis that the mean number of TVs in US homes is at least 3 is tested using sample data and the steps of hypothesis testing.
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...ijfls
A support vector machine (SVM) learns the decision surface from two different classes of the input points. In several applications, some of the input points are misclassified and each is not fully allocated to either of these two groups. In this paper a bi-objective quadratic programming model with fuzzy parameters is utilized and different feature quality measures are optimized simultaneously. An α-cut is defined to transform the fuzzy model to a family of classical bi-objective quadratic programming problems. The weighting method is used to optimize each of these problems. For the proposed fuzzy bi-objective quadratic programming model, a major contribution will be added by obtaining different effective support vectors due to changes in weighting values. The experimental results, show the effectiveness of the α-cut with the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions. The main contribution of this paper includes constructing a utility function for measuring the degree of infection with coronavirus disease (COVID-19).
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...cscpconf
The document proposes an alternative approach for selecting pseudo-random numbers for online examination systems. It compares three random number generators: a procedural language random number generator, the PHP random number generator, and an atmospheric noise-based true random number generator. It tests the randomness quality of patterns generated by each using the Diehard statistical tests. The results show that the true random number generator passes all tests, while the procedural language and PHP generators fail most tests, indicating their patterns have lower randomness quality than the true random generator.
Credit card fraud detection and concept drift adaptation with delayed supervi...Andrea Dal Pozzolo
This document discusses methods for credit card fraud detection in the presence of concept drift and delayed labeling. It formalizes the problem and proposes learning strategies to handle feedback from investigators and delayed labels separately. Experiments on real and artificial datasets show that aggregating classifiers trained on feedback and delayed samples outperforms approaches that do not distinguish between sample types or do not aggregate classifiers. Future work will focus on adaptive aggregation and addressing sample selection bias.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Common evaluation measures in NLP and IRRushdi Shams
This document discusses various evaluation measures used in information retrieval and natural language processing. It describes precision, recall, and the F1 score as fundamental measures for unranked retrieval sets. It also covers averaged precision and recall, accuracy, novelty and coverage ratios. For ranked retrieval sets, it discusses recall-precision graphs, interpolated recall-precision, precision at k, R-precision, ROC curves, and normalized discounted cumulative gain (NDCG). The document also discusses agreement measures like Kappa statistics and parses evaluation measures like Parseval and attachment scores.
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutTELKOMNIKA JOURNAL
1) The document proposes combining the ARIMA forecasting model with fuzzy alpha cuts (FAC) to improve the interval width of crime forecasting.
2) Typically, ARIMA models provide uncertainty ranges in the form of 95% confidence intervals. However, the amount of historical data can impact these intervals.
3) The proposed approach applies FAC to the forecasting results and confidence intervals from an ARIMA model. This converts the data to triangular fuzzy numbers. Then, alpha cuts are used to calculate new lower and upper interval bounds in order to obtain a better interval width for crime forecasting.
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
This document discusses using a maximum entropy approach to model loss distributions for operational risk modeling. It begins by motivating the need to accurately model loss distributions given challenges with limited datasets, including heavy tails and dependence between risks. It then provides an overview of the loss distribution approach commonly used in operational risk modeling and its limitations. The document introduces the maximum entropy approach, which frames the problem as maximizing entropy subject to moment constraints. It discusses using the Laplace transform of loss distributions to compress information into moments and how these can be estimated from sample data or fitted distributions to serve as constraints for the maximum entropy approach.
IN ORDER TO IMPLEMENT A SET OF RULES / TUTORIALOUTLET DOT COMjorge0050
Solve each trigonometric equation in the interval [0,2n) by first squaring both sides. fleas x=1+sin x Select the correct choice below and, if necessary, fill in the answer box to complete your choice. O A- The solution set is .
(Simplify your answer. Use a comma to separate answers as needed. Type an exact answer, using 1: as needed. Use integers or fractions for any numbers in the expression.) 0 B. There is no solution on this interval.
Intro and maths behind Bayes theorem. Bayes theorem as a classifier. NB algorithm and examples of bayes. Intro to knn algorithm, lazy learning, cosine similarity. Basics of recommendation and filtering methods.
Here are the key differences between supervised and unsupervised learning:
Supervised Learning:
- Uses labeled examples/data to learn. The labels provide correct answers for the learning algorithm.
- The goal is to build a model that maps inputs to outputs based on example input-output pairs.
- Common algorithms include linear/logistic regression, decision trees, k-nearest neighbors, SVM, neural networks.
- Used for classification and regression predictive problems.
Unsupervised Learning:
- Uses unlabeled data where there are no correct answers provided.
- The goal is to find hidden patterns or grouping in the data.
- Common algorithms include clustering, association rule learning, self-organizing maps.
-
The document discusses various techniques for data reduction including data sampling, data cleaning, data transformation, and data segmentation to break down large datasets into more manageable groups that provide better insight, as well as hierarchical and k-means clustering methods for grouping similar objects into clusters to analyze relationships in the data.
This document provides an overview of key concepts in statistics for data science, including:
- Descriptive statistics like measures of central tendency (mean, median, mode) and variation (range, variance, standard deviation).
- Common distributions like the normal, binomial, and Poisson distributions.
- Statistical inference techniques like hypothesis testing, t-tests, and the chi-square test.
- Bayesian concepts like Bayes' theorem and how to apply it in R.
- How to use R and RCommander for exploring and visualizing data and performing statistical analyses.
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
The document describes a sample average approximation method for solving stochastic programs with integer recourse. It approximates the expected recourse cost function using a sample average based on a sample of scenarios. It shows that as the sample size increases, the solution to the sample average approximation problem converges exponentially fast to the optimal solution of the true stochastic program. It also describes statistical and deterministic techniques for validating candidate solutions. Preliminary computational results applying this method are also mentioned.
Everything about Special Discrete DistributionsSoftsasi
Overview of Special Discrete Distributions to be Covered:
In this detailed exploration of special discrete distributions, we will cover several important
distributions, each with its unique characteristics, formulas, and applications. Here's an overview of
what will be discussed:
1. Bernoulli Distribution
Definition and parameters
Probability mass function (PMF)
Mean and variance
Applications in trials with binary outcomes, like coin flips
2. Binomial Distribution
Introduction
Special Discrete Distributions
Definition and parameters
Probability mass function (PMF)
Mean and variance
Connection to Bernoulli trials
Applications in scenarios with a fixed number of independent Bernoulli trials
3. Geometric Distribution
Definition and parameters
Probability mass function (PMF)
Mean and variance
Applications in modeling the number of trials needed until the first success occurs
4. Negative Binomial Distribution
Definition and parameters
Probability mass function (PMF)
Mean and variance
Applications in modeling the number of trials until a specified number of failures occur
5. Poisson Distribution
Definition and parameters
Probability mass function (PMF)
Mean and variance
Applications in modeling the number of events occurring in a fixed interval of time or
space
6. Hypergeometric Distribution
Definition and parameters
Probability mass function (PMF)
Mean and variance
Applications in sampling without replacement scenarios, like drawing items from a finite
population
7. Multinomial Distribution
Definition and parameters
Probability mass function (PMF)
Mean and variance
Applications in scenarios with more than two outcomes, such as rolling multiple dice or
categorizing data into multiple classes
8. Pascal Distribution
Definition and parameters
Probability mass function (PMF)
Mean and variance
Applications in modeling the number of trials until a specified number of successes occur
9. Zero-Inflated Poisson (ZIP) Distribution
Definition and parameters
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/36
Probability mass function (PMF)
Mean and variance
Applications in modeling count data with excessive zeros
These distributions are foundational in probability theory and statistics, providing powerful tools for
analyzing and interpreting data in various fields. We will delve into each distribution, exploring its
properties, formulas, mean, variance, and practical applications. Understanding these special
discrete distributions is essential for anyone working with data, making predictions, or drawing
conclusions from experiments and observations.
MSL 5080, Methods of Analysis for Business Operations 1 .docxmadlynplamondon
MSL 5080, Methods of Analysis for Business Operations 1
Course Learning Outcomes for Unit III
Upon completion of this unit, students should be able to:
2. Distinguish between the approaches to determining probability.
3. Contrast the major differences between the normal distribution and the exponential and Poisson
distributions.
Reading Assignment
Chapter 2: Probability Concepts and Applications, pp. 32–48
Unit Lesson
Mathematical truths provide us several useful means to estimate what will happen based on factors that are
given or researched. After becoming familiar with the idea of probability, one can see how mathematics make
applications in government and business possible.
Probability Distributions
To look at probability distributions, one should define a random variable as an unknown that could be any
real number, including decimals or fractions. Many problems in life have real numbers of any value of a whole
number and fraction or decimal as the value of the random variable amount. Discrete random variables will
have a certain limited range of values, and continuous random variables may have an infinite range of
possible values. These continuous random variables could be any value at all (Render, Stair, Hanna, &
Hale, 2015).
One true tendency is that events that occur in a group of trials tend to cluster around a middle point of values
as the most occurring, or highest probabilities they will occur. They then taper off to one or both sides as there
are lower probabilities that the events will be very low from the middle (or zero) and very high from the middle.
This middle point is called the mean or expected value E(X):
n
E(X) = ∑ Xi P(Xi)
i=1
Where Xi is the random variable value, and the summation sign ∑ with n and i=1 means you are adding all n
possible values (Render et al., 2015).
The sum of these events can be shown as graphs. If the random variable has a discrete probability
distribution (e.g., cans of paint that can be sold in a day), then the graph of events may look like this:
UNIT III STUDY GUIDE
Binomial and Normal Distributions
MSL 5080, Methods of Analysis for Business Operations 2
UNIT x STUDY GUIDE
Title
The bar heights show the probability for any X (or, P(X) ) along the y-axis, given the discrete number for X
along the x-axis and no fractions for discrete variables (no half-cans of paint).
The variance (σ2) is the spread of the distribution of events in a probability distribution (Render et al., 2015).
The variance is interesting because a small variance may indicate that the event value will most likely be near
the mean most of the time, and a large variance may show that the mean is not all that reliable a guide of
what the event values will be, as the sp.
This document discusses statistical estimation and provides information about objectives, outline, statistical inference, estimation types (point and interval), confidence intervals, and sample size calculation. The key points are:
- The objectives are to describe statistical inference, differentiate between point and interval estimation, compute confidence intervals, and describe sample size calculation methods.
- Point estimation provides a single value to estimate a population parameter, while interval estimation provides a range of values that the population parameter is likely to fall within.
- Confidence intervals account for sample to sample variation and give a measure of precision for estimates. Common confidence levels are 90%, 95%, and 99%.
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...萍華 楊
This paper proposes a new method for fraud detection in skewed data that uses multiple classifiers on data partitions. It compares this new method against other sampling and classification techniques on an automobile insurance fraud detection data set. The results show that the new method, which uses stacking and bagging of Naive Bayes, C4.5, and backpropagation classifiers on minority-oversampled partitions, achieves the highest cost savings compared to other sampling and single classifier approaches.
This document presents a comparative study of supervised machine learning algorithms for credit card fraud detection. The study uses a dataset from Kaggle containing over 284,000 credit card transactions, including 492 fraudulent ones. Logistic regression, K-nearest neighbors, naive Bayes, decision tree, and random forest algorithms are tested on the imbalanced dataset after oversampling with SMOTE. Random forest achieves the highest AUC of 97.11% and is determined to be the best approach for credit card fraud detection based on recall, precision, and AUC values. While random forest performs best, the document suggests future evaluation of unsupervised learning methods could produce more conclusive results.
IRJET- Disease Prediction using Machine LearningIRJET Journal
This document discusses using machine learning techniques to predict diseases based on patient symptoms. Specifically, it proposes using naive bayes, k-nearest neighbors (KNN), and logistic regression algorithms on structured and unstructured hospital data to predict diseases like diabetes, malaria, jaundice, dengue, and tuberculosis. The system is intended to make disease prediction more accessible to end users by analyzing their symptoms without needing to visit a doctor. It aims to improve prediction accuracy by handling both structured and unstructured data using machine learning models.
This document summarizes a research paper on using support vector machines (SVM) for anomaly detection, specifically for credit card fraud detection. It discusses how SVM is a supervised machine learning technique that can handle large, high-dimensional datasets. The document provides an overview of SVM, comparing it to other techniques like neural networks and clustering. It summarizes the methodology used in the research, which applied SVM to real credit card transaction data. The results showed SVM achieved high accuracy and a low false positive rate for fraud detection. In conclusion, the document states that applying this SVM method could help banks better predict fraudulent credit card transactions.
This document discusses using support vector machines (SVM) for anomaly detection, specifically for credit card fraud detection. It contains the following key points:
1. SVM is a supervised machine learning technique that can be used for anomaly detection tasks like fraud detection. It works by mapping data to a higher dimensional space to find a hyperplane that maximizes separation between classes.
2. The document evaluates SVM for credit card fraud detection using real transaction data and compares it to other techniques like neural networks and clustering. SVM achieved higher accuracy and lower false positive rates than these other methods.
3. A theoretical comparison found that SVM requires no parameter tuning, works well in high dimensions, and has lower computational cost than neural networks and
The document discusses machine learning concepts and algorithms. It provides definitions of learning, discusses different types of learning problems including supervised, unsupervised, and reinforcement learning. It also covers key machine learning topics like generalization error, empirical risk minimization, probably approximately correct learning, model selection, and overfitting vs underfitting.
ENTROPY-COST RATIO MAXIMIZATION MODEL FOR EFFICIENT STOCK PORTFOLIO SELECTION...cscpconf
This paper introduces a new stock portfolio selection model in non-stochastic environment.Following the principle of maximum entropy, a new entropy-cost ratio function is introduced as
the objective function. The uncertain returns, risks and ividends of the securities are considered as interval numbers. Along with the objective function, eight different types of constraints are used in the model to convert it into a pragmatic one. Three different models have been proposed by defining the future inancial market optimistically, pessimistically and in hecombined form to model the portfolio selection problem. To illustrate the effectiveness and tractability of the proposed models, these are tested on a set of data from Bombay Stock Exchange (BSE). The solution has been done by genetic algorithm.
Quantitative Risk Assessment - Road Development PerspectiveSUBIR KUMAR PODDER
This document outlines an approach for quantitative risk assessment in road transport infrastructure projects using stochastic analysis with triangular distributions. It discusses determining the combined influence of parameters like project cost and traffic on economic indicators. Traditionally, risks from cost and traffic changing from base cases are analyzed separately using triangular distributions defined by minimum, most likely and maximum limits. The document proposes a method to analyze the combined influence of both parameters varying simultaneously using bivariate distributions and conditional probabilities.
The document discusses confidence intervals and hypothesis testing. It provides examples of constructing 95% confidence intervals for a population mean using a sample mean and standard deviation. It also demonstrates how to identify the null and alternative hypotheses, determine if a test is right-tailed, left-tailed, or two-tailed, and calculate p-values to conclude whether to reject or fail to reject the null hypothesis based on a significance level of 0.05. Examples include testing claims about population proportions and means.
The document discusses confidence intervals and hypothesis testing. It provides examples of constructing 95% confidence intervals for a population mean and proportion. It also demonstrates identifying the null and alternative hypotheses and interpreting the results of hypothesis tests, including calculating p-values.
Classifiers are algorithms that map input data to categories in order to build models for predicting unknown data. There are several types of classifiers that can be used including logistic regression, decision trees, random forests, support vector machines, Naive Bayes, and neural networks. Each uses different techniques such as splitting data, averaging predictions, or maximizing margins to classify data. The best classifier depends on the problem and achieving high accuracy, sensitivity, and specificity.
Predictive Modeling in Insurance in the context of (possibly) big dataArthur Charpentier
This document discusses predictive modeling in insurance in the context of big data. It begins with an introduction to the speaker and outlines some key concepts in actuarial science from both American and European perspectives. It then provides examples of common actuarial problems involving ratemaking, pricing, and claims reserving. The document reviews the history of actuarial models and discusses issues around statistical learning, machine learning, and their relationship to statistics. It also covers model evaluation and various loss functions used in modeling.
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....Lacey Max
“After being the most listed dog breed in the United States for 31
years in a row, the Labrador Retriever has dropped to second place
in the American Kennel Club's annual survey of the country's most
popular canines. The French Bulldog is the new top dog in the
United States as of 2022. The stylish puppy has ascended the
rankings in rapid time despite having health concerns and limited
color choices.”
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Final ank Satta Matka Dpbos Final ank Satta Matta Matka 143 Kalyan Matka Guessing Final Matka Final ank Today Matka 420 Satta Batta Satta 143 Kalyan Chart Main Bazar Chart vip Matka Guessing Dpboss 143 Guessing Kalyan night
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf46adnanshahzad
How to Start Up a Company: A Step-by-Step Guide Starting a company is an exciting adventure that combines creativity, strategy, and hard work. It can seem overwhelming at first, but with the right guidance, anyone can transform a great idea into a successful business. Let's dive into how to start up a company, from the initial spark of an idea to securing funding and launching your startup.
Introduction
Have you ever dreamed of turning your innovative idea into a thriving business? Starting a company involves numerous steps and decisions, but don't worry—we're here to help. Whether you're exploring how to start a startup company or wondering how to start up a small business, this guide will walk you through the process, step by step.
Storytelling is an incredibly valuable tool to share data and information. To get the most impact from stories there are a number of key ingredients. These are based on science and human nature. Using these elements in a story you can deliver information impactfully, ensure action and drive change.
The 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdfthesiliconleaders
In the recent edition, The 10 Most Influential Leaders Guiding Corporate Evolution, 2024, The Silicon Leaders magazine gladly features Dejan Štancer, President of the Global Chamber of Business Leaders (GCBL), along with other leaders.
3 Simple Steps To Buy Verified Payoneer Account In 2024SEOSMMEARTH
Buy Verified Payoneer Account: Quick and Secure Way to Receive Payments
Buy Verified Payoneer Account With 100% secure documents, [ USA, UK, CA ]. Are you looking for a reliable and safe way to receive payments online? Then you need buy verified Payoneer account ! Payoneer is a global payment platform that allows businesses and individuals to send and receive money in over 200 countries.
If You Want To More Information just Contact Now:
Skype: SEOSMMEARTH
Telegram: @seosmmearth
Gmail: seosmmearth@gmail.com
Easily Verify Compliance and Security with Binance KYCAny kyc Account
Use our simple KYC verification guide to make sure your Binance account is safe and compliant. Discover the fundamentals, appreciate the significance of KYC, and trade on one of the biggest cryptocurrency exchanges with confidence.
Structural Design Process: Step-by-Step Guide for BuildingsChandresh Chudasama
The structural design process is explained: Follow our step-by-step guide to understand building design intricacies and ensure structural integrity. Learn how to build wonderful buildings with the help of our detailed information. Learn how to create structures with durability and reliability and also gain insights on ways of managing structures.
[To download this presentation, visit:
https://www.oeconsulting.com.sg/training-presentations]
This PowerPoint compilation offers a comprehensive overview of 20 leading innovation management frameworks and methodologies, selected for their broad applicability across various industries and organizational contexts. These frameworks are valuable resources for a wide range of users, including business professionals, educators, and consultants.
Each framework is presented with visually engaging diagrams and templates, ensuring the content is both informative and appealing. While this compilation is thorough, please note that the slides are intended as supplementary resources and may not be sufficient for standalone instructional purposes.
This compilation is ideal for anyone looking to enhance their understanding of innovation management and drive meaningful change within their organization. Whether you aim to improve product development processes, enhance customer experiences, or drive digital transformation, these frameworks offer valuable insights and tools to help you achieve your goals.
INCLUDED FRAMEWORKS/MODELS:
1. Stanford’s Design Thinking
2. IDEO’s Human-Centered Design
3. Strategyzer’s Business Model Innovation
4. Lean Startup Methodology
5. Agile Innovation Framework
6. Doblin’s Ten Types of Innovation
7. McKinsey’s Three Horizons of Growth
8. Customer Journey Map
9. Christensen’s Disruptive Innovation Theory
10. Blue Ocean Strategy
11. Strategyn’s Jobs-To-Be-Done (JTBD) Framework with Job Map
12. Design Sprint Framework
13. The Double Diamond
14. Lean Six Sigma DMAIC
15. TRIZ Problem-Solving Framework
16. Edward de Bono’s Six Thinking Hats
17. Stage-Gate Model
18. Toyota’s Six Steps of Kaizen
19. Microsoft’s Digital Transformation Framework
20. Design for Six Sigma (DFSS)
To download this presentation, visit:
https://www.oeconsulting.com.sg/training-presentations
SATTA MATKA SATTA FAST RESULT KALYAN TOP MATKA RESULT KALYAN SATTA MATKA FAST RESULT MILAN RATAN RAJDHANI MAIN BAZAR MATKA FAST TIPS RESULT MATKA CHART JODI CHART PANEL CHART FREE FIX GAME SATTAMATKA ! MATKA MOBI SATTA 143 spboss.in TOP NO1 RESULT FULL RATE MATKA ONLINE GAME PLAY BY APP SPBOSS
The Genesis of BriansClub.cm Famous Dark WEb PlatformSabaaSudozai
BriansClub.cm, a famous platform on the dark web, has become one of the most infamous carding marketplaces, specializing in the sale of stolen credit card data.
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
V. pacáková, d. brebera
1. Abstract—The article presents the techniques suitable for
practical purposes in general insurance and reinsurance and
demonstrates their application. It focuses mainly on modelling and
simulations of claim amounts by means of mathematical and
statistical methods using statistical software products and contains
examples of their applications. Article also emphasizes the
importance and practical use of probability models of individual
claim amounts and simulation of extreme losses in insurance practice.
Keywords—Goodness of fit tests, loss distributions, Pareto
distribution, reinsurance, risk premium, simulation.
I. INTRODUCTION
LTHOGH the empirical distribution functions can be
useful tools in understanding claims data, there is always
a desire to “fit” a probability distribution with reasonably good
mathematical properties to the claims data.
Therefore this paper involves the steps taken in actuarial
modelling to find a suitable probability distribution for the
claims data and testing for the goodness of fit of the supposed
distribution [1].
A good introduction to the subject of fitting distributions to
losses is given by Hogg and Klugman (1984) and Currie
(1993). Emphasis is on the distribution of single losses related
to claims made against various types of insurance policies.
These models are informative to the company and they enable
it make decisions on amongst other things: premium loading,
expected profits, reserves necessary to ensure (with high
probability) profitability and the impact of reinsurance and
deductibles [1]. View of the importance of probability
modelling of claim amounts for insurance practice several
actuarial book publications dealing with these issues, e.g. [2],
[7], [12].
The conditions under which claims are performed (and data
are collected) allow us to consider the claim amounts in
general insurance branches to be samples from specific, very
often heavy-tailed probability distributions. As a probability
V. Pacáková is with Institute of Mathematics and Quantitative Methods,
Faculty of Economics and Administration, University of Pardubice,
Pardubice, Studentská 84, 532 10 Pardubice, Czech Republic (e-mail:
Viera.Pacakova@upce.cz).
D. Brebera with Institute of Mathematics and Quantitative Methods,
Faculty of Economics and Administration, University of Pardubice,
Pardubice, Studentská 84, 532 10 Pardubice, Czech Republic (e-mail:
David.Brebera@upce.cz).
models for claim sizes we will understand probability models
of the financial losses which can be suffered by individuals
and paid under the contract by non-life insurance companies as
a result of insurable events. Distributions used to model these
costs are often called “loss distributions” [7]. Such
distributions are positively skewed and very often they have
relatively high probabilities in the right-hand tails. So they are
described as long tailed or heavy tailed distributions.
The distributions used in this article include gamma,
Weibull, lognormal and Pareto, which are particularly
appropriate for modelling of insurance losses. The Pareto
distribution is often used as a model for claim amounts needed
to obtain well-fitted tails. This distribution plays a central role
in this matter and an important role in quotation in non-
proportional reinsurance [15].
II. CLAIM AMOUNTS MODELLING PROCESS
We will concern with modelling claim amounts by fitting
probability distributions from selected families to set on
observed claim sizes. This modeling process will be aided by
the STATGRAPHICS Centurion XV statistical analytical
system.
Steps of modelling process follow as below:
1. We will assume that the claims arise as realizations from
a certain family of distributions after an exploratory
analysis and graphical techniques.
2. We will estimate the parameters of the selected parametric
distribution using maximum likelihood method based the
claim amount records.
3. We will test whether the selected distribution provides an
adequate fit to the data using Kolmogorov-Smirnov,
Anderson-Darling or χ2
test.
A. Selecting Loss Distribution
Most data in general insurance are skewed to the right and
therefore most distributions that exhibit this characteristic can
be used to model the claim amounts. For this article the choice
of the loss distributions was with regard to prior knowledge
and experience in curve fitting, availability of computer
software and exploratory descriptive analysis of the data to
obtain its key features. This involved finding the mean,
median, standard deviation, coefficient of variance, skewness
and kurtosis. This was done using Statgraphics Centurion XV
package.
The Distribution Fitting procedure of this software fits any
Loss Distributions and Simulations in General
Insurance and Reinsurance
V. Pacáková, D. Brebera
A
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 159
2. of 45 probability distributions (7 for discrete and 38 for
continuous random variables) to a column of numeric data that
represents random sample from the selected distribution.
Distributions selected for our analysis are defined in
Statgraphics Centurion as follow [7].
Gamma Distribution
Probability density function (PDF) is in the form
( )
1
( ) x
f x x e λ
α
α− −
=
Γ α
λ
, 0
x > (1)
with parameters: shape 0
α > and scale 0.
λ >
Lognormal Distribution
Probability density function (PDF) ) is in the form
( )2
2
ln
2
1
( )
2
x
f x e
x
−µ
−
σ
=
σ π
, 0
x > (2)
with parameters: location μ, scale 0.
σ >
Weibull Distribution
Probability density function (PDF) ) is in the form
( )
1
( ) ,
x
f x x e
α
α
− β
α−
α
=
β
0
x > (3)
with parameters: shape 0
α > and scale 0.
β >
A good tool to select a distribution for a set of data in
Statgraphics Centurion is procedure Density Trace. This
procedure provides a nonparametric estimate of the probability
density function of the population from which the data were
sampled. It is created by counting the number of observations
that fall within a window of fixed width moved across the
range of the data. The estimated density function is given by
1
1
( )
n
i
i
x x
f x W
hn h
=
−
=
∑ (4)
where h is the width of the window in units of X and W(u) is a
weighting function. Two forms of weighting function are
offered: Boxcar function and Cosine function.
The Cosine function usually gives a smoother result, with
the desirable value of h depending on the size of the data
sample. Therefore in the application we use Cosine function
( )
1 cos 2 0,5
( )
0
u if u
W u
otherwise
+ π <
= (5)
B. Parameters Estimation
We will use the method of Maximum Likelihood (ML) to
estimate the parameters of the selected loss distributions. This
method can be applied in a very wide variety of situations and
the estimated obtained using ML generally have very good
properties compared to estimates obtained by other methods
(e. g. method of moments, method of quantile). Estimates are
obtained using ML estimation in procedure Distribution
Fitting in Statgraphics Centurion XV package.
The basis for ML estimation is Maximum Likelihood
Theorem [5]: Let ( )
1 2
, , ..., n
x x x
=
x be a vector of n
independent observations taken from a population with PDF
( )
; ,
f x Θ where ( )
1 2
, , ..., p
′ = Θ Θ Θ
Θ is a vector of p
unknown parameters. Define the likelihood function ( )
;
L Θ x
by
( ) ( )
1
; ;
n
i
i
L f x
=
= ∏
Θ x Θ (6)
The ML estimate ( )
ˆ ˆ
=
Θ Θ x are that values of Θ which
maximizes ( )
; .
L Θ x
C. Goodness of Fit Tests
Various statistical tests may be used to check the fit of
a proposed model. For all tests, the hypotheses tested are:
H0: The selected distribution provides the correct statistical
model for the claims,
H1: The selected distribution does not provide the correct
statistical model for the claims.
From the seven different tests that offer the procedure
Distribution Fitting of package Statgraphics Centurion XV we
will use the next three [18]:
Chi-Squared test divides the range of X into k intervals and
compares the observed counts Oi (number of data values
observed in interval i) to the number expected given the fitted
distribution Ei (number of data values expected in interval i).
Test statistics is given by
( )
2
2
1
k
i i
i i
O E
E
=
−
χ =
∑ (7)
which is compared to a chi-squared distribution with 1
k p
− −
degrees of freedom, where p is the number of parameters
estimated when fitting the selected distribution.
Kolmogorov-Smirnov test (K-S test) compares the empirical
cumulative distribution of the data to the fitted cumulative
distribution. The test statistic is given by formula
( ) ( )
sup
n n
x
d F x F x
= − (8)
The empirical CDF ( )
x
Fn is expressed as follows:
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 160
3. ( )
( )
( ) ( )
( )
1
1
0
1, 2, ..., 1
1
n j j
n
x x
j
F x x x x j n
n
x x
+
≤
= < ≤ = −
>
(9)
where data are sorted from smallest to largest in sequence
( ) ( ) ( )
1 2
...... .
n
x x x
≤ ≤ ≤
Anderson-Darling test is one of the modifications of K-S
test. The test statistic is a weighted measure of the area
between the empirical and fitted CDF’s. It is calculated
according to:
( ) ( )
( ) ( ) ( )
( )
( )
2 1
2 1 ln 2 1 2 ln 1
n
i i
i
i z n i z
A n
n
=
− ⋅ + + − ⋅ −
=
− −
∑
where ( ) ( )
( ).
n
i i
z F x
=
In all above mentioned goodness of fit tests the small P-
value leads to a rejection of the hypothesis H0.
III. PARETO MODEL OF CLAIM AMOUNTS
Pareto distribution is commonly used to model claim-size
distribution in insurance for its convenient properties.
Pareto random variable X has distribution function
( ) 1
F x
x
α
λ
= −
λ +
(10)
with positive parameters α and λ and density function
( )
( ) 1
+
+
= α
α
λ
αλ
x
x
f . (11)
When X is by Pareto distributed, it is readily determine the
mean
( )
1
−
=
α
λ
X
E for 1
>
α (12)
and variance
( ) ( )
2
2
( )
1 2
D X
αλ
=
α − α −
for 2
>
α . (13)
Then method of moments to estimate parameters α, λ is easy
to apply. To equate the first two population and sample
moments we find estimates
2
2 2
2s
s x
α =
−
( )x
1
~
~
−
= α
λ (14)
The estimates λ
α
~
,
~ obtained by this way tend to have
rather large standard errors, mainly because 2
s could has
a very large variance for high probability of extreme values of
claim amounts. We rather prefer estimates of α and λ using
maximum likelihood method.
We denote as ˆ
ˆ ,
α λ
the maximum likelihood estimates
given data n
x
x
x ...,
,
, 2
1 from the Pareto distribution [12, 13].
Solving equation ( ) 0
=
λ
f using the initial estimate λ
~
,
where
( )
( )
1
1 1
1
ln 1
n
i
i
n n
i i
i
i i
x n
f A B
x x
x
=
= =
λ +
λ = − = −
+
λ λ + λ
∑
∑ ∑
(15)
we obtain λ̂ . Substituting λ̂ in A or B we find ˆ.
α
IV. PARETO MODEL IN REINSURANCE
Modelling of the tail of the loss distributions in general
insurance is one of the problem areas, where obtaining a good
fit to the extreme tails is of major importance. Thus is of
particular relevance in non-proportional reinsurance if we are
required to choose or price a high-excess layer [13].
The Pareto model is often used to estimate risk premiums
for excess of loss treaties with high deductibles, where loss
experience is insufficient and could therefore be misleading.
This model is likely to remain the most important
mathematical model for calculating excess of loss premiums
for some years to come [19].
The Pareto distribution function of the losses Xa that exceed
known deductible a is [4], [15].
( ) 1 ,
b
a
a
F x x a
x
=
− ≥
(16)
The density function can be written
1
( ) ,
b
a b
b a
f x x a
x +
⋅
= ≥ (17)
Through this paper we will assume that the lower limit a is
known as very often will be the case in practice when the
reinsurer receives information about all losses exceeding
a certain limit.
The parameter b is the Pareto parameter and we need it
estimate. Let us consider the single losses in a given portfolio
during a given period, usually one year. As we want to
calculate premiums for XL treaties, we may limit our attention
to the losses above a certain amount, the “observation point”
OP. Of course, the OP must be lower than the deductible of
the layer for which we wish to calculate the premium [4], [15].
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 161
4. Let losses above this OP ,1 ,2 ,
, ,...,
OP OP OP n
X X X be independent
identically Pareto distributed random variables with
distribution function
( ) 1 ,
b
OP
OP
F x x OP
x
=
− ≥
(18)
The maximum likelihood estimation of Pareto parameter b
is given by formula [4].
,
1
ln
n
OP i
i
n
X
OP
=
∑
(19)
The Pareto distribution expressed by (16) is part of the
Distribution Fitting procedure in Statgraphics Centurion XV
package. This allows us to use the Pareto distribution to
calculate the reinsurance risk premium. Risk premiums are
usually calculated using the following equation:
risk premium = expected frequency × expected loss
The expected frequency is the average number of losses
paid by reinsurer per year. For a given portfolio we should set
OP low enough to have a sufficient number of losses to give a
reasonable estimation of the frequency LF(OP).
If the frequency at the observation point OP is known than it
is possible to estimate the unknown frequency of losses
exceeding any given high deductible a as
( ) ( ) ( ) ( )
b
OP
OP
LF a LF OP P X a LF OP
a
= ⋅ 〉
= ⋅
(20)
The reinsurance risk premium RP can now be calculated as
follows:
( )
RP LF a EXL
= ⋅ (21)
where
( ) , >1
1
a
a b
EXL E X b
b
⋅
= =
−
(22)
V. SIMULATIONS USING QUANTILE FUNCTION
The Quantile Function, QF, denoted by ( ),
Q p expresses
the p-quantile p
x as a function of p: ( )
p
x Q p
= , the value of
x for which ( ) ( ).
p p
p P X x F x
= ≤ =
The definitions of the QF and the CDF can by written for
any pairs of values ( )
,
x p as ( )
x Q p
= and ( ).
p F x
= These
functions are simple inverses of each other, provided that they
are both continuous increasing functions. Thus, we can also
write ( ) ( )
1
,
Q p F p
−
= and ( ) ( )
1
.
F x Q x
−
= For sample data,
the plot of ( )
Q p corresponds to the plot of x against p [6], [9].
We denoted a set of ordered sampling data of losses by
( ) ( ) ( ) ( ) ( )
1 2 1
, ,..., ,..., ,
r n n
x x x x x
− .
The corresponding random variables are being denoted by
( ) ( ) ( ) ( ) ( )
1 2 1
, ,..., ,..., , .
r n n
X X X X X
−
Thus ( )
n
X for example is the random variable representing
the largest observation of the sample of n. The n random
variables are referred as the n order statistics. These statistics
play a major role in modelling with quantile distribution
( ).
Q p
Consider first the distribution of the largest observations on
( )
n
X with distribution function denoted ( ) ( ) ( )
.
n n
F x p
= The
probability
( ) ( ) ( ) ( )
( )
n n n
F x p P X x
= = ≤
is also probability that all n independent observations on X are
less than or equal to this value x, which for each one is p. By
the multiplication law of probability
( )
n
n
p p
= so ( )
1 n
n
p p
= and ( ) ( )
1
.
n
n
F x p p
= =
Inverting ( )
F x to get the quantile function we have
( ) ( )
( ) ( )
( )
1 n
n n n
Q p Q p
= (23)
For the general r-th order statistic ( )
r
X calculation becomes
more difficult. The probability that the r-th larges observations
is less than some value z is equal
( ) ( ) ( ) ( )
( )
r r r
p F z P X z
= = ≤
This is also probability that at least r of the n independent
observations is less or equal to z. The probability of s
observations being less than or equal to z is ,
s
p where
( )
z
F
p = is given by the binomial expression
( ) ( )( )
1
n s
s
n
P s observations z p p
s
−
≤
= −
and
( ) ( )( )
1
n
n s
s
r
s r
n
p p p
s
−
=
= −
∑
If it can be inverted, then we can write
( )
( )
, , 1 .
r
p BETAINV p r n r
= − +
From the last two expressions we get
( ) ( )
( ) ( )
( )
( )
, , 1
r r r
Q p Q BETAINV p r n r
= − + (24)
BETAINV ( )
⋅ is a standard function in packages such
as Excel. Thus, the quantiles of the order statistics can be
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 162
5. evaluated directly from the distribution ( )
p
Q of the data. The
quantile function thus provides the natural way to simulate
values for those distributions for which it is an explicit
function of p [13].
VI. SIMULATION OF EXTREME LOSSES
In a number of applications of quantile functions in general
insurance interest focuses particularly on the extreme
observations in the tails of the data. Fortunately it is possible
to simulate the observations in one tail without simulating
the central values. We will present here how to do this.
Consider the right-hand tail. The distribution of the largest
observation has been shown to be ( )
1
.
n
Q p Thus the largest
observation can be simulated as ( ) ( )
( )
n n
x Q u
= , where ( )
1 n
n
n
u v
=
and n
v is a random number from interval [0, 1]. If we now
generate a set of transformed variables by
( )
1 n
n
n
u v
=
( ) ( ) ( )
1
1
1
1
n
n
n n
u v u
−
−
−
= ⋅ (25)
( ) ( ) ( )
1
2
1
2
2 −
−
−
− ⋅
= n
n
n
n u
v
u
where i
v , , 1, 2, ...
i n n n
= − − is simply simulated set of
independent random uniform variables, not ordered in any
way. It will be seen from their definitions that ( )
i
u ,
, 1, 2, ...
i n n n
= − − form a decreasing series of values with
( ) ( )
1
i i
u u
−
< .
In fact, values ( )
i
u form an ordering sequence from
a uniform distribution. Notice that once ( )
n
u is obtained, the
relations have the general form
( ) ( ) ( )
1
1
m
m
m m
u v u +
= ⋅ , 1, 2, ...
m n n
= − −
The order statistics for the largest observations on X are
then simulated by
( ) ( )
( )
n n
x Q u
=
( ) ( )
( )
1 1
n n
x Q u
− −
= (26)
( ) ( )
( )
2 2
n n
x Q u
− −
=
In most simulation studies of n observations are generated
and the sample analyses m times to give an overall view of
their behavior. A technique that is sometimes used as an
alternative to such simulation is to use a simple of ideal
observations, sometimes called a profile. Such a set of ideal
observations could be the medians r
M , 1, 2, ..., .
r n
= .
VII. APPLICATION OF THE THEORETICAL RESULTS
A. Data
Practical application of theoretical results mentioned in
previous chapters we will performed based on data obtained
from unnamed Czech insurance company. We will use the data
set contains 745 claim amounts (in thousands of Czech crowns
- CZK) from the portfolio of 26 125 policyholders in
compulsory motor third-party liability insurance. In the article
[16] we considered 1352 claim amounts, including the reserve
estimates. Those estimates caused in the file were high number
of the same estimated values, which distorted results of the
analysis. Data file in this article shall include only actually
paid out claim amounts.
B. Exploratory analysis
We will start by descriptive analysis of sample data of the
variable X, which represents the claim amounts in the whole
portfolio of policies.
Table 1 Summary statistics for X
Count 745
Average 1220,95
Median 739,84
Standard deviation 1521,07
Coeff. of variation 124,581%
Minimum 24,42
Maximum 22820,0
Skewness 5,72352
Kurtosis 60,6561
0 4 8 12 16 20 24
(X 1000,0)
x
Fig. 1 Box-and-Whisker plot of claim amounts data
Tab.1 shows summary statistics for X. These statistics and
Box-and-Whisker plot confirm the skew nature of the claims
data. Also by density trace by (5) for X in Fig. 2 can be
concluded that loss distribution in our case is skew and long or
heavy tailed.
0 4 8 12 16 20 24
(X 1000,0)
x
0
3
6
9
12
15
(X 0,00001)
density
Fig. 2 Density Trace for X
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 163
6. C. Selected loss distributions
The results of exploratory analysis justify us to assume that
gamma, lognormal or Weibull distributions would give
a suitable model for the underlying claims distribution. We
will now start to compare how well different distributions fit to
our claims data. The best way to view the fitted distributions is
through the Frequency Histogram. Fig. 3 shows a histogram of
the data as a set of vertical bars, together with the estimated
probability density functions.
From Fig. 3 it seems that lognormal distribution follows the
data best. It is hard to compare the tail fit, but clearly the all
distributions have discrepancies at middle claims intervals.
0 0,2 0,4 0,6 0,8 1
(X 10000,0)
x
0
200
400
600
800
frequency
Distribution
Gamma
Lognormal
Weibull
Fig. 3 Histogram and estimated loss distributions
0 4 8 12 16 20 24
(X 1000,0)
Lognormal distribution
0
4
8
12
16
20
24
(X 1000,0)
x
Distribution
Gamma
Lognormal
Weibull
Fig. 4 Quantile-Quantile plot of selected distributions
The Quantile-Quantile (Q-Q) plot shows the fraction of
observations at or below X plotted versus the equivalent
percentiles of the fitted distributions. In Fig. 4 the fitted
lognormal distribution has been used to define the X-axis and
is represented by the diagonal line. The fact that the points lay
the most close to the diagonal line confirms the fact that the
lognormal distribution provides the best model for the data in
comparison with other two distributions. Gamma and
lognormal distributions deviates away from the data at higher
values of X, greater than 4000 CZK of X. Evidently, the tails
of these distributions are not fat enough.
Despite the adverse graphic results we will test whether the
selected distributions fit the data adequately by using
Goodness-of-Fit Tests of Statgraphics Centurion XV.
The ML parameters estimation of the fitted distributions by
minimizing (6) is shown in Table 2.
Table 2 Estimated parameters of the fitted distributions
Gamma Lognormal Weibull
shape = 1,30953 mean = 1201,77 shape = 1,0647
scale = 0,001073 standard deviation = 1360,07 scale = 1257,15
Log scale: mean = 6,67928
Log scale: std. dev. = 0,9080
Table 3 Anderson-Darling Goodness-of-Fit Tests for X
Gamma Lognormal Weibull
A^2 13,4802 1,58638 15,1049
Modified Form 13,4802 1,58638 15,1049
P-Value <0.01 >=0.10 <0.01
Table 3 shows the results of tests run to determine whether
X can be adequately modelled by gamma, lognormal or
Weibull distributions. P-values less than 0,01 would indicate
that X does not come from the selected distributions with 99%
confidence.
Table 4 shows the results of chi-squared test by (7) run to
determine whether X can be adequately modelled by lognormal
distribution with parameters estimated by ML. Since the
smallest p-value = 0,520495 amongst the tests performed is
greater than or equal to 0,05, we cannot reject the hypothesis
that x comes from a lognormal distribution with 95%
confidence, contrary to the result of the chi-squared test in
article [16].
Table 4 Chi-Squared test with lognormal distribution
Upper
Limit
Observed
Frequency
Expected
Frequency
Chi-
Squared
at or below 1000,0 466 446,50 0,85
2000,0 165 182,98 1,77
3000,0 53 61,93 1,29
4000,0 28 25,53 0,24
5000,0 14 12,06 0,31
6000,0 8 6,29 0,47
7000,0 4 3,52 0,07
8000,0 2 2,09 0,00
above 8000,0 5 4,11 0,19
Chi-Squared = 5,18357 with 6 d.f. P-Value = 0,520495
Figure 5 and Tab. 4 lead to the conclusion that the fitted
lognormal model undervalues the losses which exceed the
4000 CZK.. We need to find a distribution with rather more
weight in upper tail than the lognormal distribution provides.
0 4 8 12 16 20 24
(X1000,0)
Lognormal distribution
0
4
8
12
16
20
24
(X1000,0)
x
Distribution
Lognormal
Fig. 5 Quantile-Quantile plot for lognormal distribution of X
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 164
7. 4 8 12 16 20 24
(X 1000,0)
2-parameter Pareto distribution
0
4
8
12
16
20
24
(X 1000,0)
X4000
Distribution
Pareto (2-Parameter)
Fig. 6 Quantile-Quantile plot for Pareto distribution of X4000
In the Fig. 6 we can see that the Pareto model (16) for the
variable X4000 gives an better fit and is a considerable
improvement over lognormal model. Pareto distribution has
been used to define the X-axis at Fig. 6. The fact that the
points lay close to the diagonal line confirms the fact that this
distribution provides a good model for the claim amounts data
above 4 million CZK and we can assume that a good model for
losses above 4 million CZK can be Pareto distribution with
PDF expressed by the formula (17).
Table 5 Estimated parameters by ML
Pareto (2-Parameter)
shape = 2,80078
lower threshold = 4000,0
Table 6 K-S goodness-of-fit tests for X4000
Pareto (2-Parameter)
DPLUS 0,0985791
DMINUS 0,0758859
DN 0,0985791
P-Value 0,895778
There are 34 values ranging from 4000 to 22 820 thousand
CZK. Table 5 and Table 6 show the results of fitting a 2-
parameters Pareto distribution (16) to the data on X4000. The
estimated parameters of the fitted distribution using maximum
likelihood method are shown in Table 5. The results of K-S
test whether the 2-parameter Pareto distribution fits the data
adequately is shown in Table 6. Since the p-value = 0,895778
is greater than 0,05, we cannot reject the null hypothesis that
sampling values of the variable X4000 comes from a 2-
parameters Pareto distribution with 95% confidence.
We check yet hypothesis that the distribution with a good fit
on the claim amounts empirical data is a Pareto distribution in
the form (5). Since this probability model is not part of the
offer distributions for Distribution Fitting procedure of
Statgraphics Centurion XV package, we will perform a χ2
goodness of fit test on Pareto distribution in the form (16) in
the usual way using Excel spreadsheet.
We suppose the random variable X, with values of the claim
amounts in the whole portfolio of policies, is Pareto distributed
with distribution function (16). Using the method of moments
to estimate the parameters α, λ of a Pareto distribution to solve
the equations (14) we find estimators by method of moments
that are 5,6229,
α =
5644,402.
λ =
Of course, maximum
likelihood estimators we have preferred. We will obtain
estimates of parameters α and λ using more efficient maximum
likelihood method. Using the initial estimate 5644,402
λ =
we
find non-linear equation (15) as ( )
5644,402 0,03533
f = and
using tool Solver of Excel we find ( ) 10
14745,8 8,2746 10 ,
f −
= ⋅
hence the maximum likelihood estimator
ˆ 14745,8.
λ = Substituting λ̂ into A or B in equation (15) we
find α̂ = 13,19543.
Now we can check whether the Pareto distribution with
maximum likelihood estimators provides an adequate fit to the
data using 2
χ goodness-of-fit test. The 2
χ statistic is
computed by (7). The result of goodness of fit test presents
tab. 7.
Table 7 Chi-Squared test for Pareto distribution
Upper
Limit
Observed
Frequency
Expected
Frequency
Chi-
Squared
at or below 1000 466 431,5745 2,746026
2000 165 174,3427 0,500663
3000 53 74,38566 6,148315
4000 28 33,31204 0,847073
5000 14 15,5765 0,159557
6000 8 7,571378 0,024265
7000 4 3,811341 0,009338
above 7000 7 4,425835 1,497192
Total 11,93243
By comparing the calculated value 2
11,93243
χ = with
quantile 2
0,95 11,0705
χ = on 5 degrees of freedom (since there are
two estimated parameters), we have to reject null hypothesis
that the Pareto distribution with parameters estimated by
maximum likelihood provides an adequate fit to the data.
Paradoxically, if we repeat 2
χ goodness-of-fit test for
Pareto model with parameters estimated by method of
moments, we get result 2 2
0,95
4,69 11,0705.
χ
= <χ=
Therefore Pareto distribution define by (10) with parameters
5,6229,
α =
5644,402
λ =
gives an excellent fit and is a
considerable improvement over lognormal and Pareto model
with parameters estimated by the maximum likelihood method.
D. Using results in non-proportional reinsurance
Let's now suppose the insurance company wants to reduce
technical risk by non-proportional XL reinsurance with
priority (deductible) a = 8 000 (thousand CZK). We will use
the Pareto distribution for variable X4000 with parameters from
Table 5 to determine reinsurance risk premium by (21). As OP
we put value 4000. To calculate LF(a) by (20) we need to
know ( )
OP
P X a
〉 . We can use Tail Areas pane for the fitted 2-
parameters Pareto distribution of the Statgraphics Centurion
XV package. The software will calculate the tail areas for up
to 5 critical values, which we may specify. The output
indicates that the probability of obtaining a value above 8 000
for the fitted 2-parameter Pareto distribution of X4000 is
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 165
8. 0,143509, as we can see in Table 8. The value of LF(OP) we
can estimate by relative frequency of the losses above 4000:
33
( ) =
74
,0 43
5
0 4
LF OP =
Table 8 Tail Area for X4000 Pareto (2-Parameter) distribution
X Lower Tail Area (<) Upper Tail Area (>)
8000,0 0,856491 0,143509
Then by (20) we get
( ) ( ) ( ) 0,0443 0,143509 0,006357
OP
LF a LF OP P X a
= ⋅ 〉 = ⋅ =
So that we can use the formulas (21) and (22) to calculate
the reinsurance premium RP we will fit the 6 values ranging
from 8 000,0 to 22 820,0 of the variable X8000 by Pareto (2-
Parameters) distribution using Statgraphics Centurion XV
Goodnes of fit procedure. The results are listed in the Table 9
and Table 10.
Table 9 Estimated Pareto parameters by ML
Pareto (2-Parameter)
shape = 3,74093
lower threshold = 8000,0
Table 10 K-S goodness-of-fit tests for X8000
Pareto (2-Parameter)
DPLUS 0,384835
DMINUS 0,14685
DN 0,384835
P-Value 0,339047
Table 10 shows the results of tests run to determine whether
X8000 can be adequately modelled by a 2-parameter Pareto
distribution (16) with ML estimated parameters from Table 9.
Since the p-value = 0,339047 is greater than 0,05, we cannot
reject the null hypothesis that values of X8000 come from the 2-
parameter Pareto distribution with 95% confidence.
By the values of estimated parameters in Table 9 we can
calculate
( )
8000 3,74093
10918,72
1 2,74093
a
a b
E X
b
⋅ ⋅
= = =
−
Then by formula (21) we get reinsurance premium in
thousand CZK:
( ) 0,006357 1091 69,41.
8,72
RP LF a EXL
= ⋅ = ⋅ =
E. Simulation of extreme losses
By the result in part VI-C Pareto model define by (10) with
parameters 5,6229,
α =
5644,402
λ =
fit well to the claim
amounts data. Pareto quantile function ( )
Q p or the inverse of
the Pareto distribution function (10) has the form
( ) ( ) ( )
1
1
1 1
Q p F x p
− α
−
= =
λ − −
, 0 1.
p
< < (27)
That is in our case
( ) ( ) ( )
1 5,6229
1
5644,402 1 1
Q p F x p
−
−
= = − −
(28)
We use results of parts V. and VI. to simulate 10 the largest
claim amounts by (28) in case of number 1000 claim amounts.
Table 11 contains the results of simulation step by step by part
VI using formulas (25) and (26).
Table 11 Steps of simulation of 10 the largest claim amounts
ν n u Q(u) Q(BETAINV(0,5))
0,235493 1000 0,99855 12 415,58 14 937,50
0,331321 999 0,99745 10 682,07 11 942,23
0,743843 998 0,99716 10 366,40 10 544,10
0,993465 997 0,99715 10 359,85 9 656,27
0,493922 996 0,99644 9 742,37 9 015,37
0,997123 995 0,99644 9 740,15 8 518,55
0,665588 994 0,99603 9 446,11 8 115,47
0,503882 993 0,99535 9 023,41 7 777,94
0,943984 992 0,99529 8 991,23 7 488,61
0,761040 991 0,99501 8 844,77 7 236,14
Fig. 7 Graphical result of simulation of 10 the largest claim amounts
On the Fig. 7 we can see simulated 10 values of ( )
x Q u
=
and the quantiles 0,5
x , 0,995
,
x 0,005
x of the order statistics
( )
1000
,
X ( )
999
,
X ... , ( )
991
.
X Quantiles 0,005
x and 0,995
x give the
bounds for 10 largest values which the Pareto distributed claim
amounts would exceed with probability only 0,01.
Simulation of p the largest claim amounts in non-life
insurance portfolio is useful in non-proportional reinsurance of
the types of LCR(p), when insurance company cedes p the
largest claim amounts to reinsurer, and ECOMOR, when
reinsurance company pay losses that exceed p-th largest value
in decreasing sequence of the claim amounts [4].
VIII. CONCLUSION
Probability models of claim amounts create the basis for
solving of many substantial problems in general insurance
practice. When trying to fit a distribution to claims data using
traditional classic methods and classic loss distribution such as
gamma, lognormal, Weibull and Pareto, as in this article, there
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 166
9. are many situations where classical parametric distributions
may not be appropriate to model claims. Mixture distributions
[10], [20] or quantile models [9] might be good tools in such
cases.
In the article we mentioned more examples of using
probability models of claim amounts in general insurance. We
consider should be emphasized that the individual amount is
one of the components of the total claim amount [14]. The
total amount of claims in a particular time period is a
fundamental importance to the proper management of an
insurance company [3], [11], [17]. The key assumption in
various models for this total or aggregate claim amount is that
we know distribution describing the claim amounts together
with the model for events occurring in time.
REFERENCES
[1] O. M. Achieng, Actuarial Modeling for Insurance Claim Severity in
Motor Comprehensive Policy Using Industrial Statistical Distributions.
[Online]. Available:
http://www.actuaries.org/EVENTS/Congresses/Cape_Town/Papers/Non-
Life%20Insurance%20(ASTIN)/22_final%20paper_Oyugi.pdf
[2] P. J Boland, Statistical and Probabilistic Methods in Actuarial Science.
London: Chapman&Hall/CRC, 2007.
[3] A. Brdar Turk, A Quantitative Operational Risk Management Model,
Wseas Transactions on Business and Economics, Issue 5, Volume 6,
2009, pp. 241-253.
[4] T. Cipra, Reinsurance and Risk Transfer in Insurance (Zajištění
a přenos rizik v pojišťovnictví). Praha: Grada Publishing, 2004, ch. 11
[5] I. D. Currie, Loss Distributions. London and Edinburgh: Institute of
Actuaries and Faculty of Actuaries, 1993.
[6] W. G. Gilchrist, Statistical Modelling with Quantile Functions,
Chapman & Hall/CRC, London 2000.
[7] R. J. Gray, S. M. Pitts, Riska Modelling in General Insurance.
Cambridge University Press, 2012, ch. 2.
[8] R. V. Hogg, S. A. Klugman, Loss Distributions. New York: John Wiley
& Sons, 1984.
[9] P. Jindrová, Ľ. Sipková, Statistical Tools for Modeling Claim Severity.
In: European Financial Systems 2014. Proceedings of the 11th
International Scientific Conference. Lednice, June 12-13, 2014. Brno:
Masaryk University, 2014, pp. 288-294.
[10] R. Kaas, M. Goovaerts, J. Dhaene, M. Denuit, Modern Actuarial Risk
Theory. Boston: Kluwer Academic Publishers, 2001.
[11] M. Käärik, M. Umbleja, On claim size fitting and rough estimation of
risk premiums based on Estonian traffic insurance example,
International Journal of Mathematical Models and Methods in Applied
Sciences, Issue 1, Volume 5, 2011, pp. 17-24.
[12] V. Pacáková, Applied Insurance Statistics (Aplikovaná poistná
štatistika). Bratislava: Iura Edition, 2004.
[13] V. Pacáková, B. Linda, Simulations of Extreme Losses in Non-Life
Insurance. E+M Economics and Management, ročník XII, 4/2009.
[14] V. Pacáková, Modelling and Simulation in Non-life Insurance.
Proceedings of the 5th International Conference on Applied
Mathematics, Simulation, Modelling (ASM’11), Corfu Island, Greece,
July 14-16, 2011, Published by WSEAS Press, p. 129-133.
[15] V. Pacáková, J. Gogola, Pareto Distribution in Insurance and
Reinsurance. Conference proceedings from 9th international scientific
conference Financial Management of Firms and Financial Institutions,
VŠB Ostrava, 2013. pp. 298-306.
[16] V. Pacáková, D. Brebera, Loss Distributions in Insurance Risk
management. In: Recent Advances on Economics and Business
Administration, Proceedings of the International Conference on
Economics and Statistics (ES 2015), INASE Conference, Vienna,
March 15-17, 2015. pp. 17-22.
[17] Přečková, L., Asymmetry of information during the application of the
model for valuation the sum insured in case of business interruption in
the Czech Republic, International Journal of Mathematical Models
and Methods in Applied Sciences, Issue 1, Volume 5, 2011, pp. 212-
219.
[18] Probability Distributions, On-line Manuals, StatPoint, Inc., 2005.
[19] H. Schmitter, Estimating property excess of loss risk premiums by
means of Pareto model, Swiss Re, Zürich 1997. [Online]. Available
http://www.kochpublishing.ch/data/2000_pareto_0007.pdf
[20] Y. K. Tse, Nonlife Actuarial Models. Cambridge: University
Press, 2009.
Prof. RNDr. Viera Pacáková, Ph.D. graduated in Econometrics (1970) at
Comenius University in Bratislava, 1978 - RNDr. in Probability and
Mathematical Statistics at the same university, degree Ph.D at University
of Economics in Bratislava in 1986, associate prof. in Quantitative Methods
in Economics in 1998 and professor in Econometrics and Operation Research
at University of Economics in Bratislava in 2006. She was working at
Department of Statistics Faculty of Economic Informatics, University of
Economics in Bratislava since 1970 to January 2011. At the present she has
been working at Faculty of Economics and Administration in Pardubice since
2005. She has been concentrated on actuarial science and management of
financial risks since 1994 in connection with the actuarial education in the
Slovak and Czech Republic and she has been achieved considerable results in
this area.
Mgr. David Brebera graduated in Mathematics in 1999 at Mathematics and
Physics Faculty of Charles University. He was working as the senior actuary
in Pojistovna Ceske sporitelny, a. s. (Insurance Company of the Czech
Savings Bank), since 1999 to 2006. At the present he has been working at
Faculty of Economics and Administration in Pardubice since 2005. He has
been concentrated on statistics, actuarial science and management of financial
risks.
INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTERS IN SIMULATION Volume 9, 2015
ISSN: 1998-0159 167