AIG Performance Classification

Agency Performance Prediction
Miraj Vashi
11-Dec-2016

Agency Performance Prediction | 2
Contents
Contents
 Business Case
 Insights Required & Business Benefit
 A Bit About Domain…
 Data Pre-Processing
 Modelling
 Approach
 Evaluation Metric
 Outcome
 Best Model Comparison
 Model Interpretation & Key Challenges

Business Case
Azure Insurance Group is operating property and casualty (P&C) insurance, life
insurance and insurance brokerage companies. Azure sells the policies through
direct & indirect sales channel. For indirect selling, Azure has tie up with 1600+
agencies across 6 states. Azure is interested in classifying existing agencies into
predefined performance categories in a supervised predictive framework &
based on agencies past performance. Specifically Azure expects to better
understand which agencies are likely to bring more growth in Personal Line (PL)
of Business

Insights Required & Business Benefits
What Insights Are Required?
Classify each agency into one of the following categories
– GROW: Business from the agency is likely to grow > 5% in 2014
– STABLE: Business from the agency is likely to stay flat with growth in the range [-5%,5%] in 2014
– LOSS: Business form the agency is likely to shrink > 5% (< -5% growth) in 2014
Note: Business growth is measured in terms of %growth in Average Monthly Written Premium Amount achieved by the Agency for a given year
Potential Business Benefits
• Improved understanding of Agency Performance - at a micro level & macro level
– How is an individual agency is likely to perform?
– How are all agencies in a state are likely to perform?
• Optimized utilization of Agency Development Funds

A Bit About Domain…
What is Insurance?
• Risk Management Tool for the
customer (individual/business)
allowing him to transfer the risk
of financial loss to the insurance
company
• In exchange for a constant
stream of premiums, insurance
companies offer to pay
consumers a sum of money
upon the occurrence of a
predetermined event, such as a
natural catastrophe, a car crash
or death etc..
• Broadly, from a business
perspective - insurance is
classified as: Life OR Non-Life
(General)
Insurance
Life
Insurance
General
Insurance
Property & Casualty
Insurance
Medical
Insurance
Motor Vehicle
Insurance
Marine
Insurance
Fire
Insurance
Homeowner’s
Insurance
Insurance Type

Data Preprocessing
What Data Was Provided By Azure?
• 213K+ observations with 49 dimensions
• Each observation representing yearly aggregated data for an Agency >> for a Year >> for a state >> for a
product
• Key attribute summary:
– 1624 agencies
– 11 years of time duration (2005-2015)
– 6 states
– 29 products
– 2 product lines
• No target class in the data !
Attribute Analysis
Each input attribute was assessed from 3 different angles:
• Business meaning: What does it mean?
• Domain Expertise Based Predictive Importance: Can it help in predicting agency performance?
• Sparsity: Does it have enough values?

Data Preprocessing (Cont…)
Key Preprocessing Challenges:
SR# Challenge Category Challenge Resolution
1 Missing Values 1. Identified and dropped highly sparse attributes
2. Missing values encoded as "99999", "Unknown" were converted to NA during file read in R
2 Unwanted Data 1. Agencies, appointed as late as 2014 & for which 2014 growth rate can not be calculated - were
removed
2. Scope of analysis is "Personal Line (PL)" data, hence, Commercial Line (CL) data was filtered out
3 Unavailable Data New attributes created for all Quantity and Revenue attributes to average them over the # of months
data is available
4 Incomplete Data 2005 and 2015 data removed as data were available only for 8 & 5 months respectively
5 Repeating Data Agency specific attributes were detached from raw data, processed separately and later merge with
main data
6 Format Of Data For
Modelling
1. All Quantity attributes & revenue attributes were aggregated based on AGENCY_ID and YEAR
2. Each important attribute was expanded with AGENCY_ID in row and Year identifier in column
E.g. WrittenPremAmount column was converted to 2006_WrittenPremAmount,
2007_WrittenPremAmount....
7 No Target Class
Present
1. Lag variable was created for Written Premium Amount
2. Growth Rate for each agency for all years (2006-20014) was calculated
3. Each agency was assigned a class label based on 2014 growth rate:
• GROW class := 2014 growth rate > 5%
• STABLE class := 2014 growth rate in the range [-5%, 5%]
• LOSS class := 2014 growth rate < -5%

Modelling - Approach
• Important features were identified using Boruta package (11 attributes dropped)
• As this is a classification problem, following algorithms were used:
– CART
– C5.0
– Random Forest
– K Nearest Neighbours
– Artificial Neural Network
– Support Vector Machine
– GBM
– Ensemble-Stacking
• Many algorithms were tried on three flavours of data:
– ASIS Data
– ASIS Data + Range transformation
– ASIS Data + Range transformation + Important Features
• 10-fold cross-validation (3x-10x repeated) was performed to get an initial best-estimate of
hyper parameters ("caret" package)
• One or more round of grid search was used to fine tune the hyper-parameter values ("caret"
package)
• Cost-sensitive learning was used in CART and SVM

Modelling – Evaluation Metric
• Interesting Insight:
– Only ~40% of the agencies achieved >0% growth in 2014
– Of the 40%, Only ~50% of the agencies grew > 5%. Same is reflected in the
2014 growth class distribution:
• Azure is interested in identifying agencies in GROW class as accurately as
possible
GROW STABLE LOSS
21% 37% 42%
• Model Evaluation Metric:
– Higher Recall For GROW class AND
– Optimal F1 to balance Recall-Precision tradeoff

Modelling - Outcome

Modelling – Best Model Comparison
Best Model Vs. Baseline Model:
• In the absence of a model OR as a baseline model, the best estimate of 2014 Performance
Class is MODE of 2014 Performance Class attribute.
• Baseline model would predict "LOSS" class for all agencies as, with 42% observations, "LOSS"
is the highest occurring class
Model Metric Baseline Predictive Model Best Predictive Model
GROWTH Class – Recall 0 0.80
GROWTH Class – Precision 0 0.35
GROWTH Class – F1 NA 0.49
Overall Accuracy 41.78% 49.20%

Modelling – Model Interpretation & Key Challenges
Model Interpretation
• If an agency is likely to grow > 5% in 2014:
– Best Predictive Model is able to accurately label it as "GROW" in 4/5 cases
• If the Best Predictive Model has labeled an agency as "GROW":
– In 1/3 cases the agency will actually grow > 5% in 2014
– In 2/3 cases the agency will stay STABLE or LOSS in 2014
Key Challenges:
• GROW Class is a minority class in the data. The class distribution is imbalanced & is skewed toward
"LOSS" class
• For majority of algorithms, the learning is skewed toward learning LOSS class correctly - something that
Azure is not interested in
• The data has lot of variance. Difficult to get Test Data truly representative of Train data !
• There is "not enough" data to overcome class-imbalance and variance in the data

AIG Performance Classification

AIG Performance Classification

Recommended

Recommended

More Related Content

Similar to AIG Performance Classification

Similar to AIG Performance Classification (20)

Recently uploaded

Recently uploaded (20)

AIG Performance Classification