A Simple yet Efficient Method for a Credit Card Upselling Prediction

•

0 likes•485 views

romovpa

Solution of ECML/PKDD 2016 Discovery Challenge Task 2, “Upselling prediction”, team “Peter”

Science

A Simple yet Eﬃcient Method for a
Credit Card Upselling Predic;on
Peter Romov

The ECML/PKDD Discovery Challenge 2016
on Bank Card Usage Analysis
Andrey Zimovnov Evgeny Sokolov
“Ya” team
Solu;on of ECML/PKDD 2016 Discovery Challenge
Task 2 “Upselling predic;on”, team “Peter”

Approach
User informa;on
User transac;ons
Feature vector:
•  Personal (15)
•  Cards & Wealth (8)
•  Ac;veness (8)
•  Event counters (77)
•  Geo (28)
Predict target
(credit card in the future)
using XGBoost

Features: User informa;on
Personal features
•  Gender
•  Age category
•  Loca;on category
•  Income category
One-hot & label encoded
Card & Wealth features
•  Number of months labeled as “1”
•  Number of label changes “0” -> “1”
and vice versa
•  Last month labeled “1”

Features: Event counters
Categorical features in user transac8ons:
1.  Type of ac;vity (point of sale, webshop, branch)
2.  Rounded ;me period (morning, day;me, evening)
3.  Loca;on type (capital, city, village)
4.  Marker category of payment (7 anonymous categories)
5.  Type of card (credit, debit)
6.  Amount of money (low, medium, high)
Two features for each category value:
•  Number of events
integer
•  Ra;o of events with par;cular value
in all user events
ra)onal in [0, 1]

Features: Ac;veness
Ac8ve = at least one transac;on during the
period (days and weeks)

Features:
•  Number of ac;ve / inac;ve periods
•  Ra;o of ac;ve periods

Features: Geo loca;on
Aggrega8on of transac8on geo-features:
•  min, max
•  mean, std
•  percen;les (20%, 50%, 80%)
Geo-features of one transac8on:
•  coordinates
•  distance to the center (Budapest)
•  horizontal angle of loca;on
wrt the center
dist
angle

Tuning learning parameters
•  Valida;on: 10x stra;ﬁed shuﬄe split on learning (90%) and valida;on (10%)
•  Parameters to tune
–  tree depth
–  learning rate
–  number of trees in ensemble
–  scheme of ﬁlling the missing values
–  number of unimportant features to exclude
•  Decision
–  Marginal improvement in valida;on score (about 0.005 with big variance)
–  Biased valida;on scheme (because of year-to-year changes)
–  Final submission: XGBoost model with default learning parameters (Occam’s
Razor principle)
100 trees, max depth = 3, learning rate = 0.1

Feature evalua;on
Feature vector:
•  Personal (15)
•  Cards & Wealth (8)
•  Ac;veness (8)
•  Event counters (77)
•  Geo (28)
Feature group AUC change
aCer removing feature group
AUC
only features from the group
Personal -0.0322 0.6615
Cards & Wealth -0.0137 0.5653
Event counters -0.0019 0.6738
Ac;veness -0.0012 0.6419
Geo loca;on -0.0004 0.6318
Cross-valida;on AUC score: 0.7213 (stra;ﬁed shuﬄe train/test split)

That’s it!
Solu8on source code:
hlps://github.com/romovpa/ecmlpkdd2016-otp-bank-upselling

Recently uploaded

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani

Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha

Orientation, design and principles of polyhousejana861314

Isotopic evidence of long-lived volcanism on IoSérgio Sacani

Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

Nanoparticles synthesis and characterization kaibalyasahoo82800

Engler and Prantl system of classification in plant taxonomyNistarini College, Purulia (W.B) India

DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi

GBSN - Microbiology (Unit 2)Areesha Ahmad

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

Green chemistry and Sustainable development.pptxRajatChauhan518211

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174

Natural Polymer Based NanomaterialsAArockiyaNisha

Recently uploaded (20)

Pests of cotton_Sucking_Pests_Dr.UPR.pdf

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b

Biopesticide (2).pptx .This slides helps to know the different types of biop...

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000

Orientation, design and principles of polyhouse

Isotopic evidence of long-lived volcanism on Io

Broad bean, Lima Bean, Jack bean, Ullucus.pptx

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

Nanoparticles synthesis and characterization

Engler and Prantl system of classification in plant taxonomy

DIFFERENCE IN BACK CROSS AND TEST CROSS

GBSN - Microbiology (Unit 2)

Presentation Vikram Lander by Vedansh Gupta.pptx

Green chemistry and Sustainable development.pptx

Recombinant DNA technology (Immunological screening)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN

Natural Polymer Based Nanomaterials

A Simple yet Efficient Method for a Credit Card Upselling Prediction

1. A Simple yet Eﬃcient Method for a Credit Card Upselling Predic;on Peter Romov The ECML/PKDD Discovery Challenge 2016 on Bank Card Usage Analysis Andrey Zimovnov Evgeny Sokolov “Ya” team Solu;on of ECML/PKDD 2016 Discovery Challenge Task 2 “Upselling predic;on”, team “Peter”

2. Approach User informa;on User transac;ons Feature vector: •  Personal (15) •  Cards & Wealth (8) •  Ac;veness (8) •  Event counters (77) •  Geo (28) Predict target (credit card in the future) using XGBoost

3. Features: User informa;on Personal features •  Gender •  Age category •  Loca;on category •  Income category One-hot & label encoded Card & Wealth features •  Number of months labeled as “1” •  Number of label changes “0” -> “1” and vice versa •  Last month labeled “1”

4. Features: Event counters Categorical features in user transac8ons: 1.  Type of ac;vity (point of sale, webshop, branch) 2.  Rounded ;me period (morning, day;me, evening) 3.  Loca;on type (capital, city, village) 4.  Marker category of payment (7 anonymous categories) 5.  Type of card (credit, debit) 6.  Amount of money (low, medium, high) Two features for each category value: •  Number of events integer •  Ra;o of events with par;cular value in all user events ra)onal in [0, 1]

5. Features: Ac;veness Ac8ve = at least one transac;on during the period (days and weeks) Features: •  Number of ac;ve / inac;ve periods •  Ra;o of ac;ve periods

6. Features: Geo loca;on Aggrega8on of transac8on geo-features: •  min, max •  mean, std •  percen;les (20%, 50%, 80%) Geo-features of one transac8on: •  coordinates •  distance to the center (Budapest) •  horizontal angle of loca;on wrt the center dist angle

7. Tuning learning parameters •  Valida;on: 10x stra;fied shuffle split on learning (90%) and valida;on (10%) •  Parameters to tune –  tree depth –  learning rate –  number of trees in ensemble –  scheme of filling the missing values –  number of unimportant features to exclude •  Decision –  Marginal improvement in valida;on score (about 0.005 with big variance) –  Biased valida;on scheme (because of year-to-year changes) –  Final submission: XGBoost model with default learning parameters (Occam’s Razor principle) 100 trees, max depth = 3, learning rate = 0.1

8. Feature evalua;on Feature vector: •  Personal (15) •  Cards & Wealth (8) •  Ac;veness (8) •  Event counters (77) •  Geo (28) Feature group AUC change aCer removing feature group AUC only features from the group Personal -0.0322 0.6615 Cards & Wealth -0.0137 0.5653 Event counters -0.0019 0.6738 Ac;veness -0.0012 0.6419 Geo loca;on -0.0004 0.6318 Cross-valida;on AUC score: 0.7213 (stra;ﬁed shuﬄe train/test split)

9. That’s it! Solu8on source code: hlps://github.com/romovpa/ecmlpkdd2016-otp-bank-upselling

A Simple yet Efficient Method for a Credit Card Upselling Prediction

Recommended

Recommended

More Related Content

More from romovpa

More from romovpa (11)

Recently uploaded

Recently uploaded (20)

A Simple yet Efficient Method for a Credit Card Upselling Prediction