Upcoming SlideShare
×

# My Law

217 views
187 views

Published on

Published in: Technology, Economy & Finance
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
217
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide

### My Law

1. 1. Exercise Data Preparation
2. 2. Modeling Example Business: National veterans’ organization Objective: From population of lapsing donors, identify individuals worth continued solicitation. Source: 1998 KDD-Cup Competition via UCI KDD Archive 2
3. 3. The Story  A national veterans’ organization seeks to better target its solicitations for donation. By only soliciting the most likely donors, less money will be spent on solicitation efforts and more money will be available for charitable concerns.  Solicitations involve sending a small gift to an individual together with a request for donation. Gifts include mailing labels and greeting cards.  Of particular interest is the class of individuals identified as lapsing donors. These individuals made their most recent donation between 12 and 24 months ago. The organization found that by predicting the response behavior of this group, they can use the model to rank all 3.5 million individuals in their database.  The current campaign refers to a greeting card mailing sent in 06/1997.  The source of this data is the Association for Computing Machinery’s (ACM) 1998 KDD-Cup competition. 3
4. 4. Additional Data Preparation The raw analysis data has been reduced for the purpose of this course. A subset of slightly over 19,000 records has been selected for modeling. As will be seen, this subset was not chosen arbitrarily. In addition, the 481 fields have been reduced to 50. Final Analysis Data Raw Analysis Data 19,372 Records 95,412 Records 50 Fields 481 Fields 4
5. 5. Analysis Data Definition Donor master data CONTROL_NUMBER Unique Donor ID MONTHS_SINCE_ORIGIN Elapsed time since first donation IN_HOUSE 1=Given to In House program, 0=Not In House donor 5
6. 6. Analysis Data Definition Demographic and other overlay data OVERLAY_SOURCE M=Metromail, P=Polk, B=both DONOR_AGE Age as of June 1997 DONOR_GENDER Actual or inferred gender PUBLISHED_PHONE Published telephone listing HOME_OWNER H=homeowner, U=unknown MOR_HIT Mail order response hit rate 6
7. 7. Analysis Data Definition SES is a roll-up of the socio-economic field CLUSTER_CODE Demographic and other overlay data CLUSTER_CODE 54 Socio-economic cluster codes SES 5 Socio-economic cluster codes INCOME_GROUP 7 income group levels MED_HOUSEHOLD_INCOME Median income in \$100s PER_CAPITA_INCOME Income per capita in dollars WEALTH_RATING 10 wealth rating groups 7
8. 8. Analysis Data Definition Demographic and other overlay data MED_HOME_VALUE Median home value in \$100s PCT_OWNER_OCCUPIED Percent owner occupied housing URBANICITY U=urban, C=city, S=suburban, T=town, R=rural, ?=unknown 8
9. 9. Analysis Data Definition Census overlay data PCT_MALE_MILITARY Percent male military in block PCT_MALE_VETERANS Percent male veterans in block PCT_VIETNAM_VETERANS Percent Vietnam veterans in block PCT_WWII_VETERANS Percent WWII veterans in block 9
10. 10. Analysis Data Definition Transaction detail data NUMBER_PROM_12 Number promotions last 12 mos. CARD_PROM_12 Number card promotions last 12 mos. 97NK Time `94 `95 `96 `97 `98 10
11. 11. Analysis Data Definition Transaction detail data FREQ_STATUS_97NK Frequency status, June `97 RECENCY_STATUS_96NK Recency status, June `96 MONTHS_SINCE_LAST Months since last donation LAST_GIFT_AMT Amount of most recent donation 96NK 97NK Time `94 `95 `96 `97 `98 11
12. 12. Analysis Data Definition The sampling method implies that no one made a donation between 6/1996 and 6/1997. However, for a limited number of cases, the number of months since last gift is fewer than 12. This contradiction is not resolved in the data’s documentation, nor will it be resolved here. RECENT transaction detail data RESPONSE_PROP Response proportion since June `94 RESPONSE_COUNT Response count since June `94 AVG_GIFT_AMT Average gift amount since June `94 RECENT_STAR_STATUS STAR (1, 0) status since June `94 94NK 96NK Time `94 `95 `96 `97 `98 12
13. 13. Analysis Data Definition RECENT transaction detail data CARD_RESPONSE_PROP Response proportion since June `94 CARD_RESPONSE_COUNT Response count since June `94 CARD_AVG_GIFT_AMT Average gift amount since June `94 94NK 96NK Time `94 `95 `96 `97 `98 13
14. 14. Analysis Data Definition LIFETIME transaction detail data PROM Total number promotions ever GIFT_COUNT Total number donations ever AVG_GIFT_AMT Overall average gift amount PEP_STAR STAR status ever (1=yes, 0=no) 94NK 96NK Time `94 `95 `96 `97 `98 14