1. CASE STUDY
Understanding the Key Drivers to Maximise Revenue Generated From
Handle – Methodology, Findings & Results
2. CONTENTS
1. Data Preparation
i. Evaluating composition of available data
ii. Combining given data sets
iii. Deriving variables for analysis
2. Data Exploration
i. Univariate Analysis
a. Categorical variables
b. Numeric variables
c. Synopsis of Findings
ii. Bivariate Analysis
a. Plots
b. Categorical variables
c. Numeric variables
d. Synopsis of Key Findings
iii. Multivariate Analysis
3. Testing Assumptions of OLS
4. Regression Model Building
i. Summary of the Iterations performed for building the Models
ii. A combined model for all three race tracks
iii. Model for Track ‘AP’ : Model with plots for Residuals & Model fit
iv. Model for Track ‘CRC’ : Model with plots for Residuals & Model fit
v. Model for Track ‘FG’ : Model with plots for Residuals & Model fit
3. 1. DATA PREPARATION
(i)EVALUATING THE COMPOSITION OF AVAILABLE DATA
Firstly, composition of the available data was evaluated. Some of the key points looked at were:
• Type of the given variables (Character/Numeric) was evaluated in each of the data sets before merging using the procedure of PROC CONTENTS.
• Extent of the missing data was identified using PROC FREQ.
Table Name Variable # of missing data points Total # of data points in Table % of total missing values Meaning of the variable
Race conditions_of_race 103101 185360 56%Restrictions (or conditions) on the eligibility of horses to run. See decode table
Race Race_Type 1 185360 0%
Race sex_restriction 116451 185360 63%Restrictions on the gender of horses that can run. See decode table
Race scheduled_surface 182532 185360 98%Planned surface for a race. In inclement weather, turf races are often moved to dirt
Race track_condition 7279 185360 4%Describes the condition of the surface at race time. See decode table.
Race weather 7151 185360 4%Weather at race time. See decode table
Race grade 183201 185360 99%Ranking of stakes races. 1 being the highest.
Race About_distance_indicator 24392 2071 92%
Can be used with distance_id and distance_unit to indicate estimated length of races. Used for turf
races. See decode table
Race track_sealed_indicator 47203 185360 25%
Y/N on whether dirt track is "sealed" Sealing is a process of smoothing and compacting the dirt
surface to make it less penetrable to rain
Race_Distance_Conv Race_Type 1 185360 0%
sex_restriction 116451 185360 63%Restrictions on the gender of horses that can run. See decode table
scheduled_surface 182532 185360 98%Planned surface for a race. In inclement weather, turf races are often moved to dirt
track_condition 7279 185360 4%Describes the condition of the surface at race time. See decode table.
weather 7151 185360 4%Weather at race time. See decode table
grade 183201 185360 99%Ranking of stakes races. 1 being the highest.
track_sealed_indicator 47203 185360 25%
Y/N on whether dirt track is "sealed" Sealing is a process of smoothing and compacting the dirt
surface to make it less penetrable to rain
conditions_of_race 103101 185360 56%Restrictions (or conditions) on the eligibility of horses to run. See decode table
Track Track_Id 2 811 0%Track abbreviation
Track_Type 12 811 1%Should all be T for thoroughbred
State 28 811 3%State of operation
Track_Statistic Loaction_Type 7 188565 0%Should all be T for track. See decode
Location 22483 3980 83% Should all be ON for on (vs. off)
Track_Zone DST_YN 1 167 1%
4. 1. DATA PREPARATION
(ii) COMBINING GIVEN DATA SETS
The following presents a synopsis of the manner in which the given data sets were merged/combined to arrive at a
consolidated data set for use in final analysis of the variable ‘Handle’:
ORIGINAL DATA FILES MERGED DATA FILES
File # File Name # of Observations # of Variables
1 Exotic_Payoff 722873 15
2 Race 185360 60
3 Race_Distance_Conv 185360 61
4 Track 811 12
5 Track_Statistic 188565 11
6 Track_Zone 167 8
File # File Name
# of
Observations
# of
Variables Files Combined Primary Key
1 Race_Combined 185360 61Race + Race_Distance_Conv
Track_id Race_Date
Race_Number
2 Exotic_Race_Combi 724278 68Exotic_Payoff + Race_Combined
Track_id Race_Date
Race_Number
3 TrackStat_Zone1 188569 17Track_Zone1 + Track_Statistic Track_id Country
4 Track_Final 189214 22Track1 + TrackStat_Zone1
Track_id Country
Track_Name State
5 CDI_0 875026 83Track_Final + Exotic_Race_Combi Track_id Race_Date Country
• Data set ‘Track_Zone’ was modified by renaming the variable 'Area_ID' as 'Country'. The data set thus obtained was
'Track_Zone1‘. Similarly, data set 'Track' was also modified by renaming the variable 'Area_ID' as 'Country'. The data set thus
obtained is 'Track1‘.
• In some of the data files it is seen that for the variable ‘Track_id’ there are tracks that are not common in the two data sets
being merged. Hence, in the data files obtained after merging it is seen that count of the observations has increased.
• Variables with more than 50% missing values were dropped for the purpose of analysis as no meaningful analysis was
possible in the absence of adequate data
5. 1. DATA PREPARATION
(iii)DERIVING VARIABLES FOR ANALYSIS
From the final data set obtained after merging given data files, the following variables were further derived:
# Original Variable Derived Variable Description of the Derived Variable
1. Race_Dt Date_Race The derived variable shows the date on which a race took place.
2. Race_Dt Yr The derived variable shows the year in which the race took place.
3. Race_Dt Weekday It shows which day of the week the race took place.
4. Weekday Day_of_Week This variable renames weekday as a character variable.
5. Weekday Weekend_Indi This variable is categorical and shows whether the day of race is
a weekend or not.
6. WPS_pool , Total_pool Handle_Combi This variable shows the Handle generated from a race.
7. Race_Dt Mon This variable shows the month of the year in which the race took
place.
8. Post_time Race_time The variable ‘Post_time’ is converted from character to time
format to derive other time related variables for analysis.
9. Race_time HOD This variable shows the hour of the day in which the race took
place.
10. Race_Dt HOL_XXX Holiday indicators are created for each of the holidays
mentioned in the list of holidays for year 2005 & 2006.
6. 1. DATA PREPARATION
(iii)DERIVING VARIABLES FOR ANALYSIS
Notes:
• For deriving variables from ‘Race_Date’ (which was in a date-time format), it was first converted from a
number format to a SAS date format.
• The existing data set that emerged from merging the given data files was further sub-setted to include:
only the relevant track ids required for the analysis: CRC, AP & FG
only the relevant years: 2005 & 2006
• The variable ‘Handle_Combi’ was identified as the Dependent variable.
• Continuous variables were binned after evaluating the distribution using the procedure of PROC
UNIVARIATE. The variables that were binned included:
Purse_usa
Minimum_claim_price
Maximum_claim_price
Attendance
Handle_combi
7. CONTENTS
1. Data Preparation
i. Evaluating composition of available data
ii. Combining given data sets
iii. Deriving variables for analysis
2. Data Exploration
i. Univariate Analysis
a. Categorical variables
b. Numeric variables
c. Synopsis of Findings
ii. Bivariate Analysis
a. Plots
b. Categorical variables
c. Numeric variables
d. Synopsis of Key Findings
iii. Multivariate Analysis
3. Testing Assumptions of OLS
4. Regression Model Building
i. Summary of the Iterations performed for building the Models
ii. A combined model for all three race tracks
iii. Model for Track ‘AP’ : Model with plots for Residuals & Model fit
iv. Model for Track ‘CRC’ : Model with plots for Residuals & Model fit
v. Model for Track ‘FG’ : Model with plots for Residuals & Model fit
8. 2. DATA EXPLORATION
(i) Univariate Analysis
The procedure of PROC CONTENTS was used to identify the character & numeric variables in the data set that emerged after
merging the given data files. The analysis shown in the following slides is for Track Ids: AP, CRC & FG and for the years 2005 &
2006.
i. Categorical variables: PROC FREQ was used to evaluate the distribution of categorical variables in the data set.
ii. Numeric variables: PROC MEAN was used to understand the characteristics of each numeric variable using
i. Measures of Central Tendency and
ii. Dispersion
iii. Numeric variables: PROC UNIVARIATE was used to evaluate the:
i. Skewness & Kurtosis
ii. Distribution of the variable
iii. Inter Quartile Range (IQR) : This was used in binning the variable for subsequent multivariate analysis
iv. Categorical & Numeric variables were analysed both on an overall level as well as track id wise in line with the business
requirements.
9. 2. DATA EXPLORATION: Univariate Analysis
(a) Categorical Variables: Wager_Type
Frequency distribution on an overall basis
wager_type Frequency Percent
3 4729 17.87
4 1090 4.12
5 5 0.02
6 104 0.39
9 41 0.15
D 1454 5.49
E 6352 24.01
M 184 0.7
Q 612 2.31
S 5633 21.29
T 6190 23.39
Z 67 0.25
Frequency distribution Track_id wise
wager_type Frequency Percent
3 1359 19.33
4 379 5.39
D 573 8.15
E 1745 24.82
S 1304 18.54
T 1647 23.42
Z 25 0.36
wager_type Frequency Percent
3 2679 17.48
4 540 3.52
5 5 0.03
6 2 0.01
9 41 0.27
D 709 4.63
E 3751 24.48
M 184 1.2
S 3693 24.1
T 3686 24.06
Z 32 0.21
wager_type Frequency Percent
3 691 16.82
4 171 4.16
6 102 2.48
D 172 4.19
E 856 20.84
Q 612 14.9
S 636 15.49
T 857 20.87
Z 10 0.24
Track_id: AP Track_id: CRC
Track_id: FG
12. 2. DATA EXPLORATION: Univariate Analysis
(a) Categorical Variables: Track_condition
Frequency distribution on an overall basis
Track_condition Frequency Percent
FM 3707 14.18
FT 17139 65.56
GD 1928 7.38
MY 120 0.46
SF 129 0.49
SY 2571 9.84
WF 122 0.47
YL 425 1.63
Frequency distribution Track_id wise
Track_condition Frequency Percent
FM 1216 17.29
FT 4264 60.64
GD 433 6.16
MY 120 1.71
SF 129 1.83
SY 466 6.63
WF 62 0.88
YL 342 4.86
Track_condition Frequency Percent
FM 1606 10.68
FT 10042 66.78
GD 1470 9.78
SY 1812 12.05
WF 38 0.25
YL 70 0.47
Track_condition Frequency Percent
FM 885 21.74
FT 2833 69.59
GD 25 0.61
SY 293 7.2
WF 22 0.54
YL 13 0.32
Track_id: AP Track_id: CRC
Track_id: FG
13. 2. DATA EXPLORATION: Univariate Analysis
(a) Categorical Variables: Weather
Frequency distribution on an overall basis
Weather Frequency Percent
C 14479 55.39
F 175 0.67
H 622 2.38
L 8316 31.81
O 2237 8.56
R 312 1.19
Frequency distribution Track_id wise
Weather Frequency Percent
C 4360 62
H 622 8.85
L 1703 24.22
O 117 1.66
R 230 3.27
Weather Frequency Percent
C 8586 57.1
F 38 0.25
L 4453 29.61
O 1931 12.84
R 30 0.2
Weather Frequency Percent
C 1533 37.66
F 137 3.37
L 2160 53.06
O 189 4.64
R 52 1.28
Track_id: AP Track_id: CRC
Track_id: FG
14. 2. DATA EXPLORATION: Univariate Analysis
(a) Categorical Variables: Others
Sex_restriction
Sex_restriction Frequency Percent
B 5779 51.13
F 5523 48.87
Stakes_indicator Surface
Stakes_indicator Frequency Percent
N 24661 93.2
Y 1800 6.8
Surface Frequency Percent
D 20936 79.12
T 5525 20.88
15. 2. DATA EXPLORATION: Univariate Analysis
(b) Numeric Variables
track_id N Obs Variable Label Minimum Mean Median Maximum Std.Dev
AP 7034purse_usa purse_usa 9500 28548.9 25000 1000000 52990.9
minimum_claim_price minimum_claim_price 0 13264.9 10000 100000 17398.2
maximum_claim_price maximum_claim_price 0 14559.1 10000 100000 18215.5
number_of_runners number_of_runners 3 8 8 14 2
Handle_Combi 52542 238538 215058 3139455 146357
CRC 15322purse_usa purse_usa 7000 23742.1 18000 2000000 48642.1
minimum_claim_price minimum_claim_price 0 14076.3 12500 62500 12314.2
maximum_claim_price maximum_claim_price 0 14434.1 12500 62500 12255.9
number_of_runners number_of_runners 0 8 7 13 2
Handle_Combi 0 138670 124265 1186000 74123.6
FG 4107purse_usa purse_usa 8000 28401.7 20500 600000 39505
minimum_claim_price minimum_claim_price 0 12009.3 9000 80000 15375.1
maximum_claim_price maximum_claim_price 0 13804.9 10000 80000 15947.5
number_of_runners number_of_runners 0 8 8 13 2
Handle_Combi 0 199949 183672 1647365 99617.3
Note:
Variable ‘Attendance’, originally in data set ‘Track_Statistic’ only, shows the combined attendance at the track for the whole day whereas in
other files, data has been shown for multiple races at a track for any day. Hence, the merged data file will not show correct numbers for the
variable ‘Attendance’. It has, thus, has not been included in the analysis.
16. 2. DATA EXPLORATION: Univariate Analysis
(c) Synopsis of Key Findings
The table below shows the synopsis of the Univariate analysis performed in preceding slides for both categorical as well as
numeric variables:
Variable Overall Remarks
Wager Type Exacta & Trifecta Exacta & Trifecta were the most common wage types across all three
tracks
Race Type Claiming Track CRC also had Maiden Claiming as the most common race types
besides Claiming.
Age Restriction 3 yo’s & up 4 yo’s & up was also most common on track FG besides 3 yo’s & up
Track Condition Fast A Fast track condition was most common across all three race tracks.
Weather Clear Track FG was most often found Cloudy.
Purse_usa While Track AP & FG had an average purse of USD 30000 (appox), track
CRC’s purse was USD 24000 appox. Track AP had the highest Median value
for purse_usa.
Minimum_claim_price It was roughly the same for all three race tracks, appox USD 14000
Maximum_claim_price It was roughly the same for all three race tracks. Also, not much difference
b/w min & max claim price for all three race tracks.
Number_of_runners Average number of runners was 8 for all three race tracks.
Handle_Combi The average Handle & median Handle was highest for track AP.
17. CONTENTS
1. Data Preparation
i. Evaluating composition of available data
ii. Combining given data sets
iii. Deriving variables for analysis
2. Data Exploration
i. Univariate Analysis
a. Categorical variables
b. Numeric variables
c. Synopsis of Findings
ii. Bivariate Analysis
a. Plots
b. Categorical variables
c. Numeric variables
d. Synopsis of Key Findings
iii. Multivariate Analysis
3. Testing Assumptions of OLS
4. Regression Model Building
i. Summary of the Iterations performed for building the Models
ii. A combined model for all three race tracks
iii. Model for Track ‘AP’ : Model with plots for Residuals & Model fit
iv. Model for Track ‘CRC’ : Model with plots for Residuals & Model fit
v. Model for Track ‘FG’ : Model with plots for Residuals & Model fit
18. 2. DATA EXPLORATION
(ii) Bivariate Analysis
(a) Plots: For both, categorical as well as numeric variables, plots were used to graphically assess the data, identify any group
patterns and detect extreme values & outliers, if any.
Each categorical & numeric variable was plotted on the X-axis against the dependent variable, ‘Handle_combi’, on the Y-
axis.
(b) Categorical Variables: Chi-Square Test was used to evaluate the strength of association between the dependent variable
and each of the categorical variables, both existing as well as those created by binning continuous numeric variable. For
this purpose, the dependent variable, ‘Handle_Combi’ was converted from a numeric variable to an ordinal variable. Refer
to the tab ‘Proc Univariate_Binning’ in the worksheet of the link below for workings.
Measures of Central Tendency & Dispersion were also used.
Workings
(c) Numeric Variables: Correlation Analysis was used for each of the numeric variables and the dependent variable,
‘Handle_Combi’.
19. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Categorical variables
Handle_Combi & Age_Restriction Handle_Combi & Grade
The values highlighted in the plot for Handle_Combi & Grade above are those for which there are no grades.
The count of such missing values is 26116.
Since the count of such values is around 99% of the total data, in the absence of a confirmation from business, this variable will be
dropped for the purpose of analysis & model building.
21. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Categorical variables
Handle_Combi & Wager_Type Handle_Combi & Weather
The following observations in the plots above appear to be outliers:
Wager_Type= 4 Handle_Combi=3139455 Count=1
Wager_Type= E Handle_Combi=3187911 & 3206094 Count=1 each
Wager_Type= T Handle_Combi=2946736 & 3026707 Count=1 each
Wager_Type= 5 Handle_Combi= 1074715 Count= 1
Wager_Type= 3 Handle_Combi= 2062128 & 2079182 Count= 1 each
Wager_Type= S Handle_Combi= 2341293 & 2423058 Count= 1 each
Weather= Blank Handle_Combi= 0 Count=330
22. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Numeric variables
Handle_Combi & Attendance Handle_Combi & Distance_id
When Attendance = 0, how can there be any Handle?
Attendance=Blank Handle_Combi=0 Count= 299
23. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Numeric variables
Handle_Combi & Fraction_1 Handle_Combi & Fraction_2
Fraction is the split time and distance of a race. Not too sure if Handle>0 in case Fraction=0
Fraction_1=5534 Count= 4
Fraction_2= 15140 Count= 4
24. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Numeric variables
Handle_Combi & Fraction_3 Handle_Combi & Fraction_4
Fraction_3=21840 Count=4
Not too sure if Handle should be>0 in case Fraction=0
25. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Numeric variables
Handle_Combi & Fraction_5 Handle_Combi & HOD
Not too sure if Handle should be >0 in case Fraction=0
26. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Numeric variables
Handle_Combi & Maximum Claim Price Handle_Combi & Minimum Claim Price
27. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Numeric variables
Handle_Combi & Month Handle_Combi & No. of Runners
28. 2. DATA EXPLORATION: Bivariate Analysis
(a) Plots: Numeric variables
Handle_Combi & Number of Tickets bet Handle_Combi & Payoff Amount
Number_of_tickets_bet= 100 Handle_Combi=3139455 Count=1
Number_of_tickets_bet= 300 Count= 6
Payoff_amount=449240 Handle_Combi= 74352 Count=1
31. 2. DATA EXPLORATION: Bivariate Analysis
(b) Categorical variables
track_id N Obs N nmiss Minimum Mean Median Maximum Sum Std Dev
AP 7034 7032 2 52542 238538 215058 3139455 1677401898 146357
CRC 15322 15322 0 0 138670 124265 1186000 2124701137 74124
FG 4107 4107 0 0 199949 183672 1647365 821191578 99617
Track_Id wise Handle for the years 2005 & 2006
Year N Obs N nmiss Minimum Mean Maximum Sum Std Dev
2005 14355 14353 2 0 178654 3139455 2564218644 116427
2006 12108 12108 0 0 170059 3060903 2059075969 104284
Year wise Handle
32. 2. DATA EXPLORATION: Bivariate Analysis
(b) Categorical variables
Holiday Nobs N Mean Std Dev Min Max
No Holiday 24797 24795 174490 112375 0 3139455
HOL_BxD 95 95 244158 118135 88287 585465
HOL_GF 58 58 209818 75809 85759 411644
HOL_NY 150 150 241862 119630 72810 645937
HOL_TGV 201 201 117320 54094 29170 283323
HOL_Vet 88 88 212197 111846 68649 585766
HOL_Lab 184 184 185184 85400 65943 445343
HOL_ID 183 183 155035 82665 46025 457547
HOL_Mem 352 352 170100 80894 49733 438551
HOL_CDM 113 113 179644 59513 87072 362010
HOL_East 94 94 177696 57118 67108 307895
HOL_SPD 103 103 154384 51668 55188 279983
HOL_SB 45 45 170669 61101 62080 343682
Handle on different Holidays for the year 2005 & 2006
• Boxing Day & New Year Day is
with the highest average
Handle
• Though the dispersion is also
on the higher side
37. 2. DATA EXPLORATION: Bivariate Analysis
(b) Categorical Variables
Handle & Track_Condition: Overall basis
Track_con
dition
nmiss Minimum Mean Maximum Sum Std Dev
FM 0 47127 229414 3060903850436130 140592
FT 0 0 165842 1647365
284236631
0
88371
GD 0 27740 180584 773904348166275 99651
MY 0 72518 180118 416176 21614199 65542
SF 0 97339 313760 910328 40475089 129318
SY 0 0 141092 557473362746364 64893
WF 0 65195 180629 403022 22036696 71323
YL 0 78342 318714 3139455135453550 343003
Handle & Track_Condition : Track_Id wise
Track_conditi
on
Minimum Mean Maximum Std Dev
FM 104925 292445 3060903 190157
FT 52542 218140 1045357 92865
GD 83694 262882 750821 105619
MY 72518 180118 416176 65542
SF 97339 313760 910328 129318
SY 61246 181344 438860 68084
WF 98683 214068 403022 70024
YL 123543 344863 3139455 374599
Track_conditio
n
Minimum Mean Maximum Std Dev
FM 47127 179514 805235 90965
FT 0 135460 1186000 66663
GD 27740 155587 773904 83682
SY 33291 126181 557473 57553
WF 65195 128146 263524 47777
YL 78342 198372 505685 101064
Track_conditio
n
Minimum Mean Maximum Std Dev
FM 81009 233360 667579 90214
FT 55188 194819 1647365 101440
GD 135246 225033 413163 73366
SY 0 169282 384473 66152
WF 91081 177043 284815 52412
YL 179507 278785 384008 70898
Track: AP
Track: FG
Track: CRC
38. 2. DATA EXPLORATION: Bivariate Analysis
(b) Categorical Variables
Handle & Weather: Overall basis
Weather nmiss Minimum Mean Maximum Sum Std Dev
C 0 0 182642 3060903 2644474127 111013
F 0 60135 177446 388463 31053098 69996
H 0 83538 245123 750821 152466346 97967
L 0 27740 170675 3139455 1419335293 114994
O 0 31129 143224 1186000 320391583 78758
R 0 0 178122 432213 55574166 69767
Handle & Weather : Track_Id wise
Weather Minimum Mean Maximum Std Dev
C 60299 243543 3060903 132301
H 83538 245123 750821 97967
L 52542 233100 3139455 196410
O 74563 202576 457547 77792
R 61246 184426 432213 65310
Weather Minimum Mean Maximum Std Dev
C 0 146379 1074715 78120
F 60135 145458 279416 47698
L 27740 134054 457672 56383
O 31129 135004 1186000 76598
R 85277 157631 342570 64507
Weather Minimum Mean Maximum Std Dev
C 56314 212536 1647365 116815
F 65918 186319 388463 72693
L 55188 196956 1015652 86203
O 60184 190461 446041 68389
R 0 162061 334254 86450
Track: AP
Track: FG
Track: CRC
40. 2. DATA EXPLORATION: Bivariate Analysis
(b) Categorical Variables
Handle & Location_Type: Overall basis
Location_T
ype
nmiss Minimum Mean Maximum Sum Std Dev
F 1 86735 175072 376662 47269555 49556
I 0 0 116255 256454 40340341 39131
L 0 69928 180899 318480 6512372 64031
O 0 96663 209871 386702 7765224 84081
S 0 0 76598 175461 26579435 25944
T 1 0 178739 3139455 4477232533 111255
Handle & Location_Type : Track_Id wise
Location_Typ
e
Minimum Mean Maximum Std Dev
F 93507 179912 376662 51240
L 69928 180899 318480 64031
O 96663 209871 386702 84081
T 52542 240715 3139455 148876
Location_Type Minimum Mean Maximum Std Dev
I 0 116255 256454 39131
S 0 76598 175461 25944
T 0 143180 1186000 73051
Location_Type Minimum Mean Maximum Std Dev
F 86735 164357 346263 44025
T 55188 202861 1647365 98654
Track: AP
Track: FG
Track: CRC
44. 2. DATA EXPLORATION: Bivariate Analysis
(b). Categorical Variables: Chi-Square Test of Association
The following is a summary of the Chi-Square test performed to evaluate whether the association between each of the independent variables
& Handle (the dependent variable) is statistically significant or not. The results will be thus used for building the OLS Regression model with
those independent variables that will have an association significant @ 5% with the dependent variable, Handle. For the purpose of this test,
continuous variables have been binned as categorical variables on basis of the variable distribution found using the procedure of PROC
UNIVARIATE:
Variable P-Value Statistical association with
Handle
Wager_Type <0.0001 Significant
Race_Type <0.0001 Significant
Age_Restriction <0.0001 Significant
Sex_Restriction <0.0001 Significant
Stakes_Indicator <0.0001 Significant
Surface <0.0001 Significant
Track_Condition <0.0001 Significant
Weather <0.0001 Significant
Grade <0.0001 Significant
Track_Sealed_Indicator <0.0001 Significant
Maximum_Claim_Price <0.0001 Significant
45. 2. DATA EXPLORATION: Bivariate Analysis
(b) Categorical Variables: Chi-Square Test of Association (contd….)
Variable P-Value Statistical association with
Handle
Minimum_Claim_Price <0.0001 Significant
Purse <0.0001 Significant
Day of the Week <0.0001 Significant
Attendance <0.0001 Significant
The variables mentioned above have thus been found to have a statistically significant association with
the Handle, the dependent variable.
(Please refer to the hyperlinked file for detailed workings.)
47. 2. DATA EXPLORATION: Bivariate Analysis
(d) Synopsis of Key Findings
The following shows a synopsis of the track-wise bivariate analysis done for each of the categorical
variables & Handle. For each track id, categories of a variable giving the highest value of average
Handle have been spelt out:
Variable Track IDs
AP CRC FG
Race Number 11 12 11
Race Type STK (Stakes) STK (Stakes) STK (Stakes)
Age Restriction 3 4U 3
Distance ID 1000 1200 900
Track Condition YL (Yielding) YL (Yielding) YL (Yielding)
Weather H (Hazy) R (Rainy) C (Clear)
No. of Runners 14 13 13
Location Type T (Track) T (Track) T (Track)
Day of Week Saturday Saturday Saturday
Month August January April
Hour of the Day July December March
48. CONTENTS
1. Data Preparation
i. Evaluating composition of available data
ii. Combining given data sets
iii. Deriving variables for analysis
2. Data Exploration
i. Univariate Analysis
a. Categorical variables
b. Numeric variables
c. Synopsis of Findings
ii. Bivariate Analysis
a. Plots
b. Categorical variables
c. Numeric variables
d. Synopsis of Key Findings
iii. Multivariate Analysis
3. Testing Assumptions of OLS
4. Regression Model Building
i. Summary of the Iterations performed for building the Models
ii. A combined model for all three race tracks
iii. Model for Track ‘AP’ : Model with plots for Residuals & Model fit
iv. Model for Track ‘CRC’ : Model with plots for Residuals & Model fit
v. Model for Track ‘FG’ : Model with plots for Residuals & Model fit
49. 2. DATA EXPLORATION: Multivariate Analysis
For the dependent variable, ‘Handle_Combi’, and each of the numeric variables a multivariate analysis was conducted Track_Id
wise.
1. Race Number
0
200
400
600
800
1000
1200
$0
$100,000
$200,000
$300,000
$400,000
$500,000
$600,000
$700,000
$800,000
$900,000
$1,000,000
$1,100,000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
NumberofRaces(Frequency)
AverageHandle(DollarAmount)
Race #
AP: Average Handle by Race #
AP: Handle AP: No. of Races
0
200
400
600
800
1000
1200
1400
1600
1800
$0
$20,000
$40,000
$60,000
$80,000
$100,000
$120,000
$140,000
$160,000
$180,000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
NumberofRaces(Frequency)
AverageHandle(DollarAmount)
Race #
CRC: Average Handle by Race #
CRC: Handle CRC: No. of Races
0
200
400
600
800
$0
$20,000
$40,000
$60,000
$80,000
$100,000
$120,000
$140,000
$160,000
$180,000
$200,000
$220,000
$240,000
$260,000
$280,000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
NumberofRaces(Frequency)
AverageHandle(DollarAmount)
Race #
FG: Average Handle by Race #
FG Handle FG: No. of Races
For all 3 race tracks, Handle for race # 11 is the highest.
The # of races have fallen as the race # increases for all 3 race tracks.
In fact, for race # 11, the number of races have been very few.
Thus, it can be seen that lower # of races for higher race numbers have been
generating the maximum amount of average Handle.
Identifying the reasons for highest average Handle for Race # 11 in spite of
lower number of races can thus be relevant.
50. 2. DATA EXPLORATION: Multivariate Analysis
2. Number of Runners
0
200
400
600
800
1000
1200
1400
1600
1800
$0
$100,000
$200,000
$300,000
$400,000
$500,000
$600,000
$700,000
$800,000
1 2 3 4 5 6 7 8 9 10 11 12 13
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
No .of Runners
AP: Average Handle by No. of Runners
AP: Handle AP: No. of Races
0
500
1000
1500
2000
2500
3000
3500
4000
4500
$0.00
$100,000.00
$200,000.00
$300,000.00
$400,000.00
$500,000.00
$600,000.00
1 2 3 4 5 6 7 8 9 10 11 12 13
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
No. of Runners
CRC: Average Handle by No. of Runners
CRC: Handle CRC: No. of Races
0
200
400
600
800
1000
$0.00
$100,000.00
$200,000.00
$300,000.00
$400,000.00
$500,000.00
$600,000.00
1 2 3 4 5 6 7 8 9 10 11 12 13
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
No. of Runners
FG: Average Handle by No. of Runners
FG: Handle FG: No. of Races
For all 3 race tracks, maximum number of races have taken place for
around 6-7 runners.
Beyond 6-7 runners in a race at either of the 3 race tracks, the # of
races have shown a falling trend.
However, Handle is seen to be increasing with higher number of
races at all 3 race tracks.
Thus, although fewer # of races have taken place when number of
runners have been beyond 6-7, Handle has increased.
Clearly, increasing the number of races where runners are beyond
just 6-7 in number can have a positive impact on Handle.
51. 2. DATA EXPLORATION: Multivariate Analysis
3. Day of the Week
0
200
400
600
800
1000
1200
1400
1600
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
Mon Tues Wed Thurs Fri Sat Sunday
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Day of the Week
AP: Average Handle by Day of the Week
AP: Handle AP: No. of Races
0
500
1000
1500
2000
2500
3000
3500
4000
0
50000
100000
150000
200000
Mon Tues Wed Thurs Fri Sat Sunday
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Day of the Week
CRC: Average Handle by Day of the Week
CRC: Handle CRC: No. of Races
0
200
400
600
800
1000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
Mon Tues Wed Thurs Fri Sat Sunday
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Day of the Week
FG: Average Handle by Day of the Week
FG: Handle FG: No. of Races
Clearly, Saturday generates the highest average Handle during the week for all
3 race tracks.
Tuesday, perhaps, gets the lowest average Handle during the week.
Number of races held on Tuesday & Saturday are lowest & highest respectively
across all 3 race tracks.
However, for track CRC, it is seen that though the average Handle for
Wednesday is almost as high as that on Saturday, the # of races are very low in
#.
There is thus a scope for increasing average Handle on Wednesday by
increasing the # of races held on that day.
For Friday & Sunday, average Handle is much lower than that on Saturday,
however, # of races are almost as high as that on Saturday for all 3 race tracks
except CRC.
52. 2. DATA EXPLORATION: Multivariate Analysis
4. Month of the Year
0
200
400
600
800
1000
1200
1400
1600
1800
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Month of the Year
AP: Average Handle by Month of the Year
AP: Handle AP: No. of Races
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Month of the Year
CRC: Average Handle by Month of the Year
CRC: Handle CRC: No. of Races
0
200
400
600
800
1000
1200
1400
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Month of the Year
FG: Average Handle by Month of the Year
FG: Handle FG: No. of Races
Track AP & FG are seen to have races in only a few months & not the whole year around
and these months are not coinciding with each other’s.
For track CRC, while January has the highest average Handle, the # of races are the
lowest. There is thus a scope for increasing Handle in January even more by increasing
the # of races for that month.
Also, at track CRC, June & October have the lowest average Handle but higher # of races
as compared to months in which Handle is lower.
Especially for December, the steep increase in the number of races doesn’t justify the not
as steep increase in average Handle over the previous month of November.
53. 2. DATA EXPLORATION: Multivariate Analysis
5. Hour of the Day (HOD)
0
200
400
600
800
1000
1200
1400
1600
1800
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
$400,000.00
1 2 3 4 5 6 7 8 9
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
HOD
AP: Average Handle by Hour of the Day (HOD)
AP: Handle AP: No. of Races
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
1 2 3 4 5 6 7 8 9
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
HOD
CRC: Average Handle by Hour of the Day (HOD)
CRC: Handle CRC: No. of Races
0
200
400
600
800
1000
1200
0
50000
100000
150000
200000
250000
300000
1 2 3 4 5 6 7 8 9
No.ofRaces(Frequency)
AverageHandle(DOllarAmount)
HOD
FG: Average Handle by Hour of the Day (HOD)
FG: Handle FG: No. of Races
There appears to be a data anomaly for track FG. While there are no races at the
9th HOD, it has some amount of average Handle.
For track AP, the average amount of Handle has increased with increasing hours of
the day. However, the number of races have fallen sharply after the 5th HOD.
Strikingly, for track AP, at 8th & 9th HOD, very few races have generated the highest
amount of average Handle.
For track FG, though there has been a steep fall in the number of races after the
4th HOD, the average Handle has shown an increase. Also, how such few races at
the 6th HOD yield a high amount of Handle comparable to those with higher # of
races?
For track CRC, though there has been a steep fall in the number of races after the
4th HOD, the average Handle has remained high with only a marginal drop.
54. 2. DATA EXPLORATION: Multivariate Analysis
6. Purse Amount
0
500
1000
1500
2000
2500
3000
3500
4000
$0.00
$200,000.00
$400,000.00
$600,000.00
$800,000.00
$1,000,000.00
$1,200,000.00
$1,400,000.00
<=15000 15001-30000 30001-50000 50001-200000 200000+
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Purse (Dollar Amount)
AP: Average Handle by Purse Amount
AP: Purse AP: No. of Obs
0
1000
2000
3000
4000
5000
6000
7000
8000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
<=15000 15001-30000 30001-50000 50001-200000 200000+
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Purse (Dollar Amount)
CRC: Average Handle by Purse Amount
CRC: Purse CRC: No. of Obs
0
200
400
600
800
1000
1200
1400
1600
1800
$0.00
$100,000.00
$200,000.00
$300,000.00
$400,000.00
$500,000.00
$600,000.00
$700,000.00
$800,000.00
$900,000.00
<=15000 15001-30000 30001-50000 50001-200000 200000+
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Axis Title
FG: Average Handle by Purse Amount
FG: Purse FG: No. of Obs
For all three race tracks it can be seen that higher purse brackets have
higher amount of average Handle.
The number of races have fallen sharply for higher brackets of purse
amount though an increase in # of races when purse is in the bracket of
USD 15001-30000 can be seen for both track AP & track CRC.
Overall, higher amount of average Handle is seen for higher brackets of
purse amount accompanied by a sharp fall in the number of races
55. 2. DATA EXPLORATION: Multivariate Analysis
7. Minimum Claim Price
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
<=10000 10001-30000 30001-50000 50001-100000 100000+
No.ofObs(Frequency)
AverageHandle(DollarAmount)
Minimum Claim Price (Dollar Amount)
AP: Average Handle by Min Claim Price
AP: Min_ClP AP: No. of Obs
0
1000
2000
3000
4000
5000
6000
7000
$0.00
$20,000.00
$40,000.00
$60,000.00
$80,000.00
$100,000.00
$120,000.00
$140,000.00
$160,000.00
$180,000.00
<=10000 10001-30000 30001-50000 50001-100000 100000+
No.ofObs(Frequency)
AverageHandle(DollarAmount)
Minimum Claim Price (Dollar Amount)
CRC: Average Handle by Min Claim Price
CRC: Min_ClP CRC: No. of Obs
0
500
1000
1500
2000
2500
3000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
<=10000 10001-30000 30001-50000 50001-100000 100000+
No.ofObs(Frequency)
AverageHandle(DollarAmount)
Minimum Claim Price (Dollar Amount)
FG: Average Handle by Min Claim Price
FG: Min_ClP FG: No. of Obs
For all three race tracks, the amount of average Handle has been fairly
constant across all brackets of the minimum claim price.
Also, for all three race tracks, the number of races have fallen with increasing
brackets of the minimum claim price.
Only a marginal increase in the number of races can be seen for the bracket of
100000+ minimum claim price.
56. 2. DATA EXPLORATION: Multivariate Analysis
8. Maximum Claim Price
0
1000
2000
3000
4000
5000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
<=10000 10001-30000 30001-50000 50001-100000 100000+
No.ofObs(Frequency)
AverageHandle(DollarAmount)
Maximum Claim Price (Dollar Amount)
AP: Average Handle by Max Claim Price
AP: Max_ClP AP: No. of Obs
0
1000
2000
3000
4000
5000
6000
7000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
<=10000 10001-30000 30001-50000 50001-100000 100000+
No.ofObs(Frequency)
AverageHandle(DollarAmount)
Maximum Claim Price (Dollar Amount)
CRC: Average Handle by Max Claim Price
CRC: Max_ClP CRC: No. of Obs
0
500
1000
1500
2000
2500
3000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
<=10000 10001-30000 30001-50000 50001-100000 100000+
No.ofObs(Frequency)
AverageHandle(DollarAmount)
Maximum Claim Price (Dollar Amount)
FG: Average Handle by Max Claim Price
FG: Max_ClP FG: No. of Obs
As for the minimum claim price, similar observations can be made for the
amount of maximum claim price.
57. 2. DATA EXPLORATION: Multivariate Analysis
9. Attendance
0
1000
2000
3000
4000
5000
6000
7000
8000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
0-3000 3001-5000 5001-7000 7001-9000 9001-11000 11000+
No.ofRaces(Frequency))
AverageHandle(DollarAmount)
Attendance (in numbers)
AP: Average Handle by Attendance
AP: Attend AP: No. of Obs
0
1000
2000
3000
4000
5000
6000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
0-3000 3001-5000 5001-7000 7001-9000 9001-11000 11000+
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Attendance (in numbers)
CRC: Average Handle by Attendance
CRC: Attend CRC: No. of Obs
0
500
1000
1500
2000
2500
3000
3500
4000
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
$400,000.00
$450,000.00
$500,000.00
$550,000.00
0-3000 3001-5000 5001-7000 7001-9000 9001-11000 11000+
No.ofRaces(Frequency)
AverageHandle(DollarAmount)
Attendance (in numbers)
FG: Average Handle by Attendance
FG: Attend FG: No. of Obs
For track AP, Handle is generated only when attendance is 0-3000. Is that the
maximum audience holding capacity of this track.
For track CRC, Handle is seen to have increased for higher attendance brackets
though there has been a corresponding fall in the no. of races.
For track FG, Handle is highest for attendance bracket 5001-7000 though the
no. of races for this bracket of attendance is very low. No races with
attendance greater than 9000 have taken place at track FG. Is the capacity of
track FG limited to 9000?
58. CONTENTS
1. Data Preparation
i. Evaluating composition of available data
ii. Combining given data sets
iii. Deriving variables for analysis
2. Data Exploration
i. Univariate Analysis
a. Categorical variables
b. Numeric variables
c. Synopsis of Findings
ii. Bivariate Analysis
a. Plots
b. Categorical variables
c. Numeric variables
d. Synopsis of Key Findings
iii. Multivariate Analysis
3. Testing Assumptions of OLS
4. Regression Model Building
i. Summary of the Iterations performed for building the Models
ii. A combined model for all three race tracks
iii. Model for Track ‘AP’ : Model with plots for Residuals & Model fit
iv. Model for Track ‘CRC’ : Model with plots for Residuals & Model fit
v. Model for Track ‘FG’ : Model with plots for Residuals & Model fit
59. 3. TESTING ASSUPMTIONS OF OLS
The following assumptions of OLS could not be tested as the SAS procedures listed below for each test are not
available in WPS. Hence, it couldn’t be conclusively evaluated if the estimates were BLUE.:
i. Linearity
While the assumption of Linearity can be tested graphically also from the partial residual plots, the option
of ‘Partial’ while fitting the model using Proc Reg was not available.
ii. Independence of Error terms
The Durbin Watson test for evaluating the Independence of Error terms was not available as an option in
the procedure of Proc Reg in WPS.
iii Normality of Error terms
The option of ‘Normal’, ‘Histogram’ & ‘Probplot’ is not available in Proc Reg to evaluate the normality of
the error terms.
iv Homoskedasticity
White’s Test could not be used in WPS.
60. CONTENTS
1. Data Preparation
i. Evaluating composition of available data
ii. Combining given data sets
iii. Deriving variables for analysis
2. Data Exploration
i. Univariate Analysis
a. Categorical variables
b. Numeric variables
c. Synopsis of Findings
ii. Bivariate Analysis
a. Plots
b. Categorical variables
c. Numeric variables
d. Synopsis of Key Findings
iii. Multivariate Analysis
3. Testing Assumptions of OLS
4. Regression Model Building
i. Summary of the Iterations performed for building the Models
ii. A combined model for all three race tracks
iii. Model for Track ‘AP’ : Model with plots for Residuals & Model fit
iv. Model for Track ‘CRC’ : Model with plots for Residuals & Model fit
v. Model for Track ‘FG’ : Model with plots for Residuals & Model fit
61. 3. REGRESSION MODEL BUILDING
(i) A Snapshot of the Iterations Performed
The following is a synopsis of the various iterations performed for building the model. Four models have been built: One, a
combined model on an overall basis for all three race tracks and other three being separate models for each of the three race
tracks viz. AP, CRC & FG:
Iteration # Description of variables included in the Iteration Adjusted R-Square
Overall AP CRC FG
1. All as-is variables, numeric in nature, were used. 0.56 0.68 0.43 0.51
2. Dropping variables that were found to be linear
combinations of other variables in iteration # 1
0.56 0.68 0.43 0.51
3. Dummy variables, created for categorical
variables, along with as-is numeric variables
were included.
0.68 0.75 0.56 0.70
4. Dropping variables that were found to be linear
combinations of other variables in iteration # 3.
0.68 0.75 0.56 0.70
5. Dropping variables that were found to be linear
combinations of other variables in iteration # 4
or insignificant @ 5%
0.68 0.74 0.56 0.69
6. Variables found insignificant @ 5% in the
preceding iteration were dropped.
0.68 0.74
7. Variables with a VIF score > 10 in iteration # 6
above were dropped.
0.67 0.71 0.53 0.65
62. 3. REGRESSION MODEL BUILDING
(ii) A combined model for all 3 race tracks taken together.
(a) Regression Output
Please click on the above hyperlink for detailed results and the model equation in the tab named ‘7th Iteration’.
(b) Summary of the results
Results of only those drivers statistically significant @ 5% have been shown and interpreted with respect to their impact on
Handle. Type of
Characteristic
Drivers of
Handle
Impact on
Handle
Average change in
Handle for a 1 unit
change in the driver.
Ra ce Age _34 -ve 8390
Ra ce Age _35 -ve 8058
Ra ce Age _4U +ve 16835
Ra ce C o urs e _T +ve 39050
Ra ce DST -ve 51826
Ra ce Lo ca tio n_F -ve 46635
Ra ce Lo ca tio n_I -ve 23297
Ra ce Lo ca tio n_L -ve 46913
Ra ce Ra ce _ALW -ve 6936
Ra ce Ra ce _AOC -ve 9218
Ra ce Ra ce _DBY -ve 222613
Ra ce Ra ce _MC L +ve 4501
Ra ce Ra ce _MSW +ve 4606
Ra ce Ra ce _OC S -ve 33897
Ra ce Ra ce _SHP +ve 40170
Ra ce Ra ce _STK -ve 14399
Ra ce Sta te _IL +ve 39715
Ra ce Tra ck_FM -ve 20126
Ra ce Tra ck_GD -ve 19918
Ra ce Tra ck_MY -ve 20368
Ra ce Tra ck_SF +ve 11108
Ra ce Tra ck_SY -ve 12894
Ra ce W a ge r_3 -ve 83076
Ra ce W a ge r_4 -ve 64022
Ra ce W a ge r_5 -ve 111358
Ra ce W a ge r_6 -ve 114061
Ra ce W a ge r_9 -ve 66484
Ra ce W a ge r_D -ve 74073
Ra ce W a ge r_M -ve 94612
Ra ce W a ge r_Q -ve 93753
Ra ce W a ge r_S -ve 68124
Ra ce W a ge r_T -ve 22168
Ra ce W a ge r_Z -ve 93800
Ra ce W e a the r_L -ve 2824
Ra ce W e a the r_O -ve 8414
Ra ce W e a the r_R -ve 24565
Ra ce
numbe r_o f_run
ne rs
+ve 12287
Ra ce
numbe r_o f_tick
e ts _be t
+ve 211
Ra ce purs e _us a +ve 1
Ra ce ra ce _numbe r +ve 3302
Time HOL_BxD +ve 130563
Time HOL_La b +ve 21833
Time HOL_Me m -ve 9719
Time HOL_NY +ve 59249
Time HOL_SB -ve 20825
Time HOL_SP D -ve 22563
Time HOL_TGV -ve 37150
Time HOL_Ve t -ve 19124
Time Mo n +ve 1764
Time W e e kDa y +ve 5603
Time W e e ke nd_Indi +ve 15326
Time fra ctio n_1 +ve 17
SUMMARY: REGRESSION RESULTS OF THE OVERALL MODEL
63. 3. REGRESSION MODEL BUILDING
(ii) A combined model for all 3 race tracks taken together.
(c) Residual Plot
64. 3. REGRESSION MODEL BUILDING
(ii) A combined model for all 3 race tracks taken together.
(d) Model Fit
65. 3. REGRESSION MODEL BUILDING
(iii) Model for Track ‘AP’
(a) Regression Output
Please click on the above hyperlink for detailed results and the model equation in the tab named ‘7th Iteration’.
(b) Summary of the results
Results of only those drivers statistically significant @ 5% have been shown and interpreted with respect to their impact on
Handle.
s
Type of
Characteristic
Drivers of
Handle
Impact on
Handle
Average change in
Handle for a 1 unit
change in the driver.
Race Breed_QH -ve 48267
Race Location_F -ve 34365
Race Location_L -ve 36642
Race Race_ALW -ve 9246
Race Race_AOC -ve 16294
Race Race_MSW -ve 6628
Race Race_STK -ve 17440
Race Track_SY -ve 30309
Race Track_YL +ve 29391
Race Wager_3 -ve 79629
Race Wager_4 -ve 71710
Race Wager_D -ve 93001
Race Wager_S -ve 72598
Race Wager_T -ve 20962
Race Wager_Z -ve 106457
Race Weather_R -ve 15752
Race
number_of_run
ners
+ve 16502
Race
number_of_tick
ets_bet
+ve 314
Race purse_usa +ve 2
Race race_number +ve 5319
Time HOD +ve 3880
Time HOL_ID +ve 29700
Time HOL_Lab +ve 87228
Time WeekDay +ve 7264
Time Weekend_Indi +ve 34873
SUMMARY: REGRESSION RESULTS OF THE MODEL FOR TRACK 'AP'
68. 3. REGRESSION MODEL BUILDING
(iv) Model for Track ‘CRC’
(a) Regression Output
Please click on the above hyperlink for detailed results and the model equation in the tab named ‘6th Iteration’.
(b) Summary of the results
Results of only those drivers statistically significant @ 5% have been shown and interpreted with respect to their impact on
Handle. Type of
Characteristic
Drivers of Handle Impact on Handle Average change in
Handle for a 1 unit
change in the driver.
Race Age_4U +ve 106507
Race Course_T +ve 35208
Race DistanceID_Conv_to_Fur -ve 37
Race Location_S +ve 8080
Race Race_MSW +ve 8504
Race Race_STK +ve 19343
Race Track_FM -ve 19582
Race Track_GD -ve 16025
Race Track_SY -ve 9652
Race Track_W F -ve 16437
Race Track_YL -ve 22817
Race W ager_3 -ve 67735
Race W ager_5 -ve 93481
Race W ager_6 -ve 113661
Race W ager_9 -ve 67019
Race W ager_D -ve 49220
Race W ager_M -ve 80114
Race W ager_S -ve 54116
Race W ager_T -ve 12607
Race W ager_Z -ve 66625
Race W eather_F +ve 22005
Race W eather_L -ve 6765
Race W eather_O -ve 4888
Race number_of_runners +ve 9799
Race purse_usa +ve 1
Race race_number +ve 1840
Time HOD -ve 2791.4285
Time HOL_CDM +ve 38739
Time HOL_ID -ve 25209
Time HOL_Mem -ve 12591
Time Mon +ve 3053.76611
Time WeekDay +ve 5912.74561
Time Weekend_Indi +ve 8358.8241
Time fraction_3 +ve 0.84318
Time fraction_4 +ve 0.26747
Time fraction_5 +ve 0.95467
SUMMARY: REGRESSION RESULTS OF THE MODEL FOR TRACK 'CRC'
71. 3. REGRESSION MODEL BUILDING
(v) Model for Track ‘FG’
(a) Regression Output
Please click on the above hyperlink for detailed results and the model equation in the tab named ‘6th Iteration’.
(b) Summary of the results
Results of only those drivers statistically significant @ 5% have been shown and interpreted with respect to their impact on
Handle.
Type of
Characteristic
Drivers of Handle Impact on
Handle
Average change in
Handle for a 1 unit
change in the driver.
Race Age_4U +ve 8298
Race Location_F -ve 19754
Race Race_ALW -ve 10989
Race Race_AOC -ve 7139
Race Race_STK -ve 13220
Race Track_YL +ve 49565
Race Wager_3 -ve 101465
Race Wager_4 -ve 87265
Race Wager_6 -ve 129220
Race Wager_D -ve 76597
Race Wager_Q -ve 97902
Race Wager_S -ve 81803
Race Wager_T -ve 16207
Race Wager_Z -ve 111527
Race Weather_F -ve 10450
Race number_of_runners +ve 11585
Race number_of_tickets_bet +ve 725
Race payoff_amount +ve 1
Race purse_usa +ve 1
Race race_number +ve 5897
Time HOD -ve 1792
SUMMARY: REGRESSION RESULTS OF THE MODEL FOR TRACK 'FG'