2. About Me
• Started CEMBA in 2012, switched to part-time after year 1
• Graduated from Carlson in Spring of 2014
• VP, Team Manager at Bank of America within the IT department
• By graduation, presented three stats-based projects, each should
improve net income by $1M to $25M with investment < $500K
Statistics can be immediately monetized!
3. Sales and Marketing
The basic idea:
Customer purchasing should be predictable based on other customer’s past
purchasing
Possible independent variables for regression:
• Frequency of purchase (of any product, or of each product)
• Total purchases (normalized by corporate earnings or zip code
average income)
• Days since last purchase
• Preferred contact method
• Advertisement used
4. Sales and Marketing
• Remember: Use samples and verify on the whole!
• Use “clustering”, if you can, to identify similar customers:
http://www.jmp.com/support/help/K-Means_Clustering.shtml
http://www.jmp.com/support/help/Hierarchical_Clustering.shtml#110036
• Correlation will provide customer targets with higher sales closure
rates and, consequently, targets that are not profitable
• Acceptable p-values and large betas on “cross products” of
independent variables ( i.e. ϒ = βχiχj ) could indicate product
synergies/interactions
New York Times, February 19, 2012 (About Target):
“Psst, You in Aisle 5”
5. Project Management
The basic idea:
Actual Project Cost should be a function of, at least, Project Estimate
Possible independent variables for regression:
• Estimated project cost
• Percentage of work done by contractors and contractor hourly rate
(normalized by employee salary)
• How many silos/which silos are involved
• Expected duration (calendar time or hours of work) of the effort
• Implementing standard tools vs. customization
6. Project Management
Possible results:
• Little to no correlation between estimates and actuals
– Estimation process is a waste of money!
• Reasonable correlation
– Identify subsets where correlation is weaker than most and improve
estimation process
• High correlation
– Could provide possible areas for improvement (look for high betas)
– Could replace/augment portions of the estimation process (enter in all of the
independent variables and generate results)
– Could also mean “cooked” numbers
7. Project Management
Given reasonable or better correlation, expected return on the project,
and identified confidence intervals
• Avoid projects that would be taken without statistical analysis
– If the return for the project is too small to justify the undertaking given a
broad confidence interval, do not do the project
• Take on projects that normally would be skipped
– If the confidence intervals are very narrow, the estimate should be
considered “a lock” and the ROI requirements can be less stringent
8. Project Management: Case Study
Implemented at a Fortune 100 Firm
• Large areas of low correlation
• The pool of independent variables was limited by data availability
and politics
• Instead of a statistician, an expensive, automated software package
was used
– No second-order variables and no cross products (software limitation)
– No discretion in p-value measurement ( 0.051 gets just as rejected as 6.051 )
– High investment leads to sunk-cost fallacy, so statistical solutions are not
being investigated and root cause of low correlation isn’t getting identified
9. Develop a New Offering
MIT Sloan Management Review, Winter 2004:
“The Seller’s Hidden Advantage”
Toyota:
Benchmarked all of its suppliers and made them all more efficient, which
made the suppliers more competitive, which resulted in better prices for
Toyota
Orica:
Developed a 20 variable model from customer use of their explosives that
made each subsequent customer more accurate in their purchase and use
of Orica explosives
10. Develop a New Offering
IT Consulting Firm:
Benchmark your clients IT services
• Examine common services provided by each client – this is very
different and more difficult than manufacturing!
• Build a model based on available factors:
– Number of employees, locations, costs, level of service, etc.
• Results are a great starting point, but isn’t the holy grail
– Statistically suggesting costs are above benchmark prediction could be
indicative of a level of service not provided at other clients – but it could
also mean that there is inefficiency afoot
11. Tips and Tricks
1. Find out where the business unit or company makes or spends a
great deal of money
2. Find out what data can be had
3. Build a model on a sample if data is hard to get or is large
4. Ask for funding and justify with new, interesting results
5. Use the project in this class
6. Avoid using statistics terms (95% confident, Regression, etc.)
7. Expect surprising ignorance
13. Appendix: Clustering
Clustering data is using an algorithm to break a large data set into
smaller data sets:
This data set splits well into two clusters – it isn’t likely that real-life
data sets will be this contrived
14. Appendix: Clustering
Regression will not paint a good picture of the data as a whole:
Splitting the data into the appropriate clusters can lead to more
accurate modeling
15. Appendix: Why Brains Beat Tools
The process to implement a data mining/business intelligence tool:
1. Collect and organize data – usually in a repeatable, programmatic
(automatic) fashion
2. Purchase licenses and install and configure tool set – usually start
with a sample of the data from step 1
3. Examine results and tune tools
4. Act on results
Before any of this happens, a statistician should look to see if there is
actionable data relationships – steps 1 and 2 are very expensive!!
16. Appendix: Why Brains Beat Tools
Possible reasons for detecting a weak relationship:
1. Software does not perform clustering
2. Software does not examine 2nd order or cross product factors
3. Software incorrectly acts on multicollinearity
4. Tool set is improperly tuned/configured
5. Data aggregation mechanism is not functioning properly
6. The data is too random
Without a statistician, all six of these reasons look the same!