The First Step in Information Management
www.firstsanfranciscopartners.com
Produced	by:
MONTHLY SERIES
Brought	to	you	in	partnership	with:
March 2, 2017
Descriptive, Prescriptive and Predictive Analytics
Polling	Questions
§ What type	of	statistical	analyses	do	you	use	or	plan	to	use	(can	choose	multiple	answers)?
− Descriptive
− Predictive
− Prescriptive
− I	don’t	use	any	of	these
− I	don’t	know	the	difference	between	these
pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Polling	Questions
§ What type	of	statistical	analyses	do	you	use	or	plan	to	use	(can	choose	multiple	answers)?
− Descriptive
− Predictive
− Prescriptive
− I	don’t	use	any	of	these
− I	don’t	know	the	difference	between	these
§ How	frequently	do	you	use	statistical	analyses	in	your	work?
− I	don’t	currently	do	any	type	of	statistical	analysis
− Less	than	once	a	week
− Once	or	a	few	times	a	week
− At	least	once	a	day
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Topics	For	Today’s	Webinar
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ Overview	of	statistical	analysis	process
− Forming	a	hypothesis
− Identifying	appropriate	sources
− Proving/Disproving	the	hypothesis
§ Types	of	data	analysis
− Descriptive	data	analytics
− Predictive	data	analytics
− Prescriptive	data	analytics
§ How	these	types	compare	within	the	analytic	environment
§ Key	takeaways	and	suggested	resources
Combine?
Descriptive
Predictive
Prescriptive
The	Process	of	Statistical	Analysis
pg 5© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Form	
Hypotheses
• Null:	Nothing	
special
• Alternative:	
Something	
unique,	an	
actionable	
finding,	etc.
Identify	Data	
Source
• Don’t	go	
overboard!
• Collect	your	
own,	OR
• Use	
secondary	
data
Prove/Disprove	
Hypothesis
• Is	Type	I	or	
Type	II	error	
worse?
• Choose	
confidence	
level
• Reject/not	
reject	null
When	we	have	resource	constraints,	Statistical	Analysis	enables	us	to	make	quantitative	
inferences	based	on	an	amount	of	information	we	can	analyze	(a	sample).
Step	1:	Forming	a	Hypothesis
§ In	statistical	analysis,	we	have	two	hypotheses:
− Null	hypothesis:	Claims	that	any	irregularities	in	the	sample	are	due
to	chance
− Alternative	hypothesis:	Claims	that	irregularities	in	the	sample	are	due
to	non-random	causes	(and	would	therefore	reflect	the	population)
§ What	are	you	really	looking	to	discover/prove?
− Experiment	1:
§ Null:	There	is	no	difference	in	the	amount	sold	when	comparing	salespeople	who	did	
and	did	not	receive	training.
§ Alternative:	There	is	a	difference	in	the	amount	sold	when	comparing	salespeople	who	
did	and	did	not	receive	training.	
− Experiment	2:
§ Null:	The	salespeople	who	received	training	do	not	sell	more	on	average	than	the	
salespeople	who	did	not	receive	training.
§ Alternative:	Salespeople	who	received	the	training	sell	more	on	average	than	those	who	
did	not	receive	the	training.
pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step					1
Step	2:	Identifying	Appropriate	Sources
§ Remember,	you	don’t	need	Big	Data	for	every	decision!
§ Sometimes,	knowing	what	data	you	don’t need	is	just	as	important
as	knowing	what	you	do need.	Keep	your	end	decision	in	mind.
§ Potential	sources	of	data:
− Primary	data	− collect	new	data
§ Who	to	include:	Random	sample,	stratified	random	sample,	etc.	
§ How	many	to	include:	Sample	size	calculators	online	(free)
§ Determine	the	level	of	measurement	needed	for	your	desired	analysis:
categorical,	ordinal,	interval,	rational
§ As	necessary,	design	a	control	group
− Secondary	data	− utilize	existing	data
§ Census	records,	syndicated	data,	government	data,	etc.
§ Consider	your	data	needs,	data	cleanliness,	cost,	etc.,	when	determining	
appropriate	sources.
pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step					2
Step	3:	Proving/Disproving	the	Hypothesis
§ Establish	a	confidence	level	prior	to	analysis.
§ Confidence	levels:
1. Determine	how	significant	a	difference/irregularity	must	be	for	you
to	prove/disprove	your	alternative	hypothesis.
2. Determine	how	confident	you	can	be	in	your	decision.
§ Even	with	a	high	confidence	level,	you	aren’t	always	right:	
− Type	I	error:	You	reject	the	null	hypothesis	but	shouldn’t	have.
− Type	II	error:	You	do	not	reject	the	null	hypothesis	but	should	have.
− How	to	decrease	the	likelihood	of	these	errors:	change	the	confidence	level,	increase	
sample	size	(be	aware	of	effect	size),	etc.
§ Determine	which	type	of	error	is	more	detrimental	to	your	investigation	and	set	
up	your	study	accordingly.
pg 8© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step					3
Step	3:	Proving/Disproving	the	Hypothesis
pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Training N Mean
Std.
Deviation
Std. Error
Mean
No training 74 102.643 9.95482 1.15722
Training 74 106.3889 9.83445 1.14323
QPctQ3
Sig. (2-
tailed)
Mean
Difference
Std. Error
Difference
95%
Confidence
Interval of
the
Difference
95%
Confidence
Interval of
the
Difference
Lower Upper
0.029 0.865 -2.303 146 0.023 -3.74595 1.6267 -6.96086 -0.53103
-2.303 145.978 0.023 -3.74595 1.6267 -6.96087 -0.53102
Levene's Test for
Equality of Variances
t-test for Equality of
Means
F Sig.
§ Confidence	level	=	95%
§ Alpha	=	0.05
100
102
104
106
108
No	training Training
Percent	of	3rd	Quarter	Quota	Sold	
by	Trained	vs.	Untrained	
Salespeople
www.firstsanfranciscopartners.com
Types	of	Data	Analysis
Types	of	Data	Analysis
pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive PrescriptiveDescriptive
• Aims	to	help	
uncover	valuable	
insight	from	the	
data	being	analyzed
• Answers	the	
question
“What	happened?”
• Helps	forecast
behavior	of	people
and	markets
• Answers	the	question
“What	could	happen?”
• Suggests	
conclusions	or	
actions	that	may
be	taken	based
on	the	analysis
• Answers	the	
question
“What	should
be	done?”
§ Though	the	most	simple	type,	it	is	used	most	
often.
§ Two	types	of	descriptive	analysis:
1. Measures	of	central	tendency	(tells	us	
about	the	middle)
§ Mean	− the	average
§ Median	− the	midpoint	of	the	
responses
§ Mode	− the	response	with	the	highest	
frequency
2. Measures	of	dispersion
§ Range	− the	min,	the	max	and	the	
distance	between	the	two
§ Variance	− the	average	degree	to	which	
each	of	the	points	differ	from	the	mean
§ Standard	Deviation	−	the	most	
common/standard	way	of	expressing	
the	spread	of	data
pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Customer_ID Items	Purchased Amount	Spent
29304 1 1.09$																							
28308 3 44.43$																				
19962 21 218.58$																		
30281 1 73.02$																				
6.5
2
1
0
1
2
3
4
5
6
7
Mean Median Mode
Mean,	Median	and	Mode	Amounts
of	Items	Purchased
Descriptive Data	Analytics
www.firstsanfranciscopartners.com
AnalysisPredictive
§ Some	mistake	predictive	analysis	to	have	exclusive	relevance	to	predicting	
future events.	
− However,	in	cases	such	as	sentiment	analysis,	existing	data	(e.g.,	the	text
of	a	tweet)	is	used	to	predict	non-existent	data	(whether	the	tweet	is	positive
or	negative).
§ Several	of	the	models	that	can	be	used	for	predictive	analysis	are:
− Forecasting
− Simulation
− Regression
− Classification
− Clustering
pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive Data	Analytics
Forecasting
§ Forecasting:
− Moving	average	technique:	use	the	
mean	of	prior	periods	to	predict	the	
next
§ The	mean	of	periods	1−4	=	period	5
§ The	mean	of	periods	2−5	=	period	6		
− Exponential	smoothing	technique:	
similar,	but	more	recent	data	points	
are	weighted	more	heavily	due	to	
relevance
− Regression	techniques
§ Use	caution	in	forecasting	– The	
larger	the	forecasted	time	period,	
the	less	accuracy	there	is	in	the	
projections.
pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
$-
$5,000.00	
$10,000.00	
$15,000.00	
$20,000.00	
$25,000.00	
2006 2008 2010 2012 2014 2016 2018 2020 2022
Net	Income	of	Store	C	Projected	2017-2020
Predictive
Simulation
§ Simulation
− Queuing	models:	used	to	predict	wait	time	and	queue	length
§ Results	can	be	used	to	create	staff	schedules	in	a	way	that	reduces	inefficiencies,	etc.
− Discrete	event	model:	used	in	special	situations	when	queuing	cannot	be	used
§ Results	can	be	used	to	identify	bottlenecks,	etc.	
− Monte	Carlo	simulations:	used	to	identify	probable	outcomes	of	a	scenario
based	on	many	possible	outcomes	(uses	random	number	generation	and	many	
iterations	of	the	scenario).
§ Results	can	be	used	to	predict	the	likelihood	of	profitability	within	the	first	two	years,	etc.	
pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive
Queuing	Model	Example
pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Scenario	1 Scenario	2
Predictive
Monte	Carlo	Simulation	Example
pg 18© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive
Regression
§ Regression	− generally	speaking,	used
to	understand	the	correlation	of	
independent	and	dependent	variables
pg 19© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ Types	of	regression	models:
− Logistic:	used	for	categorical	variables	(i.e.,	will	customers	shop	at	your	store	or	a	
competitor?)
− Linear:	used	to	identify	a	linear	relationship	between	the	dependent	variable	and	
at	least	one	independent	variables	(i.e.,	daily	store	revenue	predicted	by	the	
number	of	customers	entering	the	store)
− Step-wise:	used	to	identify	a	relationship	between	dependent/independent	
variables.	This	is	done	by	adding/removing	variables	based	on	how	those	
variables	impact	the	overall	strength	of	the	model.
Predictive
Classification	&	Clustering
§ Classification:	used	to	assign	objects	to	
one	of	several	categories
− Sentiment	analysis	of	social	media	
postings
§ Clustering:	another	method	of	forming	
groups
− Intragroup	differences	are	minimized
− Intergroup	differences	are	maximized
− Commonly	used	to	create	and	better	
understand	customer	groups
pg 20© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive
www.firstsanfranciscopartners.com
AnalysisPrescriptive
§ Decisions	can	be	formulated	from	descriptive	and	predictive	analysis
− If	I	need	to	cut	a	product	and	I	know	that	product	C	is	least	preferred	and	least	
profitable,	I	will	cut	product	C.
§ However,	prescriptive	analytics	explicitly	tell	you	the	decisions	that	should	
be	made.	This	can	be	done	using	a	variety	of	techniques:
− Linear	programming	
− Integer	programming
− Mixed	integer	programming
− Nonlinear	programming
pg 22© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Prescriptive Data	Analytics
Linear	Programming	Example
pg 23© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Product	A Product	B Product	C Product	D Product	E
Quantity	to	Order
Profit	per	Unit 5$															 3$															 20$													 50$													 200$											 Total	Profit -$								
Product	A Product	B Product	C Product	D Product	E Used Available
Storage	Space 0.05 0.5 1 5 10 1000
Selling	Effort 0.25 5 0.5 2 7 500
Minimum	Order 100 15 20 60 5
Product	A Product	B Product	C Product	D Product	E
Quantity	to	Order 100 15 490 60 5
Profit	per	Unit 5$															 3$															 20$													 50$													 200$											 Total	Profit 14,345.00$			
Product	A Product	B Product	C Product	D Product	E Used Available
Storage	Space 0.05 0.5 1 5 10 852.5 1000
Selling	Effort 0.25 5 0.5 2 7 500 500
Minimum	Order 100 15 20 60 5
Solution:
Prescriptive
Comparing	the	Three	Types	of	Data	Analytics
§ Descriptive	analysis	is	most	common.
− Best	practice	to	perform	descriptive	
analyses	prior	to	prescriptive/predictive
§ Understand	that	distribution,	variance,	
skew,	etc.,	may	exclude	certain	models
§ How	to	know	which	type	of	analysis	to	
pursue:
− How	much	time	do	you	have?
− What	resources	are	available	to	you?
pg 24© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
− How	accurate	is	your	data?	How	accurate
do	you	need	the	model/analysis	to	be?
− How	popular/accepted	is	the	model	you	are	considering?	
§ Don’t	subscribe	to	“that’s	how	we’ve	always	done	it,”	but	
remember	to	use	a	model	that	stakeholders	will	accept.
Key	Takeaways	and	Suggested	Resources
§ Gaining	meaningful	insights	from	data	requires	planning,	technical	awareness	and	consistency.
§ Statistical	analysis	isn’t	a	replacement	for	your	own	logic	(don’t	go	on	statistical	autopilot).	
§ Utilize	available	resources	(blogs,	podcasts,	articles,	webinars	and	online	courses)	to	learn	more.
− Look	for	APPLIED statistics	topics	
§ Big	data	is	not	always	required.
§ Basic	understanding	of	the	statistical
analysis	process	goes	a	long	way!	
pg 25© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Podcast:	Not	So	Standard	Deviations
https://soundcloud.com/nssd-podcast
Guide:	When	Predictive	Models	Fail
searchdatamanagement.techtarget.com/
ezine/Business-Information/When-
predictive-analytics-models-produce-
false-outcomes
Book:	Statistics
in	Plain	English
Timothy	C.	Urdan
Closing	Q&A
pg 26© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Descriptive
Predictive
Prescriptive
?
pg 27
Thank	you!
See	you	Thursday,	April	6		for	our	next	DIA	webinar,
Building	a	Flexible	and	Scalable	Analytics	Architecture
Catch	our	webinar	recap	next	week	here:
firstsanfranciscopartners.com/blog
John	Ladley			@jladley
john@firstsanfranciscopartners.com
Kelle	O’Neal			@kellezoneal
kelle@firstsanfranciscopartners.com
© 2016 First San Francisco Partners www.firstsanfranciscopartners.com

DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics