1. Candidate	Generation	Through	Co-Location	Pattern	Mining:
● Used	concept	of	participation	index(pi) to	find	interesting	co-locations
● Participation	Index	:	Efficient,	but	many	false	positives
2. Statistical	Tests	of	Co-Location	and	Clustering	Patterns:
● Ripley’s	K	and	Cross-K against	Poisson	Complete	Spatial	Randomness
● Ripley’s	K :	𝐾" 𝑑 =	 𝜆'( ∑
*(,-./,)
1345
● Cross-K :	𝐾" 𝑑 =	 𝜆5
'(
∑
*(,-./,)
1-
345
● K	Functions	:	Computationally	expensive,	very	accurate
● Autocorrelation	vs.	Business	Decision
3. Monte-Carlo	Simulation:
● Shuffle	businesses	among	their	location	
domain,	calculate	K-function	value	each
time
● Repeat	999	times,	if	above	95%	
confidence,	reject	null	hypothesis	
and	claim	intentional	clustering
Introduction	and	Background Discussion
Dixon,	P.	M.	(2014).	Ripley's	K	Function.	Wiley	StatsRef:	
Statistics	Reference	Online.
Huang,	Y.,	Shekhar,	S.,	&	Xiong,	H.	(2004).	Discovering	
colocation	patterns	from	spatial	data	sets:	
a	general	approach.	IEEE	Transactions	on	Knowledge	and
Data	Engineering,	16(12),	1472-1485.
Metropolis,	N.,	&	Ulam,	S.	(1949).	The	monte	carlo method.	
Journal	of	the	American	Statistical	Association,	44(247),	
335-341.
Methods	and	Analysis	Techniques
References
1. Gas	stations	in	general	tend	to	be	
clustered	or	inconclusive.	Main	gas	
stations	tend	to	be	inconclusive	or	
declustered.	Small	gas	businesses	lack	
access	to	extensive	land	choice	and/or	
data
2. Food locations	tend	to	cluster	due	to	
competition	or	complementation
3. Food	and	Bank	location	may	be	clustered	
due	urban	commercial	region	planning	by	
cities
Co-Location Pattern:
A co-location is a set of spatial
features	that	frequently	occur	together	(e.g.	
Walmart	and	Subway)
Figure	1	:	Walmart	and	Subway	Co-Location
Retrieved	from	http://www.kptv.com/story/33335477/affidavit-man-pointed-gun-at-subway-manager-inside-walmart-shot-at-pumpkins
Objectives	and	Significance:
Discover	previously	unknown	business	location	
patterns	among:
1. Specific	brands	within	industries
2. Specific	brands	across	industries
3. Different	Industries
4. Same	Industries
Help	small	businesses	choose	location	by	
revealing	location	patterns	of	large	brands
Data	Set:
Dataset	for	three	largest	cities	in	the	US	
obtained	through	querying	Google	Places	API.	
Heat	maps	of	locations	shown	in	Figures	2,	3,	4.	
Brightness	indicates	density
Figure	2	:	New	York	City Figure	3	:	Los	Angeles
Figure	4	:	Chicago
Understanding	business	location	patterns	through	co-location	pattern	mining
Jeffrey	Chiu1,	Amin	Vahedian Khezerlou2,	Xun Zhou2
Irvington	High	School1	- Department	of	Management	Sciences,	The	University	of	Iowa2
Figure	6	:	Flow	of	Analysis	in	this	project	
Figure	5	:	Example	Monte-Carlo	K	Function
Retrieved	from	resources.esri.com/help/9.3/arcgisdesktop/com/gp_toolref/spatial_statistics_tools/
Results
Figure	7	:	Ripley’s	K	Function	for	All	Gas	Stations	and	Major	Gas	Stations	Only
(a)	NYC	All (b)	NYC	Major (c)	LA	All (d)	LA	Major (f)	Chicago	Major(f)	Chicago	All
Figure	8	:	Cross-K	Functions	within	food Industry
(a)	Chipotle/Le	Paine	
Quotidien NYC
(b)	Juice	Press/Le	Paine	
Quotidien NYC
(c)	McDonalds/BR	NYC (d)	Dunkin/Subway	
NYC
Further	Directions
(d)	Dunkin/Subway	
Chicago
(e)	Starbucks/Subway	
LA
1. Find	interesting	co-location	patterns	
among	other	cities	and	find	overarching	
patterns	among	cities
2. Work	with	economics	researchers	to	
generate	economic	theories	behind	co-
location	patterns
3. Submit	for	publication	to	an	academic	
journal	after	further	results	and	analysis
Acknowledgements
Figure	9	:	Cross-K	Function	across	Industries
(a)	Chipotle/HSBC	
Bank	NYC
(b)	Jimmy	Johns/	
Chase	Chicago
(c)	Starbucks/	Wells	
Fargo	LA
Special	thanks	to	Amin	and	Dr.	Zhou	for	their	
guidance	on	this	project.	Thanks	to	SSTP	for	
allowing	me	to	have	such	a	wonderful	opportunity.
Figure	10	:	Cross-K	Function	of	types	food and	bank
(a)	New	York	City (b)	Los	Angeles (c)	Chicago

Understanding business location patterns through co-location pattern mining

  • 1.
    1. Candidate Generation Through Co-Location Pattern Mining: ● Used concept of participation index(pi)to find interesting co-locations ● Participation Index : Efficient, but many false positives 2. Statistical Tests of Co-Location and Clustering Patterns: ● Ripley’s K and Cross-K against Poisson Complete Spatial Randomness ● Ripley’s K : 𝐾" 𝑑 = 𝜆'( ∑ *(,-./,) 1345 ● Cross-K : 𝐾" 𝑑 = 𝜆5 '( ∑ *(,-./,) 1- 345 ● K Functions : Computationally expensive, very accurate ● Autocorrelation vs. Business Decision 3. Monte-Carlo Simulation: ● Shuffle businesses among their location domain, calculate K-function value each time ● Repeat 999 times, if above 95% confidence, reject null hypothesis and claim intentional clustering Introduction and Background Discussion Dixon, P. M. (2014). Ripley's K Function. Wiley StatsRef: Statistics Reference Online. Huang, Y., Shekhar, S., & Xiong, H. (2004). Discovering colocation patterns from spatial data sets: a general approach. IEEE Transactions on Knowledge and Data Engineering, 16(12), 1472-1485. Metropolis, N., & Ulam, S. (1949). The monte carlo method. Journal of the American Statistical Association, 44(247), 335-341. Methods and Analysis Techniques References 1. Gas stations in general tend to be clustered or inconclusive. Main gas stations tend to be inconclusive or declustered. Small gas businesses lack access to extensive land choice and/or data 2. Food locations tend to cluster due to competition or complementation 3. Food and Bank location may be clustered due urban commercial region planning by cities Co-Location Pattern: A co-location is a set of spatial features that frequently occur together (e.g. Walmart and Subway) Figure 1 : Walmart and Subway Co-Location Retrieved from http://www.kptv.com/story/33335477/affidavit-man-pointed-gun-at-subway-manager-inside-walmart-shot-at-pumpkins Objectives and Significance: Discover previously unknown business location patterns among: 1. Specific brands within industries 2. Specific brands across industries 3. Different Industries 4. Same Industries Help small businesses choose location by revealing location patterns of large brands Data Set: Dataset for three largest cities in the US obtained through querying Google Places API. Heat maps of locations shown in Figures 2, 3, 4. Brightness indicates density Figure 2 : New York City Figure 3 : Los Angeles Figure 4 : Chicago Understanding business location patterns through co-location pattern mining Jeffrey Chiu1, Amin Vahedian Khezerlou2, Xun Zhou2 Irvington High School1 - Department of Management Sciences, The University of Iowa2 Figure 6 : Flow of Analysis in this project Figure 5 : Example Monte-Carlo K Function Retrieved from resources.esri.com/help/9.3/arcgisdesktop/com/gp_toolref/spatial_statistics_tools/ Results Figure 7 : Ripley’s K Function for All Gas Stations and Major Gas Stations Only (a) NYC All (b) NYC Major (c) LA All (d) LA Major (f) Chicago Major(f) Chicago All Figure 8 : Cross-K Functions within food Industry (a) Chipotle/Le Paine Quotidien NYC (b) Juice Press/Le Paine Quotidien NYC (c) McDonalds/BR NYC (d) Dunkin/Subway NYC Further Directions (d) Dunkin/Subway Chicago (e) Starbucks/Subway LA 1. Find interesting co-location patterns among other cities and find overarching patterns among cities 2. Work with economics researchers to generate economic theories behind co- location patterns 3. Submit for publication to an academic journal after further results and analysis Acknowledgements Figure 9 : Cross-K Function across Industries (a) Chipotle/HSBC Bank NYC (b) Jimmy Johns/ Chase Chicago (c) Starbucks/ Wells Fargo LA Special thanks to Amin and Dr. Zhou for their guidance on this project. Thanks to SSTP for allowing me to have such a wonderful opportunity. Figure 10 : Cross-K Function of types food and bank (a) New York City (b) Los Angeles (c) Chicago