SPATIO-­TEMPORAL  SIMILARIITIES  
BETWEEN  MAJOR  TOPICS  ON  TWITTER
Team9
MEMBERS:  HENRY,FRANCIS,  ISABELLA,  XING,  JIANGHAO,  KAI
City  Challenge  Week  CUSP
September  1,  2016
GOAL  AND  OBJECTIVES
Cluster  twitter  topics  (#hashtag)  by  time  and  space
To  understand  the  pattern  of  most  popular  topics    
DATASET,  TOOLS  AND  LIMITATION
Data  size:
Ø Number  of  31940  twitter  data
Ø From  June  16th,  2016  to  July  10th,  2016
Ø Attributes:  Timestamp,  Tweets,  (lat,lon)
Tools:
Ø Carto for  visualization
Ø Scikit-­learn  for  clustering  and  PCA
Ø Pandas  for  data  manipulation
Limitations:
Ø Natural  Language  Processing  (sentiment  /  stemming…  etc)
Ø Size  of  dataset
Ø Number  of  retweet
Ø Socio-­economic  characteristic  
number'of'
319431'tweets'
Time'series'Data'
#
Select'
Top'50'
hashtags
Spa=al'Data'
Time'series'Clustering'
Spa=al'Clustering
K"means(Clustering
Principle(Component
METHODOLOGY
Text:
Stemming,    exclude  punctuation  
Time:
8 times 3hours grouping per weekday => 56 timeframes
Geographic:
Set up global grid system in 10 X 10 size and aggregate data within each grid
ANALYSIS
BOROUGH
CAREER
TRAVEL/NYC
#NYC
NEXT  STEPS  &  EXAMPLE
l Use  text  and  time  pattern  to  suggest  when/who  to  advertise
l Link  traffic  and  incident  updates  to  twitter  to  improve  safety
and    reduce  congestion
#HiringDIABETES  HEAT  MAP
FUTURE  WORK

City Challenge Week Presentation