More Related Content
Similar to Data Science career mixer poster
Similar to Data Science career mixer poster (20)
Data Science career mixer poster
- 1. RESEARCH POSTER PRESENTATION DESIGN
© 2015
www.PosterPresentation
Since the exploding popularity of social media
sites such as Twitter, there has been a lot of social
studies done using data mined from these sites.
Entirely spontaneous of its users, Twitter data is
truer to individuals’ expression than traditional
survey data.
MOTIVATION
OBJECTIVES
1. Gathering data (granularity issue):
• Given Date, GEOid, and Tweet Count data
(~6 million obs.)
• Finding survey data of the same granularity as
Twitter data was hard.
• Web-scraped ZCTA-level population and
median gross rent data from American
Community Survey.
METHODS
Do richer neighborhoods use Twitter more,
and is there a point in which the richest
neighborhoods use Twitter less?
RESULTS
CONCLUSIONS
Though there was an R2 of 0.03, the parameter
estimates had corresponding p-values that
suggested statistical significance, so we
concluded that wealthier neighborhoods use
Twitter more, while the wealthiest neighborhoods
use Twitter less.
The corresponding p-value for the model's F-
statistic is also significant, BUT the model itself
explains only 3% of the variability in the data.
Categorical Data Analysis
• Gather data from external sources to utilize
Twitter dataset.
• Clean, merge, and compile data for
econometric analysis
• Convert Twitter GEOid’s so that it would be
compatible with TIGER/Line Shapefiles and
Geographic Information Systems.
• Analyze data, create a model to predict
number of tweets per location.
Dept. of Economics, Sta/s/cs, and Computer Science
Tom Jeon BS, BDIC ’17 (seeking 2016 summer internships)
Clean TwiEer Data for TIGER/Line Shapefiles and Econometric Analysis
2. Cleaning data
(differing GEOid code
issue):
• Twitter and ACS
has different ways
notating location in
their datasets
• Converted GEOid
(location data) code
accordingly as
shown on the right.
3. Merging data
(missingness and
imputation issue):
• Two final datasets
• Dplyr package
Can median gross rent and population
predict the number of tweets?
WHAT I AM DOING NOW:
I want to learn as much analytical techniques as
possible so I’ve been learning from experts of different
domains.
LDA Topic Modeling
BioinformaMcs and Neural Data
Independent Study
supervised by
Brendan O’Connor