TalkingData is a Mobile App Marketplace company based in China that has the largest independent big data service platform. Our objective is to explore and analyze data provided by TalkingData based on our hypothesis about users’ demographic characteristics to provide useful insights to support the company’s decision-making in R&D and Branding purposes.
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Talking Data: Mobile User Demographic Data Analysis
1. Group A
Ethan Jacobs
Apoorva Singu
Christopher Walker
Chun Wu
Mobile User Demographics Data Analysis
(by using R studio)
MIS 7190 Programming for Business
11/17/2016
2. Project Overview
TalkingData is a mobile app marketplace company based in
China that has the largest independent big data service
platform.
Objective:
To explore and analyze data offered by TalkingData (such as
app usage, geolocation, and mobile device properties) based
on our hypothesis about users’ demographic characteristics
to provide useful insights to support company’s decision-
making in R&D and Branding purposes.
3. Hypothesis
Hypothesis 1. Assuming the age group between 25-45 are more likely to use phone apps
during lunch time (noon-2pm) because in China, most companies allow people to have a
two-hour lunch/nap break.
Hypothesis 2. Assuming finance and business apps will be used more frequently on Friday
because users will be likely to check bank accounts and make payments.
Hypothesis 3. Assuming players of massive multiplayer online games (MMO) are likely to use
Tencent (chinese version skype/Facebook) while using gaming apps because distance
team players can voice chat with each other.
Hypothesis 4. Assuming large majority of game players will be under the age of
25 because there are social pressures and shifting personal preferences with
age.
5. Data Exploration - Data type
Interval Discrete:
- Count of Apps Used (derived)
- Hour of the Day (derived)
Interval Continuous:
- Age (given)
Categorical Nominal:
- device_id, gender, group, phone_brand, device_model
6. Data Exploration - Data Processing Methods
Data Import/Export: setwd, read.csv
Text Manipulation:
Problem: some data are collected in different language so was not presented well in R
Solution: add another external table with translation called p_b_d_model_trans.csv
Dimensions, header names and classes: summary, header, class
6,218,496 observation of 15 variable
Problem: the dataset is too large to analyze
Solution: only pull the tables and data we need based on the hypothesis to conduct analysis
7. Data Exploration - Data Processing Methods
Handling missing values is.na
Handling dates and time
Problem: how to pull “DATE” data (such as Monday, Sunday…) from time stamp
Time stamp example: 2016-05-02 00:46:51 → MONDAY
Solution: use as.POSIXIt for date, use weekdays(as.Date()) to converted it to Monday,
Tuesday…
finance.events <- events.data$event_id %in% finance.apps$event_id
finance.only <- events.data[finance.events,]
converted_time <- weekdays(as.Date(as.POSIXlt(finance.only$timestamp)))
Indexing, subsetting rows, columns
Data aggregation by groups dataframe
Data merging/joining merge
8. Data Summary
Mean age: 31.4
Median age: 29
Stand. Dev age: 9.87
Mean Age (male): 31.05
Median age (male): 29
Std. Dev Age (male): 9.45
Mean Age (female): 32.05
Median Age (female): 29
Std. Dev Age (female): 10.54
11. Analysis - Hypothesis 1
Assuming the age group between 25-45 are more likely to use phone apps during lunch time (noon-2pm) due to the fact
that in China, most companies allow people to have a two-hour lunch/nap break.
Findings
● This age group ( 25-45) are more likely
to use phone apps at 10am and 9pm.
● Of 1.2million app usage observation,
these two time slots account of 11% of
the usage
● Phone usage starts to decrease after
10 am and increase again at around
6pm
12. Analysis - Hypothesis 2
Assuming finance and business apps will be used more frequently on Friday because users will be likely to check bank
accounts and make payments.
Findings
● There is no significant differences of
finance and business usage in different
days of a week
● On average, 7800 usages of app per
day
● Friday finance and business apps
usages are a little bit more frequent
*1=Sunday, 7=Saturday
13. Analysis - Hypothesis 3
Assuming players of massively multiplayer online (MMO) games are likely to use Tencent (chinese version of
Skype/Facebook) while using gaming apps due to the fact that distant team players can voice chat with each other.
Findings
● “All MMO users are using Tencent
when play MMO games” ?
● The MMO games are developed by
and used within Tencent
14. Analysis - Hypothesis 4
Assuming large majority of game players will be under the age of 25 because there are social pressures and shifting
personal preferences with age.
Max: Age 26 = 926
Majority: Age 25-30
● Max: Age 26 = 4540
● Majority: Age 25-30
16. Findings
Max: Age 26 = 575
Majority: Age 26-30
Max: Age 26 = 352
Majority: Age 26-29
17. Future analysis
Better clean up and organized App_lable data for further research
Explore more data
Data set isn’t diverse
Such as MMO and Tencent Data
Explore Shopping Application use
Explore location
18. References
Kaggle.com Talkingdata Mobile User Demographics https://www.kaggle.com/c/talkingdata-mobile-user-demographics
Developer.apple.com Apple application category: https://developer.apple.com/app-store/categories/
MMO list https://en.wikipedia.org/wiki/List_of_massively_multiplayer_online_games
Editor's Notes
Whoever takes this part do mention about the intuition is from the objective and data exploration. We want to not only find information, but also find useful information for talkingdata’s r&d and branding business