Data science to improve your life

Laura Collett
Data science to
improve your life

March 2017
October 2017
The headlines

…please forgive the use of a pie chart
How does this apply to me though?

How may it apply to you?
• Use your own data!
My burning questions:
• What makes me happy?
• Am I getting better at running?
• What book should I read next?

/llcollett
/running
/happiness
/recbooks

So, what makes me happy?
• I tracked my mood every day in 2017…
• To see the
effect of
activities on
mood

Dataset
• Outcome is mood as a binary variable
• Binary or categorical predictor variables
• Looking at the effect of predictors on outcome in a
logistic regression model
daylio<-read_csv("daylio_2017.csv")

Descriptive statistics
• Quite optimistic in
general
• Haven’t quite defined
an ‘awful’ day…
• Possibly an upturn
at the weekends?

Check for collinearity of variables
library(GGally)

Results
model3<-glm(moodbin~sophie+friends+stats+outside+swimming+climbing,
family=binomial(link='logit'),data=daylio)
summary(model3)

More results
gmodel3<-glm(moodgreat~sophie+friends+work+outside+hikelessadventure+
hiking+reading+driving,family=binomial(link='logit'),data=daylio)
summary(gmodel3)

But am I getting better at running?
• Maximum Aerobic
Function (MAF) method
• In a nutshell: train at your
maximum aerobic heart rate in
order to build aerobic base fitness
and thus get faster at the same
heart rate
• Multilevel regression of the effect of heart
rate, slope and distance, on pace, for
multiple runs over time

Data runlist<-list(aug16,aug17,aug18,aug22,aug24,aug26,
aug28,aug30,sep13,sep15,sep16,sep17)

Model
Random effects:
Groups Name Variance Std.Dev. Corr
date (Intercept) 108.50 10.416
km 81.75 9.042 0.18
Residual 10.17 3.188
Fixed effects:
Estimate Std. Error t value
(Intercept) 414.805250 3.262865 127.13
hr -0.145002 0.008496 -17.07
slope -0.100876 0.010536 -9.57
km 5.988389 2.611040 2.29
km2 0.296926 0.011730 25.31
km3 -0.017167 0.000499 -34.40
library(lme4)
m1<-lmer(pace~hr+slope+km+km2+km3+(1+km|date),data=maf)
summary(m1)

Results
Two steps forward, one step back possibly…

Now, please recommend me a book
• Accessed reviews of the books I had on my
list, using the Goodreads API
• Created a dataset of ratings for multiple
books from multiple users (including me)
• Built a
recommender
system to
recommend
me a top 5

Goodreads API…
• Create booklist from my read (rated) and
shelved (would like a recommendation on)
books<-read_csv("goodreads_library_export.csv")
library(rgoodreads)
Sys.setenv(GOODREADS_KEY="7U8VDuR3phc4vD1WQF1g")
for (i in 1:n) {
for (j in 1:m){
ri<-j+(i-1)*m
tryCatch({
c1<-review(ri)
c2<-book(gsub(".*:","",c1$book))
c2$id<-as.numeric(c2$id)
if (c2$id %in% booklist$id) {
id[i,j]<-c2$id
rrating[i,j]<-c1$rating
rbooks<-data.frame(rrating[,j],id[,j])
}
}, error=function(e){cat("ERROR :",conditionMessage(e), "n")})
}
}
• Used rgoodreads package to request reviews from
API then save them if they are on the booklist

You can tell what books are popular already…
Who doesn’t rate The Catcher in the Rye 5/5???
…actually Liz didn’t

Recommender system
library(recommenderlab)
results<-
evaluate(scheme,algor
ithms,type="topNList"
,n=c(1,3,5,10,15,20))
scheme<-
evaluationScheme(subreal[1:9
00],method="split",train=0.9
,k=1,given=3,goodRating=5)

Results
r_ubcf<-Recommender(real[1:200],method="UBCF")
rec_ubcf<-predict(r_ubcf,real[388],n=5)
recbooks_ubcf<-as.data.frame(as(rec_ubcf,"list"))
recbooks_ubcf$id<-as.numeric(gsub("[b]","",recbooks_ubcf$u388))
recbooks_ubcf<-recbooks_ubcf[c("id")]
recbooks_ubcf<-merge(booklist,recbooks_ubcf)
User based collaborative filtering Popular items
• Both algorithms will be influenced heavily with what other people do,
so will not find “hidden gems” so to speak with this method

Data science to improve your life

Data science to improve your life

Recommended

Recommended

More Related Content

Similar to Data science to improve your life

Similar to Data science to improve your life (20)

Recently uploaded

Recently uploaded (20)

Data science to improve your life