SlideShare a Scribd company logo
IMPACT EXTEND
MEASURECAMP R CLASS
September 2018
I N T R O D U C T I O N
Who is impact extend, and how do we work with data?
02 W H A T M A K E S R S O A W E S O M E ?
Cons and pros against using R to Extract, Transform and load data
based on usecases.
03 C A L C U L A T I N G , J O I N I N G A N D G R O U P I N G D A T A
Unifying and transforming data, always.
01
AGENDA
C R E A T E , W R I T E A N D R E A D F R O M G O O G L E
S H E E T
Using R to build a free database to be used for reporting, datastorage or
Google Data Studio.
05 I N T R O D U C T I O N T O R M A R K D O W N
Automate your reporting framework by leveraging R Markdown, Shiny
and simple HTML
06 S C H E D U L E R S C R I P T S O N Y O U R M A C H I N E
How can you do as little as possible?
04
Who is impact extend, and how do we work with data?
01.
INTRODUCTION
• Copenhagen based
• Lead analyst at IMPACT EXTEND
• 2 years in doing R
• 5 years in doing GTM and GA work
• 2 years in doing random SEO and Website stuff
About me
• Kickass analyst in terms of understanding humans
• BI specialist within using PowerBI to do crazy dashboards
• Former Google Analytics class educator
• The nerd who is always curious about taking it next step
…. Also he build an entire GA validator by himself which is quite
cool
About Rasmus
100% focus on digital commerce Long customer relations 7 x Gazelle
A A R H U S – C O P E N H A G E N - L I S B O A
1 2 6 E M P L O Y E E S
E S T A B L I S H E D I N 1 9 9 8
Market leader in commerce
Established in 2018 150+ Employees Aarhus - Copenhagen - Lisbon
Part of IMPACT A/S Clients: Largest retailers in the nordics Focus is on datadriven marketing
OUR OFFERINGS
ATTRACT
ANDSELL
TRAFFIC &
INSIGHTS
SERVE
ANDGROW
DIALOGUE &
LOYALTY
DATAANDINSIGHTS
DMP & INTELLIGENCE
DIGITAL
MARKETING
STRATEGY
Full-service approach with combined services delivering holistic
solutions to address Marketing’s primary pains and objectives with
digital marketing strategies
OUR PURPOSE AS AN AGENCY
OUR APPROACH TO WORK WITH DATA
Behavioraldata
User ID
Sessions
Cross-device
CRMDATA
User ID
Purchase
Channels
(web/store)
IMPRESSIONDATA
User ID
Conversions
Store Visits
ENGAGEMENT DATA
User ID
Mails
Open/click
MARKETINGDB
Dataconsolidation
Segmentation
Engagement
LTV
Segmentering
Personalization
Dynamisk content
Triggers
WhoRyou?And whatisyourexperience?
Andwhydidyoucomeheretoday?
Cons and pros against using R to Extract, Transform and load data
based on usecases.
02.
WHAT MAKES R SO AWESOME?
Extract
GetDatafromAPI
ScrapeWebdata
Workwithnormal worksheets
Transform
Do all your calculations automatically
Splitdataapartandassembleitwith
other data
Do hugeworkloads fastas thereis nota
traditionGUI likeexcel
Load
Senddatato databases
Create dashboards
Makeautomatedreports
Getthedatathewayyouneedit
Makesurethatitlookslikeyouwantit
Dowhateveryouneedyourdatatodo
Unifying and transforming data
03.
CALCULATING, JOINING AND
GROUPING DATA
GENERATE FAKE DATA FROM A GITHUB
RESPORATORY
install.packages("RCurl")
library(RCurl)
#go to https://bit.ly/2PSb6FB and copy paste the URL
url <- "thepasted url"
script <- getURL(url, ssl.verifypeer = FALSE)
eval(parse(text = script))
This should give you 300 rows of data, that we can use to do various calculations and modifications with
GENERATE FAKE DATA FROM A GITHUB
RESPORATORY
GENERATE FAKE DATA FROM A GITHUB
REPOSITORY
WITH THE ID’S WE CAN CHECK FOR DUPLICATES
This is to determine if there are one or more
users that goes through the dataset. By
knowing we have the same user more than
once, we can aggregate data by user
duplicated(ID$CustomerID)
TO UNDERSTAND HOW THIS DATA LOOKS
AGGREGATED ON A USERLEVEL, IN EXCEL IT
WOULD LOOK LIKE THIS
Here, the Google Analytics cookie ID is
assembled with visit to the sites each day. As
each ID is connected to a GA cookie ID, we
can actually see how many devices each users
are going through within a user journey
TO DO THE SAME, DPLYR HAS SOME GREAT
WAYS OF WORKING WITH DATA
P I V O T B Y I D W I L L P R O D U C E T H I S
#group by device
ID %>%
group_by(CustomerID) %>%
summarise(devices = n_distinct(GA))
To find out how many devices people are
using, we cam group them by customer
ID and Google Analytics ID
TO DO THE SAME, DPLYR HAS SOME GREAT
WAYS OF WORKING WITH DATA
P I V O T B Y S E S S I O N S W I L L P R O D U C E T H I S
#group by device
ID %>%
group_by(CustomerID) %>%
summarise(devices = n_distinct(GA))
To find out how many session the users
had in total, you can use this
JOINS
left_join()
return all rows from x, and all
columns from x and y. Rows in
x with no match in y will have
NA values in the new columns.
If there are multiple matches
between x and y, all
combinations of the matches are
returned.
right_join()
return all rows from y, and all
columns from x and y. Rows in
y with no match in x will have
NA values in the new columns.
If there are multiple matches
between x and y, all
combinations of the matches are
returned.
full_join()
return all rows and all columns
from both x and y. Where there
are not matching values, returns
NA for the one missing.
Note: FULL OUTER JOIN can
potentially return very large
result-sets!
I N N E R J O I N L E F T J O I N R I G H T J O I N F U L L J O I N
inner_join()
return all rows from x where
there are matching values in y,
and all columns from x and y. If
there are multiple matches
between x and y, all
combination of the matches are
returned.
JOINS
inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
semi_join(x, y, by = NULL, copy = FALSE, ...)
anti_join(x, y, by = NULL, copy = FALSE, ...)
x, y tbls to join
by a character vector of variables to join by. If NULL,
the default, *_join() will do a natural join, using all
variables with common names across the two tables. A
message lists the variables so that you can check
they're right (to suppress the message, simply
explicitly list the variables that you want to join).
To join by different variables on x and y use a named
vector. For example, by = c("a" = "b") will
match x.a to y.b.
copy If x and y are not from the same data source,
and copy is TRUE, then y will be copied into the
same src as x. This allows you to join tables across
srcs, but it is a potentially expensive operation so you
must opt into it.
suffix If there are non-joined duplicate variables in x and y,
these suffixes will be added to the output to
disambiguate them. Should be a character vector of
length 2.
CRM DATA Google Analytics
INNER JOIN
inner_join()
return all rows from x where there are matching
values in y, and all columns from x and y. If
there are multiple matches between x and y, all
combination of the matches are returned.
What does this mean?
We join the two tables where the UserID is
present.
inner_join(Dataset1, Dataset2, by = "UserID",
copy = FALSE, suffix = c(".x", ".y"))
A1 A1
A2
A3
LEFT JOIN
left_join()
return all rows from x, and all columns from x
and y. Rows in x with no match in y will have
NA values in the new columns. If there are
multiple matches between x and y, all
combinations of the matches are returned.
What does this mean?
inner_join(Dataset1, Dataset2, by = "UserID",
copy = FALSE, suffix = c(".x", ".y"))
RIGHT JOIN
right_join()
return all rows from y, and all columns from x
and y. Rows in y with no match in x will have
NA values in the new columns. If there are
multiple matches between x and y, all
combinations of the matches are returned.
What does this mean?
FULL JOIN
left_join()
return all rows from x, and all columns from x
and y. Rows in x with no match in y will have
NA values in the new columns. If there are
multiple matches between x and y, all
combinations of the matches are returned.
What does this mean?
We take table 1 one, and join it with table 2
Using R to build a free database to be used for reporting, datastorage or
Google Data Studio.
04.
CREATE, WRITE AND READ
FROM GOOGLE SHEET
• We use the google authr package created by Mark
Edmonson
• This allows us to generate a token which we can
use to work with Googles products
AUTHENTICATION
#install and load google drive
install.packages("googlesheets")
library(googlesheets)
googlesheets::gs_auth()
CREATE A GOOGLE SHEET
gs_new(title = "impactextendrclass")
gs <- gs_title("impactextendrclass")
gs_browse(gs, ws = 1)
SHOULD LOOK SOMETHING LIKE THIS
LETS ADD SOME DATA TO IT!
library(dplyr)
gs %>%
gs_edit_cells(ws = 1, input = ID, trim = TRUE)
LETS ADD SOME DATA TO IT!
LETS ADD SOME MORE DATA TO IT!
eval(parse(text = script))
n <- paste("A",nrow(ID), sep="")
gs_edit_cells(gs, ws = 1, input = ID, anchor = n, byrow = FALSE,
col_names = FALSE, trim = FALSE, verbose = TRUE)
What happens is that we use the “paste” function to
find out where to add the new data from so we don’t
break the old data
DOWNLOAD AND MODIFY GS DATA
E X T R A C T T R A N S F O R M L O A D
#download gs data
download <- gs_read(gs)
upload <-
download %>%
group_by(CustomerID,sessions) %>%
summarise(devices =
n_distinct(GA))
gs %>%
gs_ws_new(ws_title =
"aggregated", input
= upload)
WHICH SHOULD GIVE YOU THIS
There are many ways to do similar task, and the
usecases are basically endless. For larger dataset we
recommend that you send the data to BigQuery or
other databases which can handle more information.
With BigQuery it will be the same approach except
that it requires that you link your creditcard to the
account
THE EASY USECASE IS DATASTUDIO
THE EASY USECASE IS DATASTUDIO
THE EASY USECASE IS DATASTUDIO
THE EASY USECASE IS DATASTUDIO
IT IS ALSO REALLY COOL FOR WEBSCRAPING
Automate your reporting framework by leveraging R Markdown, Shiny
and simple HTML
05.
Introduction to R markdown
• An adoptation to general Markdown which is used to do
documentation etc.
• R Markdown makes it possible to generate different types of
documents such as HTML, Word, PDF, Slides etc.
• R markdown is really easy to write with and keeps formatting clean
and simple
• Use the cheat sheet to play around
What is Rmarkdown?
• In terms of making sure that our GTM setups were GDPR complient
we wrote a script that took data down from GTM, and then it ran
trough everything to ensure that it was set with the right compliance
rules.
• Today we have this document generated once every 6 months, and it
will flag if there are any issues we need to take care of
Example - HTML
Example - Slides
DOING
VISUALIZATIONS
• To be able to visualize anything we need to
have the data physically downloaded on our
machine
• Also it needs to be loaded whenever you run
your document
save(upload, download,
file = "data.RData")
load("data.RData")
MAKING TABLES
• To be able to visualize anything we need to
have the data physically downloaded on our
machine
• Also it needs to be loaded whenever you run
your document
save(upload, download,
file = "data.RData")
load("data.RData")
```{r table, echo=TRUE, message=FALSE,
warning=FALSE}
library(ggplot2)
library(kableExtra)
library(kableExtra)
library(dplyr)
library(knitr)
head(upload) %>%
kable() %>%
kable_styling("HTML")
```
MAKING TABLES
The cool thing here is that
you can do any html and css
styling to your documents.
This means that you can do
basically anything that is
possible within HTML and
CSS
USING GGPLOT2
USING GGPLOT2
USING GGPLOT2
ggplot(ID,aes(x=date,y=pageviews)) + geom_line()
ID$date <- as.Date(ID$date)
ID2 <- ID
ID2 %>%
group_by(date) %>%
summarise(sum(sessions))
ID2$pageviews <- as.numeric(ID2$pageviews)
ID2$sessions <- as.numeric(ID2$sessions)
ID2 <- ID2 %>%
group_by(date) %>%
summarise(pageviews = sum(pageviews))
ggplot(ID2,aes(x=date,y=pageviews)) + geom_line()
USING GGPLOT2
ggplot(ID,aes(x=date,y=pageviews)) + geom_line()
USING GGPLOT2
ggplot(ID2,aes(x=date,y=pageviews)) + geom_line()
USING GGPLOT2
ggplot(ID2,aes(pageviews)) + geom_bar()
PLAY AROUND WITH R MARKDOWN AND PLOTS –
GOOGLE IS YOUR FRIEND FOR SEEING THE
POSSIBILITIES!
How can you do as little as possible?
06.
Schedule tasks
SCHEDULA(R) Tools à Addins à Browse Addins
Choose the file that should be executed by the file.
Choose the frequency, startDate, startTime of which
the file shall be executed.
• On PC:
• - Task Scheduler
• See and kill the process.
• On Mac:
• - Begin Automator. Click “Applications” on the
Dock of your Mac. ...
HOW TO STOP
IT AGAIN!
Rclass

More Related Content

What's hot

Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
R Visualization Assignment
R Visualization AssignmentR Visualization Assignment
R Visualization Assignment
Vassilis Kapatsoulias
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
DataminingTools Inc
 
Computing and Data Analysis for Environmental Applications
Computing and Data Analysis for Environmental ApplicationsComputing and Data Analysis for Environmental Applications
Computing and Data Analysis for Environmental Applications
Statistics Assignment Help
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
DanWooster1
 
BP208 Fabulous Feats with @Formula
BP208 Fabulous Feats with @FormulaBP208 Fabulous Feats with @Formula
BP208 Fabulous Feats with @FormulaKathy Brown
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Salah Amean
 

What's hot (9)

Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
R Visualization Assignment
R Visualization AssignmentR Visualization Assignment
R Visualization Assignment
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Computing and Data Analysis for Environmental Applications
Computing and Data Analysis for Environmental ApplicationsComputing and Data Analysis for Environmental Applications
Computing and Data Analysis for Environmental Applications
 
Matlab for marketing people
Matlab for marketing peopleMatlab for marketing people
Matlab for marketing people
 
Data1
Data1Data1
Data1
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
 
BP208 Fabulous Feats with @Formula
BP208 Fabulous Feats with @FormulaBP208 Fabulous Feats with @Formula
BP208 Fabulous Feats with @Formula
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 

Similar to Rclass

An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
Dataspora
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
Parth Khare
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
Ummiya Mohammedi
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualization
Nacho Caballero
 
Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processing
Tim Essam
 
R studio
R studio R studio
R studio
Kinza Irshad
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
ProCogia
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
excel content
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
DataminingTools Inc
 
PHStat Notes Using the PHStat Stack Data and .docx
    PHStat Notes    Using the  PHStat Stack Data  and .docx    PHStat Notes    Using the  PHStat Stack Data  and .docx
PHStat Notes Using the PHStat Stack Data and .docx
ShiraPrater50
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
Ashwini Mathur
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
Florian Uhlitz
 
Types Working for You, Not Against You
Types Working for You, Not Against YouTypes Working for You, Not Against You
Types Working for You, Not Against You
C4Media
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
Mohammed El Rafie Tarabay
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
preetikumara
 
5 structured programming
5 structured programming 5 structured programming
5 structured programming hccit
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
Spencer Fox
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
Anshik Bansal
 

Similar to Rclass (20)

An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualization
 
Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processing
 
R studio
R studio R studio
R studio
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
PHStat Notes Using the PHStat Stack Data and .docx
    PHStat Notes    Using the  PHStat Stack Data  and .docx    PHStat Notes    Using the  PHStat Stack Data  and .docx
PHStat Notes Using the PHStat Stack Data and .docx
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Types Working for You, Not Against You
Types Working for You, Not Against YouTypes Working for You, Not Against You
Types Working for You, Not Against You
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
5 structured programming
5 structured programming 5 structured programming
5 structured programming
 
assignment 2
assignment 2assignment 2
assignment 2
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 

Recently uploaded

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

Rclass

  • 1. IMPACT EXTEND MEASURECAMP R CLASS September 2018
  • 2. I N T R O D U C T I O N Who is impact extend, and how do we work with data? 02 W H A T M A K E S R S O A W E S O M E ? Cons and pros against using R to Extract, Transform and load data based on usecases. 03 C A L C U L A T I N G , J O I N I N G A N D G R O U P I N G D A T A Unifying and transforming data, always. 01 AGENDA C R E A T E , W R I T E A N D R E A D F R O M G O O G L E S H E E T Using R to build a free database to be used for reporting, datastorage or Google Data Studio. 05 I N T R O D U C T I O N T O R M A R K D O W N Automate your reporting framework by leveraging R Markdown, Shiny and simple HTML 06 S C H E D U L E R S C R I P T S O N Y O U R M A C H I N E How can you do as little as possible? 04
  • 3. Who is impact extend, and how do we work with data? 01. INTRODUCTION
  • 4. • Copenhagen based • Lead analyst at IMPACT EXTEND • 2 years in doing R • 5 years in doing GTM and GA work • 2 years in doing random SEO and Website stuff About me
  • 5. • Kickass analyst in terms of understanding humans • BI specialist within using PowerBI to do crazy dashboards • Former Google Analytics class educator • The nerd who is always curious about taking it next step …. Also he build an entire GA validator by himself which is quite cool About Rasmus
  • 6. 100% focus on digital commerce Long customer relations 7 x Gazelle A A R H U S – C O P E N H A G E N - L I S B O A 1 2 6 E M P L O Y E E S E S T A B L I S H E D I N 1 9 9 8 Market leader in commerce Established in 2018 150+ Employees Aarhus - Copenhagen - Lisbon Part of IMPACT A/S Clients: Largest retailers in the nordics Focus is on datadriven marketing
  • 7. OUR OFFERINGS ATTRACT ANDSELL TRAFFIC & INSIGHTS SERVE ANDGROW DIALOGUE & LOYALTY DATAANDINSIGHTS DMP & INTELLIGENCE DIGITAL MARKETING STRATEGY Full-service approach with combined services delivering holistic solutions to address Marketing’s primary pains and objectives with digital marketing strategies
  • 8. OUR PURPOSE AS AN AGENCY
  • 9. OUR APPROACH TO WORK WITH DATA Behavioraldata User ID Sessions Cross-device CRMDATA User ID Purchase Channels (web/store) IMPRESSIONDATA User ID Conversions Store Visits ENGAGEMENT DATA User ID Mails Open/click MARKETINGDB Dataconsolidation Segmentation Engagement LTV Segmentering Personalization Dynamisk content Triggers
  • 11. Cons and pros against using R to Extract, Transform and load data based on usecases. 02. WHAT MAKES R SO AWESOME?
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Extract GetDatafromAPI ScrapeWebdata Workwithnormal worksheets Transform Do all your calculations automatically Splitdataapartandassembleitwith other data Do hugeworkloads fastas thereis nota traditionGUI likeexcel Load Senddatato databases Create dashboards Makeautomatedreports Getthedatathewayyouneedit Makesurethatitlookslikeyouwantit Dowhateveryouneedyourdatatodo
  • 18. Unifying and transforming data 03. CALCULATING, JOINING AND GROUPING DATA
  • 19. GENERATE FAKE DATA FROM A GITHUB RESPORATORY install.packages("RCurl") library(RCurl) #go to https://bit.ly/2PSb6FB and copy paste the URL url <- "thepasted url" script <- getURL(url, ssl.verifypeer = FALSE) eval(parse(text = script)) This should give you 300 rows of data, that we can use to do various calculations and modifications with
  • 20. GENERATE FAKE DATA FROM A GITHUB RESPORATORY
  • 21. GENERATE FAKE DATA FROM A GITHUB REPOSITORY
  • 22. WITH THE ID’S WE CAN CHECK FOR DUPLICATES This is to determine if there are one or more users that goes through the dataset. By knowing we have the same user more than once, we can aggregate data by user duplicated(ID$CustomerID)
  • 23. TO UNDERSTAND HOW THIS DATA LOOKS AGGREGATED ON A USERLEVEL, IN EXCEL IT WOULD LOOK LIKE THIS Here, the Google Analytics cookie ID is assembled with visit to the sites each day. As each ID is connected to a GA cookie ID, we can actually see how many devices each users are going through within a user journey
  • 24. TO DO THE SAME, DPLYR HAS SOME GREAT WAYS OF WORKING WITH DATA P I V O T B Y I D W I L L P R O D U C E T H I S #group by device ID %>% group_by(CustomerID) %>% summarise(devices = n_distinct(GA)) To find out how many devices people are using, we cam group them by customer ID and Google Analytics ID
  • 25. TO DO THE SAME, DPLYR HAS SOME GREAT WAYS OF WORKING WITH DATA P I V O T B Y S E S S I O N S W I L L P R O D U C E T H I S #group by device ID %>% group_by(CustomerID) %>% summarise(devices = n_distinct(GA)) To find out how many session the users had in total, you can use this
  • 26. JOINS left_join() return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. right_join() return all rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. full_join() return all rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing. Note: FULL OUTER JOIN can potentially return very large result-sets! I N N E R J O I N L E F T J O I N R I G H T J O I N F U L L J O I N inner_join() return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned.
  • 27. JOINS inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) semi_join(x, y, by = NULL, copy = FALSE, ...) anti_join(x, y, by = NULL, copy = FALSE, ...) x, y tbls to join by a character vector of variables to join by. If NULL, the default, *_join() will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join). To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b. copy If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it. suffix If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.
  • 28. CRM DATA Google Analytics
  • 29. INNER JOIN inner_join() return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned. What does this mean? We join the two tables where the UserID is present. inner_join(Dataset1, Dataset2, by = "UserID", copy = FALSE, suffix = c(".x", ".y")) A1 A1 A2 A3
  • 30. LEFT JOIN left_join() return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. What does this mean? inner_join(Dataset1, Dataset2, by = "UserID", copy = FALSE, suffix = c(".x", ".y"))
  • 31. RIGHT JOIN right_join() return all rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. What does this mean?
  • 32. FULL JOIN left_join() return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. What does this mean? We take table 1 one, and join it with table 2
  • 33. Using R to build a free database to be used for reporting, datastorage or Google Data Studio. 04. CREATE, WRITE AND READ FROM GOOGLE SHEET
  • 34. • We use the google authr package created by Mark Edmonson • This allows us to generate a token which we can use to work with Googles products AUTHENTICATION #install and load google drive install.packages("googlesheets") library(googlesheets) googlesheets::gs_auth()
  • 35. CREATE A GOOGLE SHEET gs_new(title = "impactextendrclass") gs <- gs_title("impactextendrclass") gs_browse(gs, ws = 1)
  • 37. LETS ADD SOME DATA TO IT! library(dplyr) gs %>% gs_edit_cells(ws = 1, input = ID, trim = TRUE)
  • 38. LETS ADD SOME DATA TO IT!
  • 39. LETS ADD SOME MORE DATA TO IT! eval(parse(text = script)) n <- paste("A",nrow(ID), sep="") gs_edit_cells(gs, ws = 1, input = ID, anchor = n, byrow = FALSE, col_names = FALSE, trim = FALSE, verbose = TRUE) What happens is that we use the “paste” function to find out where to add the new data from so we don’t break the old data
  • 40. DOWNLOAD AND MODIFY GS DATA E X T R A C T T R A N S F O R M L O A D #download gs data download <- gs_read(gs) upload <- download %>% group_by(CustomerID,sessions) %>% summarise(devices = n_distinct(GA)) gs %>% gs_ws_new(ws_title = "aggregated", input = upload)
  • 41. WHICH SHOULD GIVE YOU THIS There are many ways to do similar task, and the usecases are basically endless. For larger dataset we recommend that you send the data to BigQuery or other databases which can handle more information. With BigQuery it will be the same approach except that it requires that you link your creditcard to the account
  • 42. THE EASY USECASE IS DATASTUDIO
  • 43. THE EASY USECASE IS DATASTUDIO
  • 44. THE EASY USECASE IS DATASTUDIO
  • 45. THE EASY USECASE IS DATASTUDIO
  • 46. IT IS ALSO REALLY COOL FOR WEBSCRAPING
  • 47. Automate your reporting framework by leveraging R Markdown, Shiny and simple HTML 05. Introduction to R markdown
  • 48. • An adoptation to general Markdown which is used to do documentation etc. • R Markdown makes it possible to generate different types of documents such as HTML, Word, PDF, Slides etc. • R markdown is really easy to write with and keeps formatting clean and simple • Use the cheat sheet to play around What is Rmarkdown?
  • 49. • In terms of making sure that our GTM setups were GDPR complient we wrote a script that took data down from GTM, and then it ran trough everything to ensure that it was set with the right compliance rules. • Today we have this document generated once every 6 months, and it will flag if there are any issues we need to take care of Example - HTML
  • 51. DOING VISUALIZATIONS • To be able to visualize anything we need to have the data physically downloaded on our machine • Also it needs to be loaded whenever you run your document save(upload, download, file = "data.RData") load("data.RData")
  • 52. MAKING TABLES • To be able to visualize anything we need to have the data physically downloaded on our machine • Also it needs to be loaded whenever you run your document save(upload, download, file = "data.RData") load("data.RData") ```{r table, echo=TRUE, message=FALSE, warning=FALSE} library(ggplot2) library(kableExtra) library(kableExtra) library(dplyr) library(knitr) head(upload) %>% kable() %>% kable_styling("HTML") ```
  • 53. MAKING TABLES The cool thing here is that you can do any html and css styling to your documents. This means that you can do basically anything that is possible within HTML and CSS
  • 56. USING GGPLOT2 ggplot(ID,aes(x=date,y=pageviews)) + geom_line() ID$date <- as.Date(ID$date) ID2 <- ID ID2 %>% group_by(date) %>% summarise(sum(sessions)) ID2$pageviews <- as.numeric(ID2$pageviews) ID2$sessions <- as.numeric(ID2$sessions) ID2 <- ID2 %>% group_by(date) %>% summarise(pageviews = sum(pageviews)) ggplot(ID2,aes(x=date,y=pageviews)) + geom_line()
  • 60. PLAY AROUND WITH R MARKDOWN AND PLOTS – GOOGLE IS YOUR FRIEND FOR SEEING THE POSSIBILITIES!
  • 61. How can you do as little as possible? 06. Schedule tasks
  • 62. SCHEDULA(R) Tools à Addins à Browse Addins Choose the file that should be executed by the file. Choose the frequency, startDate, startTime of which the file shall be executed.
  • 63.
  • 64.
  • 65. • On PC: • - Task Scheduler • See and kill the process. • On Mac: • - Begin Automator. Click “Applications” on the Dock of your Mac. ... HOW TO STOP IT AGAIN!