Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rclass

54 views

Published on

R introclass for measurecamp london 2018

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Rclass

  1. 1. IMPACT EXTEND MEASURECAMP R CLASS September 2018
  2. 2. I N T R O D U C T I O N Who is impact extend, and how do we work with data? 02 W H A T M A K E S R S O A W E S O M E ? Cons and pros against using R to Extract, Transform and load data based on usecases. 03 C A L C U L A T I N G , J O I N I N G A N D G R O U P I N G D A T A Unifying and transforming data, always. 01 AGENDA C R E A T E , W R I T E A N D R E A D F R O M G O O G L E S H E E T Using R to build a free database to be used for reporting, datastorage or Google Data Studio. 05 I N T R O D U C T I O N T O R M A R K D O W N Automate your reporting framework by leveraging R Markdown, Shiny and simple HTML 06 S C H E D U L E R S C R I P T S O N Y O U R M A C H I N E How can you do as little as possible? 04
  3. 3. Who is impact extend, and how do we work with data? 01. INTRODUCTION
  4. 4. • Copenhagen based • Lead analyst at IMPACT EXTEND • 2 years in doing R • 5 years in doing GTM and GA work • 2 years in doing random SEO and Website stuff About me
  5. 5. • Kickass analyst in terms of understanding humans • BI specialist within using PowerBI to do crazy dashboards • Former Google Analytics class educator • The nerd who is always curious about taking it next step …. Also he build an entire GA validator by himself which is quite cool About Rasmus
  6. 6. 100% focus on digital commerce Long customer relations 7 x Gazelle A A R H U S – C O P E N H A G E N - L I S B O A 1 2 6 E M P L O Y E E S E S T A B L I S H E D I N 1 9 9 8 Market leader in commerce Established in 2018 150+ Employees Aarhus - Copenhagen - Lisbon Part of IMPACT A/S Clients: Largest retailers in the nordics Focus is on datadriven marketing
  7. 7. OUR OFFERINGS ATTRACT ANDSELL TRAFFIC & INSIGHTS SERVE ANDGROW DIALOGUE & LOYALTY DATAANDINSIGHTS DMP & INTELLIGENCE DIGITAL MARKETING STRATEGY Full-service approach with combined services delivering holistic solutions to address Marketing’s primary pains and objectives with digital marketing strategies
  8. 8. OUR PURPOSE AS AN AGENCY
  9. 9. OUR APPROACH TO WORK WITH DATA Behavioraldata User ID Sessions Cross-device CRMDATA User ID Purchase Channels (web/store) IMPRESSIONDATA User ID Conversions Store Visits ENGAGEMENT DATA User ID Mails Open/click MARKETINGDB Dataconsolidation Segmentation Engagement LTV Segmentering Personalization Dynamisk content Triggers
  10. 10. WhoRyou?And whatisyourexperience? Andwhydidyoucomeheretoday?
  11. 11. Cons and pros against using R to Extract, Transform and load data based on usecases. 02. WHAT MAKES R SO AWESOME?
  12. 12. Extract GetDatafromAPI ScrapeWebdata Workwithnormal worksheets Transform Do all your calculations automatically Splitdataapartandassembleitwith other data Do hugeworkloads fastas thereis nota traditionGUI likeexcel Load Senddatato databases Create dashboards Makeautomatedreports Getthedatathewayyouneedit Makesurethatitlookslikeyouwantit Dowhateveryouneedyourdatatodo
  13. 13. Unifying and transforming data 03. CALCULATING, JOINING AND GROUPING DATA
  14. 14. GENERATE FAKE DATA FROM A GITHUB RESPORATORY install.packages("RCurl") library(RCurl) #go to https://bit.ly/2PSb6FB and copy paste the URL url <- "thepasted url" script <- getURL(url, ssl.verifypeer = FALSE) eval(parse(text = script)) This should give you 300 rows of data, that we can use to do various calculations and modifications with
  15. 15. GENERATE FAKE DATA FROM A GITHUB RESPORATORY
  16. 16. GENERATE FAKE DATA FROM A GITHUB REPOSITORY
  17. 17. WITH THE ID’S WE CAN CHECK FOR DUPLICATES This is to determine if there are one or more users that goes through the dataset. By knowing we have the same user more than once, we can aggregate data by user duplicated(ID$CustomerID)
  18. 18. TO UNDERSTAND HOW THIS DATA LOOKS AGGREGATED ON A USERLEVEL, IN EXCEL IT WOULD LOOK LIKE THIS Here, the Google Analytics cookie ID is assembled with visit to the sites each day. As each ID is connected to a GA cookie ID, we can actually see how many devices each users are going through within a user journey
  19. 19. TO DO THE SAME, DPLYR HAS SOME GREAT WAYS OF WORKING WITH DATA P I V O T B Y I D W I L L P R O D U C E T H I S #group by device ID %>% group_by(CustomerID) %>% summarise(devices = n_distinct(GA)) To find out how many devices people are using, we cam group them by customer ID and Google Analytics ID
  20. 20. TO DO THE SAME, DPLYR HAS SOME GREAT WAYS OF WORKING WITH DATA P I V O T B Y S E S S I O N S W I L L P R O D U C E T H I S #group by device ID %>% group_by(CustomerID) %>% summarise(devices = n_distinct(GA)) To find out how many session the users had in total, you can use this
  21. 21. JOINS left_join() return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. right_join() return all rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. full_join() return all rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing. Note: FULL OUTER JOIN can potentially return very large result-sets! I N N E R J O I N L E F T J O I N R I G H T J O I N F U L L J O I N inner_join() return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned.
  22. 22. JOINS inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) semi_join(x, y, by = NULL, copy = FALSE, ...) anti_join(x, y, by = NULL, copy = FALSE, ...) x, y tbls to join by a character vector of variables to join by. If NULL, the default, *_join() will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join). To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b. copy If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it. suffix If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.
  23. 23. CRM DATA Google Analytics
  24. 24. INNER JOIN inner_join() return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned. What does this mean? We join the two tables where the UserID is present. inner_join(Dataset1, Dataset2, by = "UserID", copy = FALSE, suffix = c(".x", ".y")) A1 A1 A2 A3
  25. 25. LEFT JOIN left_join() return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. What does this mean? inner_join(Dataset1, Dataset2, by = "UserID", copy = FALSE, suffix = c(".x", ".y"))
  26. 26. RIGHT JOIN right_join() return all rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. What does this mean?
  27. 27. FULL JOIN left_join() return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned. What does this mean? We take table 1 one, and join it with table 2
  28. 28. Using R to build a free database to be used for reporting, datastorage or Google Data Studio. 04. CREATE, WRITE AND READ FROM GOOGLE SHEET
  29. 29. • We use the google authr package created by Mark Edmonson • This allows us to generate a token which we can use to work with Googles products AUTHENTICATION #install and load google drive install.packages("googlesheets") library(googlesheets) googlesheets::gs_auth()
  30. 30. CREATE A GOOGLE SHEET gs_new(title = "impactextendrclass") gs <- gs_title("impactextendrclass") gs_browse(gs, ws = 1)
  31. 31. SHOULD LOOK SOMETHING LIKE THIS
  32. 32. LETS ADD SOME DATA TO IT! library(dplyr) gs %>% gs_edit_cells(ws = 1, input = ID, trim = TRUE)
  33. 33. LETS ADD SOME DATA TO IT!
  34. 34. LETS ADD SOME MORE DATA TO IT! eval(parse(text = script)) n <- paste("A",nrow(ID), sep="") gs_edit_cells(gs, ws = 1, input = ID, anchor = n, byrow = FALSE, col_names = FALSE, trim = FALSE, verbose = TRUE) What happens is that we use the “paste” function to find out where to add the new data from so we don’t break the old data
  35. 35. DOWNLOAD AND MODIFY GS DATA E X T R A C T T R A N S F O R M L O A D #download gs data download <- gs_read(gs) upload <- download %>% group_by(CustomerID,sessions) %>% summarise(devices = n_distinct(GA)) gs %>% gs_ws_new(ws_title = "aggregated", input = upload)
  36. 36. WHICH SHOULD GIVE YOU THIS There are many ways to do similar task, and the usecases are basically endless. For larger dataset we recommend that you send the data to BigQuery or other databases which can handle more information. With BigQuery it will be the same approach except that it requires that you link your creditcard to the account
  37. 37. THE EASY USECASE IS DATASTUDIO
  38. 38. THE EASY USECASE IS DATASTUDIO
  39. 39. THE EASY USECASE IS DATASTUDIO
  40. 40. THE EASY USECASE IS DATASTUDIO
  41. 41. IT IS ALSO REALLY COOL FOR WEBSCRAPING
  42. 42. Automate your reporting framework by leveraging R Markdown, Shiny and simple HTML 05. Introduction to R markdown
  43. 43. • An adoptation to general Markdown which is used to do documentation etc. • R Markdown makes it possible to generate different types of documents such as HTML, Word, PDF, Slides etc. • R markdown is really easy to write with and keeps formatting clean and simple • Use the cheat sheet to play around What is Rmarkdown?
  44. 44. • In terms of making sure that our GTM setups were GDPR complient we wrote a script that took data down from GTM, and then it ran trough everything to ensure that it was set with the right compliance rules. • Today we have this document generated once every 6 months, and it will flag if there are any issues we need to take care of Example - HTML
  45. 45. Example - Slides
  46. 46. DOING VISUALIZATIONS • To be able to visualize anything we need to have the data physically downloaded on our machine • Also it needs to be loaded whenever you run your document save(upload, download, file = "data.RData") load("data.RData")
  47. 47. MAKING TABLES • To be able to visualize anything we need to have the data physically downloaded on our machine • Also it needs to be loaded whenever you run your document save(upload, download, file = "data.RData") load("data.RData") ```{r table, echo=TRUE, message=FALSE, warning=FALSE} library(ggplot2) library(kableExtra) library(kableExtra) library(dplyr) library(knitr) head(upload) %>% kable() %>% kable_styling("HTML") ```
  48. 48. MAKING TABLES The cool thing here is that you can do any html and css styling to your documents. This means that you can do basically anything that is possible within HTML and CSS
  49. 49. USING GGPLOT2
  50. 50. USING GGPLOT2
  51. 51. USING GGPLOT2 ggplot(ID,aes(x=date,y=pageviews)) + geom_line() ID$date <- as.Date(ID$date) ID2 <- ID ID2 %>% group_by(date) %>% summarise(sum(sessions)) ID2$pageviews <- as.numeric(ID2$pageviews) ID2$sessions <- as.numeric(ID2$sessions) ID2 <- ID2 %>% group_by(date) %>% summarise(pageviews = sum(pageviews)) ggplot(ID2,aes(x=date,y=pageviews)) + geom_line()
  52. 52. USING GGPLOT2 ggplot(ID,aes(x=date,y=pageviews)) + geom_line()
  53. 53. USING GGPLOT2 ggplot(ID2,aes(x=date,y=pageviews)) + geom_line()
  54. 54. USING GGPLOT2 ggplot(ID2,aes(pageviews)) + geom_bar()
  55. 55. PLAY AROUND WITH R MARKDOWN AND PLOTS – GOOGLE IS YOUR FRIEND FOR SEEING THE POSSIBILITIES!
  56. 56. How can you do as little as possible? 06. Schedule tasks
  57. 57. SCHEDULA(R) Tools à Addins à Browse Addins Choose the file that should be executed by the file. Choose the frequency, startDate, startTime of which the file shall be executed.
  58. 58. • On PC: • - Task Scheduler • See and kill the process. • On Mac: • - Begin Automator. Click “Applications” on the Dock of your Mac. ... HOW TO STOP IT AGAIN!

×