This document provides an agenda and overview for a class on using R to work with data. The class covers topics like calculating, joining, and grouping data in R; using R to build databases in Google Sheets; and introducing R Markdown for automating reporting. Specific sessions will demonstrate generating fake data from GitHub, data transformations with dplyr, different types of joins, uploading/downloading from Google Sheets, and creating dashboards in DataStudio.
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...Dataconomy Media
Gaining insight from data is not as straightforward as we often wish it would be – as diverse as the questions we’re asking are the quality and the quantity of the data we may have at hand. Any attempt to turn data into knowledge thus strongly depends on it dealing with big or not-so-big data, high- or low-dimensional data, exact or fuzzy data, exact or fuzzy questions, and the goal being accurate prediction or understanding. This presentation emphasizes the need for a multi-paradigm data science to tackle all the challenges we are facing today and may be facing in the future. Luckily, solutions are starting to emerge...
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADataconomy Media
Making the data of a company accessible to analysts, business users and data scientists can be a quite painful endeavor. In the past 5 years, Project A has supported many of its portfolio companies with building data infrastructures and we experienced many of these pains first-hand. This talk shows how some of these pains can be overcome by applying common sense and standard software engineering best practices.
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...Dataconomy Media
Gaining insight from data is not as straightforward as we often wish it would be – as diverse as the questions we’re asking are the quality and the quantity of the data we may have at hand. Any attempt to turn data into knowledge thus strongly depends on it dealing with big or not-so-big data, high- or low-dimensional data, exact or fuzzy data, exact or fuzzy questions, and the goal being accurate prediction or understanding. This presentation emphasizes the need for a multi-paradigm data science to tackle all the challenges we are facing today and may be facing in the future. Luckily, solutions are starting to emerge...
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADataconomy Media
Making the data of a company accessible to analysts, business users and data scientists can be a quite painful endeavor. In the past 5 years, Project A has supported many of its portfolio companies with building data infrastructures and we experienced many of these pains first-hand. This talk shows how some of these pains can be overcome by applying common sense and standard software engineering best practices.
I am Walker D. I am a Civil and Environmental Engineering assignment Expert at statisticsassignmenthelp.com. I hold a Ph.D. in Civil and Environmental Engineering. I have been helping students with their homework for the past 8 years. I solve assignments related to Civil and Environmental Engineering Assignment. Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Civil and Environmental Engineering assignments.
Data Mining: Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingSalah Amean
the chapter contains :
Data Preprocessing: An Overview,
Data Quality,
Major Tasks in Data Preprocessing,
Data Cleaning,
Data Integration,
Data Reduction,
Data Transformation and Data Discretization,
Summary.
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
This is an interactive introduction to R.
R is an open source language for statistical computing, data analysis, and graphical visualization.
While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in industry as well – both Facebook and Google use R within their firms.
Bridging data analysis and interactive visualizationNacho Caballero
Clickme is an R package that lets you generate interactive visualizations directly from R. I presented the latest iteration at the 2013 IBSB conference in Kyoto
Mehar Singh, CEO of ProCogia, and Jason Grahn, Senior Business Analyst at Apptio, co-present on the journey from Excel to R at the second Bellevue chapter useR Group Meetup.
If we’re producing analysis that drives business decision making, that’s production-grade code! This talk will address this question, which in turn shows why R is the way to go – assumptions are built into the code and enables the analyst to automate & reproduce their efforts.
This presentation includes:
- Data importing (opening a CSV or connecting to a SQL in both tools)
- Filtering, grouping, summarizing (pivot tables in Excel vs. tidy code in R)
- Visualizations (charts in excel vs ggplot in R)
PHStat Notes Using the PHStat Stack Data and .docxShiraPrater50
PHStat Notes
Using the PHStat Stack Data and Unstack Data Tools p. 28
One‐ and Two‐Way Tables and Charts p. 63
Normal Probability Tools p. 97
Generating Probabilities in PHStat p. 98
Confi dence Intervals for the Mean p. 136
Confi dence Intervals for Proportions p. 136
Confi dence Intervals for the Population Variance p. 137
Determining Sample Size p. 137
One‐Sample Test for the Mean, Sigma Unknown p. 169
One‐Sample Test for Proportions p. 169
Using Two‐Sample t ‐Test Tools p. 169
Testing for Equality of Variances p. 170
Chi‐Square Test for Independence p. 171
Using Regression Tools p. 209
Stepwise Regression p. 211
Best-Subsets Regression p. 212
Creating x ‐ and R ‐Charts p. 267
Creating p ‐Charts p. 268
Using the Expected Monetary Value Tool p. 375
Excel Notes
Creating Charts in Excel 2010 p. 29
Creating a Frequency Distribution and Histogram p. 61
Using the Descriptive Statistics Tool p. 61
Using the Correlation Tool p. 62
Creating Box Plots p. 63
Creating PivotTables p. 63
Excel‐Based Random Sampling Tools p. 134
Using the VLOOKUP Function p. 135
Sampling from Probability Distributions p. 135
Single‐Factor Analysis of Variance p. 171
Using the Trendline Option p. 209
Using Regression Tools p. 209
Using the Correlation Tool p. 211
Forecasting with Moving Averages p. 243
Forecasting with Exponential Smoothing p. 243
Using CB Predictor p. 244
Creating Data Tables p. 298
Data Table Dialog p. 298
Using the Scenario Manager p. 298
Using Goal Seek p. 299
Net Present Value and the NPV Function p. 299
Using the IRR Function p. 375
Crystal Ball Notes
Customizing Defi ne Assumption p. 338
Sensitivity Charts p. 339
Distribution Fitting with Crystal Ball p. 339
Correlation Matrix Tool p. 341
Tornado Charts p. 341
Bootstrap Tool p. 342
TreePlan Note
Constructing Decision Trees in Excel p. 376
This page intentionally left blank
Useful Statistical Functions in Excel 2010 Description
AVERAGE( data range ) Computes the average value (arithmetic mean) of a set of data.
BINOM.DIST( number_s, trials, probability_s, cumulative ) Returns the individual term binomial distribution.
BINOM.INV( trials, probability_s, alpha)
CHISQ.DIST( x, deg_freedom, cumulative )
CHISQ.DIST.RT( x, deg_freedom, cumulative )
CHISQ.TEST( actual_range, expected_range )
Returns the smallest value for which the cumulative binomial
distribution is greater than or equal to a criterion value.
Returns the left-tailed probability of the chi-square distribution.
Returns the right-tailed probability of the chi-square
distribution.
Returns the test for independence; the value of the chi-square
distribution and the appropriate degrees of freedom.
CONFIDENCE.NORM( alpha, standard_dev, size ) Retu ...
I am Walker D. I am a Civil and Environmental Engineering assignment Expert at statisticsassignmenthelp.com. I hold a Ph.D. in Civil and Environmental Engineering. I have been helping students with their homework for the past 8 years. I solve assignments related to Civil and Environmental Engineering Assignment. Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Civil and Environmental Engineering assignments.
Data Mining: Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingSalah Amean
the chapter contains :
Data Preprocessing: An Overview,
Data Quality,
Major Tasks in Data Preprocessing,
Data Cleaning,
Data Integration,
Data Reduction,
Data Transformation and Data Discretization,
Summary.
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
This is an interactive introduction to R.
R is an open source language for statistical computing, data analysis, and graphical visualization.
While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in industry as well – both Facebook and Google use R within their firms.
Bridging data analysis and interactive visualizationNacho Caballero
Clickme is an R package that lets you generate interactive visualizations directly from R. I presented the latest iteration at the 2013 IBSB conference in Kyoto
Mehar Singh, CEO of ProCogia, and Jason Grahn, Senior Business Analyst at Apptio, co-present on the journey from Excel to R at the second Bellevue chapter useR Group Meetup.
If we’re producing analysis that drives business decision making, that’s production-grade code! This talk will address this question, which in turn shows why R is the way to go – assumptions are built into the code and enables the analyst to automate & reproduce their efforts.
This presentation includes:
- Data importing (opening a CSV or connecting to a SQL in both tools)
- Filtering, grouping, summarizing (pivot tables in Excel vs. tidy code in R)
- Visualizations (charts in excel vs ggplot in R)
PHStat Notes Using the PHStat Stack Data and .docxShiraPrater50
PHStat Notes
Using the PHStat Stack Data and Unstack Data Tools p. 28
One‐ and Two‐Way Tables and Charts p. 63
Normal Probability Tools p. 97
Generating Probabilities in PHStat p. 98
Confi dence Intervals for the Mean p. 136
Confi dence Intervals for Proportions p. 136
Confi dence Intervals for the Population Variance p. 137
Determining Sample Size p. 137
One‐Sample Test for the Mean, Sigma Unknown p. 169
One‐Sample Test for Proportions p. 169
Using Two‐Sample t ‐Test Tools p. 169
Testing for Equality of Variances p. 170
Chi‐Square Test for Independence p. 171
Using Regression Tools p. 209
Stepwise Regression p. 211
Best-Subsets Regression p. 212
Creating x ‐ and R ‐Charts p. 267
Creating p ‐Charts p. 268
Using the Expected Monetary Value Tool p. 375
Excel Notes
Creating Charts in Excel 2010 p. 29
Creating a Frequency Distribution and Histogram p. 61
Using the Descriptive Statistics Tool p. 61
Using the Correlation Tool p. 62
Creating Box Plots p. 63
Creating PivotTables p. 63
Excel‐Based Random Sampling Tools p. 134
Using the VLOOKUP Function p. 135
Sampling from Probability Distributions p. 135
Single‐Factor Analysis of Variance p. 171
Using the Trendline Option p. 209
Using Regression Tools p. 209
Using the Correlation Tool p. 211
Forecasting with Moving Averages p. 243
Forecasting with Exponential Smoothing p. 243
Using CB Predictor p. 244
Creating Data Tables p. 298
Data Table Dialog p. 298
Using the Scenario Manager p. 298
Using Goal Seek p. 299
Net Present Value and the NPV Function p. 299
Using the IRR Function p. 375
Crystal Ball Notes
Customizing Defi ne Assumption p. 338
Sensitivity Charts p. 339
Distribution Fitting with Crystal Ball p. 339
Correlation Matrix Tool p. 341
Tornado Charts p. 341
Bootstrap Tool p. 342
TreePlan Note
Constructing Decision Trees in Excel p. 376
This page intentionally left blank
Useful Statistical Functions in Excel 2010 Description
AVERAGE( data range ) Computes the average value (arithmetic mean) of a set of data.
BINOM.DIST( number_s, trials, probability_s, cumulative ) Returns the individual term binomial distribution.
BINOM.INV( trials, probability_s, alpha)
CHISQ.DIST( x, deg_freedom, cumulative )
CHISQ.DIST.RT( x, deg_freedom, cumulative )
CHISQ.TEST( actual_range, expected_range )
Returns the smallest value for which the cumulative binomial
distribution is greater than or equal to a criterion value.
Returns the left-tailed probability of the chi-square distribution.
Returns the right-tailed probability of the chi-square
distribution.
Returns the test for independence; the value of the chi-square
distribution and the appropriate degrees of freedom.
CONFIDENCE.NORM( alpha, standard_dev, size ) Retu ...
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1ZW7TDL.
Richard Dallaway shows an example of what Scala looks like when using pattern matching over classes, how to encode an idea into types and use advanced features of Scala without complicating the code. Filmed at qconlondon.com.
Richard Dallaway is a partner at Underscore -- a consultancy specializing in Scala, especially the type-driven and functional aspects of Scala. He works on client projects writing software and helping teams deliver software with Scala. His focus is on the web, machine learning, and code review. He's the co-author of "Essential Slick" (Underscore), and author of the "Lift Cookbook" (O'Reilly).
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
2. I N T R O D U C T I O N
Who is impact extend, and how do we work with data?
02 W H A T M A K E S R S O A W E S O M E ?
Cons and pros against using R to Extract, Transform and load data
based on usecases.
03 C A L C U L A T I N G , J O I N I N G A N D G R O U P I N G D A T A
Unifying and transforming data, always.
01
AGENDA
C R E A T E , W R I T E A N D R E A D F R O M G O O G L E
S H E E T
Using R to build a free database to be used for reporting, datastorage or
Google Data Studio.
05 I N T R O D U C T I O N T O R M A R K D O W N
Automate your reporting framework by leveraging R Markdown, Shiny
and simple HTML
06 S C H E D U L E R S C R I P T S O N Y O U R M A C H I N E
How can you do as little as possible?
04
3. Who is impact extend, and how do we work with data?
01.
INTRODUCTION
4. • Copenhagen based
• Lead analyst at IMPACT EXTEND
• 2 years in doing R
• 5 years in doing GTM and GA work
• 2 years in doing random SEO and Website stuff
About me
5. • Kickass analyst in terms of understanding humans
• BI specialist within using PowerBI to do crazy dashboards
• Former Google Analytics class educator
• The nerd who is always curious about taking it next step
…. Also he build an entire GA validator by himself which is quite
cool
About Rasmus
6. 100% focus on digital commerce Long customer relations 7 x Gazelle
A A R H U S – C O P E N H A G E N - L I S B O A
1 2 6 E M P L O Y E E S
E S T A B L I S H E D I N 1 9 9 8
Market leader in commerce
Established in 2018 150+ Employees Aarhus - Copenhagen - Lisbon
Part of IMPACT A/S Clients: Largest retailers in the nordics Focus is on datadriven marketing
7. OUR OFFERINGS
ATTRACT
ANDSELL
TRAFFIC &
INSIGHTS
SERVE
ANDGROW
DIALOGUE &
LOYALTY
DATAANDINSIGHTS
DMP & INTELLIGENCE
DIGITAL
MARKETING
STRATEGY
Full-service approach with combined services delivering holistic
solutions to address Marketing’s primary pains and objectives with
digital marketing strategies
9. OUR APPROACH TO WORK WITH DATA
Behavioraldata
User ID
Sessions
Cross-device
CRMDATA
User ID
Purchase
Channels
(web/store)
IMPRESSIONDATA
User ID
Conversions
Store Visits
ENGAGEMENT DATA
User ID
Mails
Open/click
MARKETINGDB
Dataconsolidation
Segmentation
Engagement
LTV
Segmentering
Personalization
Dynamisk content
Triggers
11. Cons and pros against using R to Extract, Transform and load data
based on usecases.
02.
WHAT MAKES R SO AWESOME?
12.
13.
14.
15.
16.
17. Extract
GetDatafromAPI
ScrapeWebdata
Workwithnormal worksheets
Transform
Do all your calculations automatically
Splitdataapartandassembleitwith
other data
Do hugeworkloads fastas thereis nota
traditionGUI likeexcel
Load
Senddatato databases
Create dashboards
Makeautomatedreports
Getthedatathewayyouneedit
Makesurethatitlookslikeyouwantit
Dowhateveryouneedyourdatatodo
19. GENERATE FAKE DATA FROM A GITHUB
RESPORATORY
install.packages("RCurl")
library(RCurl)
#go to https://bit.ly/2PSb6FB and copy paste the URL
url <- "thepasted url"
script <- getURL(url, ssl.verifypeer = FALSE)
eval(parse(text = script))
This should give you 300 rows of data, that we can use to do various calculations and modifications with
22. WITH THE ID’S WE CAN CHECK FOR DUPLICATES
This is to determine if there are one or more
users that goes through the dataset. By
knowing we have the same user more than
once, we can aggregate data by user
duplicated(ID$CustomerID)
23. TO UNDERSTAND HOW THIS DATA LOOKS
AGGREGATED ON A USERLEVEL, IN EXCEL IT
WOULD LOOK LIKE THIS
Here, the Google Analytics cookie ID is
assembled with visit to the sites each day. As
each ID is connected to a GA cookie ID, we
can actually see how many devices each users
are going through within a user journey
24. TO DO THE SAME, DPLYR HAS SOME GREAT
WAYS OF WORKING WITH DATA
P I V O T B Y I D W I L L P R O D U C E T H I S
#group by device
ID %>%
group_by(CustomerID) %>%
summarise(devices = n_distinct(GA))
To find out how many devices people are
using, we cam group them by customer
ID and Google Analytics ID
25. TO DO THE SAME, DPLYR HAS SOME GREAT
WAYS OF WORKING WITH DATA
P I V O T B Y S E S S I O N S W I L L P R O D U C E T H I S
#group by device
ID %>%
group_by(CustomerID) %>%
summarise(devices = n_distinct(GA))
To find out how many session the users
had in total, you can use this
26. JOINS
left_join()
return all rows from x, and all
columns from x and y. Rows in
x with no match in y will have
NA values in the new columns.
If there are multiple matches
between x and y, all
combinations of the matches are
returned.
right_join()
return all rows from y, and all
columns from x and y. Rows in
y with no match in x will have
NA values in the new columns.
If there are multiple matches
between x and y, all
combinations of the matches are
returned.
full_join()
return all rows and all columns
from both x and y. Where there
are not matching values, returns
NA for the one missing.
Note: FULL OUTER JOIN can
potentially return very large
result-sets!
I N N E R J O I N L E F T J O I N R I G H T J O I N F U L L J O I N
inner_join()
return all rows from x where
there are matching values in y,
and all columns from x and y. If
there are multiple matches
between x and y, all
combination of the matches are
returned.
27. JOINS
inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
semi_join(x, y, by = NULL, copy = FALSE, ...)
anti_join(x, y, by = NULL, copy = FALSE, ...)
x, y tbls to join
by a character vector of variables to join by. If NULL,
the default, *_join() will do a natural join, using all
variables with common names across the two tables. A
message lists the variables so that you can check
they're right (to suppress the message, simply
explicitly list the variables that you want to join).
To join by different variables on x and y use a named
vector. For example, by = c("a" = "b") will
match x.a to y.b.
copy If x and y are not from the same data source,
and copy is TRUE, then y will be copied into the
same src as x. This allows you to join tables across
srcs, but it is a potentially expensive operation so you
must opt into it.
suffix If there are non-joined duplicate variables in x and y,
these suffixes will be added to the output to
disambiguate them. Should be a character vector of
length 2.
29. INNER JOIN
inner_join()
return all rows from x where there are matching
values in y, and all columns from x and y. If
there are multiple matches between x and y, all
combination of the matches are returned.
What does this mean?
We join the two tables where the UserID is
present.
inner_join(Dataset1, Dataset2, by = "UserID",
copy = FALSE, suffix = c(".x", ".y"))
A1 A1
A2
A3
30. LEFT JOIN
left_join()
return all rows from x, and all columns from x
and y. Rows in x with no match in y will have
NA values in the new columns. If there are
multiple matches between x and y, all
combinations of the matches are returned.
What does this mean?
inner_join(Dataset1, Dataset2, by = "UserID",
copy = FALSE, suffix = c(".x", ".y"))
31. RIGHT JOIN
right_join()
return all rows from y, and all columns from x
and y. Rows in y with no match in x will have
NA values in the new columns. If there are
multiple matches between x and y, all
combinations of the matches are returned.
What does this mean?
32. FULL JOIN
left_join()
return all rows from x, and all columns from x
and y. Rows in x with no match in y will have
NA values in the new columns. If there are
multiple matches between x and y, all
combinations of the matches are returned.
What does this mean?
We take table 1 one, and join it with table 2
33. Using R to build a free database to be used for reporting, datastorage or
Google Data Studio.
04.
CREATE, WRITE AND READ
FROM GOOGLE SHEET
34. • We use the google authr package created by Mark
Edmonson
• This allows us to generate a token which we can
use to work with Googles products
AUTHENTICATION
#install and load google drive
install.packages("googlesheets")
library(googlesheets)
googlesheets::gs_auth()
35. CREATE A GOOGLE SHEET
gs_new(title = "impactextendrclass")
gs <- gs_title("impactextendrclass")
gs_browse(gs, ws = 1)
39. LETS ADD SOME MORE DATA TO IT!
eval(parse(text = script))
n <- paste("A",nrow(ID), sep="")
gs_edit_cells(gs, ws = 1, input = ID, anchor = n, byrow = FALSE,
col_names = FALSE, trim = FALSE, verbose = TRUE)
What happens is that we use the “paste” function to
find out where to add the new data from so we don’t
break the old data
40. DOWNLOAD AND MODIFY GS DATA
E X T R A C T T R A N S F O R M L O A D
#download gs data
download <- gs_read(gs)
upload <-
download %>%
group_by(CustomerID,sessions) %>%
summarise(devices =
n_distinct(GA))
gs %>%
gs_ws_new(ws_title =
"aggregated", input
= upload)
41. WHICH SHOULD GIVE YOU THIS
There are many ways to do similar task, and the
usecases are basically endless. For larger dataset we
recommend that you send the data to BigQuery or
other databases which can handle more information.
With BigQuery it will be the same approach except
that it requires that you link your creditcard to the
account
47. Automate your reporting framework by leveraging R Markdown, Shiny
and simple HTML
05.
Introduction to R markdown
48. • An adoptation to general Markdown which is used to do
documentation etc.
• R Markdown makes it possible to generate different types of
documents such as HTML, Word, PDF, Slides etc.
• R markdown is really easy to write with and keeps formatting clean
and simple
• Use the cheat sheet to play around
What is Rmarkdown?
49. • In terms of making sure that our GTM setups were GDPR complient
we wrote a script that took data down from GTM, and then it ran
trough everything to ensure that it was set with the right compliance
rules.
• Today we have this document generated once every 6 months, and it
will flag if there are any issues we need to take care of
Example - HTML
51. DOING
VISUALIZATIONS
• To be able to visualize anything we need to
have the data physically downloaded on our
machine
• Also it needs to be loaded whenever you run
your document
save(upload, download,
file = "data.RData")
load("data.RData")
52. MAKING TABLES
• To be able to visualize anything we need to
have the data physically downloaded on our
machine
• Also it needs to be loaded whenever you run
your document
save(upload, download,
file = "data.RData")
load("data.RData")
```{r table, echo=TRUE, message=FALSE,
warning=FALSE}
library(ggplot2)
library(kableExtra)
library(kableExtra)
library(dplyr)
library(knitr)
head(upload) %>%
kable() %>%
kable_styling("HTML")
```
53. MAKING TABLES
The cool thing here is that
you can do any html and css
styling to your documents.
This means that you can do
basically anything that is
possible within HTML and
CSS
60. PLAY AROUND WITH R MARKDOWN AND PLOTS –
GOOGLE IS YOUR FRIEND FOR SEEING THE
POSSIBILITIES!
61. How can you do as little as possible?
06.
Schedule tasks
62. SCHEDULA(R) Tools à Addins à Browse Addins
Choose the file that should be executed by the file.
Choose the frequency, startDate, startTime of which
the file shall be executed.
63.
64.
65. • On PC:
• - Task Scheduler
• See and kill the process.
• On Mac:
• - Begin Automator. Click “Applications” on the
Dock of your Mac. ...
HOW TO STOP
IT AGAIN!