SlideShare a Scribd company logo
Datasets using R-Studio
Usha Rani Singh
Datasets for cars
Dataset is a collection of related information which is useful to
analyze data and derive the outputs
The dataset contains information in various forms, and it isn't
straightforward for the analyzer to extract the data and present
it to the business
Preparing Dataset for cars
Preparing and analyzing the dataset is very important for any
threat information, which helps to provide accurate data
We have to consider the data which provide more value or
relevant for the problem
Categorize the data into regression, classification, clustering,
and ranking
It is difficult to establish data collection mechanism and data is
scattered into various forms and departments
We have to make consistency in the data
Data sample has been reduced, and at the same time it should
consist of the required information
Preparing Dataset for cars
We have to clean the data so that the processing time will be
faster and accurate
Complex datasets have to be decomposed into multiple parts
Data normalization has to be performed to improve the quality
of the data
R Studio for dataset
Pie Diagram for data set
ggplot_1 for dataset
ggplot_2 for data set
ggplot_3 for dataset
Dataset used
Thank you!
Excel Worksheet
Excel Worksheet
Week 10 – Analysing Data sets in RapidMiner
The data sets used for this weeks analysis relates to the CSRIC
best practices:
The CSRIC Best Practices Search Tool allows you to search
CSRIC's collection of Best Practices using a variety of criteria
including Network Type, Industry Role, Keywords, Priority
Levels, and BP Number. The Communications Security,
Reliability and Interoperability Council's (CSRIC) mission is to
provide recommendations to the FCC to ensure, among other
things, optimal security and reliability of communications
systems, including telecommunications, media, and public
safety. CSRIC’s members focus on a range of public safety and
homeland security-related communications matters, including:
(1) the reliability and security of communications systems and
infrastructure, particularly mobile systems; (2) 911, Enhanced
911 (E911), and Next Generation 911 (NG911); and (3)
emergency alerting.
The CSRIC's recommendations will address the prevention and
remediation of detrimental cyber events, the development of
best practices to improve overall communications reliability,
the availability and performance of communications services
and emergency alerting during natural disasters, terrorist
attacks, cyber security attacks or other events that result in
exceptional strain on the communications infrastructure, the
rapid restoration of communications services in the event of
widespread or major disruptions and the steps communications
providers can take to help secure end-users and servers.
I have used RapidMiner to analyze the data set :
The statistical view of various names, types and attributes
related to the data set.
Visualization of public safety vs prioritization
Overall prioritization pie chart
Bar graph comparing various network types and internet/data
CustomerID,Gender,Age,Annual Income (k$),Spending Score
Mall Customer Segmentation Data Analysis.pptx
Mall Customer Segment Data Analysis using RFM
Vivek Ijjagiri
Mall Customer Segmentation data
Mall Customer Segment analysis data using RFM
Problem Solving
When we want to increase the sales we need to do planning for
marketing spend, or while formulating a new promotion, as a
retail marketer we have to be more careful about how we
segment and target the customers. It would be a waste of time
and money if, for example, we launch an ad campaign that is
central to a lot of customers. Such untargeted marketing and
advertising is not likely to have a high conversion fee and may
additionally even hurt our company value.
Retailers now use sophisticated strategies to section their
customers and goal their marketing efforts to these segments.
RFM analysis is one such famous patron segmentation technique
that can assist shops to maximize the return on their advertising
Why RFM.?
Improving customer segmentation marketing and widely used
for surveys.
Superior and simplistic compared to other methods.(CHAID and
logistic regression)
Focuses on transaction information and delivering better
marketing to customers.
What is RFM?
R => Recency
F => Frequency
M=> Monetary
How are we using the RFM and target customers?
Simple we score the customers based on the RFM from high to
Greater the score there’s likely more chance to buy a product or
take a new offer or promotion.
It’ll help us identify customers that are most likely to respond
to a new offer or promotion.
Identifying the most valuable RFM segments can capitalize on
chance relationships in the data used for this analysis.
Mall Customer Segment analysis data using RFM
Recency: Recency is most important predictor of customers who
did the purchases recently. Customers who have purchased
recently a product are more likely to purchase again from your
store/mall compared to those who did not purchase recently.
Frequency: The second most important factor is how frequently
these customers purchase from you. The higher the frequency,
the higher of chances of them purchasing the products again.
Monetary: The third factor is the amount of money these
customers have spent on purchases. Customers who have spent
higher are more likely to purchase based on their recent
purchase compared to those who have spent less.
How are we going to calculate RFM?
To implement the RFM analysis, we need to further process the
data set in by the following steps:
Find the most recent date for each ID and calculate the days to
the now or some other date, to get the Recency data
Calculate the quantity of translations of a customer, to get the
Frequency data
Sum the amount of money a customer spent and divide it by
Frequency, to get the amount per transaction on average, that is
the Monetary data.
Problem Solving
Make sure we have the following libraries to procced with the
data analysis, if the libraries not found in your R Studio install
those packages.
Load and examine data
> Mall_Customers<- fread('data.csv’)
> glimpse(Mall_Customers)
Ijjagiri, Vivek (IV) - This is like a transposed version of print:
columns run down the page, and data runs across. This makes it
possible to see every column in a data frame. It's a little like str
applied to a data frame but it tries to show you as much data as
possible. (And it always shows the underlying data, even when
applied to a remote data source.)
View Data
Data Cleanup
> Mall_Customers<- Mall_Customers%>%
mutate(Quantity = replace(Quantity, Quantity<=0, NA),
UnitPrice = replace(UnitPrice, UnitPrice<=0, NA))
> Mall_Customers<- Mall_Customers%>%
Recode Variables
> df_data <- df_data %>%
InvoiceDate=as.Date(InvoiceDate, '%m/%d/%Y
%H:%M'), CustomerID=as.factor(CustomerID),
> df_data <- df_data %>%
mutate(total_dolar = Quantity*UnitPrice)
> glimpse(df_data) | summary(df_data)
Calculate RFM
> df_RFM <- df_data %>%
group_by(CustomerID) %>%
frequency=n_distinct(InvoiceNo), monitery=
> summary(df_RFM)
Calculate RFM
> kable(head(df_RFM))
K-means clustering is one of the simplest and popular
unsupervised machine learning algorithms.
The objective of K-means is simple: group similar data points
together and discover underlying patterns.
To achieve this objective, K-means looks for a fixed number (k)
of clusters in a dataset.”
A cluster refers to a collection of data points aggregated
together because of certain similarities.
In other words, the K-means algorithm identifies k number of
centroids, and then allocates every data point to the nearest
cluster, while keeping the centroids as small as possible.
K Means Clustering Algorithm
1.Specify number of clusters K.
2.Initialize centroids by first shuffling the dataset and then
randomly selecting K data points for the centroids without
3.Keep iterating until there is no change to the centroids. i.e
assignment of data points to clusters isn’t changing.
K Means clustering algorithm
Recency – How recently did the customer purchase?
> Customer_Purchase_Recency <- df_RFM$recency
> hist(Customer_Purchase_Recency, main = 'Recency')
Frequency – How often do they purchase?
> Customer_Purchase_Frequency <- df_RFM$frequency
> hist(Customer_Purchase_Frequency, main = ‘Frequency')
Monetary Value – How much do they spend?
> Customer_Purchase_Monitery <- df_RFM$monitery
> hist(Customer_Purchase_Monitery, main = ‘Monetary’,
breaks=50 )
Monetary Log
Because the data is skewed, we use log scale to normalize
> MoniteryLog <- log(df_RFM$monitery)
> hist(MoniteryLog, main ='MoniteryLog')
Ijjagiri, Vivek (IV) -
Ijjagiri, Vivek (IV) - This function is a mix of function hclust
and function dist. hcluster(x, method = "euclidean",link =
"complete") = hclust(dist(x, method = "euclidean"),method =
"complete")) It use twice less memory, as it doesn't store
distance matrix.
For more details, see documentation of hclust and Dist.
> DataFrame_Clustering <- df_RFM
> DataFrame_CustomerID <-
> row.names(DataFrame_Clustering) <- DataFrame_CustomerID
> DataFrame_CustomerID <- NULL
> DataFrame_Clustering <- scale(DataFrame_Clustering)
> summary(DataFrame_Clustering )
> d <- dist(DataFrame_Clustering)
> c <- hclust(d, method = 'ward.D2’)
> Plot(c)
Ijjagiri, Vivek (IV) - A dendrogram is a diagram that shows the
hierarchical relationship between objects. It is most commonly
created as an output from hierarchical clustering. The main use
of a dendrogram is to work out the best way to allocate objects
to clusters. The dendrogram below shows the hierarchical
clustering of six observations shown to on the scatterplot to the
left. (Dendrogram is often miswritten as dendogram.)
Plotting with less data
Plotting with less data
Plotting with less data
Customer segmentation process can be performed using various
clustering algorithms.
We focused on k-means clustering in R.
The algorithm is quite simple to implement. However,
representing data in the correct format and interpreting results
is the difficult part.
RFM Analysis can segment customers, design offers,
promotions specific to audience and produce products based on
customer profile and interests.
Shubhankar Rawat (May 2019), Mall Customers Segmentation
— Using Machine Learning retrieved from
What is market segmentation, Different types explained
retrieved from
Bradley, P. S., Bennett, K. P., & Demiriz, A. (2000).
Constrained k-means clustering (Technical Report MSR-TR-
2000-65). Microsoft Research, Redmond, WA.
K means clustering, AlindGupta retrieved from
Thank you
Any Questions
.MsftOfcThm_Accent1_Fill {
.MsftOfcThm_Accent1_Stroke {
# section 3.3 Statistical Methods for Evaluation
# section 3.3.1 Hypothesis Testing
# generate random observations from the two populations
x <- rnorm(10, mean=100, sd=5) # normal distribution centered
at 100
y <- rnorm(20, mean=105, sd=5) # normal distribution centered
at 105
# Student's t-test
t.test(x, y, var.equal=TRUE) # run the Student's t-test
# obtain t value for a two-sided test at a 0.05 significance level
qt(p=0.05/2, df=28, lower.tail= FALSE)
# Welch's t-test
t.test(x, y, var.equal=FALSE) # run the Welch's t-test
# Wilcoxon Rank-Sum Test
wilcox.test(x, y, = TRUE)
# section 3.3.6 ANOVA
offers <- sample(c("offer1", "offer2", "nopromo"), size=500,
# Simulated 500 observations of purchase sizes on the 3 offer
purchasesize <- ifelse(offers=="offer1", rnorm(500, mean=80,
ifelse(offers=="offer2", rnorm(500, mean=85,
rnorm(500, mean=40, sd=30)))
# create a data frame of offer option and purchase size
offertest <- data.frame(offer=as.factor(offers),
# display a summary of offertest where offer="offer1"
# display a summary of offertest where offer="offer2"
# display a summary of offertest where offer="nopromo"
# fit ANOVA test
model <- aov(purchase_amt ~ offers, data=offertest)
# Tukey's Honest Significant Difference (HSD) on all
# pair-wise tests for difference of means
Lesson 2
1-1© 2015 Pearson Education, Inc. Publishing as Prentice Hall
Chapter 4
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
“It is a set of beliefs that one party holds
about the other and how these beliefs are
formed from the interactions of […]
individuals as they engage in tasks
associated with an IT service” (Day 2007)
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-3
It is a multifaceted interaction of people
and processes.
It is complex. Different expectations and
accountabilities may lead to lack of trust.
It tends to cluster into patterns (e.g., IT is
a necessary evil; IT is a support but not a
partner; business and IT are partners).
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-4
IT has to keep proving itself.
The business is often disengaged from IT
Business expectations of IT change
Business assumptions of IT tend to cluster.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-5
The relationship is affected by the
interaction of many people and
processes at multiple levels.
Clarity is often lacking around
expectations and accountabilities.
There are many “disconnects”
between the two groups.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-6
Interpersonal Interaction
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-7
Expertise – the ability to support a technical
recommendation and have up-to-date knowledge.
Financial awareness – the ability to
identify the value of IT in terms of ROI
and total cost of ownership.
Execution – the ability to understand
the business, develop a vision and
operationalize strategies.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-8
Find ways to develop business knowledge in
all IT staff.
Link IT’s success criteria to business metrics.
Make business value an explicit criteria in all
IT decisions.
Ensure effective execution in all IT activities.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-9
Credibility is the belief that others can be
counted on to do what they say they will do.
It is built by:
Keeping agreements.
Acting with integrity, honesty and openness.
Being responsive (e.g., delivering on time
and under budget).
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
Communicate frequently and explicitly.
Pay attention to the “little things”.
Utilize external cues to credibility.
Assess all business touch points.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
Professionalism - can be developed by five
sets of attitudes and behaviors:
on the job)
good organization.
job well)
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
Nontechnical communication
The ability to translate and interpret needs,
not only from business to technology and
vice versa, but also between business units.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
Social sk ills
The ability to build mutual understanding, to
enable all parties to get comfortable with one
another and to uncover hidden assumptions.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
Management of politics and conflict
The ability to understand the role of politics
and how they can affect the IT work (i.e.,
addressing conflict and use it to deliver
creative solutions).
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
Expect professionalism.
Promote a wide variety of social interactions
at all levels.
Develop “soft skills” in IT staff.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
The most important way to build trust is through
an effective governance:
Integrating planning, defined accountabilities,
and clarity of roles and responsibilities are key
aspects of an effective governance.
An effective governance addresses the business’
expectations of its IT function.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
Design governance for clarity and
Mandate the relationship.
Design IT for business expectations.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
Business-IT relationships are complex, with
interactions of many types, at many levels,
and between both individuals and across
functional and organizational entities.
Four majors components are needed to
build a strong business-IT relationship:
competence, credibility, interpersonal skills,
and trust.
Chapter 5
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
Communication is a key social element of
the organizational alignment between IT
and business.
One of the most important skills IT staff
needs to develop is how to communicate
effectively with businesses.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
Good communication is essential for:
trust and partnerships between
the business and IT
perceptions of IT
of the business
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
Principle 1: The effectiveness of communication
is measured by its outcomes.
Principle 2: Communication is social behavior.
Principle 3: Shared knowledge improves
Principle 4: Mature organizations have better
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-5
Communication should be measure by its
outcomes rather than our intentions.
Communication can get distorted through
filters such as politics, culture, and
personal points of view.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-6
Communication not only transmits ideas;
it also negotiates relationships.
How you say what you mean is just as
important as what you say.
IT staff and managers need to become
aware of the power of different linguistic
styles in communication situations.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-7
The more IT staff
learns about the
business, the better
Shared knowledge is
the beginning of the
“virtuous circle”.
Shared Knowledge
Mutual Understanding
and “Common Sense”
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-8
Strong organizational practices support and
reinforce good interpersonal communication.
Mature IT organizations embed appropriate
communication at the operational and
strategic level.
“You can’t be a partner unless
you’re a mature IT organization”
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-9
The changing nature of IT work:
IT work has become more complex over
time. Multiple cultures, different political
contexts, various times zones, and virtual
contacts make communication more
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
Hiring practices:
IT skills are changing to become more
consultative and collaborative, rather
than focused exclusively on technology.
“IT organizations can no longer support smart,
super-talented but socially disruptive people”
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
IT and business organization
IT staff is expected to play a “knowledge
broker” role, not only between IT and
business but also between business units.
Thus, business silos can make this
communication challenging.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
Nature and frequency of
Formal interactions improve communication,
but communication should not exclusively
occur in formal interactions (e.g., through IT
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
Many IT staff are motivated by the desire
to be right rather than the desire to
communicate effectively.
“We definitely need a ‘we’ attitude in IT,
rather than ‘us-them’ attitude”
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
Translation: A four-step process
Impact of
Datasets using R-StudioUsha Rani Singh.docx

More Related Content

Similar to Datasets using R-StudioUsha Rani Singh.docx

Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdasBig data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Prof Dr Mehmed ERDAS
Group 1 Presentation after mid.pptx
Group 1 Presentation after mid.pptxGroup 1 Presentation after mid.pptx
Group 1 Presentation after mid.pptx
Chapter 1
Chapter 1Chapter 1
Chapter 1
Jomel Penalba
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
Shubha Brota Raha
IT and data analytics ppt(unit-2).pdf
IT and data analytics ppt(unit-2).pdfIT and data analytics ppt(unit-2).pdf
IT and data analytics ppt(unit-2).pdf
Gmid associates services portfolio bank
Gmid associates  services portfolio bankGmid associates  services portfolio bank
Gmid associates services portfolio bank
Pankaj Jha
Tools and Techniques for Quality Management
Tools and Techniques for Quality ManagementTools and Techniques for Quality Management
Tools and Techniques for Quality Management
Nazrul Islam
Chapter 10 Tools and Techniques for Quality Management.ppt
Chapter 10 Tools and Techniques for Quality Management.pptChapter 10 Tools and Techniques for Quality Management.ppt
Chapter 10 Tools and Techniques for Quality Management.ppt
Dr. Nazrul Islam
Emerging concept in information system
Emerging concept in information systemEmerging concept in information system
Emerging concept in information system
IRJET- Credit Profile of E-Commerce Customer
IRJET- Credit Profile of E-Commerce CustomerIRJET- Credit Profile of E-Commerce Customer
IRJET- Credit Profile of E-Commerce Customer
IRJET Journal
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
Intelligent Shopping Recommender using Data Mining
Intelligent Shopping Recommender using Data MiningIntelligent Shopping Recommender using Data Mining
Intelligent Shopping Recommender using Data Mining
IRJET Journal
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET Journal
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET Journal
Big Data Analytics for Contact Centers
Big Data Analytics for Contact CentersBig Data Analytics for Contact Centers
Big Data Analytics for Contact Centers
Rajender K Salgam
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
Scm and crm
Scm and crmScm and crm
Scm and crm
Sibitha Sivan
120. business intelligence modeling for increasing company value and competit...
120. business intelligence modeling for increasing company value and competit...120. business intelligence modeling for increasing company value and competit...
120. business intelligence modeling for increasing company value and competit...
Hendry Hartono
Proposed ranking for point of sales using data mining for telecom operators
Proposed ranking for point of sales using data mining for telecom operatorsProposed ranking for point of sales using data mining for telecom operators
Proposed ranking for point of sales using data mining for telecom operators

Similar to Datasets using R-StudioUsha Rani Singh.docx (20)

Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdasBig data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Group 1 Presentation after mid.pptx
Group 1 Presentation after mid.pptxGroup 1 Presentation after mid.pptx
Group 1 Presentation after mid.pptx
Chapter 1
Chapter 1Chapter 1
Chapter 1
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
IT and data analytics ppt(unit-2).pdf
IT and data analytics ppt(unit-2).pdfIT and data analytics ppt(unit-2).pdf
IT and data analytics ppt(unit-2).pdf
Gmid associates services portfolio bank
Gmid associates  services portfolio bankGmid associates  services portfolio bank
Gmid associates services portfolio bank
Tools and Techniques for Quality Management
Tools and Techniques for Quality ManagementTools and Techniques for Quality Management
Tools and Techniques for Quality Management
Chapter 10 Tools and Techniques for Quality Management.ppt
Chapter 10 Tools and Techniques for Quality Management.pptChapter 10 Tools and Techniques for Quality Management.ppt
Chapter 10 Tools and Techniques for Quality Management.ppt
Emerging concept in information system
Emerging concept in information systemEmerging concept in information system
Emerging concept in information system
IRJET- Credit Profile of E-Commerce Customer
IRJET- Credit Profile of E-Commerce CustomerIRJET- Credit Profile of E-Commerce Customer
IRJET- Credit Profile of E-Commerce Customer
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
Intelligent Shopping Recommender using Data Mining
Intelligent Shopping Recommender using Data MiningIntelligent Shopping Recommender using Data Mining
Intelligent Shopping Recommender using Data Mining
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
Big Data Analytics for Contact Centers
Big Data Analytics for Contact CentersBig Data Analytics for Contact Centers
Big Data Analytics for Contact Centers
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
Scm and crm
Scm and crmScm and crm
Scm and crm
120. business intelligence modeling for increasing company value and competit...
120. business intelligence modeling for increasing company value and competit...120. business intelligence modeling for increasing company value and competit...
120. business intelligence modeling for increasing company value and competit...
Proposed ranking for point of sales using data mining for telecom operators
Proposed ranking for point of sales using data mining for telecom operatorsProposed ranking for point of sales using data mining for telecom operators
Proposed ranking for point of sales using data mining for telecom operators

More from edwardmarivel

deadline 6 hours 7.3 y 7.47.4.docx
deadline  6 hours 7.3 y 7.47.4.docxdeadline  6 hours 7.3 y 7.47.4.docx
deadline 6 hours 7.3 y 7.47.4.docx
Deadline 6 PM Friday September 27, 201310 Project Management Que.docx
Deadline 6 PM Friday September 27, 201310 Project Management Que.docxDeadline 6 PM Friday September 27, 201310 Project Management Que.docx
Deadline 6 PM Friday September 27, 201310 Project Management Que.docx
De nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docx
De nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docxDe nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docx
De nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docx
DDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docx
DDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docxDDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docx
DDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docx
DDL 24 hours reading the article and writing a 1-page doubl.docx
DDL 24 hours reading the article and writing a 1-page doubl.docxDDL 24 hours reading the article and writing a 1-page doubl.docx
DDL 24 hours reading the article and writing a 1-page doubl.docx
DCF valuation methodSuper-normal growth modelApplicatio.docx
DCF valuation methodSuper-normal growth modelApplicatio.docxDCF valuation methodSuper-normal growth modelApplicatio.docx
DCF valuation methodSuper-normal growth modelApplicatio.docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docxDDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docx
DBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docxDBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docx
DBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docxDBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DB3.1 Mexico corruptionDiscuss the connection between pol.docx
DB3.1 Mexico corruptionDiscuss the connection between pol.docxDB3.1 Mexico corruptionDiscuss the connection between pol.docx
DB3.1 Mexico corruptionDiscuss the connection between pol.docx
DB2Pepsi Co and Coke American beverage giants, must adhere to th.docx
DB2Pepsi Co and Coke American beverage giants, must adhere to th.docxDB2Pepsi Co and Coke American beverage giants, must adhere to th.docx
DB2Pepsi Co and Coke American beverage giants, must adhere to th.docx
DB1 What Ive observedHave you ever experienced a self-managed .docx
DB1 What Ive observedHave you ever experienced a self-managed .docxDB1 What Ive observedHave you ever experienced a self-managed .docx
DB1 What Ive observedHave you ever experienced a self-managed .docx
DB Response 1I agree with the decision to search the house. Ther.docx
DB Response 1I agree with the decision to search the house. Ther.docxDB Response 1I agree with the decision to search the house. Ther.docx
DB Response 1I agree with the decision to search the house. Ther.docx
DB Response prompt ZAKChapter 7, Q1.Customers are expecting.docx
DB Response prompt  ZAKChapter 7, Q1.Customers are expecting.docxDB Response prompt  ZAKChapter 7, Q1.Customers are expecting.docx
DB Response prompt ZAKChapter 7, Q1.Customers are expecting.docx
DB Topic of Discussion Information-related CapabilitiesAnalyze .docx
DB Topic of Discussion Information-related CapabilitiesAnalyze .docxDB Topic of Discussion Information-related CapabilitiesAnalyze .docx
DB Topic of Discussion Information-related CapabilitiesAnalyze .docx
DB Instructions Each reply must be 250–300 words with a minim.docx
DB Instructions Each reply must be 250–300 words with a minim.docxDB Instructions Each reply must be 250–300 words with a minim.docx
DB Instructions Each reply must be 250–300 words with a minim.docx
DB Defining White Collar CrimeHow would you define white co.docx
DB Defining White Collar CrimeHow would you define white co.docxDB Defining White Collar CrimeHow would you define white co.docx
DB Defining White Collar CrimeHow would you define white co.docx

More from edwardmarivel (20)

deadline 6 hours 7.3 y 7.47.4.docx
deadline  6 hours 7.3 y 7.47.4.docxdeadline  6 hours 7.3 y 7.47.4.docx
deadline 6 hours 7.3 y 7.47.4.docx
Deadline 6 PM Friday September 27, 201310 Project Management Que.docx
Deadline 6 PM Friday September 27, 201310 Project Management Que.docxDeadline 6 PM Friday September 27, 201310 Project Management Que.docx
Deadline 6 PM Friday September 27, 201310 Project Management Que.docx
De nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docx
De nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docxDe nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docx
De nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docx
DDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docx
DDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docxDDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docx
DDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docx
DDL 24 hours reading the article and writing a 1-page doubl.docx
DDL 24 hours reading the article and writing a 1-page doubl.docxDDL 24 hours reading the article and writing a 1-page doubl.docx
DDL 24 hours reading the article and writing a 1-page doubl.docx
DCF valuation methodSuper-normal growth modelApplicatio.docx
DCF valuation methodSuper-normal growth modelApplicatio.docxDCF valuation methodSuper-normal growth modelApplicatio.docx
DCF valuation methodSuper-normal growth modelApplicatio.docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docxDDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docx
DBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docxDBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docx
DBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docxDBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DB3.1 Mexico corruptionDiscuss the connection between pol.docx
DB3.1 Mexico corruptionDiscuss the connection between pol.docxDB3.1 Mexico corruptionDiscuss the connection between pol.docx
DB3.1 Mexico corruptionDiscuss the connection between pol.docx
DB2Pepsi Co and Coke American beverage giants, must adhere to th.docx
DB2Pepsi Co and Coke American beverage giants, must adhere to th.docxDB2Pepsi Co and Coke American beverage giants, must adhere to th.docx
DB2Pepsi Co and Coke American beverage giants, must adhere to th.docx
DB1 What Ive observedHave you ever experienced a self-managed .docx
DB1 What Ive observedHave you ever experienced a self-managed .docxDB1 What Ive observedHave you ever experienced a self-managed .docx
DB1 What Ive observedHave you ever experienced a self-managed .docx
DB Response 1I agree with the decision to search the house. Ther.docx
DB Response 1I agree with the decision to search the house. Ther.docxDB Response 1I agree with the decision to search the house. Ther.docx
DB Response 1I agree with the decision to search the house. Ther.docx
DB Response prompt ZAKChapter 7, Q1.Customers are expecting.docx
DB Response prompt  ZAKChapter 7, Q1.Customers are expecting.docxDB Response prompt  ZAKChapter 7, Q1.Customers are expecting.docx
DB Response prompt ZAKChapter 7, Q1.Customers are expecting.docx
DB Topic of Discussion Information-related CapabilitiesAnalyze .docx
DB Topic of Discussion Information-related CapabilitiesAnalyze .docxDB Topic of Discussion Information-related CapabilitiesAnalyze .docx
DB Topic of Discussion Information-related CapabilitiesAnalyze .docx
DB Instructions Each reply must be 250–300 words with a minim.docx
DB Instructions Each reply must be 250–300 words with a minim.docxDB Instructions Each reply must be 250–300 words with a minim.docx
DB Instructions Each reply must be 250–300 words with a minim.docx
DB Defining White Collar CrimeHow would you define white co.docx
DB Defining White Collar CrimeHow would you define white co.docxDB Defining White Collar CrimeHow would you define white co.docx
DB Defining White Collar CrimeHow would you define white co.docx

Recently uploaded

Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
Celine George
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
Kalna College
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
Celine George
BPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end examBPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end exam
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
Kalna College
Observational Learning
Observational Learning Observational Learning
Observational Learning
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.

Recently uploaded (20)

Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
BPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end examBPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end exam
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
Observational Learning
Observational Learning Observational Learning
Observational Learning
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.

Datasets using R-StudioUsha Rani Singh.docx

  • 1. Datasets using R-Studio Usha Rani Singh 1 Datasets for cars Dataset is a collection of related information which is useful to analyze data and derive the outputs The dataset contains information in various forms, and it isn't straightforward for the analyzer to extract the data and present it to the business 2 Preparing Dataset for cars
  • 2. Preparing and analyzing the dataset is very important for any threat information, which helps to provide accurate data We have to consider the data which provide more value or relevant for the problem Categorize the data into regression, classification, clustering, and ranking It is difficult to establish data collection mechanism and data is scattered into various forms and departments We have to make consistency in the data Data sample has been reduced, and at the same time it should consist of the required information Preparing Dataset for cars We have to clean the data so that the processing time will be faster and accurate Complex datasets have to be decomposed into multiple parts Data normalization has to be performed to improve the quality of the data R Studio for dataset
  • 3. Pie Diagram for data set ggplot_1 for dataset ggplot_2 for data set ggplot_3 for dataset
  • 5. Microsoft Excel Worksheet Microsoft Excel Worksheet Week 10 – Analysing Data sets in RapidMiner The data sets used for this weeks analysis relates to the CSRIC best practices: The CSRIC Best Practices Search Tool allows you to search CSRIC's collection of Best Practices using a variety of criteria including Network Type, Industry Role, Keywords, Priority Levels, and BP Number. The Communications Security, Reliability and Interoperability Council's (CSRIC) mission is to provide recommendations to the FCC to ensure, among other things, optimal security and reliability of communications systems, including telecommunications, media, and public safety. CSRIC’s members focus on a range of public safety and homeland security-related communications matters, including: (1) the reliability and security of communications systems and infrastructure, particularly mobile systems; (2) 911, Enhanced 911 (E911), and Next Generation 911 (NG911); and (3) emergency alerting.
  • 6. The CSRIC's recommendations will address the prevention and remediation of detrimental cyber events, the development of best practices to improve overall communications reliability, the availability and performance of communications services and emergency alerting during natural disasters, terrorist attacks, cyber security attacks or other events that result in exceptional strain on the communications infrastructure, the rapid restoration of communications services in the event of widespread or major disruptions and the steps communications providers can take to help secure end-users and servers. I have used RapidMiner to analyze the data set : The statistical view of various names, types and attributes related to the data set. Visualization of public safety vs prioritization
  • 7. Overall prioritization pie chart Bar graph comparing various network types and internet/data usage customer-segmentation-data Mall_Customers.csv CustomerID,Gender,Age,Annual Income (k$),Spending Score (1-100) 1,Male,19,15,39 2,Male,21,15,81 3,Female,20,16,6 4,Female,23,16,77 5,Female,31,17,40 6,Female,22,17,76 7,Female,35,18,6
  • 19. Mall Customer Segment Data Analysis using RFM Vivek Ijjagiri Agenda 2 Introduction Mall Customer Segmentation data Mall Customer Segment analysis data using RFM Problem Solving
  • 20. Clustering Conclusion References Introduction When we want to increase the sales we need to do planning for marketing spend, or while formulating a new promotion, as a retail marketer we have to be more careful about how we segment and target the customers. It would be a waste of time and money if, for example, we launch an ad campaign that is central to a lot of customers. Such untargeted marketing and advertising is not likely to have a high conversion fee and may additionally even hurt our company value. Retailers now use sophisticated strategies to section their customers and goal their marketing efforts to these segments. RFM analysis is one such famous patron segmentation technique that can assist shops to maximize the return on their advertising investments. Why RFM.? Improving customer segmentation marketing and widely used for surveys. Superior and simplistic compared to other methods.(CHAID and
  • 21. logistic regression) Focuses on transaction information and delivering better marketing to customers. What is RFM? R => Recency F => Frequency M=> Monetary How are we using the RFM and target customers? Simple we score the customers based on the RFM from high to low. Greater the score there’s likely more chance to buy a product or take a new offer or promotion. It’ll help us identify customers that are most likely to respond to a new offer or promotion. Identifying the most valuable RFM segments can capitalize on chance relationships in the data used for this analysis.
  • 22. Mall Customer Segment analysis data using RFM 7 Recency: Recency is most important predictor of customers who did the purchases recently. Customers who have purchased recently a product are more likely to purchase again from your store/mall compared to those who did not purchase recently. Frequency: The second most important factor is how frequently these customers purchase from you. The higher the frequency, the higher of chances of them purchasing the products again. Monetary: The third factor is the amount of money these customers have spent on purchases. Customers who have spent higher are more likely to purchase based on their recent purchase compared to those who have spent less. How are we going to calculate RFM? To implement the RFM analysis, we need to further process the data set in by the following steps: Find the most recent date for each ID and calculate the days to the now or some other date, to get the Recency data Calculate the quantity of translations of a customer, to get the Frequency data Sum the amount of money a customer spent and divide it by
  • 23. Frequency, to get the amount per transaction on average, that is the Monetary data. 8 Problem Solving Make sure we have the following libraries to procced with the data analysis, if the libraries not found in your R Studio install those packages. library(data.table) library(dplyr) library(ggplot2) library(tidyr) library(knitr) library(rmarkdown) 9 Load and examine data > Mall_Customers<- fread('data.csv’) > glimpse(Mall_Customers)
  • 24. Ijjagiri, Vivek (IV) - This is like a transposed version of print: columns run down the page, and data runs across. This makes it possible to see every column in a data frame. It's a little like str applied to a data frame but it tries to show you as much data as possible. (And it always shows the underlying data, even when applied to a remote data source.) View Data 14 Data Cleanup Or WRangle 15 > Mall_Customers<- Mall_Customers%>% mutate(Quantity = replace(Quantity, Quantity<=0, NA), UnitPrice = replace(UnitPrice, UnitPrice<=0, NA))
  • 25. > Mall_Customers<- Mall_Customers%>% drop_na() Recode Variables > df_data <- df_data %>% mutate(InvoiceNo=as.factor(InvoiceNo), StockCode=as.factor(StockCode), InvoiceDate=as.Date(InvoiceDate, '%m/%d/%Y %H:%M'), CustomerID=as.factor(CustomerID), Country=as.factor(Country)) > df_data <- df_data %>% mutate(total_dolar = Quantity*UnitPrice) > glimpse(df_data) | summary(df_data) 16 Calculate RFM > df_RFM <- df_data %>% group_by(CustomerID) %>% summarise(recency=as.numeric(as.Date("2012-01-01")- max(InvoiceDate)), frequency=n_distinct(InvoiceNo), monitery= sum(total_dolar)/n_distinct(InvoiceNo))
  • 26. > summary(df_RFM) 17 Calculate RFM > kable(head(df_RFM)) 18 K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. The objective of K-means is simple: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.” A cluster refers to a collection of data points aggregated together because of certain similarities. In other words, the K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible.
  • 27. K Means Clustering Algorithm 1.Specify number of clusters K. 2.Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement. 3.Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing. K Means clustering algorithm Recency Recency – How recently did the customer purchase? > Customer_Purchase_Recency <- df_RFM$recency > hist(Customer_Purchase_Recency, main = 'Recency') 20 Frequency Frequency – How often do they purchase? > Customer_Purchase_Frequency <- df_RFM$frequency > hist(Customer_Purchase_Frequency, main = ‘Frequency') 21
  • 28. Monetary Monetary Value – How much do they spend? > Customer_Purchase_Monitery <- df_RFM$monitery > hist(Customer_Purchase_Monitery, main = ‘Monetary’, breaks=50 ) 22 Monetary Log Because the data is skewed, we use log scale to normalize > MoniteryLog <- log(df_RFM$monitery) > hist(MoniteryLog, main ='MoniteryLog') 23 Ijjagiri, Vivek (IV) - 17/topics/hcluster Ijjagiri, Vivek (IV) - This function is a mix of function hclust and function dist. hcluster(x, method = "euclidean",link = "complete") = hclust(dist(x, method = "euclidean"),method = "complete")) It use twice less memory, as it doesn't store distance matrix. For more details, see documentation of hclust and Dist. Clustering > DataFrame_Clustering <- df_RFM
  • 29. > DataFrame_CustomerID <- DataFrame_Clustering$CustomerID > row.names(DataFrame_Clustering) <- DataFrame_CustomerID > DataFrame_CustomerID <- NULL > DataFrame_Clustering <- scale(DataFrame_Clustering) > summary(DataFrame_Clustering ) 24 Clustering > d <- dist(DataFrame_Clustering) > c <- hclust(d, method = 'ward.D2’) > Plot(c) 25 Ijjagiri, Vivek (IV) - A dendrogram is a diagram that shows the hierarchical relationship between objects. It is most commonly created as an output from hierarchical clustering. The main use of a dendrogram is to work out the best way to allocate objects to clusters. The dendrogram below shows the hierarchical clustering of six observations shown to on the scatterplot to the left. (Dendrogram is often miswritten as dendogram.) Plotting with less data
  • 30. 26 Plotting with less data 27 Plotting with less data 28 Conclusion Customer segmentation process can be performed using various clustering algorithms. We focused on k-means clustering in R. The algorithm is quite simple to implement. However, representing data in the correct format and interpreting results is the difficult part.
  • 31. RFM Analysis can segment customers, design offers, promotions specific to audience and produce products based on customer profile and interests. References Shubhankar Rawat (May 2019), Mall Customers Segmentation — Using Machine Learning retrieved from using-machine-learning-274ddf5575d5 What is market segmentation, Different types explained retrieved from management/brand/what-is-market-segmentation/ Bradley, P. S., Bennett, K. P., & Demiriz, A. (2000). Constrained k-means clustering (Technical Report MSR-TR- 2000-65). Microsoft Research, Redmond, WA. K means clustering, AlindGupta retrieved from Thank you Any Questions
  • 32. .MsftOfcThm_Accent1_Fill { fill:#4472C4; } .MsftOfcThm_Accent1_Stroke { stroke:#4472C4; } RcodeProject.R ########################################## # section 3.3 Statistical Methods for Evaluation ########################################## ########################################## # section 3.3.1 Hypothesis Testing ########################################## # generate random observations from the two populations x <- rnorm(10, mean=100, sd=5) # normal distribution centered at 100 y <- rnorm(20, mean=105, sd=5) # normal distribution centered at 105 # Student's t-test t.test(x, y, var.equal=TRUE) # run the Student's t-test # obtain t value for a two-sided test at a 0.05 significance level qt(p=0.05/2, df=28, lower.tail= FALSE) # Welch's t-test t.test(x, y, var.equal=FALSE) # run the Welch's t-test
  • 33. # Wilcoxon Rank-Sum Test wilcox.test(x, y, = TRUE) ########################################## # section 3.3.6 ANOVA ########################################## offers <- sample(c("offer1", "offer2", "nopromo"), size=500, replace=T) # Simulated 500 observations of purchase sizes on the 3 offer options purchasesize <- ifelse(offers=="offer1", rnorm(500, mean=80, sd=30), ifelse(offers=="offer2", rnorm(500, mean=85, sd=30), rnorm(500, mean=40, sd=30))) # create a data frame of offer option and purchase size offertest <- data.frame(offer=as.factor(offers), purchase_amt=purchasesize) # display a summary of offertest where offer="offer1" summary(offertest[offertest$offer=="offer1",]) # display a summary of offertest where offer="offer2" summary(offertest[offertest$offer=="offer2",]) # display a summary of offertest where offer="nopromo" summary(offertest[offertest$offer=="nopromo",]) # fit ANOVA test model <- aov(purchase_amt ~ offers, data=offertest) summary(model)
  • 34. # Tukey's Honest Significant Difference (HSD) on all # pair-wise tests for difference of means TukeyHSD(model) Lesson 2 1-1© 2015 Pearson Education, Inc. Publishing as Prentice Hall Chapter 4 4-1 © 2015 Pearson Education, Inc. Publishing as Prentice Hall © 2015 Pearson Education, Inc. Publishing as Prentice Hall “It is a set of beliefs that one party holds about the other and how these beliefs are formed from the interactions of […] individuals as they engage in tasks associated with an IT service” (Day 2007) 4-2 © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-3 It is a multifaceted interaction of people and processes.
  • 35. It is complex. Different expectations and accountabilities may lead to lack of trust. It tends to cluster into patterns (e.g., IT is a necessary evil; IT is a support but not a partner; business and IT are partners). © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-4 IT has to keep proving itself. The business is often disengaged from IT work. Business expectations of IT change continually. Business assumptions of IT tend to cluster. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-5 The relationship is affected by the interaction of many people and processes at multiple levels. Clarity is often lacking around expectations and accountabilities. There are many “disconnects” between the two groups.
  • 36. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-6 Trust Credibility Competence Value Interpersonal Interaction © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-7 Expertise – the ability to support a technical recommendation and have up-to-date knowledge. Financial awareness – the ability to identify the value of IT in terms of ROI and total cost of ownership. Execution – the ability to understand the business, develop a vision and operationalize strategies. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-8 Find ways to develop business knowledge in all IT staff.
  • 37. Link IT’s success criteria to business metrics. Make business value an explicit criteria in all IT decisions. Ensure effective execution in all IT activities. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-9 Credibility is the belief that others can be counted on to do what they say they will do. It is built by: Keeping agreements. Acting with integrity, honesty and openness. Being responsive (e.g., delivering on time and under budget). © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 10 Communicate frequently and explicitly. Pay attention to the “little things”. Utilize external cues to credibility. Assess all business touch points.
  • 38. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 11 Professionalism - can be developed by five sets of attitudes and behaviors: on the job) good organization. job well) © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 12 Nontechnical communication The ability to translate and interpret needs, not only from business to technology and vice versa, but also between business units. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 13 Social sk ills The ability to build mutual understanding, to
  • 39. enable all parties to get comfortable with one another and to uncover hidden assumptions. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 14 Management of politics and conflict The ability to understand the role of politics and how they can affect the IT work (i.e., addressing conflict and use it to deliver creative solutions). © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 15 Expect professionalism. Promote a wide variety of social interactions at all levels. Develop “soft skills” in IT staff. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 16 The most important way to build trust is through an effective governance: Integrating planning, defined accountabilities,
  • 40. and clarity of roles and responsibilities are key aspects of an effective governance. An effective governance addresses the business’ expectations of its IT function. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 17 Design governance for clarity and transparency. Mandate the relationship. Design IT for business expectations. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 18 Business-IT relationships are complex, with interactions of many types, at many levels, and between both individuals and across functional and organizational entities. Four majors components are needed to build a strong business-IT relationship: competence, credibility, interpersonal skills, and trust. Chapter 5
  • 41. 5-1 © 2015 Pearson Education, Inc. Publishing as Prentice Hall © 2015 Pearson Education, Inc. Publishing as Prentice Hall Communication is a key social element of the organizational alignment between IT and business. One of the most important skills IT staff needs to develop is how to communicate effectively with businesses. 5-2 © 2015 Pearson Education, Inc. Publishing as Prentice Hall Good communication is essential for: trust and partnerships between the business and IT perceptions of IT of the business 5-3
  • 42. © 2015 Pearson Education, Inc. Publishing as Prentice Hall Principle 1: The effectiveness of communication is measured by its outcomes. Principle 2: Communication is social behavior. Principle 3: Shared knowledge improves communication. Principle 4: Mature organizations have better communication. 5-4 © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-5 Communication should be measure by its outcomes rather than our intentions. Communication can get distorted through filters such as politics, culture, and personal points of view. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-6 Communication not only transmits ideas; it also negotiates relationships. How you say what you mean is just as
  • 43. important as what you say. IT staff and managers need to become aware of the power of different linguistic styles in communication situations. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-7 The more IT staff learns about the business, the better communication becomes. Shared knowledge is the beginning of the “virtuous circle”. Shared Knowledge Increased Communication Mutual Understanding and “Common Sense” Implementation Success THE VIRTUOUS COMMUNICATION CYCLE
  • 44. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-8 Strong organizational practices support and reinforce good interpersonal communication. Mature IT organizations embed appropriate communication at the operational and strategic level. “You can’t be a partner unless you’re a mature IT organization” © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-9 The changing nature of IT work: IT work has become more complex over time. Multiple cultures, different political contexts, various times zones, and virtual contacts make communication more challenging. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 10 Hiring practices: IT skills are changing to become more consultative and collaborative, rather than focused exclusively on technology. “IT organizations can no longer support smart,
  • 45. super-talented but socially disruptive people” © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 11 IT and business organization structures: IT staff is expected to play a “knowledge broker” role, not only between IT and business but also between business units. Thus, business silos can make this communication challenging. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 12 Nature and frequency of communication: Formal interactions improve communication, but communication should not exclusively occur in formal interactions (e.g., through IT governance). © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 13 Attitude:
  • 46. Many IT staff are motivated by the desire to be right rather than the desire to communicate effectively. “We definitely need a ‘we’ attitude in IT, rather than ‘us-them’ attitude” © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 14 Translation: A four-step process Business Impact of Technology Issues Business Technology Issues IT Solution s Business