Customer Segmentation Project

IS - 6410 - System Analysis and Design
Group Project 2
Divya Bhatia
Poojya Reddy
Aditya Ekawade
Siddharth Suresh
Aditya Kannan

IS6410- Analysis & Design Customer Segmentation Report
Team Organisation Report
Team Member Skill Set IT Interest Areas
Aditya Ekawade Web technologies (HTML, JavaScript,
React, PHP, JAVA), UI, SEO
Web Development, Digital
Marketing
Siddharth Suresh IT Security, R, Statistics, Data
Visualization
Data Analytics, Business
Intelligence
Divya Bhatia Software Automation, R , Data
Visualization , Statistics
Data Science
Poojya Reddy Scripting, DevOPs, Build Engineer,
Business Analysis (Technical +
Functional)
DevOPS developer,Digital
Marketing and Analytics
Aditya Kannan Java, MySQL, Hadoop Ecosystem ,
Power BI.
Data Engineering, Data
Warehousing, Consulting.
Scrum Roles Team Member
Scrum Master Aditya Kannan
Product Owner Aditya Ekawade, Poojya Reddy
Developers Divya Bhatia, Siddharth Suresh
2

Table of Contents
Table of Contents 3
Project Selection And Requirements Analysis Report 4
Executive Summary 4
Detailed Requirements 6
High Level Scope Definition 10
Use Case Diagram 12
Use case narratives 13
Project Plan 29
Work Breakdown Structure 29
GANTT Chart 31
CoCoMo Estimation 32
Burndown Chart 34
Sprint Planning 35
Analysis Document 37
Logical Entity Relation Diagram 37
Data Flow Diagram 38
DFD Level 0 39
Activity Diagram 40
CRUD Matrix: 41
Buy vs Build Analysis 42
Design and Prototype Document 44
Architecture/ Platform Choices 44
Data Storage Platform: 45
Data Processing Platform: 45
Physical Entity Relation Diagram 46
Physical Data Flow Diagram 47
Mock-ups 48
References: 53
3

Project Selection And Requirements Analysis Report
Executive Summary
A flawless vision for what’s upcoming in fashion is the motto what our company likes to believe
in, since our inception 5 years ago. Trendzzz4u.com our company strives to exceed customer
expectation at every step of the user’s shopping journey on our website. This loyalty has driven
us from a small scale part time online retailer to middle tier e-commerce retailer.
Our website currently offers 15000+ products in clothes and accessories for Men and Women.
With the business expansion, which would offer 40000+ products through strategic partnerships
with suppliers in the next two years, scalability in managing our website data is the biggest
challenge we would face.
Our in-house analytics department currently deals with our inbuilt Data Warehouse which
consumes our inventory and CRM system data. Using this warehouse, our product managers
obtain actionable insights and make decisions based on weekly reports. The current size of our
warehouse is 2TB. With the target of an increase in the product catalog, there would an
exponential increase in data close to 10TB per year. If we continue with our current data
warehouse approach integration with the supplier source systems would be a problem and
working on them independently will create many data silos.
Also, we would restrict ourselves by working only on lag data as it is difficult to apply modern
statistical analysis such as association rule mining, classification on the data warehouse. This
would not help us to track on user buying/browsing patterns, work on unstructured data and
perform customer segmentation on unstructured data. With the current dynamics changing in
analytics we need to shift our existing data warehouse to a highly scalable cloud storage such
as Amazon S3 and build a data lake for analysis. ETL processing should be replaced with the
usage of modern MapReduce algorithms or agile in-memory data processing open source
frameworks such as Apache Spark/Kafka. Separating storage and computing is needed with
such huge amounts of influx data.
4

By performing customer segmentation following are the three objectives which can be achieved
with the implementation of this new analytics system:
1. We can track the difference between loyal customers vs visitors, perform heat map
analysis of their browsing patterns.
2. Understanding customer demographics and to focus on high profitable segments.
3. Finally empowering our Marketing department to make better strategic decisions in
terms of online Ads/campaigns.
End Users for our new system would be:
1. Marketing Department users
2. Product Managers
3. Data Analyst
5

Detailed Requirements
6

Responses:
7

8

9

High Level Scope Definition
User Stories Acceptance Criterion
As an Analyst, I want to load data from database so
that I can analyse it.
Data is available in the database.
Analyst should have correct
credentials and access level for the
database.
As an Analyst, I want to analyse the data so that I can
segregate the data into different customer segments.
Data is loaded from the database.
As an Analyst, I want to clean the data so that the
data is made consistent.
Data is loaded from the database.
Data may be structured or
unstructured which can be cleaned.
As an Analyst, I want to segment the data so that the
marketing team use these segments and lay out
different marketing strategies.
Data has different segments and
variety through which it can be
broken down.
Marketing strategies are created
based on segments identified.
As a Marketing Team, I want to pull reports based on
segments so that I can lay out different marketing
strategies.
Data is available based on segments
for reports to be created.
Identified segments can be mapped
to different strategies.
As a Marketing Team, I want to identify different
customer segments so that each segment can be
handled with the different promotional strategy.
Data has different segments and
variety.
Identified segments can be mapped
to different promotional strategies.
10

As a Marketing Team, I want to track campaigns so
that I will know which ones have reached the goal.
Data is available for the customers
who have interacted with various
campaigns.
As a Marketing Team, I want to send various
promotions to customers so that more customers are
obtained.
Marketing team has access to send
promotions.
As a Customer, I want to receive promotions so that I
can avail them.
Customer should have access to
internet to receive various forms of
promotions.
As a Customer, I want to interact with the campaigns
so that I can accept the promotion.
Customer should receive
promotions.
11

Use Case Diagram
12

Use case narratives
Narrative - 1
Use case name (should
describe the goal- active verb)
Analyze Data
Last revised March 13, 2017 by Poojya Reddy
March 13, 2017 by Aditya Kannan
Description (purpose) This use case describes how data is analyzed .
Actors (that could invoke use
case)
Analyst
Pre-condition Data is loaded from the database.
Post-condition Cleaned data along with customer segments.
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Analyst has access to the data loaded from the database.
2.As part of the data analysis, the analyst first cleans the data
3.After data cleaning, customer segments are created which can be used to identify
different customers.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
2.Data loaded from the database is already clean.
13

3.Data is insufficient to create segments/few data points/one particular segment is
dominating the dataset.
Alternate paths (Extensions/ Exceptions)
1. a1 Data is not loaded correctly from the database.
a2 Analyst cannot access the data.
b1 Analyst does not have the correct access level to view the data
b2 Analyst cannot access the data
2.a1 Data cleaning fails due to inconsistent data,junk values,few data points etc.
3.a1 Too few data points to create customer segments/data set is only of one particular
type.
a2 Use case terminates and needs to be restarted.
List Related use case names Clean Data
Customer Segmentation
14

Narrative -2
Use case name (should describe the goal-
active verb)
Load Data
Last revised March 13, 2017 by Divya Bhatia
March 13, 2017 by Siddharth Suresh
Description (purpose) This use case describes how data can be
loaded from database which is required for
analysis.
Actors (that could invoke use case) Analyst,AWS System
Pre-condition An existing database and valid credentials
for the analyst.
Post-condition Data is loaded from the database.
trigger to end)
1.Analyst logins into the database with valid credentials.
2.Database validates the user credentials and access type, and allows the analyst to
login.
3.Analyst can view the data and load the data( via various data source systems like
CRM,Operational systems,external data providers) in memory to work on it.
1.Credentials can be of various types such as Administrator,User,Team accesses.
3.Connect database to external sources.
15

1. a1 Credentials entered are incorrect, which does not allow the analyst to login.
a2 Loading the database fails.
a3 Analyst is redirected to the login page.
2.a1 Credentials have a different access level than required, which does not allow the
analyst to login.
a2 Loading the database fails.
a3 Analyst is redirected to the login page.
3.a1 Loading the database fails.
List Related use case names
16

Narrative -3
active verb)
Identify Segments
Last revised March 13, 2017 by Aditya Kannan
March 13, 2017 by Aditya Ekawade
Description (purpose) This use case describes how segments
can be identified from marketing
perspective.
Actors (that could invoke use case) Marketing team
Pre-condition Marketing team has access to reports
created by the analyst.
Post-condition Customer segments identified by
marketing team.
trigger to end)
1.Marketing team has access to reports created by the analyst.
2.Identify segments based on the reports created by the analyst.
1.Reports has insufficient data
2.Data is insufficient to create segments/few data points/one particular segment is
dominating the dataset.
1. a1 Marketing team does not have access to reports created by the analyst.
17

a2 Marketing team cannot access the reports.
2.a1 Too few data points to create customer segments/data set is only of one particular
type.
18

Narrative -4
Use case name (should describe the goal- active
verb)
Pull Reports
March 13, 2017 by Siddharth Suresh
Description (purpose) This use case describes how marketing
team can pull reports created by the
analyst.
Actors (that could invoke use case) Marketing team,AWS System
Pre-condition Marketing team has access to reports
created by the analyst.
Post-condition Reports can be viewed by the marketing
team.
trigger to end)
1.Marketing team has access to reports created by the analyst.
2.Marketing team can view and make edits on the reports.
3.Data for the reports is pulled from the AWS system.
1.Reports have no data
1. a1 Marketing team does not have access to reports created by the analyst.
a2 Marketing team cannot access the reports.
19

2.a1 Marketing team cannot make edits or use filters on the reports.
3.a1 AWS System is down and data cannot be pulled
a2 Use case terminates and needs to restart
20

Narrative -5
verb)
Interacts with campaign
March 13, 2017 by Poojya Reddy
Description (purpose) This use case describes the
interaction of customer with a
campaign.
Actors (that could invoke use case) Customer
Pre-condition Customer received a promotion from
the marketing team.
Post-condition Customer interacted with the
promotion.
trigger to end)
1.Marketing team sends promotions to the customer.
2.Customer responds to the promotion.
3.The interaction of the customer with the promotion is tracked by the marketing team
which is used to compare with the goals required by the team.
1. Customer does not respond to the promotion.
2. Marketing team sends multiple promotions to the same customer.
2. a1 Customer does not interact with the promotions sent.
21

a2 Use case terminates.
3. a1 No interaction by the user results in no data generation, hence the marketing team
cannot track the campaign.
List Related use case names Track Campaigns
22

Narrative -6
verb)
Send Promotions
Last revised March 13, 2017 by Siddharth Suresh
Description (purpose) This use case describes type of
promotions the marketing team sends.
Pre-condition Marketing team has access to send
promotions.
Post-condition Marketing team sends promotions.
trigger to end)
1.Marketing team sends various forms of promotions like emails,loyalty programs,
coupons, social media ads and paid ads.
1. Team sends only emails or loyalty program promotion to the customer.
2. Team sends coupons and media ads to the user based on interactions with the
campaigns.
1. a1 Marketing team is unable to gather any data about customers and no promotions
are sent.
a2. Use case terminates.
23

List Related use case names Email marketing
Loyalty program
Send Coupon
Social Media
Display/Paid Ads
24

Narrative -7
verb)
Track Campaigns
Last revised March 13, 2017 by Poojya Reddy
Description (purpose) This use case describes how
marketing team can track campaigns.
Pre-condition NA
Post-condition Marketing team could successfully
track campaigns
trigger to end)
1.Marketing team tracks the campaign for which the user interacts with the campaign.
2.Tracked campaigns are compared with respect to the goals required for the campaign.
1. No user interacts with the campaign.
1. a1 There is no data to track and compare with the expected goals as no user interacts
with the campaign.
a2. Use case terminates
2.a1 There are no expected goals for comparison.
25

List Related use case names Interacts with campaigns
Goals completed
26

Narrative -8
active verb)
Send Coupons
March 13, 2017 by Aditya Kannan
Description (purpose) This use case describes the interaction of
marketing teams,customer with a coupon.
Actors (that could invoke use case) Marketing team,Customer
Pre-condition Marketing team has access to send
promotions,Customer can receive
promotions.
Post-condition Marketing team sends promotions via
coupons.
trigger to end)
1.Marketing team sends promotions via coupons.
2.Customer responds to the promotional coupon either by using it or asking updates on it.
3.The interaction of the customer with the coupon is tracked by the marketing team which
is used to compare with the goals required by the team.
1. Customer does not respond to the promotional coupon.
2. Marketing team sends multiple promotions to the same customer.
1. a1 Marketing team does not send any promotions.
27

a2. Use case terminates.
2. a1 Customer does not interact with the promotions sent.
a2 Use case terminates.
List Related use case names Send Promotions
28

Project Plan
Work Breakdown Structure
WBS is a hierarchical and incremental decomposition of the project into phases, deliverables
and work packages. It is a tree structure, which shows a subdivision of effort required to achieve
an objective; for example a program, project, and contract.[2]
In a project or contract, the WBS is
developed by starting with the end objective and successively subdividing it into manageable
components in terms of size, duration, and responsibility (e.g., systems, subsystems,
components, tasks, subtasks, and work packages) which include all steps necessary to achieve
the objective.
The diagram below shows the WBS of the entire customer segmentation project. The project is
divided into 5 modules
1. Customer Survey
2. Create E-Commerce Website
3. Set Hadoop Environment
4. Data Engineering
5. Analyze Data & Reporting
29

Customer Survey: The main focus of this module is to prepare, send and analyze
questionnaires for potential customers. The questionnaires are prepared such that to analyze
the the demographics and the type of devices used by people. The purpose of this phase is to
use this data as a means to estimate the success rate of reaching potential customers with
targeted promotions.
Create e-Commerce Website: This module of the project includes, searching and acquiring an
e-commerce web site that is readily available in the market, analyzing whether to go with cloud
or web hosting (web hosting chosen for our project), purchasing a web domain, installing the
the e-commerce template on the server, getting the website up and running and finally
generating the web site logs.
Set Hadoop Environment: The operations during this phase includes creating login credentials
in the AWS, Purchasing EMR and S3 services, installing the necessary softwares in EC2 and
finally testing the Hadoop clusters.
30

Data Engineering: The Data Engineering phase is responsible for ingesting the log data
contained in the web server into the EMR node clusters, then converting the unstructured data
into structured data using the MapReduce algorithm and storing the structured data in a
relational database.
Analyze Data & Reporting: This is the final phase of the project which helps the marketing team
create targeted promotions. The data is loaded from the relational database for the analysts to
perform data analysis and identify the various customer segments. The identified customer
segments and provided to the marketing team in the form of reports. The marketing team will
perform their analysis and come up with campaign strategies and targeted promotions.
GANTT Chart
A GANTT chart is a good way to keep track of the various activities undertaken during the
project. However, we are constricting our chart to only the planning phase which is the entire
endeavor of the class project.
31

CoCoMo Estimation
Based on the definitions of each of the development modes, we have decided that our
project to be a semi-detached project. It is a software project which is intermediate in
both size and complexity. Our team consists of individuals with mixed experience levels
and our project deals with a good mix of rigid and less than rigid requirements.
The equation for the Effort (E) and Development time (D) for this model are :
E = 3.0 * (KLOC)^1.12 D = 2.5 * (E)^0.35
Simple Average Complex
Inputs Member
Login
3 6
Member registration 3
Outputs Send Promotions 4 4
Inquires Pull reports 3 37
32

Analyze Data 10
Identify Segments 8
Track Campaigns 8
Interacts with campaigns 8
Files Reports 8 8
Interfaces Application server to
database
10 20
User to application server 10
Total 75
Calculating the Adjusted Function Point -
The adjusted function point denoted by FP is given by the formula:
FP = total UFP * (0.65 + (0.01 * Total complexity adjustment value)) or
FP = total UFP * (Complexity adjustment factor)
Total complexity adjustment value is counted based on responses to questions called
complexity weighting factors in the table below:
Table Adjusted Function Points
Number Complexity Weighting Factor Valu
e
1 Backup and recovery 2
2 Data communications 2
3 Distributed processing 2
4 Performance critical 5
5 Existing operating environment 4
6 Online Data Entry 3
7 Input transaction over multiple screens 1
33

8 Master files updated online 3
9 Information domain values complex 5
10 Internal processing complex 4
11 Code designed for reuse 5
12 Software Deployment 4
13 Application designed for change 4
Total complexity adjustment value 44
Calculating the Source Lines of Code (SLOC) -
· Total Unadjusted Function Points (UFP) = 75
· Product Complexity Adjustment (PC) = 0.65 + (0.01 *44) = 1.74
· Total Adjusted Function Points (FP) = UFP * PC = 75 *1.74 = 130.5
· Language Factor (LF) for programming languages used assumed as = 25
· Source Lines of Code (SLOC) = FP * LF = 130.5 *25 = 3262.5
Estimating the Effort and Development Time -
The programmer productivity and the development time are as follows:
· KDSI = 3.263 KLOC
· Effort = 3 * (3.26) 1.12
= 11.27 person-month
· Development TIme = 2.5 * (11.27) 0.35
= 5.83 months
Burndown Chart
After understanding the scope of the project, we estimated the deliverables of the class project
to be equivalent to 90 hours of work and estimated 2 hours of work to be completed on a daily
basis, thereby completing the project in 45 days time.
The burndown chart below shows the rate of work completed from inception to completion.
34

Sprint Planning
35

36

Analysis Document
Logical Entity Relation Diagram
37

Data Flow Diagram
38

DFD Level 0
39

Activity Diagram
40

CRUD Matrix:
Processes/
Entities
Load Data Perform Data
Analysis
Build
Customer
Segmentation
Dashboard
Build
Strategy
System
Data Lake R R R R
Reports R CRUD CRUD RU
Campaign
Log file
R R R CRUD
41

Buy vs Build Analysis
For our project, we need 4 machines each with minimum 8GB RAM to be running to process our
website logs. If we plan for an in house cluster setup, it would increase the maintenance cost
and also for processing big data, scalability is the biggest worry as we never know the size of
the incoming data.So after careful analysis, meetings with the current IT systems and
stakeholders team, we have decided to go ahead with buy option.
Amazon Web Services (AWS), offers EMR (Elastic MapReduce) a on cloud hadoop framework
to process vast amounts of data in the most cost effective and fast way.EMR provides an option
to scale node and clusters dynamically. Also aws offers 99.99% run time, and any cluster can
be spinned up in under 2minutes. We calculated estimated cost from AWS calculator for using
EC2 and EMR services. The cost is around 60$ per month. Below given is the snapshot from
the aws calculator.
Further if we need to separate computing and storage as we progress in big data, we can opt
for Amazon S3, for on cloud storage and create a data pipeline between S3 and amazon EMR.
The cost of using S3 as per aws calculator is 266$ for storing 10TB of data.
42

43

Design and Prototype Document
Architecture/ Platform Choices
1. The above diagram depicts our ‘to-be-system’ for applying customer segmentation.
2. The process would start by first generating the logs, from our website (trendzzz4u.com).
The logs would consist of clickstream data and browsing data.
3. Using the logs generated, the data would be ingested in the AWS cloud for Data
Processing.
4. AWS would be Infrastructure platform, for deploying, processing and applying analytics
on the log data.
44

5. Unstructured data would be converted to a structured format for data analysis.
Data Storage Platform:
1. Amazon S3.
2. Amazon EFS
3. MySQL DB Instance
Data Processing Platform:
Amazon EMR: A comprehensive hadoop package provided by amazon consisting of
Hive, Sqoop, Flume, MapReduce and Hbase. This is main processing engine for our
application. Business logic would reside here.
45

Physical Entity Relation Diagram
46

Physical Data Flow Diagram
47

Mock-ups
The diagrams below depicts the mock-up screens of the dashboards for the Analyst and the
Marketer. The diagrams cover the following uses cases:
● Analyzing data and creating reports by the Analyst.
● Pulling the reports, sending promotions and tracking the campaigns by the Marketer.
These UI mock-ups are designed by using the software Adobe experience design (XD) and are
designed by focussing on the principles of Utility and Usability.
The dashboards will be created in such a way that the Analysts and Marketers can spend more
time in doing what they do best and less time in learning these interfaces.
Mockup screen for data analyst dashboard.
48

Mockup screen for data analysis.
49

Mockup screen for marketing analyst dashboard.
50

Mockup screen for sending email promotions.
Mockup screen for campaign tracking.
51

52

References:
1.For general understanding of all concepts - “Dr. Ramachandran, Vandana”, All the lecture
slides
2. For all references regarding services offered by amazon.
https://aws.amazon.com/, February 10, February 17, March 13, March 14, March 15, 2017
3. To Understand the writing style in executive summary - “Faulkner,Jennifer ” Published on
September 17,2015, https://www.proposify.biz/blog/executive-summary , Accessed on March 18
2017
4. To estimate CoCoMo -
http://people.cs.ksu.edu/~padmaja/Project/CostEstimate.htm , Accessed on March 19 2017
5. For Use Case Narratives,High level scope definition -
“Dr. Ramachandran, Vandana”, s3_IS6410-Requirements.pptx, 23rd January 2017
Tools used :
6.For all diagrams(Use case,ERDs,DFDs,Software architecture,WBS) -
https://www.lucidchart.com/documents#docs?folder_id=home&browser=icon&sort=saved-desc
7. For creating UI Mockups-
Design for the Header on Analyst’s dashboard based on Power BI and the software used Adobe
XD - https://powerbi.microsoft.com/en-us/
53

Customer Segmentation Project

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Customer Segmentation Project

Similar to Customer Segmentation Project (20)

Recently uploaded

Recently uploaded (20)

Customer Segmentation Project