1. The data cleaning process involved fixing incorrect negative values and missing data in tables. Queries were run to identify issues and make corrections.
2. An assessment of total sales showed overall sales of $84 million with profits of $9.4 million. Highest sales were on Saturdays and member type V bought the most.
3. An analysis of member behavior found that on average members bought 8 items worth $83.50 per visit, with 6 unique items. The most active members visited over 5,000 times and spent over $50,000.
The document summarizes the results of analyzing sales data from Sam's Club stores. Key findings include:
- Total sales across all stores was $64.6 million, with the top-selling store bringing in $5.8 million.
- Sales were highest on Sundays ($13.7 million) and Saturdays ($9.3 million).
- The most common member type, "V", accounted for $6 million in sales.
- On average, members visited stores 2.1 times and spent $81.81 per visit, purchasing 8 items.
- Peak shopping hours were between 3-6pm on weekdays and earlier on weekends.
This document provides an overview of analytic functions in Oracle SQL. It begins by introducing aggregate functions such as SUM, COUNT, MAX, and MIN, which are used to group and summarize data. It then explains analytic functions, also known as windowing functions, which allow calculations over sets of rows defined in a window. Several common analytic functions like SUM, RANK, DENSE_RANK, and ROW_NUMBER are demonstrated. The document also covers windowing clauses, lag/lead functions, and using analytic functions to calculate rolling totals. Overall, the document serves as a high-level introduction to analytic SQL functions and how they can be used to analyze and summarize data in more flexible ways compared to traditional aggregate functions.
Quick iteration and reusability of metric calculations for powerful data exploration.
At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc data requests and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are starving for data, but are forced to make important decisions without access to data that could guide them in the right direction. Looker addresses both of these problems with a YAML-based modeling language called LookML.
This paper walks through a number of data modeling examples, demonstrating how to use LookML to generate, alter, and update reports—without the need to rewrite any SQL. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. Small updates are quick and can be made immediately available to business users to manipulate, iterate, and transform in any way they see fit.
Predict Repeat Shoppers with H20 and SparkairisData
1. The document discusses predicting repeat shoppers of online stores using gradient boosted machines in H2O and Spark.
2. It describes the repeat shopper problem motivation of targeting repeat purchasers who are more lucrative customers.
3. Feature generation steps are outlined using user and seller data like monthly purchase counts, category interactions, and seller similarity metrics to create a large sparse matrix for modeling.
This document discusses using Ruby to perform multidimensional data analysis on relational databases. It introduces Mondrian, an open-source OLAP engine that allows for multidimensional analysis on top of SQL databases using the MDX query language. A new Ruby gem called mondrian-olap will integrate Mondrian and provide a Ruby DSL and ActiveRecord-like query interface for defining OLAP schemas and performing analytical queries on relational data in a simpler way than SQL. Examples show how to write multidimensional queries in MDX and the Ruby interface to analyze sales data across dimensions like time, products, and customers.
A check out service for users to leave location specific feedback and gain rewards.
Users are able to choose between an express or full-review option to leave valuable information about their experience.
Businesses are given the option to place specific survey questions and reward users by offering targeted coupons/discounts, facilitating a low cost B2C relationship.
The document discusses key concepts related to planning profit for retailers, including cost, retail price, operating income, cost of goods sold, gross margin, operating expenses, and net operating profit. It provides examples of calculating gross sales, customer returns and allowances, net sales, and gross sales based on net sales and customer return percentage. The document also discusses evaluating buyer performance based on sales, inventory, and margin results. Finally, it discusses using profit and loss statements from a merchandiser's perspective.
The document summarizes a group project presentation on customer relationship management (CRM) analytics. It discusses how CRM gives insights into customers and sales/service, how CRM analytics helps monitor customer service and generate leads. It then describes exploring a retail dataset to analyze customer purchase behavior and segment customers into groups based on recency, frequency, and monetary value of transactions to identify the most profitable customers. K-means clustering is used to segment customers and predict future revenue. Customer lifetime value is also calculated.
The document summarizes the results of analyzing sales data from Sam's Club stores. Key findings include:
- Total sales across all stores was $64.6 million, with the top-selling store bringing in $5.8 million.
- Sales were highest on Sundays ($13.7 million) and Saturdays ($9.3 million).
- The most common member type, "V", accounted for $6 million in sales.
- On average, members visited stores 2.1 times and spent $81.81 per visit, purchasing 8 items.
- Peak shopping hours were between 3-6pm on weekdays and earlier on weekends.
This document provides an overview of analytic functions in Oracle SQL. It begins by introducing aggregate functions such as SUM, COUNT, MAX, and MIN, which are used to group and summarize data. It then explains analytic functions, also known as windowing functions, which allow calculations over sets of rows defined in a window. Several common analytic functions like SUM, RANK, DENSE_RANK, and ROW_NUMBER are demonstrated. The document also covers windowing clauses, lag/lead functions, and using analytic functions to calculate rolling totals. Overall, the document serves as a high-level introduction to analytic SQL functions and how they can be used to analyze and summarize data in more flexible ways compared to traditional aggregate functions.
Quick iteration and reusability of metric calculations for powerful data exploration.
At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc data requests and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are starving for data, but are forced to make important decisions without access to data that could guide them in the right direction. Looker addresses both of these problems with a YAML-based modeling language called LookML.
This paper walks through a number of data modeling examples, demonstrating how to use LookML to generate, alter, and update reports—without the need to rewrite any SQL. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. Small updates are quick and can be made immediately available to business users to manipulate, iterate, and transform in any way they see fit.
Predict Repeat Shoppers with H20 and SparkairisData
1. The document discusses predicting repeat shoppers of online stores using gradient boosted machines in H2O and Spark.
2. It describes the repeat shopper problem motivation of targeting repeat purchasers who are more lucrative customers.
3. Feature generation steps are outlined using user and seller data like monthly purchase counts, category interactions, and seller similarity metrics to create a large sparse matrix for modeling.
This document discusses using Ruby to perform multidimensional data analysis on relational databases. It introduces Mondrian, an open-source OLAP engine that allows for multidimensional analysis on top of SQL databases using the MDX query language. A new Ruby gem called mondrian-olap will integrate Mondrian and provide a Ruby DSL and ActiveRecord-like query interface for defining OLAP schemas and performing analytical queries on relational data in a simpler way than SQL. Examples show how to write multidimensional queries in MDX and the Ruby interface to analyze sales data across dimensions like time, products, and customers.
A check out service for users to leave location specific feedback and gain rewards.
Users are able to choose between an express or full-review option to leave valuable information about their experience.
Businesses are given the option to place specific survey questions and reward users by offering targeted coupons/discounts, facilitating a low cost B2C relationship.
The document discusses key concepts related to planning profit for retailers, including cost, retail price, operating income, cost of goods sold, gross margin, operating expenses, and net operating profit. It provides examples of calculating gross sales, customer returns and allowances, net sales, and gross sales based on net sales and customer return percentage. The document also discusses evaluating buyer performance based on sales, inventory, and margin results. Finally, it discusses using profit and loss statements from a merchandiser's perspective.
The document summarizes a group project presentation on customer relationship management (CRM) analytics. It discusses how CRM gives insights into customers and sales/service, how CRM analytics helps monitor customer service and generate leads. It then describes exploring a retail dataset to analyze customer purchase behavior and segment customers into groups based on recency, frequency, and monetary value of transactions to identify the most profitable customers. K-means clustering is used to segment customers and predict future revenue. Customer lifetime value is also calculated.
Production statistics
Here you can see the number of stores with and without coupons, the number of stores with coupons in production, and in QA stages. The same real-time data is presented in percentages and also can be searched by the store's name to view the specific data for each one.
Coupons auto-apply statistics by merchants
Here you will find the data based on stores. Also, there are such metrics as:
- Success events / Number of configs in production ratio
- The percentage ratio of stores with show events to stores with coupons in production
- The percentage ratio of stores with start events to stores with show events
- The percentage ratio of stores with success events to stores with start events
- The line graph shows the whole picture clearly and with all details.
Email: sales@besttoolbars.net
Project A hast helped more than 15 of its portfolio companies with setting up Data Warehouses and BI infrastructure in general. Doing so, we developed quite some understanding of the technical and organisational aspects of providing an organisation with data. However, we oftentimes struggled with defining the right KPIs that were relevant for different operational teams and for other use cases such as recommendation, segmentation and prediction.
In this talk, I will share some our of the best practices for metrics and attributes of customers, transactions and marketing touchpoints that we developed over the last 5 years. In addition to that, I will talk about the technical aspect of consistency and correctness and about http://commerce-reporting.com, an initiative to share knowledge about reporting best practices.
This document analyzes Warby Parker's marketing funnels using SQL to calculate conversion rates. It summarizes the results of analyzing Warby Parker's style quiz and home try-on feature funnel. Key findings include that giving customers more home try-on options (5 pairs vs 3 pairs) increased purchase rates and focusing on getting more people through the full funnel could increase overall purchases.
5 metrics to strengthen your multichannel sales strategydevin simon
For many eCommerce companies, measuring multichannel sales performance is a challenge. And as your organisation expands to new sales channels like marketplaces, your own brand web stores, and social commerce channels, the complexity for measuring their performance increases.
So, you may end up being confused on which KPIs to track, or end up tracking every known KPI out there. To avoid this, we have listed some of the most prominent KPIs that can help you critically analyse your multichannel sales strategy:
Contents
Phase 1: Design Concepts 2
Project Description 2
Use Cases 3
Data Dictionary 4
High Level Design Components 5
Detailed Design: Checkout 7
Diagrams 7
Design Analysis 8
Detailed Design: Product Research 9
Diagrams 9
Design – Using Pseudocode 10
Product Profit 10
Phase 2: Sequential Logic Structures 11
Design 11
Product Profit 11
Phase 3: Problem Solving with Decisions 12
Safe Discount 12
Return Customer Bonus 13
Applying Discounts 14
Phase 4: Problem Solving with Loops 15
Total order 15
Problems to Solve 16
Calculate Profits 16
Rock, Paper, Scissors 18
Number Guessing Game 20
Phase 5: Using Abstractions in Design 22
Seeing Abstractions 22
Refactoring 22
Phase 1: Design ConceptsProject Description
Although we may be late to the game, we will nevertheless join the world of e-commerce to sell our fantastic product on the Internet. To do so, we need a Web site that will allow for commerce and sales. To be quick about it, we require the following:
· Searchable inventory and shopping pages
· A shopping cart
· A place for customers to register when they make purchases
· A checkout process to make the purchase
Within this main process, there are a bunch of other needs that must be met, as follows:
· We want to track the date of the last purchase a customer make so we can offer incentives and discounts based on the last time they shopped.
· We will offer sales based on the number of different items that a person purchases.
· We will also give discounts for bulk orders a discount when a person buys many of the same item
In addition to sales feature, the solution must provide the ability to manage and research the sales of products. It must include the following:
· Must be able to add, update and remove product inventory in real time on the site
· Needs to have research capabilities to determine how well a product is selling, such as the following:
· How often the item is viewed, added to shopping carts, and then purchased
· How a price change affects sales and profit
Use Cases
From the description above, we can relate this to the following use cases, which describe how the user will interact with our system. Each use case is a set of screens that the users would interact with to accomplish something they need on the site.
In addition to the customer’s activity, the solution will allow Sales Analysts to manage and research product sales.
Data Dictionary
Variable Name
Type
Description
todaysDate
Date
Today’s date, when the program is running
creationDate
Date
The date the customer created their account
priorPurchases
Integer
Number of Purchases this customer has made in the past
lastPurchaseDate
Date
The date of the last purchase the customer made
lineItemPrice
Array
The price of each line item the customer has added to the cart
lineItemQuantity
Array
The quantity of each line item the customer has added to the cart
membershipLevel
Integer
The account nature of the customer
1 – Guest
2 – Registered
3 – Preferred
totalPurchaseAmount
Double
T.
Predicting online user behaviour using deep learning algorithmsArmando Vieira
We propose a robust classifier to predict buying intentions based on user behaviour within a large e-commerce website. In this work we compare traditional machine learning techniques with the most advanced deep learning approaches. We show that both Deep Belief Networks and Stacked Denoising auto-Encoders achieved a substantial improvement by extracting features from high dimensional data during the pre-train phase. They prove also to be more convenient to deal with severe class imbalance.
Company segmentation - an approach with RCasper Crause
We classify companies based on how their stocks trade using their daily stock returns (percentage movement from one day to the next). This analysis will help your organization determine which companies are related to each other (competitors and have similar attributes).
How to understand Clickbank statistics. Simple way to understand clickbank product selection is by understanding clickbank statistics. Learn how to understand this clickbank statistics for better conversion and profit.
Google Partners Hangout: Lexical Analysis Part 2 Boost Media
This SlideShare includes tips on how marketers can uncover language learnings and insights based on current messaging strategies. By using lexical analysis, we learned how to gain competitive advantages and actionable insights.
Being involved in performance audits on systems of every size, from start-up sites hacked together overnight, to a ginormous applications built by world-recognized brand companies, I’ve seen a lot of interesting (and sometimes very unique) performance issues in every level of the stack: code, architecture, databases (sometimes all of the above). But there are a few particular, very “Performance 101″, issues that (unfortunately) appear in a lot of code bases. In this talk I'll present the most common database-related performance bottlenecks that can happen in most applications.
This document provides instructions for using SampleNet merchandising software. It allows users to enter sample details and invoices, generate sample orders, track sample statuses in reports, and analyze development costs by customer. The software is intended for sampling departments to efficiently manage the sample development process for garment and textile companies.
A Market Basket Analysis of a bakery shop data using Apriori Algorithms and Association Rule mining . Application and Benefits of Market Basket Analytics in Retail Management
Web analytics involves measuring, collecting, analyzing, and reporting internet data to understand and optimize website usage. It can be used as a tool for business and market research as well as improving website effectiveness. Web analytics applications can also help measure the results of traditional advertising campaigns by estimating how traffic to a website changes after a new campaign launch. Key metrics include the number of visitors and page views, which provide information about traffic and popularity trends useful for market research.
• Designed relational database, da modeling and profiling, create of table and view structure on MS SQL Server
• Developed complex SQL code such as table-level check constraint, Triggers, computed columns, Indexes and Views
• Utilized Tableau to create analytical dashboards pertaining to sales and retailer information making vital decisions and strategic planning of the system.
This document describes an online gift selling system that aims to address limitations of manual and existing online systems. The proposed system would allow global customers to purchase gifts online, help maximize store profits, and provide dynamic product information to customers. It would develop an easy-to-use, interactive website for online gift selling and ordering. The system would maintain customer records and support online payment processing and order tracking.
This document discusses various concepts related to merchandise management, including:
- Types of buying systems for staple and fashion merchandise
- Factors to consider when determining order quantities
- The relationship between inventory investment and product availability
- Components of inventory like cycle stock and buffer stock
- Methods for forecasting demand and calculating order points
- Merchandise budget planning and open-to-buy monitoring
- Allocating merchandise to stores and analyzing performance using ABC analysis and sell-through rates
- Evaluating vendors using a weighted average approach
- Using the retail inventory method to track inventory costs and values
A search term report provides information on an ad campaign's performance for specific search terms. It can be generated for all campaigns or individual campaigns. The report shows metrics like impressions, clicks, conversions and costs for each search term. Segmenting the report by time, devices, networks or conversions allows advertisers to analyze search term performance in different contexts. Identifying top performing and irrelevant search terms allows advertisers to optimize keywords and negative keywords to improve click-through rates and reduce costs.
The document describes a user profiling engine that predicts whether online shoppers will purchase items and what items will be bought. It analyzes an e-commerce clickstream dataset containing user sessions and purchases. A random forest classifier is used to predict buys based on features like the number of item clicks, the item buy-to-click ratio, popular items, and time of day. The best model score was 45,821 by using these informative features without overfitting. Proper feature selection is important for accurately determining buyer behavior.
This document discusses using a sales funnel to calculate the number of leads required to achieve a target revenue. It explains how to determine the current conversion rates between leads, prospects, and customers. With this information, businesses can calculate the approximate number of leads needed to generate a specific revenue goal. The sales funnel is a useful tool for evaluating a company's sales process and identifying opportunities to support growth.
Production statistics
Here you can see the number of stores with and without coupons, the number of stores with coupons in production, and in QA stages. The same real-time data is presented in percentages and also can be searched by the store's name to view the specific data for each one.
Coupons auto-apply statistics by merchants
Here you will find the data based on stores. Also, there are such metrics as:
- Success events / Number of configs in production ratio
- The percentage ratio of stores with show events to stores with coupons in production
- The percentage ratio of stores with start events to stores with show events
- The percentage ratio of stores with success events to stores with start events
- The line graph shows the whole picture clearly and with all details.
Email: sales@besttoolbars.net
Project A hast helped more than 15 of its portfolio companies with setting up Data Warehouses and BI infrastructure in general. Doing so, we developed quite some understanding of the technical and organisational aspects of providing an organisation with data. However, we oftentimes struggled with defining the right KPIs that were relevant for different operational teams and for other use cases such as recommendation, segmentation and prediction.
In this talk, I will share some our of the best practices for metrics and attributes of customers, transactions and marketing touchpoints that we developed over the last 5 years. In addition to that, I will talk about the technical aspect of consistency and correctness and about http://commerce-reporting.com, an initiative to share knowledge about reporting best practices.
This document analyzes Warby Parker's marketing funnels using SQL to calculate conversion rates. It summarizes the results of analyzing Warby Parker's style quiz and home try-on feature funnel. Key findings include that giving customers more home try-on options (5 pairs vs 3 pairs) increased purchase rates and focusing on getting more people through the full funnel could increase overall purchases.
5 metrics to strengthen your multichannel sales strategydevin simon
For many eCommerce companies, measuring multichannel sales performance is a challenge. And as your organisation expands to new sales channels like marketplaces, your own brand web stores, and social commerce channels, the complexity for measuring their performance increases.
So, you may end up being confused on which KPIs to track, or end up tracking every known KPI out there. To avoid this, we have listed some of the most prominent KPIs that can help you critically analyse your multichannel sales strategy:
Contents
Phase 1: Design Concepts 2
Project Description 2
Use Cases 3
Data Dictionary 4
High Level Design Components 5
Detailed Design: Checkout 7
Diagrams 7
Design Analysis 8
Detailed Design: Product Research 9
Diagrams 9
Design – Using Pseudocode 10
Product Profit 10
Phase 2: Sequential Logic Structures 11
Design 11
Product Profit 11
Phase 3: Problem Solving with Decisions 12
Safe Discount 12
Return Customer Bonus 13
Applying Discounts 14
Phase 4: Problem Solving with Loops 15
Total order 15
Problems to Solve 16
Calculate Profits 16
Rock, Paper, Scissors 18
Number Guessing Game 20
Phase 5: Using Abstractions in Design 22
Seeing Abstractions 22
Refactoring 22
Phase 1: Design ConceptsProject Description
Although we may be late to the game, we will nevertheless join the world of e-commerce to sell our fantastic product on the Internet. To do so, we need a Web site that will allow for commerce and sales. To be quick about it, we require the following:
· Searchable inventory and shopping pages
· A shopping cart
· A place for customers to register when they make purchases
· A checkout process to make the purchase
Within this main process, there are a bunch of other needs that must be met, as follows:
· We want to track the date of the last purchase a customer make so we can offer incentives and discounts based on the last time they shopped.
· We will offer sales based on the number of different items that a person purchases.
· We will also give discounts for bulk orders a discount when a person buys many of the same item
In addition to sales feature, the solution must provide the ability to manage and research the sales of products. It must include the following:
· Must be able to add, update and remove product inventory in real time on the site
· Needs to have research capabilities to determine how well a product is selling, such as the following:
· How often the item is viewed, added to shopping carts, and then purchased
· How a price change affects sales and profit
Use Cases
From the description above, we can relate this to the following use cases, which describe how the user will interact with our system. Each use case is a set of screens that the users would interact with to accomplish something they need on the site.
In addition to the customer’s activity, the solution will allow Sales Analysts to manage and research product sales.
Data Dictionary
Variable Name
Type
Description
todaysDate
Date
Today’s date, when the program is running
creationDate
Date
The date the customer created their account
priorPurchases
Integer
Number of Purchases this customer has made in the past
lastPurchaseDate
Date
The date of the last purchase the customer made
lineItemPrice
Array
The price of each line item the customer has added to the cart
lineItemQuantity
Array
The quantity of each line item the customer has added to the cart
membershipLevel
Integer
The account nature of the customer
1 – Guest
2 – Registered
3 – Preferred
totalPurchaseAmount
Double
T.
Predicting online user behaviour using deep learning algorithmsArmando Vieira
We propose a robust classifier to predict buying intentions based on user behaviour within a large e-commerce website. In this work we compare traditional machine learning techniques with the most advanced deep learning approaches. We show that both Deep Belief Networks and Stacked Denoising auto-Encoders achieved a substantial improvement by extracting features from high dimensional data during the pre-train phase. They prove also to be more convenient to deal with severe class imbalance.
Company segmentation - an approach with RCasper Crause
We classify companies based on how their stocks trade using their daily stock returns (percentage movement from one day to the next). This analysis will help your organization determine which companies are related to each other (competitors and have similar attributes).
How to understand Clickbank statistics. Simple way to understand clickbank product selection is by understanding clickbank statistics. Learn how to understand this clickbank statistics for better conversion and profit.
Google Partners Hangout: Lexical Analysis Part 2 Boost Media
This SlideShare includes tips on how marketers can uncover language learnings and insights based on current messaging strategies. By using lexical analysis, we learned how to gain competitive advantages and actionable insights.
Being involved in performance audits on systems of every size, from start-up sites hacked together overnight, to a ginormous applications built by world-recognized brand companies, I’ve seen a lot of interesting (and sometimes very unique) performance issues in every level of the stack: code, architecture, databases (sometimes all of the above). But there are a few particular, very “Performance 101″, issues that (unfortunately) appear in a lot of code bases. In this talk I'll present the most common database-related performance bottlenecks that can happen in most applications.
This document provides instructions for using SampleNet merchandising software. It allows users to enter sample details and invoices, generate sample orders, track sample statuses in reports, and analyze development costs by customer. The software is intended for sampling departments to efficiently manage the sample development process for garment and textile companies.
A Market Basket Analysis of a bakery shop data using Apriori Algorithms and Association Rule mining . Application and Benefits of Market Basket Analytics in Retail Management
Web analytics involves measuring, collecting, analyzing, and reporting internet data to understand and optimize website usage. It can be used as a tool for business and market research as well as improving website effectiveness. Web analytics applications can also help measure the results of traditional advertising campaigns by estimating how traffic to a website changes after a new campaign launch. Key metrics include the number of visitors and page views, which provide information about traffic and popularity trends useful for market research.
• Designed relational database, da modeling and profiling, create of table and view structure on MS SQL Server
• Developed complex SQL code such as table-level check constraint, Triggers, computed columns, Indexes and Views
• Utilized Tableau to create analytical dashboards pertaining to sales and retailer information making vital decisions and strategic planning of the system.
This document describes an online gift selling system that aims to address limitations of manual and existing online systems. The proposed system would allow global customers to purchase gifts online, help maximize store profits, and provide dynamic product information to customers. It would develop an easy-to-use, interactive website for online gift selling and ordering. The system would maintain customer records and support online payment processing and order tracking.
This document discusses various concepts related to merchandise management, including:
- Types of buying systems for staple and fashion merchandise
- Factors to consider when determining order quantities
- The relationship between inventory investment and product availability
- Components of inventory like cycle stock and buffer stock
- Methods for forecasting demand and calculating order points
- Merchandise budget planning and open-to-buy monitoring
- Allocating merchandise to stores and analyzing performance using ABC analysis and sell-through rates
- Evaluating vendors using a weighted average approach
- Using the retail inventory method to track inventory costs and values
A search term report provides information on an ad campaign's performance for specific search terms. It can be generated for all campaigns or individual campaigns. The report shows metrics like impressions, clicks, conversions and costs for each search term. Segmenting the report by time, devices, networks or conversions allows advertisers to analyze search term performance in different contexts. Identifying top performing and irrelevant search terms allows advertisers to optimize keywords and negative keywords to improve click-through rates and reduce costs.
The document describes a user profiling engine that predicts whether online shoppers will purchase items and what items will be bought. It analyzes an e-commerce clickstream dataset containing user sessions and purchases. A random forest classifier is used to predict buys based on features like the number of item clicks, the item buy-to-click ratio, popular items, and time of day. The best model score was 45,821 by using these informative features without overfitting. Proper feature selection is important for accurately determining buyer behavior.
This document discusses using a sales funnel to calculate the number of leads required to achieve a target revenue. It explains how to determine the current conversion rates between leads, prospects, and customers. With this information, businesses can calculate the approximate number of leads needed to generate a specific revenue goal. The sales funnel is a useful tool for evaluating a company's sales process and identifying opportunities to support growth.
1. Sam’s Club Sales Review
Data Cleaning Process
In order to begin the data cleaning process, I took the top 1000 results from each table
to better assess the data for errors.
select top 1000 *
from storeinformation
select top 1000 *
from memberindex
select top 1000*
from store_visits
When assessing the data, I first noticed incorrect negative values in the tender
amount, total unit cost and total visit amount tables. I found negative values in
the tender amount, total unit cost, and total visit amount table. I then ran
queries, took the absolute value of each column to make all values positive
--Find negative values in tender amount
select *
from store_visits
where tender_amt<=0
--Correct negative values in tender amount
update store_visits
set tender_amt=ABS(tender_amt)
where tender_amt<0
--Find values that have a negative total unit cost
select *
from store_visits
where tot_unit_cost<=0
--Correct negative values in total unit cost
update store_visits
set tot_unit_cost=ABS(tot_unit_cost)
where tot_unit_cost<0
--Find values with a negative total visit amount
select *
from store_visits
where total_visit_amt<=0
--Correct negative values for total visit amount
2. update store_visits
set total_visit_amt=ABS(total_visit_amt)
where total_visit_amt<0
--Find incorrect values in membership_nbr
select *
from memberindex
where (membership_nbr=999)
select district_nbr
from storeinformation
where (district_nbr=0)
update storeinformation
set district_nbr=999
where (district_nbr=0)
After fixing the incorrect values, I also noticed that many tables have missing values.
The qualify organization table, the delivery type table and the align sub division table
need to be so that the missing values show as null or 0.
--Missing data in the align sub division number table
select *
from storeinformation
where len(align_sub_division_nbr)=0
--update the store information table
update storeinformation
set (align_sub_division_nbr='X')
where len(align_sub_division_nbr=0)
Data Quality Assessment Documented
Entity Integrity
For the first part of the data quality assessment, I chose to check the entity integrity of each table. For
the member index table, we ran two queries in order to discover if there was any missing or null
records.
Queries
--Check if the member index table has entity integrity
select *
from member_index
where membership_nbr is null
select membership_nbr, count(*)
from member_index
3. group by membership_nbr
having count(*)>1
Running both queries produced no result meaning that the member index does have entity integrity
Next I chose to check the store visits table for entity integrity. Again I ran two queries in order to
discover if there was any missing or null records.
Queries
--check if the store visits table has entity integrity
select *
from store_visits
where visit_nbr is null
select visit_nbr, count(*)
from store_visits
group by visit_nbr
having count(*)>1
Running both queries produced no result meaning that the store visits table has entity
integrity
Next I chose to check the store information table for entity integrity. Again I ran two
queries in in order to discover if there was any missing or null records
Queries
--Check if the store information table has entity integrity
select *
from store_information
where store_nbr is null
select store_nbr, count(*)
from store_information
group by store_nbr
having count(*)>1
Running both queries produced no result meaning the member index does have entity
integrity
Referential Integrity
Next, I chose to check table’s relationships for referential integrity. I
ran queries in the tables to make sure the values in the foreign key field
match an existing value in the primary table
Store Information and Store Visits
--Check if the store information table has referential integrity with the store visits
table
select store_nbr
from store_information
where store_nbr not in (select store_nbr from store_visits);
Running the query produced results, meaning that the store information table does not
have referential integrity with the store visits table
4. select *
from store_information
insert store_nbr values (9999,'unknown', 'unknown', 'unknown', 'unknown', 'unknown',
'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown',
'unknown', 'unknown','unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown',
'unknown', 'unknown', )
To fix this issue of referential integrity, I created a dummy record in the primary key
and updated the unmatched foreign key values to dummy values.
Member Index and Store Visits
--Check if the member index table has referential integrity with the store visits table
select membership_nbr
from member_index
where membership_nbr not in (select membership_nbr from store_visits);
Running the query produced results, meaning that the member index table does not have
referential integrity with the store visits table
insert membership_nbr values (9999,'unknown', 'unknown', 'unknown', 'unknown', 'unknown',
'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown',
'unknown', 'unknown')
To fix this issue of referential integrity, I created a dummy record in the primary key
and updated the unmatched foreign key values to dummy values.
Data Analysis Process
To start analyzing the data, I first wanted to see the information available in the store
visits, store information and member index tables.
select *
from storevisits
select *
from storeinformation
select *
from memberindex
Overall Assessment of Total Sales
In order to get an assessment of overall total sales, I took a general approach by
showing: the total number of items and unique items, the total unit cost the total sales
of all members combined. Taking a more narrow approach, I then looked at total sales
each day of the week for each individual store.
--Overall Summary of Total Sales
select sum(total_visit_amt) as [total sales], sum(tot_unit_cost) as [total unit cost],
sum(tot_unique_itm_cnt) as [total number of unique items], sum(tot_scan_cnt) as [total
number of items purchased]
from store_visits
5. --Summary of Total Sales listed by day of the week and store number
select sum(total_visit_amt) as [total sales], store_nbr, transaction_date as [dayweek]
from storevisits
group by store_nbr, transaction_date
order by store_nbr, dayweek
In order to see which week days Sam’s Club sells the most product, I took the overall
total sales and used the transaction date to group the data by day of the week
--Summary of Total Sales each day of the week
select sum(total_visit_amt) as [total sales], transaction_date as [dayweek]
from storevisits
group by transaction_date
In order to look at the total sales of each member type, I took the overall total sales
and sorted the results by member type
--Summary of Total sales by membership types
select distinct member_type, sum(total_visit_amt) as [total sales]
from memberindex m join storevisits s on m.membership_nbr=s.membership_nbr
group by member_type
I also thought it would be interesting to calculate each stores profit through
calculating each stores total sales minus each stores total unit cost.
--List of each stores profit
select store_nbr,sum(total_visit_amt) as [total sales], sum(tot_unit_cost) as [total unit
cost],
sum(total_visit_amt)-sum(tot_unit_cost) as [store profit]
from store_visits
group by store_nbr
Assessment of Member Buying Behavior
To Asses member buying behavior, I first diversified the data by each individual member
based on total average items bought and total amount spent. I also thought it would be
important to include average number of unique items bought to compare with average of
total items bought.
--Typical Purchase patterns of the members per visit
select distinct membership_nbr, avg(tot_scan_cnt) as [avgitemsbought],
avg(total_visit_amt) as [avgamountspent], count(tot_unique_itm_cnt) as [number of unique
items]
from store_visits
group by membership_nbr
I then created a breakdown of member visits by day of the week by sorting the number of
visits each member took by transaction date
--Member visits breakdown by the day of the week
6. select distinct membership_nbr, transaction_date, count(visit_nbr) as [number of visits]
from store_visits
group by transaction_date, membership_nbr
order by transaction_date
To get a summary of member visits by hours during a day, I calculated the total
transaction time and grouped the result by individual transaction date including the
amount of visitors which visited Sam’s Club stores each day. I also thought it would be
interesting to order the data from greatest to least transaction time to discover any
member visit patterns
-Summary of member visits breakdown by hours during a day
select max(transaction_time)-min(transaction_time) as [total transaction time],
transaction_date, count(visit_nbr) as [number of visitors]
from store_visits
group by transaction_date
--Summary of member visits breakdown by greatest to least transaction time
select max(transaction_time)-min(transaction_time) as [total transaction time],
transaction_date, count(visit_nbr) as [number of visitors]
from store_visits
group by transaction_date
order by max(transaction_time) desc
When looking for the characteristics of the most active members, I decided to scale the
data for members who have concurred a total sales over 50000 and have visited Sam’s Club
over 5000 times. I then looked at the average number of items bought for these members.
--Purchasing pattern of the customer who have visited and spent the most at Sam's Club
select distinct membership_nbr, avg(tot_scan_cnt) as [avgitemsbought],
sum(total_visit_amt) as [totalamountspent], count(visit_nbr) as [number of visits]
from store_visits
group by membership_nbr
having sum(total_visit_amt)>50000 and count(visit_nbr)>5000
Summary of Total Sales and Buying Behavior
Overall Assessment of Store Sales
1.
--A.Summary of Total Sales
--Looking at the summary of total sales for Sam's Club, the results show a total sales
around 84 million dollars at a total cost of around 75 million. Sam's Club sold 84,200
items to reach these sales numbers with 61,000 of those items being unique. Overall,
Sam's Club saw a total profit of $9438845.35.
7. --When analyzing the list of total sales by day of the week and store number, I noticed
that all but 4 stores witnessed the highest number of sales on January 29th, 2000.
--Looking at total sales for each store we are able to see that store 18 had the highest
total sales at $6980721.18 and store 3 has the lowest total sales at $2961961.83.
--C.Summary of Total Sales Breakdowns
--The amount of total sales per day for all of Sam’s club reached as high as $4929361.73
occurring on January 29th, 2000.
--When differentiating total sales by member type, it is apparent that members with
member type V bought the most with a total sales of $31187297.64. Members with member
type Z buy the least having a total sales of only $12300.37. V,W,X,A,E,D,3,Y,1,Z would
be the order of total sales by member type from greatest to least.
--D.Useful Insights and Additional Analysis
--Store 18 has witnessed the most profit at an amount of $834940.19 while store 3 has
witnessed the least profit at $309,654.24. I also thought that it would be interesting
see what states had the highest total sales to see where Sam’s Club is most popular. The
state of Ohio had the most total sales by far, almost doubling the total sales for the
state of Florida which was second on the list.
--What is the most popular type of payment method?
I decided to look at number of refunds to assess what percent purchases Sam Club can
expects to be refunded. With 46818 out of the 1,007,961 Member visits included purchase
refunds, Sam’s Club should expect to see a product refund rate of about 5%.
Assessment of Member Buying Behavior
2.
--A.Summary of Typical Purchasing Patterns
--Customers who visited Sam's Club the most on average only bought one item per visit
with a wide variation per customer in the average amount spent.
--Overall, the 1,007,961 total Sam's Club members on average bought 8 items at an average
spending amount of $83.50. It seems as if many members do not frequently purchase
duplicates of the items as 6 out of the 8 items purchased are unique.
--B.Summary of Member Visits Breakdown
When looking at the member visits breakdown by day of the week, I selected membership
number, transaction date which I used to group the count of visits. I then ordered the
data by transaction date to see the day to day breakdown.
When looking at the member visits breakdown, it would be expected that the amount of
transaction time would correlate with the number of visitors but this is not the case.
This may conclude that other factors such as the experience of the employee at Sam’s Club
may have an effect on transaction time.
8. --C.Characteristics of Most Active Members
c.--When scaling the data to find the most active members at Sam's Club, I came up with
14 members who have visited over 5000 times and have spent over 50000 dollars. --Looking
at these members, we are able to see that they always pay cash and either have a V or W
member code
--D.Additional Analysis and Useful Insights
--Might also be important to retrieve information about each store to also look at the
effect that management or location might have on total sales. I also thought it would be
interesting the look the number of elite status members and their total sales. There are
1327 members at Sam’s Club who have accumulated a sales of 120483.11
-