SlideShare a Scribd company logo
1 of 12
MSA 8110 Final Project
PRIORITY 1: Assessand Clean DataStep1: Imported Data Tablesfrom UA_SAMSCLUB into ES611926
Tables Imported from UA_SAMSCLUB
dbo.Store_Visits sc.Store_Visits
dbo.Member_Index sc.member_index
dbo.store_information sc.store_information
Imported to ES611926
Step 2: Assessdata for entity integrity,referential integrityandoverall data integrity
Entity Integrity
Requirements
-All PKentriesare unique andcannothave the same valuesonkey
attribute(s) andnopart of the PK may be null
Queries to Use: Check for Nulls in PK Columns
select *
from SC.STORE_VISITS
where visit_nbr is null;
select *
from SC.MEMBER_INDEX
where membership_NBR is
null;
select *
from SC.STORE_INFORMATION
where STORE_NBR is null;
No Null Found for all three columns in this table’s queries
Queries to Use: Check for Unique Identifiers in PK Column
select visit_nbr,
count(*)
from sc.store_visits
group by visit_nbr
having count(*)>1;
select membership_nbr,
count (*)
from sc.member_index
group by membership_nbr
having count (*)>1;
select store_nbr, count (*)
from sc.Store_Information
group by store_nbr
having count(*)>1;
select visit_nbr,
store_nbr,
membership_nbr, count (*)
from SC.STORE_Visits
group by visit_nbr,
store_nbr, membership_nbr
having count (*)>1;
No Duplicate Identifiers found in PK Column for all three columns in this table’s queries
Referential Integrity
Requirements
A dependenttable’sFKmusthave eitheranull entryor a matchingentryinthe
relatedtable.Eventhoughanattribute maynothave a correspondingattributeitis
impossible tohave aninvalidentry.
STORE_VISITS & MEMBER_INDEX STORE_VISITS & STORE_INFORMATION
select membership_nbr
from sc.store_visits where membership_nbr not in
(select membership_nbr from sc.member_index);
select store_nbr from sc.store_visits
where store_nbr not in (select store_nbr from
sc.store_information);
Result: 193,898 rows returned with duplicate ID
numbers matching different Visit_nbr but
Membership_nbr not located in Member_Index table
Result: Referential Integrity exists as no rows returned
Data Integrity In a relational database,aconditioninwhichthe datainthe DB complywithall entityand
referentialintegrityconstraints.
Query used: checking for blank values
select *
from sc.store_visits
where len(member_code)=0;
Result: 4 rows with unique membership_nbr’s but no member_code. All other columns with no null indicators
tested successfully for values
Proposed Solution: Fill Member_code wth dummy values where code has been left blank
Query Used: Checking for (-} values where inappropriate
A) select visit_nbr, total_visit_amt, refund_code from sc.store_visits where total_visit_amt<=0;
B) select visit_nbr, total_visit_amt, refund_code from sc.store_visits where refund_code=0 and
total_visit_amt<=0;
C) select visit_nbr, membership_nbr, total_visit_amt, refund_code, member_code from sc.store_visits
where refund_code=1 and total_visit_amt>=0;
A) Results: Refund Code =1 when total_visit_amt <=0 implies a credit back to the customer has been made
therefore a (-) value should be inserted.
B) Result: No rows populated
C) Result: 12,302 rows populated with (+) values when refund code >0;
Suggested update: to 0 where rows with total_visit_amt>0 and refund_code=1. Update to 0 instead of
changing to (-) value as not enough information is available to know exactly how the credit if
applicable was applied.
Data Not Used: sales_tax, tender_amt, tot_unit_cost
Reason for Un-Used Data: not enough data on functional relations between columns to correct data, therefore
data is being excluded from analysis.
Step 3: Clean/Scrub the Data
Query Used: To Insert Dummy Value for missing Member_Code
update sc.store_visits
set member_code = 99999
where len(member_code)=0;
Query Used: To Correct incorrect (-) Values
update sc.store_visits
set total_visit_amt=0
where refund_code=1 and total_visit_amt>0;
Note: Please see Step 2 C for explanation of why this query was used.
Query Used: To Delete Membership_nbr from store_visits not found in member_index
delete from sc.store_visits
where membership_nbr not in (select membership_nbr from sc.member_index);
Priority 2: Data Analysis: Step4: Data Analysis usingSQL
TASK(s): Analyze and Report:
1) Overall AssessmentofStore Sales:
a) Summaryof Total Sales
b) Summaryof Total Salesbyeach store
c) Summaryof Total Salesbreakdowns:
i) Day of Week
ii) MembershipType
d) Performanyadditional analysisandprovide useful insights
2) Assessmentofmemberbuying behavior:
a) Summaryof typical purchase patterns(amount&# of items) bymemberpervisit
b) Summaryof membervisitbreakdowns:
i) Day of Week
ii) Hours duringDay
c) ID the characteristicsof the most active (insalesandfrequentvisits) members
3) Performany additional analysis and provide useful
Task 1
a) Total Sales Query:
select sum(total_visit_amt) as Sales
from sc.store_visits
where total_visit_amt>=0 or refund_code!=1;
b) Total Sales by Store:
select distinct(store_nbr), sum(total_visit_amt) as TOTAL_SALES
from sc.store_visits
where total_visit_amt>=0 and refund_code!=1
group by store_nbr
order by total_sales desc;
c) Total Sales Break Down
i) Total Sales Break Down by Day of the Week:
select datepart(weekday, transaction_date) as weekday, sum(total_visit_amt) as TotalSales
from sc.STORE_VISITS
where total_visit_amt>=0
group by datepart(weekday, transaction_date)
ii) Total Sales Break Down by Member_Type:
Note: First a combinedtable ofstore_visits &member_index.QueryUsed:
select a.MEMBER_TYPE, b.total_visit_amt
into #Sales
from sc.member_index a right join sc.store_visits b on a.membership_nbr=b.membership_nbr
group by a.member_type, b.total_visit_amt;
select distinct member_type, sum(total_visit_amt) as TotSales
from #Sales
where total_visit_amt>=0
group by member_type;
d) Additional Analysis Queries:
Top Performing Sale Days Grouped by Member_Type:
select datepart(weekday, b.transaction_date) as weekday, a.MEMBER_TYPE, b.total_visit_amt
into #Sales1
from sc.member_index a right join sc.store_visits b on a.membership_nbr=b.membership_nbr
group by b.transaction_date, a.member_type, b.total_visit_amt order by weekday;
select distinct weekday, member_type, sum(total_visit_amt) as TotSales
from #Sales1
where total_visit_amt>=0
group by weekday, member_type
order by weekday, totsales desc;
Member Type and Sales broken down by store:
select datepart(weekday, b.transaction_Date) as weekday, b.store_nbr, a.MEMBER_TYPE,
count(a.membership_nbr) as #Members, b.total_visit_amt
into #Sales2
from sc.member_index a right join sc.store_visits b on a.membership_nbr=b.membership_nbr
group by a.member_type, b.transaction_Date, b.store_nbr, b.total_visit_amt, a.membership_nbr;
select distinct store_nbr, member_type, sum(#members) as #type, sum(total_visit_amt) as Spent
from #sales2
where total_visit_amt>=0
group by store_nbr, member_type
order by store_nbr;
Task 2
a) Typical Purchase Patterns (amount& # of items) bymembersper visit
Create a Temp Table:
select datepart(weekday, transaction_date) as weekday,(transaction_time/1000000) as [HOUR],
visit_nbr, total_visit_amt, tot_scan_cnt
into #Sales3
from sc.store_visits
group by datepart(weekday, transaction_date), (transaction_time/1000000), visit_nbr,
total_visit_amt, tot_scan_cnt
Checking for Table:
select * from #sales3
Count of Total Visit_nbr, average sales per visit, average # of items:
select count(visit_nbr) as TotalVisits, avg(total_visit_amt) as avgsale_visit,
avg(tot_scan_cnt) as avgitemcnt
from #sales3
where total_visit_amt>=0;
Note: Above Query written to exclude returns/negative sale values
Count # of Distinct Membership_nbr and Average visit per member:
select datepart(weekday, transaction_date) as weekday,(transaction_time/1000000) as [HOUR],
membership_nbr, count(visit_nbr) as #Visit, total_visit_amt, tot_scan_cnt
into #Sales4
from sc.store_visits
group by datepart(weekday, transaction_date), (transaction_time/1000000), membership_nbr,
total_visit_amt, tot_scan_cnt
select * from #sales4
Displays # Visits Per Member
select distinct membership_nbr, sum(#Visit) as Visits
into #temp0
from #sales4
where total_visit_amt>=0
group by membership_nbr;
Displays Average Visit Per Member
select count(membership_nbr) as #members, avg(visits) as AvgVisit from #temp0
b) Provide Summary of MemberVisitsBreakdowns:
Create WeekDay Table:
select distinct datepart(weekday,transaction_date) as weekday
into #day
from sc.store_visits;
Checking for Table:
select * from #day;
Create a #membership tableto store allmembers
select distinct membership_nbr
into #members
from sc.member_index;
Checking for Table:
select * from #members
Create a combined #day_membertable
select weekday, membership_nbr
into #day_members
from #day cross join #members;
Checking for Table:
select * from #Day_members
Create a #Datasettableby left join#day_members with the #Sales2 table:
select a.weekday, b.membership_nbr, b.Hour, sum(b.#visit) as TotVisit, sum(b.total_visit_amt)
as TotSale, sum(b.tot_scan_cnt) as TotItem
into #Dataset
from #Day_members a left join #Sales5 b
on a.weekday=b.weekday and a.membership_nbr=b.membership_nbr
group by a.weekday, b.Hour, b.membership_nbr
order by a.weekday;
Check for Table:
select * from #Dataset;
Update Query:
select * from #dataset
update #dataset
set totSale=0
where totSale<0
update #dataset
set totItem=0
where totSale=0
i) By Day of Week:
select Weekday, count(membership_nbr) as #Members,Sum(totVisit) as Total_Visits,
sum(totSale) Total_Sales, sum(TotItem) as Total_Items
from #Dataset
group by weekday
ii) By Hours During a Day:
select distinct Hour, count(membership_nbr) as #Members,sum(totvisit) as Total_visit,
sum(totsale) as Total_sales, sum(totitem) as total_items
from #Dataset2
group by Hour
order by Hour desc
c) ID the characteristics of the most active (in salesand in frequentvisits) members
QUERY to ID TOP 15 based on Total_sales and #VISITS:
select top 15 membership_nbr, count(hour) as #Visits, sum(sales) as TOTAL_SALES, sum(itemcnt)
as TOTAL_ITEMCNT
from #dataset
group by membership_nbr
order by TOTAL_SALES desc, #Visits desc;
d) Avg Sale and Item by Hour and Day
select distinct Hour, avg(totsale) as AvgSale, avg(totitem) as AvgItems
from #Dataset2
group by Hour
order by Hour desc
Task 1 Results
a) Total Sales =$64,623,769.65
b) Store_Nbr & Total Sales
18 5791990.04
15 4756625.64
24 4530115.62
27 4382872.87
19 4281760.27
21 3867094.50
28 3730117.50
20 3558149.57
26 3480692.58
29 3425846.25
5 3381878.74
16 3267488.93
22 3184074.35
25 2875560.22
4 2598886.30
17 2573647.71
3 2486413.71
23 2450554.85
c) Day of Week& Total Sales
1 Mon 9,854,511.58
2 Tues 9,068,451.60
3 Wed 6,688,436.60
4 Thur 6,696,184.47
5 Fri 7,348,921.78
6 Sat 9,259,135.96
7 Sun 13,685,584.17
MemberType & Total Sales
E 219,401.10
Y 14,519.50
G 13,487.12
3 12,517.61
H 1,953.13
A 502,865.01
W 6,985,452.22
X 6,067,036.65
1 11,754.68
D 56,829.09
V 6,044,250.25
Z 9010.12
d) Top Sale Days organized by MemberType
Day M_Type Sales
7Sunday V $5,102,319.46
7Sunday W $3,729,507.83
2 X $1,511,115.64
5 A $154,879.90
6 E $117,923.12
5 D $13,376.19
7 G $4,435.72
6 Y $3,794.28
7 1 $3,452.14
4 3 $2,980.68
3 Z $2,386.06
2 H $1,150.99
Most frequentVisitorgroupedby store order by #MemberType
Store Type #Type sale
18 V 46508 $ 2,923,557.90
15 W 26895 $ 1,882,085.31
20 X 9078 $ 478,001.79
23 A 4976 $ 51,262.87
22 3 139 $ 9,231.70
26 E 112 $ 22,792.14
26 D 66 $ 9,769.09
24 Y 62 $ 5,914.94
21 G 42 $ 5,105.56
24 Z 38 $ 4,466.81
26 1 38 $ 6,869.71
22 H 6 $ 1,762.71
Task 2 Results
a) Summary of Purchase Patterns
Visits Sales Items
789,943 $64,623,769.65 $6,609,868
AveragesSale andItemsperVisit:
Total_Visit Sales Items
789,943 $81.808142 8
#Members AvgVisit/Members
282316 2
b) MembervisitBreakdowns
i) By Day ofthe Week
Totals:
Day #members TotVisit TotSales TotItems
1 Monday 92,924 128,331 $ 9,854,511.58 1,029,430
2 Tuesday 88,252 121,760 $ 9,068,451.60 919,459
3Wednesday 66,089 89,672 $ 6,688,436.60 686,271
4Thursday 63,927 87,209 $ 6,696,184.47 679,817
5Friday 69,778 95,767 $ 7,348,921.78 762,536
6Saturday 84,691 116,792 $ 9,259,135.96 976,163
7Sunday 125,017 174,532 $ 13,685,584.17 1,494,974
ii) By Hour of the Day (totals)
Hour #members TotVisit TotSales TotItems
21 92 117 $ 53,223.52 7,995
20 16,397 20,376 $ 2,073,567.65 194,454
19 35,547 47,182 $ 3,890,335.06 399,504
18 40,493 55,487 $ 4,202,097.93 442,749
17 50,964 69,393 $ 5,234,699.21 547,431
16 59,041 81,558 $ 6,162,529.00 639,720
15 62,606 87,344 $ 6,557,986.47 678,389
14 66,350 92,813 $ 7,081,434.92 742,840
13 67,057 94,649 $ 7,082,559.66 757,477
12 64,054 91,749 $ 6,699,729.65 714,873
11 58,605 81,479 $ 6,115,617.03 657,350
10 45,158 59,939 $ 4,575,482.35 472,832
9 17,473 22,240 $ 1,836,507.65 181,723
8 4,970 6,712 $ 704,855.45 73,829
7 1,585 2,429 $ 292,911.61 32,536
6 171 382 $ 23,438.63 3,097
5 77 162 $ 7,525.51 924
4 36 49 $ 6,669.96 924
2 2 3 $ 54.90 3
c) Characteristics ofthe most active membersin termsof frequencyand sales
The QueriesruninTask1 and Task2 reveal Member_TypesW,XandV are the
mostactive membersintermsof salesandfrequencyof visits.Theyare most
dominantintermsof salesand visitsinstore numbers18,15 & 20. However,
while the datashowsthe majorityof membersvisitandspendthe mostonDay 1
and Day 7, whichholdstrue for member_typesW&V,member_type Xmakesthe
greatestnumberof visitsandspendsthe mostonday 2. As no member_type
dominatedday1, day1 revealeditself tobe the mostdiverse inmember_type
withthe secondhighest#in salesandfrequencyof visits. Lastly,visitfrequency
and saleswere attheirmax during13:00 & 14:00.
d) Average Sale by Hour, Average Sale by Day
Hour AvgSales AvgItems
21 $ 578.52 86
20 $ 126.46 11
19 $ 109.44 11
18 $ 103.77 10
17 $ 102.71 10
16 $ 104.38 10
15 $ 104.75 10
14 $ 106.73 11
13 $ 105.62 11
12 $ 104.60 11
11 $ 104.35 11
10 $ 101.32 10
9 $ 105.11 10
8 $ 141.82 14
7 $ 184.80 20
6 $ 137.07 18
5 $ 97.73 12
4 $ 185.28 25
2 $ 27.45 1
Day AvgSales AvgItems
1 Monday $ 106.05 1029430
2 Tuesday $ 102.76 919459
3Wednesday $ 101.20 686271
4 Thursday $ 104.75 679817
5 Friday $ 105.32 762536
6 Saturday $ 109.33 976163
7 Sunday $ 109.47 1494974
Priority 3:Task 3: Summary of Highlights and Takeaways based on Task 1 & Task 2 Data Analysis
Note to Reader:
 Salesdata beinganalyzedfortoday’smeetingisforthe monthandyear of January,2000 and shouldbe used
incombinationwithdataforthe yearsfollowingtoforecastsale volumesforthe monthof January.
 Functionsusingtotal_visit_amouthave beencalculatedtoexclude negativevalues,andItem_cnthasbeen
updatedto0 for saiditemsreturned.
 Analysismayverydue tolack of clarityand access toadditional detailsregardingthe dataprovided.
Disclaimer:The data displayhasbeenderivedfrom fictitiousdatabase andisinnoway a true representationof Sam’s
Club.
Major Highlights:
Day
Total SalesPer
Day: high, low
Member_type best
sale day
Top Performing store ordered by #M_Type
Store# M_Type #M_Type Sales
18 V 46,508 $2,923,557.90
15 W 26,895 $1,882,085.31
Totals/Day
Day #members TotVisit TotSales TotItems
1 92,924 128,331 $ 9,854,511.58 1,029,430
2 88,252 121,760 $ 9,068,451.60 919,459
3 66,089 89,672 $ 6,688,436.60 686,271
4 63,927 87,209 $ 6,696,184.47 679,817
5 69,778 95,767 $ 7,348,921.78 762,536
6 84,691 116,792 $ 9,259,135.96 976,163
7 125,017 174,532 $ 13,685,584.17 1,494,974
High Traffic Hour and Largest Sale Producer
Hour #Members #Visits Total Sales #Items
14 66,350 92,813 $ 7,081,434.92 742,840
13 67,057 94,649 $ 7,082,559.66 757,477
Note: Despite Hour #13& #14producing thegreatest sales and #visits. Averages in
Task2d show thatHour #21 produces thehighestaveragesale value 578.52anditem
count 86.
1 Monday $9,854,511.58
Day 1: Most diversein
terms of
member_type
2 Tuesday
$9,068,451.60
Type_X:
$1,511,115.64
Type_H: $1,150.99
3 Wednesday
$6,688,436.60
Type_Z: #2,386.06
4 Thursday
$6,696,184.47
Type_3: $2,980.68
5 Friday
$7,348,921.78 Type_D: $13,376.19
Type_A: $154,879.90
6 Saturday
$9,259,135.96
Type_Y: #3,794.28
7 7 Sunday $13,685,584.17
Type_W:
$3,729,507.83
Type_V:
$5,102,319.26
Type_1: $3,452.14
Type_G: $4,425.72
Major Findings:
AnalysisrevealedSam’sClub’shighesttrafficandhighestproducerstorestobe #18 and #15 withMonday and
Sunday,andhours 13:00 and14:00 producingthe bestresultsinthe same terms.Lastly,examinationalso
demonstratedfromthe averagestable inTask2dthathour #21 producedthe highestaverage sale anditemcount.Data
collectedfromtasks1 and 2 confirmsthe consistencybetweenqueryresultsfortopandbottomperformers.
Lookingsolelyatthe topperformers the hightrafficdays,hoursandstorescan be deduced. Itcan be expectedthat
Monday andSunday will see the greatestnumberof visitorswiththe highestgrossingsales,withpeakhoursacrossthe
board being13:00 and 14:00. Lastly,store #18 and #15 can expecttosee the bulkof visitsandsaleswiththe remainder
spreadout amongthe remainingstores.
The top performerdataalsoallowsfora basicstereotype the visitingmemberstobe derived. Furtheranalysis
revealsthe bulkof visitortobelongtomember_type V,Wor X.Member_type V andW spendthe most onSunday,
while member_type Xspendsthe moston Tuesdays.Interestingly,thesemember_typeseachfrequentdifferentstores:
V makesthe greatestnumberof visitstostore #18, W to store #15, and X to #20. Giventhese attributesitcanbe a
general assumptionaboutthe spendingpatternsof thesememberscanbe forecasted.
Thisdata alone,while sufficienttoroughlyforecastvisitingandspendingpatternsforthe yearof 2001, isnot sufficient
to bestanswerthe questionastowhy andhow these attributes are beinginfluence.Nordoesthe dataprovidedgive
enoughinsightintothe member_typesorproductsbeingofferedtoallow formore efficientmarketingandinventory
management.
Takeaways:
To assessthe cause of frequency,salesperformance,item type purchasedandmembership_type selected
additional datawhichhasnot beenrelayedthroughthe givenSam’sClubdataisdesired.
Additional Datadesiredtodevelopamore concise analysisof the topandbottomperformerswouldbe:
 Store location (zipcode,state)
 Unique itemidentifieranddescription(Toanalyze type andcostof productsbeingsold)
 Descriptionof member_type (e.g.wholesaler,retail ect..)
 Socioeconomicclassificationof memberbasedonzipcode or surveys
If the above datapointscouldbe obtainedalongwithfollowingyeardatafor the monthof January a betterunderstand
of the whyand howbehindeachdatafieldcoulddeveloped.The resultwouldnotonlybe the abilitytoprovide more
concise forecastingresultsforfrequency,salesandinventorymanagement,butalsothe abilitytounderstandand
marketmore efficientlypermember_typeandlevelineacharea.

More Related Content

Similar to MSA 8110 Final Project Key Data Analysis

Exto Reward Points for Magento 2 (user guide)
Exto Reward Points for Magento 2 (user guide)Exto Reward Points for Magento 2 (user guide)
Exto Reward Points for Magento 2 (user guide)Exto.io
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus workKevinPSF
 
Online Gift swelling Gallery
Online Gift swelling GalleryOnline Gift swelling Gallery
Online Gift swelling GalleryAmit Pawara
 
Market Basket Analysis in SAS
Market Basket Analysis in SASMarket Basket Analysis in SAS
Market Basket Analysis in SASAndrew Kramer
 
Company segmentation - an approach with R
Company segmentation - an approach with RCompany segmentation - an approach with R
Company segmentation - an approach with RCasper Crause
 
Sap fscm configuratiom material: STRIVE ERP TECHNOLOGIES
Sap fscm configuratiom material: STRIVE ERP TECHNOLOGIESSap fscm configuratiom material: STRIVE ERP TECHNOLOGIES
Sap fscm configuratiom material: STRIVE ERP TECHNOLOGIESstrive erp technologies
 
SQL for Business Problems.pptx
SQL for Business Problems.pptxSQL for Business Problems.pptx
SQL for Business Problems.pptxMustafa Ahmed
 
Final Project Python - Elyada Wigati Pramaresti.pptx
Final Project Python - Elyada Wigati Pramaresti.pptxFinal Project Python - Elyada Wigati Pramaresti.pptx
Final Project Python - Elyada Wigati Pramaresti.pptxElyada Wigati Pramaresti
 
Counting -cycle_count_using_mobile_application
Counting  -cycle_count_using_mobile_applicationCounting  -cycle_count_using_mobile_application
Counting -cycle_count_using_mobile_applicationmanitenkasi
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docxcelenarouzie
 
Write a banking program that simulates the operation of your local ba.docx
 Write a banking program that simulates the operation of your local ba.docx Write a banking program that simulates the operation of your local ba.docx
Write a banking program that simulates the operation of your local ba.docxajoy21
 
Exto Store Credit for Magento 2 (user guide)
Exto Store Credit for Magento 2 (user guide)Exto Store Credit for Magento 2 (user guide)
Exto Store Credit for Magento 2 (user guide)Exto.io
 
Tn shaw 107 data warehousing problem set
Tn shaw 107 data warehousing problem setTn shaw 107 data warehousing problem set
Tn shaw 107 data warehousing problem setTejNarayanShaw2
 
Final Project SQL - Elyada Wigati Pramaresti.pptx
Final Project SQL - Elyada Wigati Pramaresti.pptxFinal Project SQL - Elyada Wigati Pramaresti.pptx
Final Project SQL - Elyada Wigati Pramaresti.pptxElyada Wigati Pramaresti
 
CONTROLLING PPT.pptx
CONTROLLING PPT.pptxCONTROLLING PPT.pptx
CONTROLLING PPT.pptxsaphana9800
 

Similar to MSA 8110 Final Project Key Data Analysis (20)

1 z1 051
1 z1 0511 z1 051
1 z1 051
 
Exto Reward Points for Magento 2 (user guide)
Exto Reward Points for Magento 2 (user guide)Exto Reward Points for Magento 2 (user guide)
Exto Reward Points for Magento 2 (user guide)
 
Belajar SQL
Belajar SQLBelajar SQL
Belajar SQL
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus work
 
POL API for E-Commerce Loyalty
POL API for E-Commerce LoyaltyPOL API for E-Commerce Loyalty
POL API for E-Commerce Loyalty
 
Online Gift swelling Gallery
Online Gift swelling GalleryOnline Gift swelling Gallery
Online Gift swelling Gallery
 
Market Basket Analysis in SAS
Market Basket Analysis in SASMarket Basket Analysis in SAS
Market Basket Analysis in SAS
 
Company segmentation - an approach with R
Company segmentation - an approach with RCompany segmentation - an approach with R
Company segmentation - an approach with R
 
Sap fscm configuratiom material: STRIVE ERP TECHNOLOGIES
Sap fscm configuratiom material: STRIVE ERP TECHNOLOGIESSap fscm configuratiom material: STRIVE ERP TECHNOLOGIES
Sap fscm configuratiom material: STRIVE ERP TECHNOLOGIES
 
SQL for Business Problems.pptx
SQL for Business Problems.pptxSQL for Business Problems.pptx
SQL for Business Problems.pptx
 
Final Project Python - Elyada Wigati Pramaresti.pptx
Final Project Python - Elyada Wigati Pramaresti.pptxFinal Project Python - Elyada Wigati Pramaresti.pptx
Final Project Python - Elyada Wigati Pramaresti.pptx
 
Counting -cycle_count_using_mobile_application
Counting  -cycle_count_using_mobile_applicationCounting  -cycle_count_using_mobile_application
Counting -cycle_count_using_mobile_application
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
 
Write a banking program that simulates the operation of your local ba.docx
 Write a banking program that simulates the operation of your local ba.docx Write a banking program that simulates the operation of your local ba.docx
Write a banking program that simulates the operation of your local ba.docx
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Exto Store Credit for Magento 2 (user guide)
Exto Store Credit for Magento 2 (user guide)Exto Store Credit for Magento 2 (user guide)
Exto Store Credit for Magento 2 (user guide)
 
Tn shaw 107 data warehousing problem set
Tn shaw 107 data warehousing problem setTn shaw 107 data warehousing problem set
Tn shaw 107 data warehousing problem set
 
Dwh training 1
Dwh training 1Dwh training 1
Dwh training 1
 
Final Project SQL - Elyada Wigati Pramaresti.pptx
Final Project SQL - Elyada Wigati Pramaresti.pptxFinal Project SQL - Elyada Wigati Pramaresti.pptx
Final Project SQL - Elyada Wigati Pramaresti.pptx
 
CONTROLLING PPT.pptx
CONTROLLING PPT.pptxCONTROLLING PPT.pptx
CONTROLLING PPT.pptx
 

MSA 8110 Final Project Key Data Analysis

  • 1. MSA 8110 Final Project PRIORITY 1: Assessand Clean DataStep1: Imported Data Tablesfrom UA_SAMSCLUB into ES611926 Tables Imported from UA_SAMSCLUB dbo.Store_Visits sc.Store_Visits dbo.Member_Index sc.member_index dbo.store_information sc.store_information Imported to ES611926 Step 2: Assessdata for entity integrity,referential integrityandoverall data integrity Entity Integrity Requirements -All PKentriesare unique andcannothave the same valuesonkey attribute(s) andnopart of the PK may be null Queries to Use: Check for Nulls in PK Columns select * from SC.STORE_VISITS where visit_nbr is null; select * from SC.MEMBER_INDEX where membership_NBR is null; select * from SC.STORE_INFORMATION where STORE_NBR is null; No Null Found for all three columns in this table’s queries Queries to Use: Check for Unique Identifiers in PK Column select visit_nbr, count(*) from sc.store_visits group by visit_nbr having count(*)>1; select membership_nbr, count (*) from sc.member_index group by membership_nbr having count (*)>1; select store_nbr, count (*) from sc.Store_Information group by store_nbr having count(*)>1; select visit_nbr, store_nbr, membership_nbr, count (*) from SC.STORE_Visits group by visit_nbr, store_nbr, membership_nbr having count (*)>1; No Duplicate Identifiers found in PK Column for all three columns in this table’s queries Referential Integrity Requirements A dependenttable’sFKmusthave eitheranull entryor a matchingentryinthe relatedtable.Eventhoughanattribute maynothave a correspondingattributeitis impossible tohave aninvalidentry. STORE_VISITS & MEMBER_INDEX STORE_VISITS & STORE_INFORMATION select membership_nbr from sc.store_visits where membership_nbr not in (select membership_nbr from sc.member_index); select store_nbr from sc.store_visits where store_nbr not in (select store_nbr from sc.store_information); Result: 193,898 rows returned with duplicate ID numbers matching different Visit_nbr but Membership_nbr not located in Member_Index table Result: Referential Integrity exists as no rows returned
  • 2. Data Integrity In a relational database,aconditioninwhichthe datainthe DB complywithall entityand referentialintegrityconstraints. Query used: checking for blank values select * from sc.store_visits where len(member_code)=0; Result: 4 rows with unique membership_nbr’s but no member_code. All other columns with no null indicators tested successfully for values Proposed Solution: Fill Member_code wth dummy values where code has been left blank Query Used: Checking for (-} values where inappropriate A) select visit_nbr, total_visit_amt, refund_code from sc.store_visits where total_visit_amt<=0; B) select visit_nbr, total_visit_amt, refund_code from sc.store_visits where refund_code=0 and total_visit_amt<=0; C) select visit_nbr, membership_nbr, total_visit_amt, refund_code, member_code from sc.store_visits where refund_code=1 and total_visit_amt>=0; A) Results: Refund Code =1 when total_visit_amt <=0 implies a credit back to the customer has been made therefore a (-) value should be inserted. B) Result: No rows populated C) Result: 12,302 rows populated with (+) values when refund code >0; Suggested update: to 0 where rows with total_visit_amt>0 and refund_code=1. Update to 0 instead of changing to (-) value as not enough information is available to know exactly how the credit if applicable was applied. Data Not Used: sales_tax, tender_amt, tot_unit_cost Reason for Un-Used Data: not enough data on functional relations between columns to correct data, therefore data is being excluded from analysis. Step 3: Clean/Scrub the Data Query Used: To Insert Dummy Value for missing Member_Code update sc.store_visits set member_code = 99999 where len(member_code)=0; Query Used: To Correct incorrect (-) Values update sc.store_visits set total_visit_amt=0 where refund_code=1 and total_visit_amt>0; Note: Please see Step 2 C for explanation of why this query was used. Query Used: To Delete Membership_nbr from store_visits not found in member_index delete from sc.store_visits where membership_nbr not in (select membership_nbr from sc.member_index);
  • 3. Priority 2: Data Analysis: Step4: Data Analysis usingSQL TASK(s): Analyze and Report: 1) Overall AssessmentofStore Sales: a) Summaryof Total Sales b) Summaryof Total Salesbyeach store c) Summaryof Total Salesbreakdowns: i) Day of Week ii) MembershipType d) Performanyadditional analysisandprovide useful insights 2) Assessmentofmemberbuying behavior: a) Summaryof typical purchase patterns(amount&# of items) bymemberpervisit b) Summaryof membervisitbreakdowns: i) Day of Week ii) Hours duringDay c) ID the characteristicsof the most active (insalesandfrequentvisits) members 3) Performany additional analysis and provide useful Task 1 a) Total Sales Query: select sum(total_visit_amt) as Sales from sc.store_visits where total_visit_amt>=0 or refund_code!=1; b) Total Sales by Store: select distinct(store_nbr), sum(total_visit_amt) as TOTAL_SALES from sc.store_visits where total_visit_amt>=0 and refund_code!=1 group by store_nbr order by total_sales desc; c) Total Sales Break Down i) Total Sales Break Down by Day of the Week: select datepart(weekday, transaction_date) as weekday, sum(total_visit_amt) as TotalSales from sc.STORE_VISITS where total_visit_amt>=0 group by datepart(weekday, transaction_date) ii) Total Sales Break Down by Member_Type: Note: First a combinedtable ofstore_visits &member_index.QueryUsed: select a.MEMBER_TYPE, b.total_visit_amt into #Sales from sc.member_index a right join sc.store_visits b on a.membership_nbr=b.membership_nbr group by a.member_type, b.total_visit_amt; select distinct member_type, sum(total_visit_amt) as TotSales from #Sales where total_visit_amt>=0 group by member_type; d) Additional Analysis Queries: Top Performing Sale Days Grouped by Member_Type: select datepart(weekday, b.transaction_date) as weekday, a.MEMBER_TYPE, b.total_visit_amt into #Sales1 from sc.member_index a right join sc.store_visits b on a.membership_nbr=b.membership_nbr group by b.transaction_date, a.member_type, b.total_visit_amt order by weekday; select distinct weekday, member_type, sum(total_visit_amt) as TotSales from #Sales1 where total_visit_amt>=0 group by weekday, member_type order by weekday, totsales desc;
  • 4. Member Type and Sales broken down by store: select datepart(weekday, b.transaction_Date) as weekday, b.store_nbr, a.MEMBER_TYPE, count(a.membership_nbr) as #Members, b.total_visit_amt into #Sales2 from sc.member_index a right join sc.store_visits b on a.membership_nbr=b.membership_nbr group by a.member_type, b.transaction_Date, b.store_nbr, b.total_visit_amt, a.membership_nbr; select distinct store_nbr, member_type, sum(#members) as #type, sum(total_visit_amt) as Spent from #sales2 where total_visit_amt>=0 group by store_nbr, member_type order by store_nbr; Task 2 a) Typical Purchase Patterns (amount& # of items) bymembersper visit Create a Temp Table: select datepart(weekday, transaction_date) as weekday,(transaction_time/1000000) as [HOUR], visit_nbr, total_visit_amt, tot_scan_cnt into #Sales3 from sc.store_visits group by datepart(weekday, transaction_date), (transaction_time/1000000), visit_nbr, total_visit_amt, tot_scan_cnt Checking for Table: select * from #sales3 Count of Total Visit_nbr, average sales per visit, average # of items: select count(visit_nbr) as TotalVisits, avg(total_visit_amt) as avgsale_visit, avg(tot_scan_cnt) as avgitemcnt from #sales3 where total_visit_amt>=0; Note: Above Query written to exclude returns/negative sale values Count # of Distinct Membership_nbr and Average visit per member: select datepart(weekday, transaction_date) as weekday,(transaction_time/1000000) as [HOUR], membership_nbr, count(visit_nbr) as #Visit, total_visit_amt, tot_scan_cnt into #Sales4 from sc.store_visits group by datepart(weekday, transaction_date), (transaction_time/1000000), membership_nbr, total_visit_amt, tot_scan_cnt select * from #sales4 Displays # Visits Per Member select distinct membership_nbr, sum(#Visit) as Visits into #temp0 from #sales4 where total_visit_amt>=0 group by membership_nbr; Displays Average Visit Per Member select count(membership_nbr) as #members, avg(visits) as AvgVisit from #temp0
  • 5. b) Provide Summary of MemberVisitsBreakdowns: Create WeekDay Table: select distinct datepart(weekday,transaction_date) as weekday into #day from sc.store_visits; Checking for Table: select * from #day; Create a #membership tableto store allmembers select distinct membership_nbr into #members from sc.member_index; Checking for Table: select * from #members Create a combined #day_membertable select weekday, membership_nbr into #day_members from #day cross join #members; Checking for Table: select * from #Day_members Create a #Datasettableby left join#day_members with the #Sales2 table: select a.weekday, b.membership_nbr, b.Hour, sum(b.#visit) as TotVisit, sum(b.total_visit_amt) as TotSale, sum(b.tot_scan_cnt) as TotItem into #Dataset from #Day_members a left join #Sales5 b on a.weekday=b.weekday and a.membership_nbr=b.membership_nbr group by a.weekday, b.Hour, b.membership_nbr order by a.weekday; Check for Table: select * from #Dataset; Update Query: select * from #dataset update #dataset set totSale=0 where totSale<0 update #dataset set totItem=0 where totSale=0 i) By Day of Week: select Weekday, count(membership_nbr) as #Members,Sum(totVisit) as Total_Visits, sum(totSale) Total_Sales, sum(TotItem) as Total_Items from #Dataset group by weekday ii) By Hours During a Day: select distinct Hour, count(membership_nbr) as #Members,sum(totvisit) as Total_visit, sum(totsale) as Total_sales, sum(totitem) as total_items from #Dataset2 group by Hour order by Hour desc
  • 6. c) ID the characteristics of the most active (in salesand in frequentvisits) members QUERY to ID TOP 15 based on Total_sales and #VISITS: select top 15 membership_nbr, count(hour) as #Visits, sum(sales) as TOTAL_SALES, sum(itemcnt) as TOTAL_ITEMCNT from #dataset group by membership_nbr order by TOTAL_SALES desc, #Visits desc; d) Avg Sale and Item by Hour and Day select distinct Hour, avg(totsale) as AvgSale, avg(totitem) as AvgItems from #Dataset2 group by Hour order by Hour desc
  • 7. Task 1 Results a) Total Sales =$64,623,769.65 b) Store_Nbr & Total Sales 18 5791990.04 15 4756625.64 24 4530115.62 27 4382872.87 19 4281760.27 21 3867094.50 28 3730117.50 20 3558149.57 26 3480692.58 29 3425846.25 5 3381878.74 16 3267488.93 22 3184074.35 25 2875560.22 4 2598886.30 17 2573647.71 3 2486413.71 23 2450554.85 c) Day of Week& Total Sales 1 Mon 9,854,511.58 2 Tues 9,068,451.60 3 Wed 6,688,436.60 4 Thur 6,696,184.47 5 Fri 7,348,921.78 6 Sat 9,259,135.96 7 Sun 13,685,584.17 MemberType & Total Sales E 219,401.10 Y 14,519.50 G 13,487.12 3 12,517.61 H 1,953.13 A 502,865.01 W 6,985,452.22 X 6,067,036.65 1 11,754.68 D 56,829.09 V 6,044,250.25 Z 9010.12 d) Top Sale Days organized by MemberType Day M_Type Sales 7Sunday V $5,102,319.46 7Sunday W $3,729,507.83 2 X $1,511,115.64 5 A $154,879.90 6 E $117,923.12 5 D $13,376.19 7 G $4,435.72 6 Y $3,794.28 7 1 $3,452.14 4 3 $2,980.68 3 Z $2,386.06 2 H $1,150.99
  • 8. Most frequentVisitorgroupedby store order by #MemberType Store Type #Type sale 18 V 46508 $ 2,923,557.90 15 W 26895 $ 1,882,085.31 20 X 9078 $ 478,001.79 23 A 4976 $ 51,262.87 22 3 139 $ 9,231.70 26 E 112 $ 22,792.14 26 D 66 $ 9,769.09 24 Y 62 $ 5,914.94 21 G 42 $ 5,105.56 24 Z 38 $ 4,466.81 26 1 38 $ 6,869.71 22 H 6 $ 1,762.71 Task 2 Results a) Summary of Purchase Patterns Visits Sales Items 789,943 $64,623,769.65 $6,609,868 AveragesSale andItemsperVisit: Total_Visit Sales Items 789,943 $81.808142 8 #Members AvgVisit/Members 282316 2 b) MembervisitBreakdowns i) By Day ofthe Week Totals: Day #members TotVisit TotSales TotItems 1 Monday 92,924 128,331 $ 9,854,511.58 1,029,430 2 Tuesday 88,252 121,760 $ 9,068,451.60 919,459 3Wednesday 66,089 89,672 $ 6,688,436.60 686,271 4Thursday 63,927 87,209 $ 6,696,184.47 679,817 5Friday 69,778 95,767 $ 7,348,921.78 762,536 6Saturday 84,691 116,792 $ 9,259,135.96 976,163 7Sunday 125,017 174,532 $ 13,685,584.17 1,494,974
  • 9. ii) By Hour of the Day (totals) Hour #members TotVisit TotSales TotItems 21 92 117 $ 53,223.52 7,995 20 16,397 20,376 $ 2,073,567.65 194,454 19 35,547 47,182 $ 3,890,335.06 399,504 18 40,493 55,487 $ 4,202,097.93 442,749 17 50,964 69,393 $ 5,234,699.21 547,431 16 59,041 81,558 $ 6,162,529.00 639,720 15 62,606 87,344 $ 6,557,986.47 678,389 14 66,350 92,813 $ 7,081,434.92 742,840 13 67,057 94,649 $ 7,082,559.66 757,477 12 64,054 91,749 $ 6,699,729.65 714,873 11 58,605 81,479 $ 6,115,617.03 657,350 10 45,158 59,939 $ 4,575,482.35 472,832 9 17,473 22,240 $ 1,836,507.65 181,723 8 4,970 6,712 $ 704,855.45 73,829 7 1,585 2,429 $ 292,911.61 32,536 6 171 382 $ 23,438.63 3,097 5 77 162 $ 7,525.51 924 4 36 49 $ 6,669.96 924 2 2 3 $ 54.90 3 c) Characteristics ofthe most active membersin termsof frequencyand sales The QueriesruninTask1 and Task2 reveal Member_TypesW,XandV are the mostactive membersintermsof salesandfrequencyof visits.Theyare most dominantintermsof salesand visitsinstore numbers18,15 & 20. However, while the datashowsthe majorityof membersvisitandspendthe mostonDay 1 and Day 7, whichholdstrue for member_typesW&V,member_type Xmakesthe greatestnumberof visitsandspendsthe mostonday 2. As no member_type dominatedday1, day1 revealeditself tobe the mostdiverse inmember_type withthe secondhighest#in salesandfrequencyof visits. Lastly,visitfrequency and saleswere attheirmax during13:00 & 14:00.
  • 10. d) Average Sale by Hour, Average Sale by Day Hour AvgSales AvgItems 21 $ 578.52 86 20 $ 126.46 11 19 $ 109.44 11 18 $ 103.77 10 17 $ 102.71 10 16 $ 104.38 10 15 $ 104.75 10 14 $ 106.73 11 13 $ 105.62 11 12 $ 104.60 11 11 $ 104.35 11 10 $ 101.32 10 9 $ 105.11 10 8 $ 141.82 14 7 $ 184.80 20 6 $ 137.07 18 5 $ 97.73 12 4 $ 185.28 25 2 $ 27.45 1 Day AvgSales AvgItems 1 Monday $ 106.05 1029430 2 Tuesday $ 102.76 919459 3Wednesday $ 101.20 686271 4 Thursday $ 104.75 679817 5 Friday $ 105.32 762536 6 Saturday $ 109.33 976163 7 Sunday $ 109.47 1494974
  • 11. Priority 3:Task 3: Summary of Highlights and Takeaways based on Task 1 & Task 2 Data Analysis Note to Reader:  Salesdata beinganalyzedfortoday’smeetingisforthe monthandyear of January,2000 and shouldbe used incombinationwithdataforthe yearsfollowingtoforecastsale volumesforthe monthof January.  Functionsusingtotal_visit_amouthave beencalculatedtoexclude negativevalues,andItem_cnthasbeen updatedto0 for saiditemsreturned.  Analysismayverydue tolack of clarityand access toadditional detailsregardingthe dataprovided. Disclaimer:The data displayhasbeenderivedfrom fictitiousdatabase andisinnoway a true representationof Sam’s Club. Major Highlights: Day Total SalesPer Day: high, low Member_type best sale day Top Performing store ordered by #M_Type Store# M_Type #M_Type Sales 18 V 46,508 $2,923,557.90 15 W 26,895 $1,882,085.31 Totals/Day Day #members TotVisit TotSales TotItems 1 92,924 128,331 $ 9,854,511.58 1,029,430 2 88,252 121,760 $ 9,068,451.60 919,459 3 66,089 89,672 $ 6,688,436.60 686,271 4 63,927 87,209 $ 6,696,184.47 679,817 5 69,778 95,767 $ 7,348,921.78 762,536 6 84,691 116,792 $ 9,259,135.96 976,163 7 125,017 174,532 $ 13,685,584.17 1,494,974 High Traffic Hour and Largest Sale Producer Hour #Members #Visits Total Sales #Items 14 66,350 92,813 $ 7,081,434.92 742,840 13 67,057 94,649 $ 7,082,559.66 757,477 Note: Despite Hour #13& #14producing thegreatest sales and #visits. Averages in Task2d show thatHour #21 produces thehighestaveragesale value 578.52anditem count 86. 1 Monday $9,854,511.58 Day 1: Most diversein terms of member_type 2 Tuesday $9,068,451.60 Type_X: $1,511,115.64 Type_H: $1,150.99 3 Wednesday $6,688,436.60 Type_Z: #2,386.06 4 Thursday $6,696,184.47 Type_3: $2,980.68 5 Friday $7,348,921.78 Type_D: $13,376.19 Type_A: $154,879.90 6 Saturday $9,259,135.96 Type_Y: #3,794.28 7 7 Sunday $13,685,584.17 Type_W: $3,729,507.83 Type_V: $5,102,319.26 Type_1: $3,452.14 Type_G: $4,425.72 Major Findings: AnalysisrevealedSam’sClub’shighesttrafficandhighestproducerstorestobe #18 and #15 withMonday and Sunday,andhours 13:00 and14:00 producingthe bestresultsinthe same terms.Lastly,examinationalso demonstratedfromthe averagestable inTask2dthathour #21 producedthe highestaverage sale anditemcount.Data collectedfromtasks1 and 2 confirmsthe consistencybetweenqueryresultsfortopandbottomperformers. Lookingsolelyatthe topperformers the hightrafficdays,hoursandstorescan be deduced. Itcan be expectedthat Monday andSunday will see the greatestnumberof visitorswiththe highestgrossingsales,withpeakhoursacrossthe board being13:00 and 14:00. Lastly,store #18 and #15 can expecttosee the bulkof visitsandsaleswiththe remainder spreadout amongthe remainingstores. The top performerdataalsoallowsfora basicstereotype the visitingmemberstobe derived. Furtheranalysis revealsthe bulkof visitortobelongtomember_type V,Wor X.Member_type V andW spendthe most onSunday, while member_type Xspendsthe moston Tuesdays.Interestingly,thesemember_typeseachfrequentdifferentstores: V makesthe greatestnumberof visitstostore #18, W to store #15, and X to #20. Giventhese attributesitcanbe a general assumptionaboutthe spendingpatternsof thesememberscanbe forecasted. Thisdata alone,while sufficienttoroughlyforecastvisitingandspendingpatternsforthe yearof 2001, isnot sufficient to bestanswerthe questionastowhy andhow these attributes are beinginfluence.Nordoesthe dataprovidedgive enoughinsightintothe member_typesorproductsbeingofferedtoallow formore efficientmarketingandinventory management.
  • 12. Takeaways: To assessthe cause of frequency,salesperformance,item type purchasedandmembership_type selected additional datawhichhasnot beenrelayedthroughthe givenSam’sClubdataisdesired. Additional Datadesiredtodevelopamore concise analysisof the topandbottomperformerswouldbe:  Store location (zipcode,state)  Unique itemidentifieranddescription(Toanalyze type andcostof productsbeingsold)  Descriptionof member_type (e.g.wholesaler,retail ect..)  Socioeconomicclassificationof memberbasedonzipcode or surveys If the above datapointscouldbe obtainedalongwithfollowingyeardatafor the monthof January a betterunderstand of the whyand howbehindeachdatafieldcoulddeveloped.The resultwouldnotonlybe the abilitytoprovide more concise forecastingresultsforfrequency,salesandinventorymanagement,butalsothe abilitytounderstandand marketmore efficientlypermember_typeandlevelineacharea.