dataVISIONS is built with novel machine learning algorithms in combination with deep data mining by fraud concepts in response to a simple but profound question,"What should be the Pricing strategy to stop eCommerce fraud, improve Cyber-security, decrease Anti Money Laundry, Call center behavior analysis etc?" What segmentation techniques can be applied towards those goals?
American Statistical Association October 23 2009 Presentation Part 1
Data Visions Big Data Visual Analytics Tool
1. dataVISIONS: Big Data Visual Analytics Tool: VERSION 2:
•Predictive analytics provides an entry into Retail Analytics for
example, finding propensity of buying.
•Very few variables get selected out of 500 or so, hence it NEEDS
Holistic approach. Rank order doesn’t mean largest possible dollar
made from historical buying patterns...
•Propensity of buying needs to be looked through pricing strategy.
Evaluate its effects through as many variables as possible because
of correlation with hottest topic in retail or return rate analysis.
dataVISIONS is cost effective compared to similar new SAS tool.
•Furthermore, Big Data Retail Analytics needs Visual Analytics that is
prospective, not retrospective. It also must unearths good
questions, hypothesis and interpretation that one must be able to
see!
2. Big Data Analytics Vision: Visualize or not IS the
Question!
•Customers are not comfortable with just numbers or
interpretation that differs by person-to-person.
•Visualization tool needed in the company’s assets.
•Helps with practical application of Statistics; stops
peeling layers of data forever!
•Current visualization tools in market lack
combination of statistical methods to meet demands
for Data Mining (segmentation appreciation). Several
examples are provided below:-
3. Pricing Strategy: Gender Effects on eCommerce: New Pricing shows clear
segmentation: Spatial statistics in combination with Regression shows very strong
price sensitivity (very bottom right pane).
4. Pricing Strategy: Gender Effects on eCommerce: Same Method: Market need for crisp
Segmentation Visualization: low, medium (center), high and very strong price
sensitive customers (very bottom right pane).
5. What we learned so far…
APPLICATION OF SPATIAL STATISTICS SIMULATION SHOWS MANY
SELLER STORES ON ECOMMERCE WEBSITES THAT NEEDS TO BE
TURNED OFF WITH NEW PRICE.
MAJORITY OF THEM ARE AT VERY LOW PRICE; LIKELY STOLEN
CONSUMER ELECTRONICS PRODUCTS SOLD TO MAKE A QUICK BUCK
IN ECOMMERCE. WHAT WILL BE THE CONSUMER PROPENSITY TO
BUY IN FUTURE FROM THIS ECOMMERCE WEBSITE ?
COMPANY NEEDS TO INSTITUTE NEW PRICING POLICY BECAUSE THE
WORKING HYPOTHESIS IS : “IN BOTH GENDERS, NEW PRICING
REMOVES SOME SELLER STORES THAT COSTS MONEY TO SERVE AND
ADDS TO FRAUD REDUCTION/SECURITY”
6. eCommerce Actionable Findings Found
•Old price was an easy entry to any seller. Buyer
cloud from simulation for new price allow
segmentation and identification of price sensitive
and insensitive customers of various levels.
•eCommerce Company needs to have a strategy
that serves well to low price sellers and buyer as
well. New Pricing works with segmenting high value
seeker as well; want to pay more.
•More information through social media behavior is
provided why this strategy will serve well…
7. Pricing Strategy VS Social Media (photo sharing= yes)
behavior interaction simulation: old price, Gender=Male
8. Pricing Strategy VS Social Media (photo sharing= yes)
behavior interaction simulation: new price,
Gender=Male
9. Pricing Strategy VS Social Media (photo sharing= no) behavior
interaction simulation: old price, Gender=Male
10. Pricing Strategy VS Social Media (photo sharing= no)
behavior interaction simulation: New price, Gender=Male
11. Actionable Findings Learned
•Photo-sharing is an important social media
behavior; dataVISIONS methods remove clutter and
finds price sensitive customers even for those who
do not indulge in photo sharing behavior.
• some seller stores drop out with new pricing
simulation; indicating some fraud reduction. This in
turn improves buyer return to eCommerce website.
•dataVISION removes clutter and shows this data is
coming from Very price sensitive customers (Y-axis).
•A happy eCommerce customer will have low
Recency (time taken to next purchase) and not ask
for product return at the cost of company.
12. Actionable Findings Learned
•Product recommendation on social media is a way
that Retailer can count on return purchase. Higher
recommendation could mean higher Net Promoter
Score (NPS).
•Next two slides show value of Net Promoter Score
in eCommerce. NPS is the difference between
Promoters and Detractors, same number can be
arrived in multiple ways (Detractors>Promoters
=<0). Hence predictive analytics will NEVER use it!
•dataVISION removes clutter and shows this data is
coming from Very price sensitive customers.
13. Male: Net Promoter Score <=0 (more buyer gave
bad review of product), old price simulation
14. Male: Net Promoter Score <=0, New price simulation: at
least some bad reviews by buyers are gone from very price
sensitive customers; could become return purchaser.
15. CYBER SECURITY AND RETAIL
VARIABLES INTERACTION
• Cyber security is an important concern that permeates every aspect of
US corporate system:
•http://www.businessweek.com/articles/2012-08-02/the-cost-of-cyber-crime
•Billions of $ being poured in unsuccessfully!
•As consumers switch to mobile apps; there will be phenomenal growth
in fraudulent bills paid from apps hacking because it is very easy to push
a button on mobile inadvertently. Plus the global busy life makes it so
much easier…
•Retailers need to pay lot more attention to prevent it prospectively and
grab the market share NOW!
16. Cyber security: Algorithm encourages to form spikes but 7 observations from right
refuses to do so; just in time price or product will sell very quickly due to “right
sizing”. Seller’s Original price; new pricing strategy unable to stop this. Low r-squares
fraud patterns: male gender, Net Promoter Score= <0; 4 weeks of a month (same data
source as above: Male Gender)
17. How does pricing strategy interacts with cyber security? One seller from
above slide occurred twice in male gender. there are only three spikes out of
20 observations below. R-square pattern shows fraud hits in the data. This
seller’s pattern below for 8 months: Male Gender, Net Promoter Score= <0.
18. Cyber security VS Online Pricing Strategy: The method is designed to
smooth out large pricing spikes; complicated, Aberrant Online
Selling Pattern (AOSP) on the same data above shows exact same
patterns. Hypothesis in slide 5 is rejected!
19. Female Gender is not the victim here. New Pricing helps with female gender only.
Hypothesis: “female gender has better awareness of a consumer electronics product
such as phones in ecommerce space”. Net Promoter score =<0 and only one time
seller were found here. Low NPS is not from fraud, perhaps over advertisement is
culprit here… product return rate should be lower than male gender
20. How does Pricing strategy interacts with Net Promoter Score (3 & 4) and Recency (500 or
50% of time likely to visit this ecommerce website) of buyer (male) for cyber security? New
pricing strategy kicks at observation #7, all sellers participated only once: No difference
between old and new price (A/B Testing)…. Explains buyer Recency is only 50%
21. How does Seasonality interacts with Online Fraud? Mock up data of holiday sale: Only
visualization from combination methods shows behavior of fraud (right); one product really
high price , one low price and two at expected price. This way no one suspects of anything
wrong; explains the change in statistics from left (no fraud) to right at that time (male
gender). Program this and find thousands like it in database; saving millions $ prospectively!
Mean normalized + moving average Mean normalized + moving average
22. One known online fraudster’s sale was added: Highest variations explained differs by
20% in right pane (R-sq: fraud). Finding more patterns like this in database will make
Millions $ for company. But this data mining pattern came from question that
became obvious only after visualization of spike pattern in the center of right pane.
Large price spike normalized: Large price spike normalized: how
pattern change: No fraud pattern changes!
23. RETAIL BANKING RISK ANALYTICS:
ANTI MONEY LAUNDRY (AML)
•Retail Banking is superpower of US economy; needless to write billions
spend to bail out this sector to stabilize the US Economy (2007-10).
•Banks provide loans to retail customers and make money based on loan
origination rate and interest rate etc...
•Risk/retail banking paradigm is shifting; pricing needs to be looked
through the prism of Online and Social Media Behavior.
•Need to find profitable customers, has working life left and will go
through some more life changing events, hence creating retail demand.
These customers must Never churn from your business!!
•Customer segmentation here are 1.female gender, 2.online Ads imp=0,
3.TV Ads imp=0, 4.online photo sharing=0, 5.leader in providing mortgages
and home equity lines of credit to consumers= 0. The segmentation below
show the followings from Data Mining:-
24. Pattern 1: Business Question: are the AML customers have churned or still
with the bank: no de-differencing and r-square is same for linear and
quadratic equations (Loan Amount is Y- axis).
26. Pattern 3: Genetic Algorithm also has
no effects in changing the coefficients!
27. Pattern 4: First three patterns above do not change. One expects
these customers have churned. It is nice to confirm the interpretation
VISUALLY! Anti mutation rate when brings shrinkage in coefficients,
confirms continuation of same pattern as above; keeps over
segmentation rate low. Customer Churn is not just inevitable, but have
done so! Catching them for AML will be difficult.
28. CALL CENTER ANALYTICS:FRAUD and WASTE
A) THE VALUE PROPOSITION (VPE) OF CALL CENTER IS “LOAD BALANCING” OR
ROUTE MAJORITY OF CALLS TO MOST PRODUCTIVE CALL CENTER SALES AGENTS.
B) THAT MEANS AGENTS WITH HIGHEST SALES CONVERSION RATE (SRC).
C) SALES AGENT CAN EASILY TAKE A SALES CALL AND INPUT IN SYSTEM AS NON
SALE CALL IF SELLING DID NOT MATERIALIZE TO KEEP SRC HIGH. THIS IS
FRAUDULENT ACTIVITY WHICH BEATS THE LOAD BALANCING CONCEPT.
D) THEN THERE ARE CALL CENTER AGENTS WHO TAKE VERY LONG SALES
CONVERSION TIME. THIS IS WASTE BECAUSE CUSTOMER USUALLY DO NOT WAIT
FOR 30 MINUTES ON PHONE TO BUY A PRODUCT IN NEXT 30 MINUTES.
E) SALES AGENTS WITH SIMILAR TIMES OF WAIT AND CALL TIME IS VERY
SUSPICIOUS BECAUSE CALL SALE TIME> WAIT RESULTS INTO REFERRED FOR
TRAINING. THIS SALES AGENT IS DOING BEST TO AVOID NEGATIVE EFFECTS ON
PERFORMANCE. PLUS MORE TIME MEANS ADDITIONAL PRAISE FOR PRODUCT
THAT MAY NOT LIVE UP TO; TRIGGERING RETURN AND LOSS OF WARRANTY $.
29. CALL CENTER ANALYTICS: looking for condition E because it is fraud and
waste as well as company may lose warranty $ and could end up paying for return shipping
$. No_seasonal r-sq coefficients are lower than season (good) because sell occur in
Christmas. Bad news is that seasonality Linear and Quadratic coefficients are similar!
30. Similar Linear and Quadratic R-sq means the call center agent is avoiding
training referral and company could end up incurring additional $ for this
sales later . Mathematical Equation developed catches the agent in action;
very low coefficient pattern means review all sales made by this agent in call
center after sending for training.
31. BUSINESS VISION APPENDIX
FRAUD, WASTE AND ABUSE HAS CAUGHT UP WITH RETAIL. IT IS WITH
ECOMMERCE, BRICK AND MOTOR STORE AND EVERYWHERE.
PROPENSITY OF BUYING IS CORNERSTONE OF RETAIL PREDICTIVE ANALYTICS.
EVEN EXCRUCIATING ANALYSIS OF TOP 2% DECILE RESULTS IN VARIANCE WITH
PREDICTED VS OBSERVED PURCHASE $. MUST REVIEW PRICING STRATEGY!
RETAILERS HATE TO SEE PRODUCTS RETURNED DUE TO POOR SHAPE OF
STOLEN PRODUCT OR EXAGGERATED ADVERTISEMENT. SMALL COMPANY IN
BAY AREA WILLING TO FORK OUT MILLIONS OF $ FOR PRICY SAS TOOL.
SOCIAL MEDIA REVOLUTION IS SUCH THAT ONE NEGATIVE COMMENT EQUALS
TO WASHING OF THOUSANDS OF $ IN ADVERTISEMENT SPEND AND GOODWILL.
AML HAS ORIGIN IN INSURANCE AND REQUIRES COMPETENCY IN BANKING
LOAN ORIGINATION DATA, HEALTHCARE AND CAR INSURANCE DATA. THAT’S
WHY EVEN TOP 5 CONSULTING COMPANY HAS LOWER PRESENCE IN IT; HARD
TO FIND SME IN ALL THESE THREE AREAS.
32. DOUBLE CHECK CONSULTING: dataVISIONS
PRATIBHA SINHA: MS PHYSICS, BIHAR UNIVERSITY, MBA IN INTERNATIONAL
MARKETING FROM IGNU IN PATNA.
CORPORATE HIGHLIGHT IS RECOGNIZED ECOMMERCE EXPERT, EXPERIENCE WITH
DIGITAL RIVER, SYMANTEC (NORTON PRODUCT) AND PACIFIC GAS AND ELECTRIC
COMPANY IN BAY AREA. AFTER CORPORATE WORK, SHE ENJOYS EXPERIMENTING
WITH INDIAN AND CHINESE SPICES.
NAVIN SINHA HAS MS IN AGRICULTURAL STATISTICS, STATISTICAL GENETICS,
DECISION SCIENCES (MBA). HE IS AUTHOR OF 12 PEER PAPERS AND ONE US
PATENT.
CORPORATE HIGHLIGHT IS EXPERIENCE FROM SEVERAL BILLION $ COMPANY SUCH
AS DSM FOOD SPECIALTY (6TH LARGEST EUROPEAN COMPANY) , BEST BUY, WIPRO,
UNITEDHEALTH GROUP AND VERISK HEALTH. NAVIN IS RESPECTED FRAUD AND
DATA MINING EXPERT IN INSURANCE AND SIMILAR VERTICALS. NAVIN ENJOYS
APPLYING MATHEMATICAL GENETICS CONCEPTS TO BREED NEW VARIETIES OF
TOMATO WHEN NOT WORKING ON CORPORATE PROJECTS.
33. CONCLUSIONS
•dataVISIONS Big Data Visual Analytics Tool was built on mocking up Retail and
Banking data from Navin and Pratibha Sinha’s corporate experiences.
•Invited speaker by American Statistical Association for Cancer Data Mining
(YouTube:2009). Pratibha Sinha is an eCommerce Expert.
•The tool achieves its objectives: Unearth hypothesis, unexpected Data Mining
patterns in various dynamic US Corporate system. Like to know a Tool that does all
this???
•Flexible to share growing pains to help build customized Visualization tool for a
company; learning and collaboration will only improve dataVISIONS!
•Navin Sinha is an award wining poet from Utah State University (1998); took that
level of creativity and imagination to come up with dataVISIONS big data visual
analytics tool. The material presented here is a Very Small Sample of Methods.
•Disclaimer: According to CA Laws, Propriety Technical Marketing Material of
Navin Sinha and Pratibha Sinha (952-905-6636). They are not liable for
unauthorized use.
•VPE: “Something for the money, and-more for the satisfaction!”