SlideShare a Scribd company logo
1 of 5
Download to read offline
Using Predictive analytics for large scale E-commerce
Optimizing Machine learning runs
Background:
In our last paper we compared two alternate machine-learning techniques from
the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its
counterpart Apache Hadoop’s MapReduce. We saw how Apache Spark was better
both qualitatively as well as quantitatively even for moderately sized sites.
In this paper, we look at how we can further optimize the efficiency of these runs
without compromising on quality. We determine how the two algorithms we
studied last time perform when run on all data available and when run only with
success data. In the e-commerce domain, success data is defined, as a subset of
the total data, which we heuristically believe, does not include noise.
Data Gathering and setup:
Relevant click stream data for was collected. This constitutes user behavior,
namely view and buy. Based on this, predictive analytics for item-similarity was
run using the Apache Spark and Apace Hadoop mapreduce Log Likelihood in
both cases (i.e. All data and only success data). The data set we used contains the
following information
1. Total data points (ALL DATA ) = 110 Million records of click stream data
(views, buys, and add carts )
2. Total data points ( SUCCESS DATA ) = 22 Million records of click stream
data from users who are categorized as buyers ( bought at least 1 item )
3. 70 / 30 split between training data and test data. (i.e. we split the data set
in #1, in the 70 / 30 ratio. We used 70 % of the data to create
recommendations on and used the balance 30 % to test )
4. Total buyers ( unique people who bought ) = 300 K
We believe the above sample is representative of a mid sized E-commerce
company. We then ran this sample considering all data, and then again with only
success data ( defined above ). We employed two algorithms (i.e. LLR and spark )
to compare the effect of running only with success data as against all data on
these two algorithms. The analysis of our run is described below
Analysis:
We gathered the following data:
1. Number of incorrect recommendations (i.e. Number of products we
recommended that users did not buy) – False positives
2. Number of correct product recommendations (i.e. Number of products
that users bought that we recommended) – True positives
3. Total recommendations
4. Users who bought products that we recommended.
Observations:
1. Total recommendations:
We clearly see that LLR algorithm on ALL data yields far more recommendations
than any other variant. The effect of using only success data on LLR drastically
reduces the number of recommendations that the algorithm yields. However, in
the case of Spark, the effect of using only success data does not drastically reduce
the number of recommendations.
2. Number of correct product recommendations (True Positives):
We clearly see that LLR algorithm on success data yields more correct
recommendations followed closely by Spark on success data, followed by LLR
and Spark on ALL data. The effect of using only success data with LLR drastically
improves the quality of recommendations that the algorithm yields. Even though
in the previous graph the LLR algorithm on ALL data yielded most
recommendations, the quality of those recommendations were not good as is
shown in this graph. The LLR algorithm on SUCCESS data yields far better results
followed closely by the SPARK algorithm on SUCCESS data.
3. Number of incorrect product recommendations (False Positives):
As expected as a consequence of having a low true positive, the false positive of
running LLR with ALL data is significantly higher than other algorithms. Thus we
can see that though the algorithm yields most recommendations, most of them
are useless. We also notice that the false positive rate of LLR on SUCCESS data is
more than that of SPARK, and SPARK on Success data has the least false positive
rate, which is what is desired.
4. Accuracy / Precision
As seen from the above graph, when taken holistically, and the ratio of true
positives (useful recommendations) to false positives (useless
recommendations) is taken, the SPARK algorithm on SUCCESS data comes out a
clear winner.
Inference:
Hence we conclude that using only success data which is only a fifth of the total
data yields better quality results in both LLR and SPARK. The quality
improvement (percentage improvement) in LLR is significant when run only on
SUCCESS data, as compared to SPARK. Over all we see SPARK behaves
consistently irrespective of whether it is run on ALL data or SUCCESS data, with
the quality SPARK on SUCCESS data being marginally better. Hence since the
data set is significantly smaller, and the time taken to run these algorithms is
directly proportional to the data set, we see that running SPARK on SUCCESS
data yields the best results.
- Avinash Shenoi
Founder & Director - Instaclique
avinash@niyuj.com

More Related Content

What's hot

A product-focused introduction to Machine Learning
A product-focused introduction to Machine LearningA product-focused introduction to Machine Learning
A product-focused introduction to Machine LearningSatpreet Singh
 
Sentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainSentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainEdureka!
 
Things you need to know about big data
Things you need to know about big dataThings you need to know about big data
Things you need to know about big dataLantern Institute
 
Fleet intelligence A-Z: Top 10 Terms in Big Data
Fleet intelligence A-Z: Top 10 Terms in Big Data Fleet intelligence A-Z: Top 10 Terms in Big Data
Fleet intelligence A-Z: Top 10 Terms in Big Data Morgan Casper
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingKai Xin Thia
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Andrew Clark
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with REdureka!
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?BalaBit
 
Aastha Grover Resume (2)
Aastha Grover Resume (2)Aastha Grover Resume (2)
Aastha Grover Resume (2)Aastha Grover
 
Comparison of machine learning algorithms for e commerce
Comparison of machine learning algorithms for e commerceComparison of machine learning algorithms for e commerce
Comparison of machine learning algorithms for e commerceNiyuj - Delivering innovation
 
Resume_Mohammad
Resume_MohammadResume_Mohammad
Resume_Mohammadsazalcse
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Edureka!
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Statistics, Analysis of a balanced incomplete block design using SPSS.
Statistics, Analysis of a balanced incomplete block design using SPSS.Statistics, Analysis of a balanced incomplete block design using SPSS.
Statistics, Analysis of a balanced incomplete block design using SPSS.Kipkosgei Festus
 
CHANGXU CHEN Resume(logistics)
CHANGXU CHEN Resume(logistics)CHANGXU CHEN Resume(logistics)
CHANGXU CHEN Resume(logistics)Changxu Chen
 
Reverse mashup proposal
Reverse mashup proposalReverse mashup proposal
Reverse mashup proposalTetsuro Toyoda
 
Using ai in advance science venkat vajradhar - medium
Using ai in advance science   venkat vajradhar - mediumUsing ai in advance science   venkat vajradhar - medium
Using ai in advance science venkat vajradhar - mediumvenkatvajradhar1
 
Datasciencein E-commerce industry
Datasciencein E-commerce industryDatasciencein E-commerce industry
Datasciencein E-commerce industryRakuten Group, Inc.
 

What's hot (18)

A product-focused introduction to Machine Learning
A product-focused introduction to Machine LearningA product-focused introduction to Machine Learning
A product-focused introduction to Machine Learning
 
Sentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainSentiment Analysis In Retail Domain
Sentiment Analysis In Retail Domain
 
Things you need to know about big data
Things you need to know about big dataThings you need to know about big data
Things you need to know about big data
 
Fleet intelligence A-Z: Top 10 Terms in Big Data
Fleet intelligence A-Z: Top 10 Terms in Big Data Fleet intelligence A-Z: Top 10 Terms in Big Data
Fleet intelligence A-Z: Top 10 Terms in Big Data
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
 
Aastha Grover Resume (2)
Aastha Grover Resume (2)Aastha Grover Resume (2)
Aastha Grover Resume (2)
 
Comparison of machine learning algorithms for e commerce
Comparison of machine learning algorithms for e commerceComparison of machine learning algorithms for e commerce
Comparison of machine learning algorithms for e commerce
 
Resume_Mohammad
Resume_MohammadResume_Mohammad
Resume_Mohammad
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Statistics, Analysis of a balanced incomplete block design using SPSS.
Statistics, Analysis of a balanced incomplete block design using SPSS.Statistics, Analysis of a balanced incomplete block design using SPSS.
Statistics, Analysis of a balanced incomplete block design using SPSS.
 
CHANGXU CHEN Resume(logistics)
CHANGXU CHEN Resume(logistics)CHANGXU CHEN Resume(logistics)
CHANGXU CHEN Resume(logistics)
 
Reverse mashup proposal
Reverse mashup proposalReverse mashup proposal
Reverse mashup proposal
 
Using ai in advance science venkat vajradhar - medium
Using ai in advance science   venkat vajradhar - mediumUsing ai in advance science   venkat vajradhar - medium
Using ai in advance science venkat vajradhar - medium
 
Datasciencein E-commerce industry
Datasciencein E-commerce industryDatasciencein E-commerce industry
Datasciencein E-commerce industry
 

Viewers also liked

Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012Seismi Limited
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modellingQuinton Anderson
 
Best Practices In Predictive Analytics
Best Practices In Predictive AnalyticsBest Practices In Predictive Analytics
Best Practices In Predictive AnalyticsCapgemini
 
Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling Edureka!
 
Presentation Churn Management
Presentation Churn ManagementPresentation Churn Management
Presentation Churn Managementfarhanmajeed
 
Gourab resume 08.24.15
Gourab resume 08.24.15Gourab resume 08.24.15
Gourab resume 08.24.15Gourab Samanta
 
Polish Mathematicians by Kinga Gruca 2d
Polish Mathematicians by Kinga Gruca 2dPolish Mathematicians by Kinga Gruca 2d
Polish Mathematicians by Kinga Gruca 2dmagdajanusz
 
Professional Resume II - Anthony Philip Potgieter - 092015
Professional Resume II - Anthony Philip Potgieter - 092015Professional Resume II - Anthony Philip Potgieter - 092015
Professional Resume II - Anthony Philip Potgieter - 092015Anthony Potgieter
 
Lancera Company Profile
Lancera Company ProfileLancera Company Profile
Lancera Company ProfileCalvin Gordon
 
Gregory Sam professional_profile
Gregory Sam professional_profileGregory Sam professional_profile
Gregory Sam professional_profileGregory Sam
 

Viewers also liked (14)

Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Best Practices In Predictive Analytics
Best Practices In Predictive AnalyticsBest Practices In Predictive Analytics
Best Practices In Predictive Analytics
 
Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling
 
Presentation Churn Management
Presentation Churn ManagementPresentation Churn Management
Presentation Churn Management
 
Ncsp 07
Ncsp 07Ncsp 07
Ncsp 07
 
Gourab resume 08.24.15
Gourab resume 08.24.15Gourab resume 08.24.15
Gourab resume 08.24.15
 
Polish Mathematicians by Kinga Gruca 2d
Polish Mathematicians by Kinga Gruca 2dPolish Mathematicians by Kinga Gruca 2d
Polish Mathematicians by Kinga Gruca 2d
 
Professional Resume II - Anthony Philip Potgieter - 092015
Professional Resume II - Anthony Philip Potgieter - 092015Professional Resume II - Anthony Philip Potgieter - 092015
Professional Resume II - Anthony Philip Potgieter - 092015
 
ECCM14_045-ECCM14
ECCM14_045-ECCM14ECCM14_045-ECCM14
ECCM14_045-ECCM14
 
Lancera Company Profile
Lancera Company ProfileLancera Company Profile
Lancera Company Profile
 
Horizon shutters - www.horizonshutters.ie
Horizon shutters - www.horizonshutters.ieHorizon shutters - www.horizonshutters.ie
Horizon shutters - www.horizonshutters.ie
 
CV-KAILASH NOV 2014
CV-KAILASH                     NOV 2014CV-KAILASH                     NOV 2014
CV-KAILASH NOV 2014
 
Gregory Sam professional_profile
Gregory Sam professional_profileGregory Sam professional_profile
Gregory Sam professional_profile
 

Similar to Predictive analytics for E-commerce

Summer Independent Study Report
Summer Independent Study ReportSummer Independent Study Report
Summer Independent Study ReportShreya Chakrabarti
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningSharvil Katariya
 
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP
 
Deconstructing the app store rankings formula
Deconstructing the app store rankings formula Deconstructing the app store rankings formula
Deconstructing the app store rankings formula Suthasinee Lieopairoj
 
Final project ADS INFO-7390
Final project ADS INFO-7390Final project ADS INFO-7390
Final project ADS INFO-7390Tushar Goel
 
Analytical Hierarchy Process (AHP)
Analytical Hierarchy Process (AHP)Analytical Hierarchy Process (AHP)
Analytical Hierarchy Process (AHP)SakshiAggarwal98
 
All that Glitters Is Not Gold_Comparing Backtest and Out-of-Sample Performanc...
All that Glitters Is Not Gold_Comparing Backtest and Out-of-Sample Performanc...All that Glitters Is Not Gold_Comparing Backtest and Out-of-Sample Performanc...
All that Glitters Is Not Gold_Comparing Backtest and Out-of-Sample Performanc...justinlent
 
Analytics for Process Excellence
Analytics for Process ExcellenceAnalytics for Process Excellence
Analytics for Process ExcellenceDenis Gagné
 
Meta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationMeta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationPriyatham Bollimpalli
 
Types Of Sap Hana Models
Types Of Sap Hana ModelsTypes Of Sap Hana Models
Types Of Sap Hana ModelsAshley Thomas
 
PredictingYelpReviews
PredictingYelpReviewsPredictingYelpReviews
PredictingYelpReviewsGary Giust
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetIJCERT
 
Big Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxBig Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxKnoldus Inc.
 
Big Data Transformations Powered By Spark
Big Data Transformations Powered By SparkBig Data Transformations Powered By Spark
Big Data Transformations Powered By SparkKnoldus Inc.
 

Similar to Predictive analytics for E-commerce (20)

MLProjectReport
MLProjectReportMLProjectReport
MLProjectReport
 
Summer Independent Study Report
Summer Independent Study ReportSummer Independent Study Report
Summer Independent Study Report
 
NLPP Method (English)
NLPP Method (English)NLPP Method (English)
NLPP Method (English)
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised Learning
 
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
 
A New Sales Forecasting Model for International Restaurants
A New Sales Forecasting Model for International RestaurantsA New Sales Forecasting Model for International Restaurants
A New Sales Forecasting Model for International Restaurants
 
Practical Machine Learning at Work
Practical Machine Learning at WorkPractical Machine Learning at Work
Practical Machine Learning at Work
 
Deconstructing the app store rankings formula
Deconstructing the app store rankings formula Deconstructing the app store rankings formula
Deconstructing the app store rankings formula
 
Final project ADS INFO-7390
Final project ADS INFO-7390Final project ADS INFO-7390
Final project ADS INFO-7390
 
Analytical Hierarchy Process (AHP)
Analytical Hierarchy Process (AHP)Analytical Hierarchy Process (AHP)
Analytical Hierarchy Process (AHP)
 
TOP Statistical Analysis Software
TOP Statistical Analysis SoftwareTOP Statistical Analysis Software
TOP Statistical Analysis Software
 
All that Glitters Is Not Gold_Comparing Backtest and Out-of-Sample Performanc...
All that Glitters Is Not Gold_Comparing Backtest and Out-of-Sample Performanc...All that Glitters Is Not Gold_Comparing Backtest and Out-of-Sample Performanc...
All that Glitters Is Not Gold_Comparing Backtest and Out-of-Sample Performanc...
 
Analytics for Process Excellence
Analytics for Process ExcellenceAnalytics for Process Excellence
Analytics for Process Excellence
 
Guide to SEMRush
Guide to SEMRushGuide to SEMRush
Guide to SEMRush
 
Meta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationMeta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter Optimization
 
Types Of Sap Hana Models
Types Of Sap Hana ModelsTypes Of Sap Hana Models
Types Of Sap Hana Models
 
PredictingYelpReviews
PredictingYelpReviewsPredictingYelpReviews
PredictingYelpReviews
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
 
Big Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxBig Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptx
 
Big Data Transformations Powered By Spark
Big Data Transformations Powered By SparkBig Data Transformations Powered By Spark
Big Data Transformations Powered By Spark
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 

Predictive analytics for E-commerce

  • 1.
  • 2. Using Predictive analytics for large scale E-commerce Optimizing Machine learning runs Background: In our last paper we compared two alternate machine-learning techniques from the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its counterpart Apache Hadoop’s MapReduce. We saw how Apache Spark was better both qualitatively as well as quantitatively even for moderately sized sites. In this paper, we look at how we can further optimize the efficiency of these runs without compromising on quality. We determine how the two algorithms we studied last time perform when run on all data available and when run only with success data. In the e-commerce domain, success data is defined, as a subset of the total data, which we heuristically believe, does not include noise. Data Gathering and setup: Relevant click stream data for was collected. This constitutes user behavior, namely view and buy. Based on this, predictive analytics for item-similarity was run using the Apache Spark and Apace Hadoop mapreduce Log Likelihood in both cases (i.e. All data and only success data). The data set we used contains the following information 1. Total data points (ALL DATA ) = 110 Million records of click stream data (views, buys, and add carts ) 2. Total data points ( SUCCESS DATA ) = 22 Million records of click stream data from users who are categorized as buyers ( bought at least 1 item ) 3. 70 / 30 split between training data and test data. (i.e. we split the data set in #1, in the 70 / 30 ratio. We used 70 % of the data to create recommendations on and used the balance 30 % to test ) 4. Total buyers ( unique people who bought ) = 300 K We believe the above sample is representative of a mid sized E-commerce company. We then ran this sample considering all data, and then again with only success data ( defined above ). We employed two algorithms (i.e. LLR and spark ) to compare the effect of running only with success data as against all data on these two algorithms. The analysis of our run is described below Analysis: We gathered the following data: 1. Number of incorrect recommendations (i.e. Number of products we recommended that users did not buy) – False positives 2. Number of correct product recommendations (i.e. Number of products that users bought that we recommended) – True positives 3. Total recommendations 4. Users who bought products that we recommended.
  • 3. Observations: 1. Total recommendations: We clearly see that LLR algorithm on ALL data yields far more recommendations than any other variant. The effect of using only success data on LLR drastically reduces the number of recommendations that the algorithm yields. However, in the case of Spark, the effect of using only success data does not drastically reduce the number of recommendations. 2. Number of correct product recommendations (True Positives): We clearly see that LLR algorithm on success data yields more correct recommendations followed closely by Spark on success data, followed by LLR and Spark on ALL data. The effect of using only success data with LLR drastically improves the quality of recommendations that the algorithm yields. Even though in the previous graph the LLR algorithm on ALL data yielded most
  • 4. recommendations, the quality of those recommendations were not good as is shown in this graph. The LLR algorithm on SUCCESS data yields far better results followed closely by the SPARK algorithm on SUCCESS data. 3. Number of incorrect product recommendations (False Positives): As expected as a consequence of having a low true positive, the false positive of running LLR with ALL data is significantly higher than other algorithms. Thus we can see that though the algorithm yields most recommendations, most of them are useless. We also notice that the false positive rate of LLR on SUCCESS data is more than that of SPARK, and SPARK on Success data has the least false positive rate, which is what is desired. 4. Accuracy / Precision As seen from the above graph, when taken holistically, and the ratio of true positives (useful recommendations) to false positives (useless
  • 5. recommendations) is taken, the SPARK algorithm on SUCCESS data comes out a clear winner. Inference: Hence we conclude that using only success data which is only a fifth of the total data yields better quality results in both LLR and SPARK. The quality improvement (percentage improvement) in LLR is significant when run only on SUCCESS data, as compared to SPARK. Over all we see SPARK behaves consistently irrespective of whether it is run on ALL data or SUCCESS data, with the quality SPARK on SUCCESS data being marginally better. Hence since the data set is significantly smaller, and the time taken to run these algorithms is directly proportional to the data set, we see that running SPARK on SUCCESS data yields the best results. - Avinash Shenoi Founder & Director - Instaclique avinash@niyuj.com