Using Predictive analytics for large scale E-commerce
Optimizing Machine learning runs
Background:
In our last paper we compared two alternate machine-learning techniques from
the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its
counterpart Apache Hadoop’s MapReduce. We saw how Apache Spark was better
both qualitatively as well as quantitatively even for moderately sized sites.
In this paper, we look at how we can further optimize the efficiency of these runs
without compromising on quality. We determine how the two algorithms we
studied last time perform when run on all data available and when run only with
success data. In the e-commerce domain, success data is defined, as a subset of
the total data, which we heuristically believe, does not include noise.
Data Gathering and setup:
Relevant click stream data for was collected. This constitutes user behavior,
namely view and buy. Based on this, predictive analytics for item-similarity was
run using the Apache Spark and Apace Hadoop mapreduce Log Likelihood in
both cases (i.e. All data and only success data). The data set we used contains the
following information
1. Total data points (ALL DATA ) = 110 Million records of click stream data
(views, buys, and add carts )
2. Total data points ( SUCCESS DATA ) = 22 Million records of click stream
data from users who are categorized as buyers ( bought at least 1 item )
3. 70 / 30 split between training data and test data. (i.e. we split the data set
in #1, in the 70 / 30 ratio. We used 70 % of the data to create
recommendations on and used the balance 30 % to test )
4. Total buyers ( unique people who bought ) = 300 K
We believe the above sample is representative of a mid sized E-commerce
company. We then ran this sample considering all data, and then again with only
success data ( defined above ). We employed two algorithms (i.e. LLR and spark )
to compare the effect of running only with success data as against all data on
these two algorithms. The analysis of our run is described below
Analysis:
We gathered the following data:
1. Number of incorrect recommendations (i.e. Number of products we
recommended that users did not buy) – False positives
2. Number of correct product recommendations (i.e. Number of products
that users bought that we recommended) – True positives
3. Total recommendations
4. Users who bought products that we recommended.
Observations:
1. Total recommendations:
We clearly see that LLR algorithm on ALL data yields far more recommendations
than any other variant. The effect of using only success data on LLR drastically
reduces the number of recommendations that the algorithm yields. However, in
the case of Spark, the effect of using only success data does not drastically reduce
the number of recommendations.
2. Number of correct product recommendations (True Positives):
We clearly see that LLR algorithm on success data yields more correct
recommendations followed closely by Spark on success data, followed by LLR
and Spark on ALL data. The effect of using only success data with LLR drastically
improves the quality of recommendations that the algorithm yields. Even though
in the previous graph the LLR algorithm on ALL data yielded most
recommendations, the quality of those recommendations were not good as is
shown in this graph. The LLR algorithm on SUCCESS data yields far better results
followed closely by the SPARK algorithm on SUCCESS data.
3. Number of incorrect product recommendations (False Positives):
As expected as a consequence of having a low true positive, the false positive of
running LLR with ALL data is significantly higher than other algorithms. Thus we
can see that though the algorithm yields most recommendations, most of them
are useless. We also notice that the false positive rate of LLR on SUCCESS data is
more than that of SPARK, and SPARK on Success data has the least false positive
rate, which is what is desired.
4. Accuracy / Precision
As seen from the above graph, when taken holistically, and the ratio of true
positives (useful recommendations) to false positives (useless
recommendations) is taken, the SPARK algorithm on SUCCESS data comes out a
clear winner.
Inference:
Hence we conclude that using only success data which is only a fifth of the total
data yields better quality results in both LLR and SPARK. The quality
improvement (percentage improvement) in LLR is significant when run only on
SUCCESS data, as compared to SPARK. Over all we see SPARK behaves
consistently irrespective of whether it is run on ALL data or SUCCESS data, with
the quality SPARK on SUCCESS data being marginally better. Hence since the
data set is significantly smaller, and the time taken to run these algorithms is
directly proportional to the data set, we see that running SPARK on SUCCESS
data yields the best results.
- Avinash Shenoi
Founder & Director - Instaclique
avinash@niyuj.com

Predictive analytics for E-commerce

  • 2.
    Using Predictive analyticsfor large scale E-commerce Optimizing Machine learning runs Background: In our last paper we compared two alternate machine-learning techniques from the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its counterpart Apache Hadoop’s MapReduce. We saw how Apache Spark was better both qualitatively as well as quantitatively even for moderately sized sites. In this paper, we look at how we can further optimize the efficiency of these runs without compromising on quality. We determine how the two algorithms we studied last time perform when run on all data available and when run only with success data. In the e-commerce domain, success data is defined, as a subset of the total data, which we heuristically believe, does not include noise. Data Gathering and setup: Relevant click stream data for was collected. This constitutes user behavior, namely view and buy. Based on this, predictive analytics for item-similarity was run using the Apache Spark and Apace Hadoop mapreduce Log Likelihood in both cases (i.e. All data and only success data). The data set we used contains the following information 1. Total data points (ALL DATA ) = 110 Million records of click stream data (views, buys, and add carts ) 2. Total data points ( SUCCESS DATA ) = 22 Million records of click stream data from users who are categorized as buyers ( bought at least 1 item ) 3. 70 / 30 split between training data and test data. (i.e. we split the data set in #1, in the 70 / 30 ratio. We used 70 % of the data to create recommendations on and used the balance 30 % to test ) 4. Total buyers ( unique people who bought ) = 300 K We believe the above sample is representative of a mid sized E-commerce company. We then ran this sample considering all data, and then again with only success data ( defined above ). We employed two algorithms (i.e. LLR and spark ) to compare the effect of running only with success data as against all data on these two algorithms. The analysis of our run is described below Analysis: We gathered the following data: 1. Number of incorrect recommendations (i.e. Number of products we recommended that users did not buy) – False positives 2. Number of correct product recommendations (i.e. Number of products that users bought that we recommended) – True positives 3. Total recommendations 4. Users who bought products that we recommended.
  • 3.
    Observations: 1. Total recommendations: Weclearly see that LLR algorithm on ALL data yields far more recommendations than any other variant. The effect of using only success data on LLR drastically reduces the number of recommendations that the algorithm yields. However, in the case of Spark, the effect of using only success data does not drastically reduce the number of recommendations. 2. Number of correct product recommendations (True Positives): We clearly see that LLR algorithm on success data yields more correct recommendations followed closely by Spark on success data, followed by LLR and Spark on ALL data. The effect of using only success data with LLR drastically improves the quality of recommendations that the algorithm yields. Even though in the previous graph the LLR algorithm on ALL data yielded most
  • 4.
    recommendations, the qualityof those recommendations were not good as is shown in this graph. The LLR algorithm on SUCCESS data yields far better results followed closely by the SPARK algorithm on SUCCESS data. 3. Number of incorrect product recommendations (False Positives): As expected as a consequence of having a low true positive, the false positive of running LLR with ALL data is significantly higher than other algorithms. Thus we can see that though the algorithm yields most recommendations, most of them are useless. We also notice that the false positive rate of LLR on SUCCESS data is more than that of SPARK, and SPARK on Success data has the least false positive rate, which is what is desired. 4. Accuracy / Precision As seen from the above graph, when taken holistically, and the ratio of true positives (useful recommendations) to false positives (useless
  • 5.
    recommendations) is taken,the SPARK algorithm on SUCCESS data comes out a clear winner. Inference: Hence we conclude that using only success data which is only a fifth of the total data yields better quality results in both LLR and SPARK. The quality improvement (percentage improvement) in LLR is significant when run only on SUCCESS data, as compared to SPARK. Over all we see SPARK behaves consistently irrespective of whether it is run on ALL data or SUCCESS data, with the quality SPARK on SUCCESS data being marginally better. Hence since the data set is significantly smaller, and the time taken to run these algorithms is directly proportional to the data set, we see that running SPARK on SUCCESS data yields the best results. - Avinash Shenoi Founder & Director - Instaclique avinash@niyuj.com