In our last paper we compared two alternate machine-learning techniques from
the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its
counterpart Apache Hadoop’s MapReduce. We saw how Apache Spark was better
both qualitatively as well as quantitatively even for moderately sized sites.
In this paper, we look at how we can further optimize the efficiency of these runs
without compromising on quality. We determine how the two algorithms we
studied last time perform when run on all data available and when run only with
success data. In the e-commerce domain, success data is defined, as a subset of
the total data, which we heuristically believe, does not include noise.
2. Using Predictive analytics for large scale E-commerce
Optimizing Machine learning runs
Background:
In our last paper we compared two alternate machine-learning techniques from
the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its
counterpart Apache Hadoop’s MapReduce. We saw how Apache Spark was better
both qualitatively as well as quantitatively even for moderately sized sites.
In this paper, we look at how we can further optimize the efficiency of these runs
without compromising on quality. We determine how the two algorithms we
studied last time perform when run on all data available and when run only with
success data. In the e-commerce domain, success data is defined, as a subset of
the total data, which we heuristically believe, does not include noise.
Data Gathering and setup:
Relevant click stream data for was collected. This constitutes user behavior,
namely view and buy. Based on this, predictive analytics for item-similarity was
run using the Apache Spark and Apace Hadoop mapreduce Log Likelihood in
both cases (i.e. All data and only success data). The data set we used contains the
following information
1. Total data points (ALL DATA ) = 110 Million records of click stream data
(views, buys, and add carts )
2. Total data points ( SUCCESS DATA ) = 22 Million records of click stream
data from users who are categorized as buyers ( bought at least 1 item )
3. 70 / 30 split between training data and test data. (i.e. we split the data set
in #1, in the 70 / 30 ratio. We used 70 % of the data to create
recommendations on and used the balance 30 % to test )
4. Total buyers ( unique people who bought ) = 300 K
We believe the above sample is representative of a mid sized E-commerce
company. We then ran this sample considering all data, and then again with only
success data ( defined above ). We employed two algorithms (i.e. LLR and spark )
to compare the effect of running only with success data as against all data on
these two algorithms. The analysis of our run is described below
Analysis:
We gathered the following data:
1. Number of incorrect recommendations (i.e. Number of products we
recommended that users did not buy) – False positives
2. Number of correct product recommendations (i.e. Number of products
that users bought that we recommended) – True positives
3. Total recommendations
4. Users who bought products that we recommended.
3. Observations:
1. Total recommendations:
We clearly see that LLR algorithm on ALL data yields far more recommendations
than any other variant. The effect of using only success data on LLR drastically
reduces the number of recommendations that the algorithm yields. However, in
the case of Spark, the effect of using only success data does not drastically reduce
the number of recommendations.
2. Number of correct product recommendations (True Positives):
We clearly see that LLR algorithm on success data yields more correct
recommendations followed closely by Spark on success data, followed by LLR
and Spark on ALL data. The effect of using only success data with LLR drastically
improves the quality of recommendations that the algorithm yields. Even though
in the previous graph the LLR algorithm on ALL data yielded most
4. recommendations, the quality of those recommendations were not good as is
shown in this graph. The LLR algorithm on SUCCESS data yields far better results
followed closely by the SPARK algorithm on SUCCESS data.
3. Number of incorrect product recommendations (False Positives):
As expected as a consequence of having a low true positive, the false positive of
running LLR with ALL data is significantly higher than other algorithms. Thus we
can see that though the algorithm yields most recommendations, most of them
are useless. We also notice that the false positive rate of LLR on SUCCESS data is
more than that of SPARK, and SPARK on Success data has the least false positive
rate, which is what is desired.
4. Accuracy / Precision
As seen from the above graph, when taken holistically, and the ratio of true
positives (useful recommendations) to false positives (useless
5. recommendations) is taken, the SPARK algorithm on SUCCESS data comes out a
clear winner.
Inference:
Hence we conclude that using only success data which is only a fifth of the total
data yields better quality results in both LLR and SPARK. The quality
improvement (percentage improvement) in LLR is significant when run only on
SUCCESS data, as compared to SPARK. Over all we see SPARK behaves
consistently irrespective of whether it is run on ALL data or SUCCESS data, with
the quality SPARK on SUCCESS data being marginally better. Hence since the
data set is significantly smaller, and the time taken to run these algorithms is
directly proportional to the data set, we see that running SPARK on SUCCESS
data yields the best results.
- Avinash Shenoi
Founder & Director - Instaclique
avinash@niyuj.com