SlideShare a Scribd company logo
1 of 8
Predictive analytics will boom
your E-Commerce business
Background:
In this paper we compare two alternate machine-learning techniques
from the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity,
and its counterpart Apache Hadoop’s MapReduce. We compare these both
qualitatively as well as quantitatively in the context of two ecommerce stores
with different behaviour to determine which one is more effective and efficient
in a given context.
Subjects:
• The subjects under test are two ecommerce stores.
• Sample Store 1:
– Traffic: Approximately 3 M unique visitors per month
– Transactions: 2500 transactions daily
• Sample Store 2:
– Traffic: Approximately 1 M unique visitors per month
– Transactions: 250 Transactions daily.
Data Gathering and setup:
• Relevant click stream data for both subjects was collected. This constitutes
user behaviour, namely view and buy. Based on this, predictive analytics for
item-similarity was run using the Apache Spark and Apace Hadoop map
reduce Log Likelihood in both cases. The subjects were observed for 1 week
to gather both quantitative and qualitative results.
Quantitative Analysis:
• We gathered data for both stores and plotted the
following data points hourly for a one-week period.
That explains the peaks and troughs where activity
goes down at night and peaks during the day.
• Total products viewed ( blue )
• Recommendation available from Apace Hadoop
mapreduce log likelihood (LLR ) ( red )
• Recommendations available from Apache
SPARK (Spark ) ( grey )
Observations
Sample store 1
Sample store 2
• In the case (Sample store 2) where we have lower
transactions and lower visitors, we see that Spark yields far
fewer results (i.e. recommendations) than in the case (Sample
store 1) where there are large number of transactions and
more traffic. We see that in (Sample store 1) the total product
views, the total products for which we have recommendations
from LLR and recommendations from SPARK are almost
identical, which shows we have recommendations for almost
all products that are viewed both using Spark as well as LLR. In
Sample store 2, we see that the total product views and the
total products for which we have recommendations from LLR
are almost identical, but the recommendations from Spark lag
behind significantly.
• Inference:
• Hence we conclude that quantitatively if the there are large
number of transactions then quantitatively Spark and LLR are
almost equivalent in terms of the number of
recommendations they yield.
Qualitative Analysis:
• We gathered data for both stores and plotted the
following data points hourly for a one-week period.
• Total products bought ( purple )
• Products that were recommended by Apace Hadoop
mapreduce log likelihood (LLR ) that were bought (
Blue )
• Products that were recommended by Apace Spark
(Spark) that were bought ( grey )
Observations:
• Sample Store 1:
Sample store 2:
We see that in (Sample store 1) the total product buys, and the total products which were
recommended by SPARK and bought are almost equal, which suggests that most buys were for
products that were recommended by Spark. However products recommended by LLR which
were bought lag behind significantly.
We see that in (Sample store 2) the total product buys, and the total products, which were
recommended by SPARK and LLR and bought, are further apart than in Sample store1, which
suggests that most buys were for products that were not recommended by Spark or LLR. We
also see that while spark still does marginally better than LLR, both are comparable, and deviate
from the products that were actually bought.
Inference:
Hence we conclude that qualitatively if the there are large number of transactions then
qualitatively Spark is significantly better than LLR, and almost all products that are
recommended by Spark are also bought. LLR lags behind significantly.
When there are lesser transactions, we see that Spark is still marginally better than LLR
qualitatively, but products that are actually bought, are different from the ones that are
recommended by both Spark and LLR.
Author:
Avinash Shenoi
Founder & Director at Instaclique
Contact at: avinash@niyuj.com

More Related Content

Similar to Predictive analytics

Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Shani729
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysisVermaAkash32
 
Deploying Splunk. Arquitetura e dimensionamento do Splunk
Deploying Splunk. Arquitetura e dimensionamento do SplunkDeploying Splunk. Arquitetura e dimensionamento do Splunk
Deploying Splunk. Arquitetura e dimensionamento do SplunkSplunk
 
Karna.AI - Automated Retail Shelf Monitoring
Karna.AI - Automated Retail Shelf MonitoringKarna.AI - Automated Retail Shelf Monitoring
Karna.AI - Automated Retail Shelf MonitoringParallelDots ShelfWatch
 
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchII-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchDr. Haxel Consult
 
Internet piracy and book sales. A field experiment.
Internet piracy and book sales. A field experiment.Internet piracy and book sales. A field experiment.
Internet piracy and book sales. A field experiment.GRAPE
 
Tutorial 12 (click models)
Tutorial 12 (click models)Tutorial 12 (click models)
Tutorial 12 (click models)Kira
 
Assortment Planning for P&G
Assortment Planning for P&GAssortment Planning for P&G
Assortment Planning for P&GLee Saeugling
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014gmalouf678
 
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlibDiscovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlibJoel Pinho Lucas
 

Similar to Predictive analytics (14)

Predictive analytics for E-commerce
Predictive analytics for E-commerce Predictive analytics for E-commerce
Predictive analytics for E-commerce
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Deploying Splunk. Arquitetura e dimensionamento do Splunk
Deploying Splunk. Arquitetura e dimensionamento do SplunkDeploying Splunk. Arquitetura e dimensionamento do Splunk
Deploying Splunk. Arquitetura e dimensionamento do Splunk
 
Karna.AI - Automated Retail Shelf Monitoring
Karna.AI - Automated Retail Shelf MonitoringKarna.AI - Automated Retail Shelf Monitoring
Karna.AI - Automated Retail Shelf Monitoring
 
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchII-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
 
Internet piracy and book sales. A field experiment.
Internet piracy and book sales. A field experiment.Internet piracy and book sales. A field experiment.
Internet piracy and book sales. A field experiment.
 
Tutorial 12 (click models)
Tutorial 12 (click models)Tutorial 12 (click models)
Tutorial 12 (click models)
 
Assortment Planning for P&G
Assortment Planning for P&GAssortment Planning for P&G
Assortment Planning for P&G
 
Olapin oracle
Olapin oracleOlapin oracle
Olapin oracle
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
 
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlibDiscovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
 

Recently uploaded

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Recently uploaded (20)

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Predictive analytics

  • 1. Predictive analytics will boom your E-Commerce business
  • 2. Background: In this paper we compare two alternate machine-learning techniques from the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its counterpart Apache Hadoop’s MapReduce. We compare these both qualitatively as well as quantitatively in the context of two ecommerce stores with different behaviour to determine which one is more effective and efficient in a given context. Subjects: • The subjects under test are two ecommerce stores. • Sample Store 1: – Traffic: Approximately 3 M unique visitors per month – Transactions: 2500 transactions daily • Sample Store 2: – Traffic: Approximately 1 M unique visitors per month – Transactions: 250 Transactions daily. Data Gathering and setup: • Relevant click stream data for both subjects was collected. This constitutes user behaviour, namely view and buy. Based on this, predictive analytics for item-similarity was run using the Apache Spark and Apace Hadoop map reduce Log Likelihood in both cases. The subjects were observed for 1 week to gather both quantitative and qualitative results.
  • 3. Quantitative Analysis: • We gathered data for both stores and plotted the following data points hourly for a one-week period. That explains the peaks and troughs where activity goes down at night and peaks during the day. • Total products viewed ( blue ) • Recommendation available from Apace Hadoop mapreduce log likelihood (LLR ) ( red ) • Recommendations available from Apache SPARK (Spark ) ( grey )
  • 5. • In the case (Sample store 2) where we have lower transactions and lower visitors, we see that Spark yields far fewer results (i.e. recommendations) than in the case (Sample store 1) where there are large number of transactions and more traffic. We see that in (Sample store 1) the total product views, the total products for which we have recommendations from LLR and recommendations from SPARK are almost identical, which shows we have recommendations for almost all products that are viewed both using Spark as well as LLR. In Sample store 2, we see that the total product views and the total products for which we have recommendations from LLR are almost identical, but the recommendations from Spark lag behind significantly. • Inference: • Hence we conclude that quantitatively if the there are large number of transactions then quantitatively Spark and LLR are almost equivalent in terms of the number of recommendations they yield.
  • 6. Qualitative Analysis: • We gathered data for both stores and plotted the following data points hourly for a one-week period. • Total products bought ( purple ) • Products that were recommended by Apace Hadoop mapreduce log likelihood (LLR ) that were bought ( Blue ) • Products that were recommended by Apace Spark (Spark) that were bought ( grey ) Observations: • Sample Store 1:
  • 7. Sample store 2: We see that in (Sample store 1) the total product buys, and the total products which were recommended by SPARK and bought are almost equal, which suggests that most buys were for products that were recommended by Spark. However products recommended by LLR which were bought lag behind significantly. We see that in (Sample store 2) the total product buys, and the total products, which were recommended by SPARK and LLR and bought, are further apart than in Sample store1, which suggests that most buys were for products that were not recommended by Spark or LLR. We also see that while spark still does marginally better than LLR, both are comparable, and deviate from the products that were actually bought. Inference: Hence we conclude that qualitatively if the there are large number of transactions then qualitatively Spark is significantly better than LLR, and almost all products that are recommended by Spark are also bought. LLR lags behind significantly. When there are lesser transactions, we see that Spark is still marginally better than LLR qualitatively, but products that are actually bought, are different from the ones that are recommended by both Spark and LLR.
  • 8. Author: Avinash Shenoi Founder & Director at Instaclique Contact at: avinash@niyuj.com