SlideShare a Scribd company logo
1 of 20
User Profiling Engine
Predicting Potential Buyers On
E-Commerce Websites
&
Analysing Buyer Behaviour
Objective
Predict whether the user (a session) is going to buy
something or not.
If Buy is predicted, then also predict what items will
be bought in that session.
Data Set
The YOOCHOOSE dataset contain a collection of sessions from a retailer, where each
session is encapsulating the click events that the user performed in the session.
For some of the sessions, there are also buy events; means that the session ended with
the user bought something from the web shop.
The data was collected during several months in the year of 2014, reflecting the clicks
and purchases performed by the users of an online retailer in Europe.
Challenge Link: http://2015.recsyschallenge.com/challenge.html
Data Set Schema
Clicks Data Set
Session ID – the id of the session. In one session there are one or many clicks.
Timestamp – the time when the click occurred.
Item ID – the unique identifier of the clicked Item.
Category – the context of the click. The value "S" indicates a special offer, "0" indicates a
missing value, a number between 1 to 12 indicates a real category identifier, any other
number indicates a brand.
Data Set Schema
Buys Data Set
Session ID - the id of the session. In one session there are one or many buying events.
Timestamp - the time when the buy occurred. Format of YYYY-MM-DDThh:mm:ss.SSSZ
Item ID – the unique identifier of item that has been bought.
Price – the price of the item. Could be represented as an integer number.
Quantity – the quantity in this buying.
Data Set Schema
Test Data Set
Session ID – the id of the session. In one session there are one or many clicks.
Timestamp – the time when the click occurred.
Item ID – the unique identifier of the clicked Item.
Category – the context of the click. The value "S" indicates a special offer, "0" indicates a
missing value, a number between 1 to 12 indicates a real category identifier, any other
number indicates a brand.
Data Set Analysis
Data Set Analysis
Approach
Major Features Extraction
Buys Per Click
Items which have a high proportion of buys with respect to
clicks in the training set have a higher chance of being bought
in the test clicks as well.
Item Popularity
Globally popular items in the
training data have a higher
chance of being bought in the
test data as well. Items which
are purchased more than 10
times are taken into
consideration.
Hour Of Item Click
In the training data, hours
8-17 of a day have been
observed to have high
buys/click rate compared
to others. We have
categorized our hours by
serially numbering them
from 0 to 23.
Item Clicks
An item which receives
more number of clicks in
a session is more likely
to be bought by the user.
Classification
Random Forest Classifier
Group Of Decision Trees Built upon Randomly Sampled
Input Records .
Can Build Complex decision Regions.
Combining Classifier outputs helps in preventing Overfitting.
MOST IMPORTANTLY : Capable Of Dealing with
Imbalanced Datasets .
Evaluation Measure
The evaluation considers taking into consideration the ability to predict both aspects – whether the sessions
end with buying event, and what were the items that have been bought. Let’s define the following:
Sl – sessions in submitted solution file
S - All sessions in the test set
s – session in the test set
Sb – sessions in test set which end with buy
As – predicted bought items in session s
Bs – actual bought items in session s
then the score of a solution will be :
Results
Best Score Obtained - 45821
Following Features are used for training Model:
Item clicks
Buys/click
Popular items
Hour of Day
Month Of Year
Conclusion
Proper Features influence the score of prediction. Not all features are useful in determining
buyer behaviour and some of them may even prove to be detrimental.
Features that take too many values may actually overfit , it is better to bin the values into
fewer value sets .
Thanks
Submitted By -
Utkarsh Agarwal (201301184)
Viplav Sanghvi (201505573)
Lalit Kundu (201201062)
Mentor -
Anurag Tyagi

More Related Content

What's hot

Cmo workbench - Actionable Intelligence to Target New Age Customers
Cmo workbench - Actionable Intelligence to Target New Age CustomersCmo workbench - Actionable Intelligence to Target New Age Customers
Cmo workbench - Actionable Intelligence to Target New Age CustomersZensar Technologies Ltd.
 
Marketing & sales bckground- session 1
Marketing & sales bckground- session 1Marketing & sales bckground- session 1
Marketing & sales bckground- session 1Ahmed Roshdy
 
Over View of Manthan Retail Analytics
Over View of Manthan Retail AnalyticsOver View of Manthan Retail Analytics
Over View of Manthan Retail AnalyticsSatish Kumar Kondeti
 
Cross sell - concept and related analytics
Cross sell - concept and related analyticsCross sell - concept and related analytics
Cross sell - concept and related analyticsGaurav Sharma
 
Pharma Marketing Scopes by Bhavesh Mor
Pharma Marketing Scopes by Bhavesh MorPharma Marketing Scopes by Bhavesh Mor
Pharma Marketing Scopes by Bhavesh MorBhavesh Mor
 
Customer activation Predictive model
Customer activation Predictive model Customer activation Predictive model
Customer activation Predictive model Dipesh Patel
 

What's hot (6)

Cmo workbench - Actionable Intelligence to Target New Age Customers
Cmo workbench - Actionable Intelligence to Target New Age CustomersCmo workbench - Actionable Intelligence to Target New Age Customers
Cmo workbench - Actionable Intelligence to Target New Age Customers
 
Marketing & sales bckground- session 1
Marketing & sales bckground- session 1Marketing & sales bckground- session 1
Marketing & sales bckground- session 1
 
Over View of Manthan Retail Analytics
Over View of Manthan Retail AnalyticsOver View of Manthan Retail Analytics
Over View of Manthan Retail Analytics
 
Cross sell - concept and related analytics
Cross sell - concept and related analyticsCross sell - concept and related analytics
Cross sell - concept and related analytics
 
Pharma Marketing Scopes by Bhavesh Mor
Pharma Marketing Scopes by Bhavesh MorPharma Marketing Scopes by Bhavesh Mor
Pharma Marketing Scopes by Bhavesh Mor
 
Customer activation Predictive model
Customer activation Predictive model Customer activation Predictive model
Customer activation Predictive model
 

Viewers also liked

COMM consumer profile analysis
COMM consumer profile analysisCOMM consumer profile analysis
COMM consumer profile analysisbpfaller
 
Deriving concept based user profiles
Deriving concept based user profilesDeriving concept based user profiles
Deriving concept based user profilesShailaja Swami
 
What is a consumer profile in business
What is a consumer profile in businessWhat is a consumer profile in business
What is a consumer profile in businessNowMaster Academy
 
User Profiling and Technology
User Profiling and TechnologyUser Profiling and Technology
User Profiling and TechnologyJonathan Daniels
 
Converse Consumer Profile
Converse Consumer ProfileConverse Consumer Profile
Converse Consumer Profileguestf9083d
 
User Centered Design: Interviews & Surveys.
User Centered Design: Interviews & Surveys. User Centered Design: Interviews & Surveys.
User Centered Design: Interviews & Surveys. DCU_MPIUA
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebFabrizio Orlandi
 
How to Create the Perfect Customer Profile
How to Create the Perfect Customer ProfileHow to Create the Perfect Customer Profile
How to Create the Perfect Customer ProfileMarketscan
 
All ski skiing_cps
All ski skiing_cpsAll ski skiing_cps
All ski skiing_cpsBen Rifkin
 
Customer Profile Template
Customer Profile TemplateCustomer Profile Template
Customer Profile TemplateDemand Metric
 
Defining Personas, A User Experience Approach
Defining Personas, A User Experience ApproachDefining Personas, A User Experience Approach
Defining Personas, A User Experience ApproachLeon Kadoch Hardie
 
Energy Drink Consumer Research Proposal
Energy Drink Consumer Research ProposalEnergy Drink Consumer Research Proposal
Energy Drink Consumer Research ProposalStephen Zoeller, MBA
 
Design Thinking With Persona
Design Thinking With PersonaDesign Thinking With Persona
Design Thinking With PersonaFranki Chamaki
 
Zara marketing plan
Zara marketing planZara marketing plan
Zara marketing planAwais Alii
 

Viewers also liked (16)

COMM consumer profile analysis
COMM consumer profile analysisCOMM consumer profile analysis
COMM consumer profile analysis
 
Deriving concept based user profiles
Deriving concept based user profilesDeriving concept based user profiles
Deriving concept based user profiles
 
What is a consumer profile in business
What is a consumer profile in businessWhat is a consumer profile in business
What is a consumer profile in business
 
User Profiling and Technology
User Profiling and TechnologyUser Profiling and Technology
User Profiling and Technology
 
Converse Consumer Profile
Converse Consumer ProfileConverse Consumer Profile
Converse Consumer Profile
 
User Centered Design: Interviews & Surveys.
User Centered Design: Interviews & Surveys. User Centered Design: Interviews & Surveys.
User Centered Design: Interviews & Surveys.
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic Web
 
How to Create the Perfect Customer Profile
How to Create the Perfect Customer ProfileHow to Create the Perfect Customer Profile
How to Create the Perfect Customer Profile
 
All ski skiing_cps
All ski skiing_cpsAll ski skiing_cps
All ski skiing_cps
 
Customer Profile Template
Customer Profile TemplateCustomer Profile Template
Customer Profile Template
 
Personas
PersonasPersonas
Personas
 
Defining Personas, A User Experience Approach
Defining Personas, A User Experience ApproachDefining Personas, A User Experience Approach
Defining Personas, A User Experience Approach
 
Energy Drink Consumer Research Proposal
Energy Drink Consumer Research ProposalEnergy Drink Consumer Research Proposal
Energy Drink Consumer Research Proposal
 
Design Thinking With Persona
Design Thinking With PersonaDesign Thinking With Persona
Design Thinking With Persona
 
Zara marketing plan
Zara marketing planZara marketing plan
Zara marketing plan
 
Creating Customer Profiles
Creating Customer ProfilesCreating Customer Profiles
Creating Customer Profiles
 

Similar to Ire Major Project - User profiling engine

How Data Science can increase Ecommerce profits
How Data Science can increase Ecommerce profitsHow Data Science can increase Ecommerce profits
How Data Science can increase Ecommerce profitsRomexsoft
 
Predicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsPredicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsArmando Vieira
 
Fantastic Metrics (and where to find them)
Fantastic Metrics (and where to find them)Fantastic Metrics (and where to find them)
Fantastic Metrics (and where to find them)Scout Digital Marketing
 
[DSC Europe 23][AICommerce]Ratko Nikolic Fashion-forward Transforming E-Comme...
[DSC Europe 23][AICommerce]Ratko Nikolic Fashion-forward Transforming E-Comme...[DSC Europe 23][AICommerce]Ratko Nikolic Fashion-forward Transforming E-Comme...
[DSC Europe 23][AICommerce]Ratko Nikolic Fashion-forward Transforming E-Comme...DataScienceConferenc1
 
Trade smart case studies
Trade smart case studiesTrade smart case studies
Trade smart case studiesKristy Weiss
 
Trade smart case studies
Trade smart case studiesTrade smart case studies
Trade smart case studiesKristy Weiss
 
The twin goals of customer research: inspire designers, persuade stakeholders
The twin goals of customer research: inspire designers, persuade stakeholdersThe twin goals of customer research: inspire designers, persuade stakeholders
The twin goals of customer research: inspire designers, persuade stakeholdersRashmi Sinha
 
Digital Age & Its Effects On The Marketing model, Consumer Decision Journey a...
Digital Age & Its Effects On The Marketing model, Consumer Decision Journey a...Digital Age & Its Effects On The Marketing model, Consumer Decision Journey a...
Digital Age & Its Effects On The Marketing model, Consumer Decision Journey a...Apoorv Pandey
 
Segmentation for maximum output
Segmentation for maximum outputSegmentation for maximum output
Segmentation for maximum outputThe Reference
 
Lead Scoring Aligning Sales Marketing
Lead Scoring Aligning Sales MarketingLead Scoring Aligning Sales Marketing
Lead Scoring Aligning Sales MarketingSilverpop
 
Module six measurement final
Module six measurement finalModule six measurement final
Module six measurement finalLucy Sutton
 
Explicato Company Overview
Explicato Company OverviewExplicato Company Overview
Explicato Company OverviewGeorge Yankov
 
Marketing L5: Marketing Research & Guest Speaker
Marketing L5: Marketing Research & Guest SpeakerMarketing L5: Marketing Research & Guest Speaker
Marketing L5: Marketing Research & Guest SpeakerAhmed Eid
 
Big Data and Social CRM
Big Data and Social CRMBig Data and Social CRM
Big Data and Social CRMMichel Bruley
 
Practical ways to use dynamic recommendations
Practical ways to use dynamic recommendationsPractical ways to use dynamic recommendations
Practical ways to use dynamic recommendationsYesLifecycleMarketing
 

Similar to Ire Major Project - User profiling engine (20)

How Data Science can increase Ecommerce profits
How Data Science can increase Ecommerce profitsHow Data Science can increase Ecommerce profits
How Data Science can increase Ecommerce profits
 
Predicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsPredicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithms
 
Fantastic Metrics (and where to find them)
Fantastic Metrics (and where to find them)Fantastic Metrics (and where to find them)
Fantastic Metrics (and where to find them)
 
TradeSmart Case Studies
TradeSmart Case StudiesTradeSmart Case Studies
TradeSmart Case Studies
 
[DSC Europe 23][AICommerce]Ratko Nikolic Fashion-forward Transforming E-Comme...
[DSC Europe 23][AICommerce]Ratko Nikolic Fashion-forward Transforming E-Comme...[DSC Europe 23][AICommerce]Ratko Nikolic Fashion-forward Transforming E-Comme...
[DSC Europe 23][AICommerce]Ratko Nikolic Fashion-forward Transforming E-Comme...
 
Trade smart case studies
Trade smart case studiesTrade smart case studies
Trade smart case studies
 
Trade smart case studies
Trade smart case studiesTrade smart case studies
Trade smart case studies
 
Datawiz.io case study
Datawiz.io case studyDatawiz.io case study
Datawiz.io case study
 
The twin goals of customer research: inspire designers, persuade stakeholders
The twin goals of customer research: inspire designers, persuade stakeholdersThe twin goals of customer research: inspire designers, persuade stakeholders
The twin goals of customer research: inspire designers, persuade stakeholders
 
Data Mining
Data MiningData Mining
Data Mining
 
Digital Age & Its Effects On The Marketing model, Consumer Decision Journey a...
Digital Age & Its Effects On The Marketing model, Consumer Decision Journey a...Digital Age & Its Effects On The Marketing model, Consumer Decision Journey a...
Digital Age & Its Effects On The Marketing model, Consumer Decision Journey a...
 
Segmentation for maximum output
Segmentation for maximum outputSegmentation for maximum output
Segmentation for maximum output
 
Lead Scoring Aligning Sales Marketing
Lead Scoring Aligning Sales MarketingLead Scoring Aligning Sales Marketing
Lead Scoring Aligning Sales Marketing
 
Module six measurement final
Module six measurement finalModule six measurement final
Module six measurement final
 
Explicato Company Overview
Explicato Company OverviewExplicato Company Overview
Explicato Company Overview
 
Marketing L5: Marketing Research & Guest Speaker
Marketing L5: Marketing Research & Guest SpeakerMarketing L5: Marketing Research & Guest Speaker
Marketing L5: Marketing Research & Guest Speaker
 
Segmentation
SegmentationSegmentation
Segmentation
 
Segmentation
SegmentationSegmentation
Segmentation
 
Big Data and Social CRM
Big Data and Social CRMBig Data and Social CRM
Big Data and Social CRM
 
Practical ways to use dynamic recommendations
Practical ways to use dynamic recommendationsPractical ways to use dynamic recommendations
Practical ways to use dynamic recommendations
 

Recently uploaded

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Ire Major Project - User profiling engine

  • 1. User Profiling Engine Predicting Potential Buyers On E-Commerce Websites & Analysing Buyer Behaviour
  • 2. Objective Predict whether the user (a session) is going to buy something or not. If Buy is predicted, then also predict what items will be bought in that session.
  • 3. Data Set The YOOCHOOSE dataset contain a collection of sessions from a retailer, where each session is encapsulating the click events that the user performed in the session. For some of the sessions, there are also buy events; means that the session ended with the user bought something from the web shop. The data was collected during several months in the year of 2014, reflecting the clicks and purchases performed by the users of an online retailer in Europe. Challenge Link: http://2015.recsyschallenge.com/challenge.html
  • 4. Data Set Schema Clicks Data Set Session ID – the id of the session. In one session there are one or many clicks. Timestamp – the time when the click occurred. Item ID – the unique identifier of the clicked Item. Category – the context of the click. The value "S" indicates a special offer, "0" indicates a missing value, a number between 1 to 12 indicates a real category identifier, any other number indicates a brand.
  • 5. Data Set Schema Buys Data Set Session ID - the id of the session. In one session there are one or many buying events. Timestamp - the time when the buy occurred. Format of YYYY-MM-DDThh:mm:ss.SSSZ Item ID – the unique identifier of item that has been bought. Price – the price of the item. Could be represented as an integer number. Quantity – the quantity in this buying.
  • 6. Data Set Schema Test Data Set Session ID – the id of the session. In one session there are one or many clicks. Timestamp – the time when the click occurred. Item ID – the unique identifier of the clicked Item. Category – the context of the click. The value "S" indicates a special offer, "0" indicates a missing value, a number between 1 to 12 indicates a real category identifier, any other number indicates a brand.
  • 11. Buys Per Click Items which have a high proportion of buys with respect to clicks in the training set have a higher chance of being bought in the test clicks as well.
  • 12. Item Popularity Globally popular items in the training data have a higher chance of being bought in the test data as well. Items which are purchased more than 10 times are taken into consideration.
  • 13. Hour Of Item Click In the training data, hours 8-17 of a day have been observed to have high buys/click rate compared to others. We have categorized our hours by serially numbering them from 0 to 23.
  • 14. Item Clicks An item which receives more number of clicks in a session is more likely to be bought by the user.
  • 16. Random Forest Classifier Group Of Decision Trees Built upon Randomly Sampled Input Records . Can Build Complex decision Regions. Combining Classifier outputs helps in preventing Overfitting. MOST IMPORTANTLY : Capable Of Dealing with Imbalanced Datasets .
  • 17. Evaluation Measure The evaluation considers taking into consideration the ability to predict both aspects – whether the sessions end with buying event, and what were the items that have been bought. Let’s define the following: Sl – sessions in submitted solution file S - All sessions in the test set s – session in the test set Sb – sessions in test set which end with buy As – predicted bought items in session s Bs – actual bought items in session s then the score of a solution will be :
  • 18. Results Best Score Obtained - 45821 Following Features are used for training Model: Item clicks Buys/click Popular items Hour of Day Month Of Year
  • 19. Conclusion Proper Features influence the score of prediction. Not all features are useful in determining buyer behaviour and some of them may even prove to be detrimental. Features that take too many values may actually overfit , it is better to bin the values into fewer value sets .
  • 20. Thanks Submitted By - Utkarsh Agarwal (201301184) Viplav Sanghvi (201505573) Lalit Kundu (201201062) Mentor - Anurag Tyagi