SlideShare a Scribd company logo
CODREG
A
ECKOVATION MACHINE
LEARNINGCHALLENGE 7 : AVITO DEMAND PREDICTION
TEAM
DIVYANSHU GOEL
00851202716
TARUSHEE KUMAR
35551202716
RAHUL SRIVASTAVA
02651202716
MANAV GUPTA
01751202716
W E L C O M E
Avito’s challenge is to predict demand for
an online advertisement based on its full
description (title, description, images,
etc.), its context (geographically where it
was posted, similar ads already posted)
and historical demand for similar ads in
similar contexts.
Note: Since the dataset was too large, all
the work was done on Google Cloud
D A T A S E T
Provided by Avito, Russia’s largest classified
advertisements website.
Size of dataset = 80 GB.
D A T A S E T
• item_id - Ad id.
• user_id - User id.
• region - Ad region.
• city - Ad city.
• parent_category_name - Top level ad category as classified by Avito's ad
model.
• category_name - Fine grain ad category as classified by Avito's ad model.
• param_1 - Optional parameter from Avito's ad model.
• param_2 - Optional parameter from Avito's ad model.
• param_3 - Optional parameter from Avito's ad model.
• title - Ad title.
• description - Ad description.
• price - Ad price.
• item_seq_number - Ad sequential number for user.
• activation_date- Date ad was placed.
• user_type - User type.
• image - Id code of image.
• image_top_1 - Avito's classification code for the image.
O B J E C T I V E S
1
2
3
4DATA
ANALYSISFeatures are
analyzed and
visualized for data
refining
DATA REFINING
Unimportant
features are
removed and are
converted to
integers
MODEL
CREATIONDifferent models
were created to
test accuracy
ML
ALGORITHMSAlgorithm were
applied to increase
accuracy
D A T A V I S U A L I S A T I
O N
There are a
lot of cheap
items.
Deal
Probability
reduces as
Low prices
have higher
deal_probabili
C A T E G O R I S A T I O N
• region = 28
• city = 1022
• parent_category_name = 9
• category_name = 47
• user_type = 3
• image_top_1 = 2774
C A T E G O R I S A T I O N
C A T E G O R I S A T I O N
R E F I N I N G
• Null values in price were exchanged by the categorical means.
• image column contains image id of the AD and hence was dropped after
the images were joined to the final dataset file.
• Images were compressed from different sizes to 32x32 pixel size.
• They were converted to Black and White
• Approximately, 50GB of images were reduced to 11GB and stored in
an array of length 1024 in a pickle file.
• Rows which do not have images were given 0 as their pixel
information.
R E F I N I N G
• description was not analysed due to time constraints and was dropped.
• Stop words would have been removed.
• Each word would have been tokenized in description.
• Most common words would have been removed.
• Dummies would have been created for each word.
• user_type contains 3 unique set of values (Private, Shop and Company)
hence dummies were created.
• user_type was dropped.
• Shop was dropped.
• item_id was unique for every row and hence was dropped.
• Null values in param_1, param_2, param_3 were given a unique set of
values (missing).
R E F I N I N G
We tried to translate the language of data from Russian to
English using the GoogleTranslateAPI. The data was not
translated as the API is paid after some translations and time
constraints.
P R E – P R O C E S S I N G
• All the data (string type) was assigned a unique ID (integer).
• This ID was stored in the dictionary and later in a JSON File for future
mapping of data.
• The columns changed were user_type (Private and Company), region,
city, category_name, image_top_1, parent_category_name.
P R E – P R O C E S S I N G
The final data-frame was made with:
1. 15,03,424 rows x 1,040 columns
2. 8.5 GB CSV File
3. 11.2 GB Feather File
The data was too large to handle at once so was split into 15 CSV Files of approx. 566 MB
containing 1,00,000 rows each.
L I N E A R – R E G R E S S I O
N
MODEL
INITIALIZATION,
TRAIN,
SCORE &
ROOT MEAN
SQUARE ERROR:
D N N - R E G R E S S O R
MODEL INITIALIZATION :
D N N - R E G R E S S O R
TRAINING :
D N N - R E G R E S S O R
EVALUATION :
D N N - R E G R E S S O R
SAVE MODEL:
T H A N K S F O R W A T C H I
N G
CODREGA@GMAIL.CO
M

More Related Content

Similar to Avito Demand Prediction Challenge - Kaggle

Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Andre Essing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksHow to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksAmazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Savvas Chatzichristofis
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
 
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016Amazon Web Services Korea
 
Apdm 101 Arc Gis Pipeline Data Model (1)
Apdm 101 Arc Gis Pipeline Data Model  (1)Apdm 101 Arc Gis Pipeline Data Model  (1)
Apdm 101 Arc Gis Pipeline Data Model (1)David Nichter, GISP
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2Neo4j
 
UNIT 5 CAD STANDARDS -GOoGLE.pdf
UNIT 5 CAD STANDARDS -GOoGLE.pdfUNIT 5 CAD STANDARDS -GOoGLE.pdf
UNIT 5 CAD STANDARDS -GOoGLE.pdfDURAIMURUGANM2
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
 

Similar to Avito Demand Prediction Challenge - Kaggle (20)

DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
 
Amazon DynamoDB Design Workshop
Amazon DynamoDB Design WorkshopAmazon DynamoDB Design Workshop
Amazon DynamoDB Design Workshop
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Data Access Patterns
Data Access PatternsData Access Patterns
Data Access Patterns
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksHow to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
 
Apdm 101 Arc Gis Pipeline Data Model (1)
Apdm 101 Arc Gis Pipeline Data Model  (1)Apdm 101 Arc Gis Pipeline Data Model  (1)
Apdm 101 Arc Gis Pipeline Data Model (1)
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
UNIT 5 CAD STANDARDS -GOoGLE.pdf
UNIT 5 CAD STANDARDS -GOoGLE.pdfUNIT 5 CAD STANDARDS -GOoGLE.pdf
UNIT 5 CAD STANDARDS -GOoGLE.pdf
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Avito Demand Prediction Challenge - Kaggle

  • 2. TEAM DIVYANSHU GOEL 00851202716 TARUSHEE KUMAR 35551202716 RAHUL SRIVASTAVA 02651202716 MANAV GUPTA 01751202716
  • 3. W E L C O M E Avito’s challenge is to predict demand for an online advertisement based on its full description (title, description, images, etc.), its context (geographically where it was posted, similar ads already posted) and historical demand for similar ads in similar contexts. Note: Since the dataset was too large, all the work was done on Google Cloud
  • 4. D A T A S E T Provided by Avito, Russia’s largest classified advertisements website. Size of dataset = 80 GB.
  • 5. D A T A S E T • item_id - Ad id. • user_id - User id. • region - Ad region. • city - Ad city. • parent_category_name - Top level ad category as classified by Avito's ad model. • category_name - Fine grain ad category as classified by Avito's ad model. • param_1 - Optional parameter from Avito's ad model. • param_2 - Optional parameter from Avito's ad model. • param_3 - Optional parameter from Avito's ad model. • title - Ad title. • description - Ad description. • price - Ad price. • item_seq_number - Ad sequential number for user. • activation_date- Date ad was placed. • user_type - User type. • image - Id code of image. • image_top_1 - Avito's classification code for the image.
  • 6. O B J E C T I V E S 1 2 3 4DATA ANALYSISFeatures are analyzed and visualized for data refining DATA REFINING Unimportant features are removed and are converted to integers MODEL CREATIONDifferent models were created to test accuracy ML ALGORITHMSAlgorithm were applied to increase accuracy
  • 7. D A T A V I S U A L I S A T I O N There are a lot of cheap items. Deal Probability reduces as Low prices have higher deal_probabili
  • 8. C A T E G O R I S A T I O N • region = 28 • city = 1022 • parent_category_name = 9 • category_name = 47 • user_type = 3 • image_top_1 = 2774
  • 9. C A T E G O R I S A T I O N
  • 10. C A T E G O R I S A T I O N
  • 11. R E F I N I N G • Null values in price were exchanged by the categorical means. • image column contains image id of the AD and hence was dropped after the images were joined to the final dataset file. • Images were compressed from different sizes to 32x32 pixel size. • They were converted to Black and White • Approximately, 50GB of images were reduced to 11GB and stored in an array of length 1024 in a pickle file. • Rows which do not have images were given 0 as their pixel information.
  • 12. R E F I N I N G • description was not analysed due to time constraints and was dropped. • Stop words would have been removed. • Each word would have been tokenized in description. • Most common words would have been removed. • Dummies would have been created for each word. • user_type contains 3 unique set of values (Private, Shop and Company) hence dummies were created. • user_type was dropped. • Shop was dropped. • item_id was unique for every row and hence was dropped. • Null values in param_1, param_2, param_3 were given a unique set of values (missing).
  • 13. R E F I N I N G We tried to translate the language of data from Russian to English using the GoogleTranslateAPI. The data was not translated as the API is paid after some translations and time constraints.
  • 14. P R E – P R O C E S S I N G • All the data (string type) was assigned a unique ID (integer). • This ID was stored in the dictionary and later in a JSON File for future mapping of data. • The columns changed were user_type (Private and Company), region, city, category_name, image_top_1, parent_category_name.
  • 15. P R E – P R O C E S S I N G The final data-frame was made with: 1. 15,03,424 rows x 1,040 columns 2. 8.5 GB CSV File 3. 11.2 GB Feather File The data was too large to handle at once so was split into 15 CSV Files of approx. 566 MB containing 1,00,000 rows each.
  • 16. L I N E A R – R E G R E S S I O N MODEL INITIALIZATION, TRAIN, SCORE & ROOT MEAN SQUARE ERROR:
  • 17. D N N - R E G R E S S O R MODEL INITIALIZATION :
  • 18. D N N - R E G R E S S O R TRAINING :
  • 19. D N N - R E G R E S S O R EVALUATION :
  • 20. D N N - R E G R E S S O R SAVE MODEL:
  • 21. T H A N K S F O R W A T C H I N G CODREGA@GMAIL.CO M