SlideShare a Scribd company logo
1 of 9
Data Mining Cup 2013
Task 2
The Uni_Budapest_Te_1 solution
Gábor Simon
Data Preparation
Several numeric variables with missing values
Missing values were very informative
Solution:
• Discretize numeric variables into categories
• Missing values will be a separate category
Modeling
Machine learning library: Weka 3.7
Algorithm: Stochastic Gradient Descent
• Offline learning: pre-train on Task 1 data
• Online learning: keep learning during evaluation
5 pre-trained models packaged in our solution
Modeling
Transactions at different points in the session are
very different
To exploit this, use separate models for different
parts of sessions
Modeling
Transactions at different points in the session are
very different
Evaluation
Performance test on Task 1 data
Train on 70 % of sessions, evaluate on 30 %
0
10000
20000
30000
40000
50000
60000
70000
1 2 3 4, 5, 6 7+
Number of instances
Evaluation
Performance test on Task 1 data
50%
55%
60%
65%
70%
75%
80%
1 2 3 4, 5, 6 7+
Benchmark Model
Evaluation
Good predictive power for first few steps
After that, slightly better than benchmark
50%
55%
60%
65%
70%
75%
80%
1 2 3 4, 5, 6 7+
Benchmark Model
Summary
The key points of our solution:
• Recode numerical variables with missing values
into categories
• Use pretrained – but updateable – models
• Use different models for different parts of
sessions

More Related Content

Similar to DMC2013 Task 2

Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Aspiring Minds | Automata
Aspiring Minds | Automata Aspiring Minds | Automata
Aspiring Minds | Automata Aspiring Minds
 
Talk at VL/HCC13
Talk at VL/HCC13Talk at VL/HCC13
Talk at VL/HCC13Rui Pereira
 
20160423EdinburghPresentationV1
20160423EdinburghPresentationV120160423EdinburghPresentationV1
20160423EdinburghPresentationV1Jan Hellings
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Lviv Startup Club
 
03 machine learning workflow v2
03 machine learning workflow v203 machine learning workflow v2
03 machine learning workflow v2Anne Starr
 
Agile businessprocessdevelopmentmethodology
Agile businessprocessdevelopmentmethodologyAgile businessprocessdevelopmentmethodology
Agile businessprocessdevelopmentmethodologyDavut Çulha
 
Business processeffortestimation
Business processeffortestimationBusiness processeffortestimation
Business processeffortestimationDavut Çulha
 
Agile businessprocessdevelopmentmethodology
Agile businessprocessdevelopmentmethodologyAgile businessprocessdevelopmentmethodology
Agile businessprocessdevelopmentmethodologyDavut Çulha
 
Sequencing ofusecases
Sequencing ofusecasesSequencing ofusecases
Sequencing ofusecasesDavut Çulha
 
Training thecustomer
Training thecustomerTraining thecustomer
Training thecustomerDavut Çulha
 

Similar to DMC2013 Task 2 (20)

Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Aspiring Minds | Automata
Aspiring Minds | Automata Aspiring Minds | Automata
Aspiring Minds | Automata
 
Talk at VL/HCC13
Talk at VL/HCC13Talk at VL/HCC13
Talk at VL/HCC13
 
20160423EdinburghPresentationV1
20160423EdinburghPresentationV120160423EdinburghPresentationV1
20160423EdinburghPresentationV1
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
 
03 machine learning workflow v2
03 machine learning workflow v203 machine learning workflow v2
03 machine learning workflow v2
 
Culha methodology
Culha methodologyCulha methodology
Culha methodology
 
Agile businessprocessdevelopmentmethodology
Agile businessprocessdevelopmentmethodologyAgile businessprocessdevelopmentmethodology
Agile businessprocessdevelopmentmethodology
 
Agile bpm
Agile bpmAgile bpm
Agile bpm
 
Business processeffortestimation
Business processeffortestimationBusiness processeffortestimation
Business processeffortestimation
 
Agile bpsdm
Agile bpsdmAgile bpsdm
Agile bpsdm
 
Keeping history
Keeping historyKeeping history
Keeping history
 
Iteration base
Iteration baseIteration base
Iteration base
 
Agile bpsdm
Agile bpsdmAgile bpsdm
Agile bpsdm
 
Agile businessprocessdevelopmentmethodology
Agile businessprocessdevelopmentmethodologyAgile businessprocessdevelopmentmethodology
Agile businessprocessdevelopmentmethodology
 
Sequencing ofusecases
Sequencing ofusecasesSequencing ofusecases
Sequencing ofusecases
 
Training thecustomer
Training thecustomerTraining thecustomer
Training thecustomer
 
Site inspection
Site inspectionSite inspection
Site inspection
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

DMC2013 Task 2

  • 1. Data Mining Cup 2013 Task 2 The Uni_Budapest_Te_1 solution Gábor Simon
  • 2. Data Preparation Several numeric variables with missing values Missing values were very informative Solution: • Discretize numeric variables into categories • Missing values will be a separate category
  • 3. Modeling Machine learning library: Weka 3.7 Algorithm: Stochastic Gradient Descent • Offline learning: pre-train on Task 1 data • Online learning: keep learning during evaluation 5 pre-trained models packaged in our solution
  • 4. Modeling Transactions at different points in the session are very different To exploit this, use separate models for different parts of sessions
  • 5. Modeling Transactions at different points in the session are very different
  • 6. Evaluation Performance test on Task 1 data Train on 70 % of sessions, evaluate on 30 % 0 10000 20000 30000 40000 50000 60000 70000 1 2 3 4, 5, 6 7+ Number of instances
  • 7. Evaluation Performance test on Task 1 data 50% 55% 60% 65% 70% 75% 80% 1 2 3 4, 5, 6 7+ Benchmark Model
  • 8. Evaluation Good predictive power for first few steps After that, slightly better than benchmark 50% 55% 60% 65% 70% 75% 80% 1 2 3 4, 5, 6 7+ Benchmark Model
  • 9. Summary The key points of our solution: • Recode numerical variables with missing values into categories • Use pretrained – but updateable – models • Use different models for different parts of sessions