SlideShare a Scribd company logo
{
Kaggle <- BikeShare
Taposh Dutta Roy
Jan 26th 2015
Presented at YHAT, Oakland, CA
Contents
• About Bikeshare
• Data
• Tools
• R
• Factor Engineering
• Matrix
• Random Forest
• Neural Nets
About Bike Share
Competition: http://www.kaggle.com/c/bike-
sharing-demand
Challenge:
Forecast use of a city’s bike share system
Data :
UCI Machine Learning Repository
Publication :
Fanaee-T, Hadi, and Gama, Joao, Event
labeling combining ensemble detectors
and background knowledge, Progress in
Artificial Intelligence (2013): pp. 1-15,
Springer Berlin Heidelberg.
About Bike Share
Data
The goal is to predict counts either
based on sum of casual & registered or
directly
Data Fields
datetime - hourly date + timestamp
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather –
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain +
Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp - temperature in Celsius
atemp - "feels like" temperature in Celsius
humidity - relative humidity
windspeed - wind speed
casual - number of non-registered user rentals initiated
registered - number of registered user rentals initiated
count - number of total rental
Data
datetime - hourly date + timestamp
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather –
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain +
Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp - temperature in Celsius
atemp - "feels like" temperature in Celsius
humidity - relative humidity
windspeed - wind speed
casual - number of non-registered user rentals initiated
registered - number of registered user rentals initiated
count - number of total rental
Data
Datetime - hourly date + timestamp
Predefined Factors:
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather –
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain +
Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
Data - Continuous
Data
Workday busy hours
Data
Data
Tools
Weka
R
Python
H2O + R
Vowpal Wabbit
Using R
Feature Engineering
Citations
Feature-Weighted Linear Stacking
Joseph Sill1, Gabor Takacs2, Lester Mackey3, and David Lin4
Combining Predictions for Accurate Recommender Systems
Michael Jahrer ,Andreas Töscher ,Robert Legenstein

More Related Content

More from Taposh Roy

Consumer electronics bm_retail
Consumer electronics bm_retailConsumer electronics bm_retail
Consumer electronics bm_retail
Taposh Roy
 
Multi Asset Endowment Investment Strategy
Multi Asset Endowment Investment StrategyMulti Asset Endowment Investment Strategy
Multi Asset Endowment Investment Strategy
Taposh Roy
 
Competitor Analysis for RSG Consulting
Competitor Analysis for RSG ConsultingCompetitor Analysis for RSG Consulting
Competitor Analysis for RSG Consulting
Taposh Roy
 
Financial Analysis boeing airbus
Financial Analysis boeing airbusFinancial Analysis boeing airbus
Financial Analysis boeing airbus
Taposh Roy
 
Sprint softbank (Merger Analysis)
Sprint softbank (Merger Analysis)Sprint softbank (Merger Analysis)
Sprint softbank (Merger Analysis)
Taposh Roy
 
M a analysis_roche_genentech
M a analysis_roche_genentechM a analysis_roche_genentech
M a analysis_roche_genentech
Taposh Roy
 
Land rover north america (HBS 9-596036)
Land rover north america (HBS 9-596036)Land rover north america (HBS 9-596036)
Land rover north america (HBS 9-596036)
Taposh Roy
 
American airlines - Value Pricing 1992
American airlines - Value Pricing 1992American airlines - Value Pricing 1992
American airlines - Value Pricing 1992
Taposh Roy
 
Strategy frameworks-and-models
Strategy frameworks-and-modelsStrategy frameworks-and-models
Strategy frameworks-and-models
Taposh Roy
 
Tesla in UAE (Financial Strategy)
Tesla in UAE (Financial Strategy)Tesla in UAE (Financial Strategy)
Tesla in UAE (Financial Strategy)
Taposh Roy
 
Understandingplatform
UnderstandingplatformUnderstandingplatform
Understandingplatform
Taposh Roy
 
Disney hbs9 701-035
Disney hbs9 701-035Disney hbs9 701-035
Disney hbs9 701-035
Taposh Roy
 
Best buy-analysis
Best buy-analysisBest buy-analysis
Best buy-analysis
Taposh Roy
 
Redbox instant Analysis
Redbox instant AnalysisRedbox instant Analysis
Redbox instant Analysis
Taposh Roy
 

More from Taposh Roy (14)

Consumer electronics bm_retail
Consumer electronics bm_retailConsumer electronics bm_retail
Consumer electronics bm_retail
 
Multi Asset Endowment Investment Strategy
Multi Asset Endowment Investment StrategyMulti Asset Endowment Investment Strategy
Multi Asset Endowment Investment Strategy
 
Competitor Analysis for RSG Consulting
Competitor Analysis for RSG ConsultingCompetitor Analysis for RSG Consulting
Competitor Analysis for RSG Consulting
 
Financial Analysis boeing airbus
Financial Analysis boeing airbusFinancial Analysis boeing airbus
Financial Analysis boeing airbus
 
Sprint softbank (Merger Analysis)
Sprint softbank (Merger Analysis)Sprint softbank (Merger Analysis)
Sprint softbank (Merger Analysis)
 
M a analysis_roche_genentech
M a analysis_roche_genentechM a analysis_roche_genentech
M a analysis_roche_genentech
 
Land rover north america (HBS 9-596036)
Land rover north america (HBS 9-596036)Land rover north america (HBS 9-596036)
Land rover north america (HBS 9-596036)
 
American airlines - Value Pricing 1992
American airlines - Value Pricing 1992American airlines - Value Pricing 1992
American airlines - Value Pricing 1992
 
Strategy frameworks-and-models
Strategy frameworks-and-modelsStrategy frameworks-and-models
Strategy frameworks-and-models
 
Tesla in UAE (Financial Strategy)
Tesla in UAE (Financial Strategy)Tesla in UAE (Financial Strategy)
Tesla in UAE (Financial Strategy)
 
Understandingplatform
UnderstandingplatformUnderstandingplatform
Understandingplatform
 
Disney hbs9 701-035
Disney hbs9 701-035Disney hbs9 701-035
Disney hbs9 701-035
 
Best buy-analysis
Best buy-analysisBest buy-analysis
Best buy-analysis
 
Redbox instant Analysis
Redbox instant AnalysisRedbox instant Analysis
Redbox instant Analysis
 

Recently uploaded

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
ramrag33
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
integral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdfintegral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdf
gaafergoudaay7aga
 

Recently uploaded (20)

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
integral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdfintegral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdf
 

Kaggle bikeshare Competition - Part 1

  • 1. { Kaggle <- BikeShare Taposh Dutta Roy Jan 26th 2015 Presented at YHAT, Oakland, CA
  • 2. Contents • About Bikeshare • Data • Tools • R • Factor Engineering • Matrix • Random Forest • Neural Nets
  • 3. About Bike Share Competition: http://www.kaggle.com/c/bike- sharing-demand Challenge: Forecast use of a city’s bike share system Data : UCI Machine Learning Repository
  • 4. Publication : Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg. About Bike Share
  • 5. Data The goal is to predict counts either based on sum of casual & registered or directly
  • 6. Data Fields datetime - hourly date + timestamp season - 1 = spring, 2 = summer, 3 = fall, 4 = winter holiday - whether the day is considered a holiday workingday - whether the day is neither a weekend nor holiday weather – 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog temp - temperature in Celsius atemp - "feels like" temperature in Celsius humidity - relative humidity windspeed - wind speed casual - number of non-registered user rentals initiated registered - number of registered user rentals initiated count - number of total rental
  • 7. Data datetime - hourly date + timestamp season - 1 = spring, 2 = summer, 3 = fall, 4 = winter holiday - whether the day is considered a holiday workingday - whether the day is neither a weekend nor holiday weather – 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog temp - temperature in Celsius atemp - "feels like" temperature in Celsius humidity - relative humidity windspeed - wind speed casual - number of non-registered user rentals initiated registered - number of registered user rentals initiated count - number of total rental
  • 8. Data Datetime - hourly date + timestamp Predefined Factors: season - 1 = spring, 2 = summer, 3 = fall, 4 = winter holiday - whether the day is considered a holiday workingday - whether the day is neither a weekend nor holiday weather – 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • 11. Data
  • 12. Data
  • 16. Citations Feature-Weighted Linear Stacking Joseph Sill1, Gabor Takacs2, Lester Mackey3, and David Lin4 Combining Predictions for Accurate Recommender Systems Michael Jahrer ,Andreas Töscher ,Robert Legenstein

Editor's Notes

  1. Would really love to build a framework recognizing patterns based on the type of data and creating complex factors out of the box Size Sparse vs. Dense Tall (more records) & skinny vs High Width (more columns)