SlideShare a Scribd company logo
1 of 30
WEKA: A MODERN APPLICATION
OF DATA MINING TECHNIQUES
SEAN,ROB,PRATIK,RHODRI,AL, VASANTI,MINGHAO
What is WEKA?
• Desktop application for machine learning & data mining
• Open source Java based tool
• Offers commonly used algorithms to model data.
• University of Waikato, New Zealand
What is Data Mining & Machine Learning?
• Data Mining :
• Searching for patterns in data
• Finding value in data
• Machine Learning:
• Developing models which computational resources can use
• Using computational resources to model data to predict a likely outcome.
Features of WEKA
• Pre-process data
• Classification & Clustering
• Association rules
• 3D visualisation
Choosing the Dataset
• Public datasets:
•data.gov.uk
•kaggle.com: such as Titanic dataset
•UCI Machine Learning Repository
• Dataset which could provide insight to a real world scenario
• Would model effectively in WEKA: several properties
Capital Bikeshare
Picture: Alejandro Castro, flickr, creative commons
• Bike-share system in Washington DC and surrounding area
• https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
The Objective
• Investigate factors affecting bike-share usage
• Could this data be used to predict how busy or quiet a bike share
system may be on a given day?
Dataset fields
• Record index
• Time information
•Date, day of the week, whether day is holiday, whether day is working day,
month, year, season (1-4 spring/summer/autumn/winter)
•Weather Information
•weather description (separated into four distinct results which are roughly good
to bad)
•normalized values for temperature, ‘feels like’ temperature, humidity and
windspeed
•Totals
•counts for bikes rented by registered and casual users
•total count for bikes registered that day
Pre-processing
• Remove fields which don’t help prediction
•indexes, sub-totals etc
• Filters
• Discretize - categorise into discrete values
• ClassBalancer - re-weights instances so more evenly spread
Data Visualisation
Basic terminology to understand evolution of
classifiers
•True positive(tp): An instance is correctly predicted to belong to the
given class
•True negative(tn): An instance is correctly predicted not to belong
to the given class
•False positive(fp): An instance is incorrectly predicted to belong to
the given class
•False negative(fn): An instance is incorrectly predicted not to belong
to the given class
Explanation of Statistics
• Precision:
• Recall:
• F-measure:
Algorithms explored
Graph based:
• J48 - This classifier uses a tree structure to make decisions.
•Performs very good for our dataset
Algorithms explored
Rule based :
• ZeroR - ZeroR is the simplest classification method which relies on the target
and ignores all predictors.
•Not good for our dataset
Algorithms explored
Naïve Bayes
•This is a probabilistic classifier based on Bayes Theorem which
analyses the relationship between features and class labels.
•. This classifier can handle missing values by ignoring them during
calculation of the conditional probabilities.
Testset Division
Training and Testing set:
-Training data is used for building a ML model
-Testing data is used for measuring performance of a ML model
Supplying testing set in WekaSeparate training and testing
Testset Division
Cross Validation:
-To overcome the problem of overfitting
-Makes the predictions more general
•Includes:
-Splitting the original dataset into k equal parts (folds)
-Takes out one fold aside, and performs training over the rest k-1
folds and measures the performance
-Repeats the process k times by taking different fold each time.
•10-fold cross-validation : k = 10
Testset Division
Percentage split
-Randomly split your dataset into a training and a testing partitions
each time you evaluate a model.
Dividing original dataset into testing and training
For example:
If we have a data of 100
instances and we would like
to split 66% as training and
34% as test set using
percentage split
What is Clustering?
• Finding the class labels and the number of classes directly from the
data (in contrast to classification).
• It is unsupervised learning:
We want to explore the data to find some structures in them.
What is clustering for?
● Grouping items of similar properties together into clusters.
● For example to apply machine learning approaches to make
decisions based on data e.g. for classifying : “small”, “medium” and
“large” T-Shirts.
Clustering types:
Clustering types:
Some popular Clustering Algorithms
•K- means clustering (disjoint sets)
•EM clustering (probabilistic)
•Cobweb clustering (hierarchical)
KMeans: Iterative distance-based clustering
(disjoint sets)
1. Specify k, the desired number of clusters
2. Choose k points at random as cluster centers
3. Assign all instances to their closest cluster center
4. Calculate the centroid (i.e., mean) of instances in each cluster
5. These centroids are the new cluster centers
6. Continue until the cluster centers don’t change
Minimizes the total squared distance from instances to their cluster
centers.
K-means in Weka
•Note parameters:
• numClusters
•distanceFunction
How can we tell the
right number of clusters?
In general, this is
an unsolved problem
Clustering is subjective
•Use the AddCluster
unsupervised attribute filter
•Hard to evaluate clustering
Trying to cluster into seasons
Using K-means clustering, with k=4, we wish to see if the data falls
into the clusters based on the seasons
Observations
• We found that winter and summer months have separated into two
distinct clusters.
• The autumn and spring months have not separated so well.
• From the visualisation we also see the overall trend of more users in
the summer months compared to winter ones.
• This is not surprising since these months are hotter and people are
more likely to choose to rent bikes.
Possible Improvements
• Data accuracy
• Uncontrollable outside factors e.g. road closures,cycle paths built,tube strikes etc.
• As popularity increases -> may affect results.
• Data precision
• Bad measurements, subjective opinions(weather): generalised - exact calculations needed.
• Variable factors e.g. “temperature or weather” is different depending on exact location.
• Data itself always changing: only an indicator of some relationships.
• Different people: e.g. tourists – different people may have different attitudes
• Different locations yield different results: weather is variable across continents.
Evaluation of best approach
• J48 - easy to visualise
• Zero R is a bad idea for our dataset
Overall : the best approach is to analyse several different WEKA modules
and compare results to focus efforts and find the best solution.
• Graphs of properties: can indicate most important factors to be classified
• Classification algorithms: to build a model
• Testing the model is also crucial.
Conclusions based on data
• Dataset suitability - probably more suited to classification than
clustering
• Some prediction was possible
• External factors - other changes in the transport network, cycling
for health, city events
• Other possible analysis: usage by hour, casual users
• Applications: Smart cities & planning - effective bikeshare provision

More Related Content

What's hot

Bitcoin - Introduction to Virtual Currency / Cryptocurrency
Bitcoin - Introduction to Virtual Currency / CryptocurrencyBitcoin - Introduction to Virtual Currency / Cryptocurrency
Bitcoin - Introduction to Virtual Currency / CryptocurrencySwaminath Sam
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine LearningScaleway
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionvineeta vineeta
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionkalpesh1908
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
Cryptocurrencies: Issues, Challenges and Way Forward
Cryptocurrencies: Issues, Challenges and Way ForwardCryptocurrencies: Issues, Challenges and Way Forward
Cryptocurrencies: Issues, Challenges and Way ForwardVinod Kashyap
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithmsankit panigrahy
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learningwgyn
 
Presentation on security feature of atm (2)
Presentation on security feature of atm (2)Presentation on security feature of atm (2)
Presentation on security feature of atm (2)Siya Agarwal
 
Cryptocurrency - Digital Currency
Cryptocurrency - Digital CurrencyCryptocurrency - Digital Currency
Cryptocurrency - Digital CurrencySameer Satyam
 
Paper currency recognigation with counterfeit detection using image processing
Paper currency recognigation with counterfeit detection using image processingPaper currency recognigation with counterfeit detection using image processing
Paper currency recognigation with counterfeit detection using image processingmeghanaaandy
 
Cryptocurrency
CryptocurrencyCryptocurrency
CryptocurrencyMZain17
 

What's hot (20)

Bitcoin - Introduction to Virtual Currency / Cryptocurrency
Bitcoin - Introduction to Virtual Currency / CryptocurrencyBitcoin - Introduction to Virtual Currency / Cryptocurrency
Bitcoin - Introduction to Virtual Currency / Cryptocurrency
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Cryptocurrency
CryptocurrencyCryptocurrency
Cryptocurrency
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
Cryptocurrencies: Issues, Challenges and Way Forward
Cryptocurrencies: Issues, Challenges and Way ForwardCryptocurrencies: Issues, Challenges and Way Forward
Cryptocurrencies: Issues, Challenges and Way Forward
 
Credit card fraud dection
Credit card fraud dectionCredit card fraud dection
Credit card fraud dection
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learning
 
Presentation on security feature of atm (2)
Presentation on security feature of atm (2)Presentation on security feature of atm (2)
Presentation on security feature of atm (2)
 
Cryptocurrency - Digital Currency
Cryptocurrency - Digital CurrencyCryptocurrency - Digital Currency
Cryptocurrency - Digital Currency
 
Cryptocurrency
CryptocurrencyCryptocurrency
Cryptocurrency
 
BITCOIN- A Presentation.
BITCOIN- A Presentation.BITCOIN- A Presentation.
BITCOIN- A Presentation.
 
Cryptography
CryptographyCryptography
Cryptography
 
Payment Card System Overview
Payment Card System OverviewPayment Card System Overview
Payment Card System Overview
 
Paper currency recognigation with counterfeit detection using image processing
Paper currency recognigation with counterfeit detection using image processingPaper currency recognigation with counterfeit detection using image processing
Paper currency recognigation with counterfeit detection using image processing
 
Fraud detection
Fraud detectionFraud detection
Fraud detection
 
Cryptocurrency
CryptocurrencyCryptocurrency
Cryptocurrency
 
Public key Infrastructure (PKI)
Public key Infrastructure (PKI)Public key Infrastructure (PKI)
Public key Infrastructure (PKI)
 

Viewers also liked

Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Pratik Doshi
 
Sesión mat resolvemos problemas de equilibrio copia
Sesión mat resolvemos problemas de equilibrio   copiaSesión mat resolvemos problemas de equilibrio   copia
Sesión mat resolvemos problemas de equilibrio copiaSOTO ZOTITO
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generationrsathishwaran
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Ishan Awadhesh
 
Webquest shantall
Webquest shantallWebquest shantall
Webquest shantallShantall0
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Matthew Courtney
 
final final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEfinal final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEKenneth Rosales
 
Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)CASA, UCL
 
WEKA:Output Knowledge Representation
WEKA:Output Knowledge RepresentationWEKA:Output Knowledge Representation
WEKA:Output Knowledge Representationweka Content
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKAbutest
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKAbutest
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedShareek Ahamed
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekarathorenitin87
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributesweka Content
 
Acc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newAcc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newninfaames
 

Viewers also liked (17)

Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...
 
Sesión mat resolvemos problemas de equilibrio copia
Sesión mat resolvemos problemas de equilibrio   copiaSesión mat resolvemos problemas de equilibrio   copia
Sesión mat resolvemos problemas de equilibrio copia
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
 
Webquest shantall
Webquest shantallWebquest shantall
Webquest shantall
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8
 
final final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEfinal final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSE
 
Amazon
AmazonAmazon
Amazon
 
Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)
 
WEKA:Output Knowledge Representation
WEKA:Output Knowledge RepresentationWEKA:Output Knowledge Representation
WEKA:Output Knowledge Representation
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek Ahamed
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributes
 
Acc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newAcc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer new
 

Similar to Weka bike rental

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfSaketBansal9
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxAkash527744
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data miningAshikur Rahman
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 

Similar to Weka bike rental (20)

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptx
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
machine learning
machine learningmachine learning
machine learning
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 

Recently uploaded

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Weka bike rental

  • 1. WEKA: A MODERN APPLICATION OF DATA MINING TECHNIQUES SEAN,ROB,PRATIK,RHODRI,AL, VASANTI,MINGHAO
  • 2. What is WEKA? • Desktop application for machine learning & data mining • Open source Java based tool • Offers commonly used algorithms to model data. • University of Waikato, New Zealand
  • 3. What is Data Mining & Machine Learning? • Data Mining : • Searching for patterns in data • Finding value in data • Machine Learning: • Developing models which computational resources can use • Using computational resources to model data to predict a likely outcome.
  • 4. Features of WEKA • Pre-process data • Classification & Clustering • Association rules • 3D visualisation
  • 5. Choosing the Dataset • Public datasets: •data.gov.uk •kaggle.com: such as Titanic dataset •UCI Machine Learning Repository • Dataset which could provide insight to a real world scenario • Would model effectively in WEKA: several properties
  • 6. Capital Bikeshare Picture: Alejandro Castro, flickr, creative commons • Bike-share system in Washington DC and surrounding area • https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
  • 7. The Objective • Investigate factors affecting bike-share usage • Could this data be used to predict how busy or quiet a bike share system may be on a given day?
  • 8. Dataset fields • Record index • Time information •Date, day of the week, whether day is holiday, whether day is working day, month, year, season (1-4 spring/summer/autumn/winter) •Weather Information •weather description (separated into four distinct results which are roughly good to bad) •normalized values for temperature, ‘feels like’ temperature, humidity and windspeed •Totals •counts for bikes rented by registered and casual users •total count for bikes registered that day
  • 9. Pre-processing • Remove fields which don’t help prediction •indexes, sub-totals etc • Filters • Discretize - categorise into discrete values • ClassBalancer - re-weights instances so more evenly spread
  • 11. Basic terminology to understand evolution of classifiers •True positive(tp): An instance is correctly predicted to belong to the given class •True negative(tn): An instance is correctly predicted not to belong to the given class •False positive(fp): An instance is incorrectly predicted to belong to the given class •False negative(fn): An instance is incorrectly predicted not to belong to the given class
  • 12. Explanation of Statistics • Precision: • Recall: • F-measure:
  • 13. Algorithms explored Graph based: • J48 - This classifier uses a tree structure to make decisions. •Performs very good for our dataset
  • 14. Algorithms explored Rule based : • ZeroR - ZeroR is the simplest classification method which relies on the target and ignores all predictors. •Not good for our dataset
  • 15. Algorithms explored Naïve Bayes •This is a probabilistic classifier based on Bayes Theorem which analyses the relationship between features and class labels. •. This classifier can handle missing values by ignoring them during calculation of the conditional probabilities.
  • 16. Testset Division Training and Testing set: -Training data is used for building a ML model -Testing data is used for measuring performance of a ML model Supplying testing set in WekaSeparate training and testing
  • 17. Testset Division Cross Validation: -To overcome the problem of overfitting -Makes the predictions more general •Includes: -Splitting the original dataset into k equal parts (folds) -Takes out one fold aside, and performs training over the rest k-1 folds and measures the performance -Repeats the process k times by taking different fold each time. •10-fold cross-validation : k = 10
  • 18. Testset Division Percentage split -Randomly split your dataset into a training and a testing partitions each time you evaluate a model. Dividing original dataset into testing and training For example: If we have a data of 100 instances and we would like to split 66% as training and 34% as test set using percentage split
  • 19. What is Clustering? • Finding the class labels and the number of classes directly from the data (in contrast to classification). • It is unsupervised learning: We want to explore the data to find some structures in them. What is clustering for? ● Grouping items of similar properties together into clusters. ● For example to apply machine learning approaches to make decisions based on data e.g. for classifying : “small”, “medium” and “large” T-Shirts.
  • 22. Some popular Clustering Algorithms •K- means clustering (disjoint sets) •EM clustering (probabilistic) •Cobweb clustering (hierarchical)
  • 23. KMeans: Iterative distance-based clustering (disjoint sets) 1. Specify k, the desired number of clusters 2. Choose k points at random as cluster centers 3. Assign all instances to their closest cluster center 4. Calculate the centroid (i.e., mean) of instances in each cluster 5. These centroids are the new cluster centers 6. Continue until the cluster centers don’t change Minimizes the total squared distance from instances to their cluster centers.
  • 24. K-means in Weka •Note parameters: • numClusters •distanceFunction How can we tell the right number of clusters? In general, this is an unsolved problem Clustering is subjective
  • 25. •Use the AddCluster unsupervised attribute filter •Hard to evaluate clustering
  • 26. Trying to cluster into seasons Using K-means clustering, with k=4, we wish to see if the data falls into the clusters based on the seasons
  • 27. Observations • We found that winter and summer months have separated into two distinct clusters. • The autumn and spring months have not separated so well. • From the visualisation we also see the overall trend of more users in the summer months compared to winter ones. • This is not surprising since these months are hotter and people are more likely to choose to rent bikes.
  • 28. Possible Improvements • Data accuracy • Uncontrollable outside factors e.g. road closures,cycle paths built,tube strikes etc. • As popularity increases -> may affect results. • Data precision • Bad measurements, subjective opinions(weather): generalised - exact calculations needed. • Variable factors e.g. “temperature or weather” is different depending on exact location. • Data itself always changing: only an indicator of some relationships. • Different people: e.g. tourists – different people may have different attitudes • Different locations yield different results: weather is variable across continents.
  • 29. Evaluation of best approach • J48 - easy to visualise • Zero R is a bad idea for our dataset Overall : the best approach is to analyse several different WEKA modules and compare results to focus efforts and find the best solution. • Graphs of properties: can indicate most important factors to be classified • Classification algorithms: to build a model • Testing the model is also crucial.
  • 30. Conclusions based on data • Dataset suitability - probably more suited to classification than clustering • Some prediction was possible • External factors - other changes in the transport network, cycling for health, city events • Other possible analysis: usage by hour, casual users • Applications: Smart cities & planning - effective bikeshare provision