SlideShare a Scribd company logo
1 of 30
WEKA: A MODERN APPLICATION
OF DATA MINING TECHNIQUES
SEAN,ROB,PRATIK,RHODRI,AL, VASANTI,MINGHAO
What is WEKA?
• Desktop application for machine learning & data mining
• Open source Java based tool
• Offers commonly used algorithms to model data.
• University of Waikato, New Zealand
What is Data Mining & Machine Learning?
• Data Mining :
• Searching for patterns in data
• Finding value in data
• Machine Learning:
• Developing models which computational resources can use
• Using computational resources to model data to predict a likely outcome.
Features of WEKA
• Pre-process data
• Classification & Clustering
• Association rules
• 3D visualisation
Choosing the Dataset
• Public datasets:
•data.gov.uk
•kaggle.com: such as Titanic dataset
•UCI Machine Learning Repository
• Dataset which could provide insight to a real world scenario
• Would model effectively in WEKA: several properties
Capital Bikeshare
Picture: Alejandro Castro, flickr, creative commons
• Bike-share system in Washington DC and surrounding area
• https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
The Objective
• Investigate factors affecting bike-share usage
• Could this data be used to predict how busy or quiet a bike share
system may be on a given day?
Dataset fields
• Record index
• Time information
•Date, day of the week, whether day is holiday, whether day is working day,
month, year, season (1-4 spring/summer/autumn/winter)
•Weather Information
•weather description (separated into four distinct results which are roughly good
to bad)
•normalized values for temperature, ‘feels like’ temperature, humidity and
windspeed
•Totals
•counts for bikes rented by registered and casual users
•total count for bikes registered that day
Pre-processing
• Remove fields which don’t help prediction
•indexes, sub-totals etc
• Filters
• Discretize - categorise into discrete values
• ClassBalancer - re-weights instances so more evenly spread
Data Visualisation
Basic terminology to understand evolution of
classifiers
•True positive(tp): An instance is correctly predicted to belong to the
given class
•True negative(tn): An instance is correctly predicted not to belong
to the given class
•False positive(fp): An instance is incorrectly predicted to belong to
the given class
•False negative(fn): An instance is incorrectly predicted not to belong
to the given class
Explanation of Statistics
• Precision:
• Recall:
• F-measure:
Algorithms explored
Graph based:
• J48 - This classifier uses a tree structure to make decisions.
•Performs very good for our dataset
Algorithms explored
Rule based :
• ZeroR - ZeroR is the simplest classification method which relies on the target
and ignores all predictors.
•Not good for our dataset
Algorithms explored
Naïve Bayes
•This is a probabilistic classifier based on Bayes Theorem which
analyses the relationship between features and class labels.
•. This classifier can handle missing values by ignoring them during
calculation of the conditional probabilities.
Testset Division
Training and Testing set:
-Training data is used for building a ML model
-Testing data is used for measuring performance of a ML model
Supplying testing set in WekaSeparate training and testing
Testset Division
Cross Validation:
-To overcome the problem of overfitting
-Makes the predictions more general
•Includes:
-Splitting the original dataset into k equal parts (folds)
-Takes out one fold aside, and performs training over the rest k-1
folds and measures the performance
-Repeats the process k times by taking different fold each time.
•10-fold cross-validation : k = 10
Testset Division
Percentage split
-Randomly split your dataset into a training and a testing partitions
each time you evaluate a model.
Dividing original dataset into testing and training
For example:
If we have a data of 100
instances and we would like
to split 66% as training and
34% as test set using
percentage split
What is Clustering?
• Finding the class labels and the number of classes directly from the
data (in contrast to classification).
• It is unsupervised learning:
We want to explore the data to find some structures in them.
What is clustering for?
● Grouping items of similar properties together into clusters.
● For example to apply machine learning approaches to make
decisions based on data e.g. for classifying : “small”, “medium” and
“large” T-Shirts.
Clustering types:
Clustering types:
Some popular Clustering Algorithms
•K- means clustering (disjoint sets)
•EM clustering (probabilistic)
•Cobweb clustering (hierarchical)
KMeans: Iterative distance-based clustering
(disjoint sets)
1. Specify k, the desired number of clusters
2. Choose k points at random as cluster centers
3. Assign all instances to their closest cluster center
4. Calculate the centroid (i.e., mean) of instances in each cluster
5. These centroids are the new cluster centers
6. Continue until the cluster centers don’t change
Minimizes the total squared distance from instances to their cluster
centers.
K-means in Weka
•Note parameters:
• numClusters
•distanceFunction
How can we tell the
right number of clusters?
In general, this is
an unsolved problem
Clustering is subjective
•Use the AddCluster
unsupervised attribute filter
•Hard to evaluate clustering
Trying to cluster into seasons
Using K-means clustering, with k=4, we wish to see if the data falls
into the clusters based on the seasons
Observations
• We found that winter and summer months have separated into two
distinct clusters.
• The autumn and spring months have not separated so well.
• From the visualisation we also see the overall trend of more users in
the summer months compared to winter ones.
• This is not surprising since these months are hotter and people are
more likely to choose to rent bikes.
Possible Improvements
• Data accuracy
• Uncontrollable outside factors e.g. road closures,cycle paths built,tube strikes etc.
• As popularity increases -> may affect results.
• Data precision
• Bad measurements, subjective opinions(weather): generalised - exact calculations needed.
• Variable factors e.g. “temperature or weather” is different depending on exact location.
• Data itself always changing: only an indicator of some relationships.
• Different people: e.g. tourists – different people may have different attitudes
• Different locations yield different results: weather is variable across continents.
Evaluation of best approach
• J48 - easy to visualise
• Zero R is a bad idea for our dataset
Overall : the best approach is to analyse several different WEKA modules
and compare results to focus efforts and find the best solution.
• Graphs of properties: can indicate most important factors to be classified
• Classification algorithms: to build a model
• Testing the model is also crucial.
Conclusions based on data
• Dataset suitability - probably more suited to classification than
clustering
• Some prediction was possible
• External factors - other changes in the transport network, cycling
for health, city events
• Other possible analysis: usage by hour, casual users
• Applications: Smart cities & planning - effective bikeshare provision

More Related Content

What's hot

Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionvineeta vineeta
 
Credit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionCredit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionLalit Jain
 
Introduction to Computational Intelligent
Introduction to Computational IntelligentIntroduction to Computational Intelligent
Introduction to Computational IntelligentKent State University
 
IRJET- Diabetes Prediction using Machine Learning
IRJET- Diabetes Prediction using Machine LearningIRJET- Diabetes Prediction using Machine Learning
IRJET- Diabetes Prediction using Machine LearningIRJET Journal
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number GeneratorsDarshini Parikh
 
Gender voice recognition.pptx
Gender voice recognition.pptxGender voice recognition.pptx
Gender voice recognition.pptxRohith572864
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSREHMAT ULLAH
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Pratibha Singh
 

What's hot (20)

Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Credit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionCredit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly Detection
 
Lstm
LstmLstm
Lstm
 
Introduction to Computational Intelligent
Introduction to Computational IntelligentIntroduction to Computational Intelligent
Introduction to Computational Intelligent
 
IRJET- Diabetes Prediction using Machine Learning
IRJET- Diabetes Prediction using Machine LearningIRJET- Diabetes Prediction using Machine Learning
IRJET- Diabetes Prediction using Machine Learning
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number Generators
 
PDF OCR
PDF OCRPDF OCR
PDF OCR
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
Gender voice recognition.pptx
Gender voice recognition.pptxGender voice recognition.pptx
Gender voice recognition.pptx
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...
 

Viewers also liked

Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Pratik Doshi
 
Sesión mat resolvemos problemas de equilibrio copia
Sesión mat resolvemos problemas de equilibrio   copiaSesión mat resolvemos problemas de equilibrio   copia
Sesión mat resolvemos problemas de equilibrio copiaSOTO ZOTITO
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generationrsathishwaran
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Ishan Awadhesh
 
Webquest shantall
Webquest shantallWebquest shantall
Webquest shantallShantall0
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Matthew Courtney
 
final final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEfinal final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEKenneth Rosales
 
Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)CASA, UCL
 
WEKA:Output Knowledge Representation
WEKA:Output Knowledge RepresentationWEKA:Output Knowledge Representation
WEKA:Output Knowledge Representationweka Content
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKAbutest
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKAbutest
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedShareek Ahamed
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekarathorenitin87
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributesweka Content
 
Acc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newAcc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newninfaames
 

Viewers also liked (17)

Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...Mining dynamic social networks from public news articles for company value pr...
Mining dynamic social networks from public news articles for company value pr...
 
Sesión mat resolvemos problemas de equilibrio copia
Sesión mat resolvemos problemas de equilibrio   copiaSesión mat resolvemos problemas de equilibrio   copia
Sesión mat resolvemos problemas de equilibrio copia
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
 
Webquest shantall
Webquest shantallWebquest shantall
Webquest shantall
 
Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8Web and Social Computing - Presentation Week8
Web and Social Computing - Presentation Week8
 
final final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSEfinal final copy of BIKE SHARE IN SAN JOSE
final final copy of BIKE SHARE IN SAN JOSE
 
Amazon
AmazonAmazon
Amazon
 
Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)Visualising Bike Share (#geomob 21 October 2010)
Visualising Bike Share (#geomob 21 October 2010)
 
WEKA:Output Knowledge Representation
WEKA:Output Knowledge RepresentationWEKA:Output Knowledge Representation
WEKA:Output Knowledge Representation
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek Ahamed
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributes
 
Acc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer newAcc 560 week 9 quiz – strayer new
Acc 560 week 9 quiz – strayer new
 

Similar to Weka bike rental

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfSaketBansal9
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxAkash527744
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data miningAshikur Rahman
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 

Similar to Weka bike rental (20)

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptx
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
machine learning
machine learningmachine learning
machine learning
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Weka bike rental

  • 1. WEKA: A MODERN APPLICATION OF DATA MINING TECHNIQUES SEAN,ROB,PRATIK,RHODRI,AL, VASANTI,MINGHAO
  • 2. What is WEKA? • Desktop application for machine learning & data mining • Open source Java based tool • Offers commonly used algorithms to model data. • University of Waikato, New Zealand
  • 3. What is Data Mining & Machine Learning? • Data Mining : • Searching for patterns in data • Finding value in data • Machine Learning: • Developing models which computational resources can use • Using computational resources to model data to predict a likely outcome.
  • 4. Features of WEKA • Pre-process data • Classification & Clustering • Association rules • 3D visualisation
  • 5. Choosing the Dataset • Public datasets: •data.gov.uk •kaggle.com: such as Titanic dataset •UCI Machine Learning Repository • Dataset which could provide insight to a real world scenario • Would model effectively in WEKA: several properties
  • 6. Capital Bikeshare Picture: Alejandro Castro, flickr, creative commons • Bike-share system in Washington DC and surrounding area • https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
  • 7. The Objective • Investigate factors affecting bike-share usage • Could this data be used to predict how busy or quiet a bike share system may be on a given day?
  • 8. Dataset fields • Record index • Time information •Date, day of the week, whether day is holiday, whether day is working day, month, year, season (1-4 spring/summer/autumn/winter) •Weather Information •weather description (separated into four distinct results which are roughly good to bad) •normalized values for temperature, ‘feels like’ temperature, humidity and windspeed •Totals •counts for bikes rented by registered and casual users •total count for bikes registered that day
  • 9. Pre-processing • Remove fields which don’t help prediction •indexes, sub-totals etc • Filters • Discretize - categorise into discrete values • ClassBalancer - re-weights instances so more evenly spread
  • 11. Basic terminology to understand evolution of classifiers •True positive(tp): An instance is correctly predicted to belong to the given class •True negative(tn): An instance is correctly predicted not to belong to the given class •False positive(fp): An instance is incorrectly predicted to belong to the given class •False negative(fn): An instance is incorrectly predicted not to belong to the given class
  • 12. Explanation of Statistics • Precision: • Recall: • F-measure:
  • 13. Algorithms explored Graph based: • J48 - This classifier uses a tree structure to make decisions. •Performs very good for our dataset
  • 14. Algorithms explored Rule based : • ZeroR - ZeroR is the simplest classification method which relies on the target and ignores all predictors. •Not good for our dataset
  • 15. Algorithms explored Naïve Bayes •This is a probabilistic classifier based on Bayes Theorem which analyses the relationship between features and class labels. •. This classifier can handle missing values by ignoring them during calculation of the conditional probabilities.
  • 16. Testset Division Training and Testing set: -Training data is used for building a ML model -Testing data is used for measuring performance of a ML model Supplying testing set in WekaSeparate training and testing
  • 17. Testset Division Cross Validation: -To overcome the problem of overfitting -Makes the predictions more general •Includes: -Splitting the original dataset into k equal parts (folds) -Takes out one fold aside, and performs training over the rest k-1 folds and measures the performance -Repeats the process k times by taking different fold each time. •10-fold cross-validation : k = 10
  • 18. Testset Division Percentage split -Randomly split your dataset into a training and a testing partitions each time you evaluate a model. Dividing original dataset into testing and training For example: If we have a data of 100 instances and we would like to split 66% as training and 34% as test set using percentage split
  • 19. What is Clustering? • Finding the class labels and the number of classes directly from the data (in contrast to classification). • It is unsupervised learning: We want to explore the data to find some structures in them. What is clustering for? ● Grouping items of similar properties together into clusters. ● For example to apply machine learning approaches to make decisions based on data e.g. for classifying : “small”, “medium” and “large” T-Shirts.
  • 22. Some popular Clustering Algorithms •K- means clustering (disjoint sets) •EM clustering (probabilistic) •Cobweb clustering (hierarchical)
  • 23. KMeans: Iterative distance-based clustering (disjoint sets) 1. Specify k, the desired number of clusters 2. Choose k points at random as cluster centers 3. Assign all instances to their closest cluster center 4. Calculate the centroid (i.e., mean) of instances in each cluster 5. These centroids are the new cluster centers 6. Continue until the cluster centers don’t change Minimizes the total squared distance from instances to their cluster centers.
  • 24. K-means in Weka •Note parameters: • numClusters •distanceFunction How can we tell the right number of clusters? In general, this is an unsolved problem Clustering is subjective
  • 25. •Use the AddCluster unsupervised attribute filter •Hard to evaluate clustering
  • 26. Trying to cluster into seasons Using K-means clustering, with k=4, we wish to see if the data falls into the clusters based on the seasons
  • 27. Observations • We found that winter and summer months have separated into two distinct clusters. • The autumn and spring months have not separated so well. • From the visualisation we also see the overall trend of more users in the summer months compared to winter ones. • This is not surprising since these months are hotter and people are more likely to choose to rent bikes.
  • 28. Possible Improvements • Data accuracy • Uncontrollable outside factors e.g. road closures,cycle paths built,tube strikes etc. • As popularity increases -> may affect results. • Data precision • Bad measurements, subjective opinions(weather): generalised - exact calculations needed. • Variable factors e.g. “temperature or weather” is different depending on exact location. • Data itself always changing: only an indicator of some relationships. • Different people: e.g. tourists – different people may have different attitudes • Different locations yield different results: weather is variable across continents.
  • 29. Evaluation of best approach • J48 - easy to visualise • Zero R is a bad idea for our dataset Overall : the best approach is to analyse several different WEKA modules and compare results to focus efforts and find the best solution. • Graphs of properties: can indicate most important factors to be classified • Classification algorithms: to build a model • Testing the model is also crucial.
  • 30. Conclusions based on data • Dataset suitability - probably more suited to classification than clustering • Some prediction was possible • External factors - other changes in the transport network, cycling for health, city events • Other possible analysis: usage by hour, casual users • Applications: Smart cities & planning - effective bikeshare provision