SlideShare a Scribd company logo
WEKA- TOOL FOR DATA MINING




                       SUBMITTED BY :
                       DIVYA HAMIRWASIA
                       10BM60025
INTRODUCTION
 Waikato Environment for Knowledge Analysis (WEKA) is a free and open source data mining
tool. Data mining is the transformation of large amounts of data into meaningful patterns and
rules, results of which could be used to take important business decisions. The ultimate goal of
data mining is to create a model, a model that can improve the way you read and interpret your
existing data and your future data. WEKA is the product of the University of Waikato (New
Zealand) and was first implemented in its modern form in 1997. It uses the GNU General Public
License (GPL). The software is written in the Java language and contains a GUI for interacting
with data files and producing visual results.

Advantages of Weka include:

       free availability under the GNU General Public License
       portability, since it is fully implemented in the Java programming language and thus runs
       on almost any modern computing platform
       a comprehensive collection of data preprocessing and modeling techniques
       ease of use due to its graphical user interfaces

REGRESSION ANALYSIS
Linear regression is an approach to modeling the relationship between a scalar dependent
variable y and one or more explanatory variables denoted X.
The data we use here is trying to establish a relation between the number of people employeed
and:
       the percentage price deflation
       the GNP in millions of dollars
       the number of unemployed in thousands
       the number of people employed by the military
       the number of people over 14
       the year

REGRESSION IN WEKA:
Load the data by clicking on the preprocess tab. Click on the open file and choose the target
folder and then the requires .arff file. After selecting the file, your WEKA Explorer should look
similar to the screenshot:
To create the model, click on the Classify tab. The first step is to select the model we want to
build, so WEKA knows how to work with the data, and how to create the appropriate model:
        Click the Choose button, then expand the functions branch.
        Select the LinearRegression leaf.
Now that the desired model has been chosen, we have to tell WEKA where the data is that it
should use to build the model. Though it may be obvious to us that we want to use the data we
supplied in the ARFF file, there are actually different options, some more advanced than what
we'll be using. The other three choices are Supplied test set, where you can supply a different
set of data to build the model; Cross-validation, which lets WEKA build a model based on
subsets of the supplied data and then average them out to create a final model;
and Percentage split, where WEKA takes a percentile subset of the supplied data to build a final
model. These other choices are useful with different models, which we'll see in future articles.
With regression, we can simply choose Use training set. This tells WEKA that to build our
desired model, we can simply use the data set we supplied in our ARFF file.
Select number of people employed as the dependent variable. Click start. The result is as
follows:
The result is :
Number of people employed =206.3701 * percentage price deflation -1.2427 * number of
people unemployed -0.5971 * number of people employed by the military + 0.3079 * number
of people over 14 + 13699.5644


CLUSTER ANALYSIS
Clustering allows a user to make groups of data to determine patterns from the data. Clustering
has its advantages when the data set is defined and a general pattern needs to be determined
from the data. The data set used for clustering example focuses on the BMW dealership . The
dealership has kept track of how people walk through the dealership and the showroom, what
cars they look at, and how often they ultimately make purchases. They are hoping to mine this
data by finding patterns in the data and by using clusters to determine if certain behaviors in
their customers emerge. There are 100 rows of data in this sample.

CLUSTERING IN WEKA
Load the data into WEKA from the bmw.arff file. To do so click on the preprocess tab and then
click on the open file button. Select the target folder and select the needed file. Once the file is
opened all the attributes will be listed as follows:
Next, click on the cluster tab. Click Choose and select SimpleKMeans from the choices
that appear.




Finally, we want to adjust the attributes of our cluster algorithm by clicking SimpleKMeans.
The only attribute which needs to be adjusted is the numClusters field which lets us decide
how many clusters we want. We set this value as 5 here.
We click start and the clustering is done. The result is as follows:
INTERPRETATION OF THE RESULT:
 Each cluster shows us a type of behavior in our customers, from which we can begin to draw
some conclusions:
        Cluster 0 — The people in this group appear to wander around the dealership, looking
        at cars parked outside on the lots, but trail off when it comes to coming into the
        dealership, and worst of all, they don't purchase anything.
        Cluster 1 — this group people tend to walk straight to the M5s, ignoring the 3-series
        cars and the Z4. However, they don't have a high purchase rate — only 52 percent. This
        is a potential problem and could be a focus for improvement for the dealership, perhaps
        by sending more salespeople to the M5 section.
        Cluster 2 — they aren't statistically relevant, and we can't draw any good conclusions
        from their behavior. (This happens sometimes with clusters and may indicate that you
        should reduce the number of clusters you've created).
        Cluster 3 —they always end up purchasing a car and always end up financing it. Here's
        where the data shows us some interesting things: It appears they walk around the lot
        looking at cars, then turn to the computer search available at the dealership. Ultimately,
        they tend to buy M5s or Z4s (but never 3-series). This cluster tells the dealership that it
        should consider making its search computers more prominent around the lots (outdoor
        search computers?), and perhaps making the M5 or Z4 much more prominent in the
        search results. Once the customer has made up his mind to purchase the vehicle, he
        always qualifies for financing and completes the purchase.
        Cluster 4 — they always look at the 3-series and never look at the much more expensive
        M5. They walk right into the showroom, choosing not to walk around the lot and tend to
        ignore the computer search terminals. While 50 percent get to the financing stage, only
        32 percent ultimately finish the transaction. The dealership could draw the conclusion
        that these customers looking to buy their first BMWs know exactly what kind of car they
        want (the 3-series entry-level model) and are hoping to qualify for financing to be able
        to afford it. The dealership could possibly increase sales to this group by relaxing their
        financing standards or by reducing the 3-series prices.

Other interesting way to examine the data in these clusters is to inspect it visually. To do this,
right-click on the Result List section of the Cluster tab and then click on the Visualize Cluster
Assignments. Change the X axis to be M5 (Num), the Y axis to Purchase (Num), and the Color
to Cluster (Nom). This will show us in a chart how the clusters are grouped in terms of who
looked at the M5 and who purchased one. We can see in the X=1, Y=1 point (those who looked
at M5s and made a purchase) that the only clusters represented here are 1 and 3. We also see
that the only clusters at point X=0, Y=0 are 4 and 0. It matches with our above result. Clusters 1
and 3 were buying the M5s, while cluster 0 wasn't buying anything, and cluster 4 was only
looking at the 3-series. Figure shows the visual cluster layout for our example.
Weka_10BM60025_VGSOM

More Related Content

Similar to Weka_10BM60025_VGSOM

Salesforce crm review
Salesforce crm reviewSalesforce crm review
Salesforce crm review
FindMyCRM
 
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxCopyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
dickonsondorris
 
Intelligent Sales Execution with SAP Sales Cloud for Smarter and Faster Sales
Intelligent Sales Execution with SAP Sales Cloud for Smarter and Faster SalesIntelligent Sales Execution with SAP Sales Cloud for Smarter and Faster Sales
Intelligent Sales Execution with SAP Sales Cloud for Smarter and Faster Sales
Ashish Saxena
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
Kate Subramanian
 
Modern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docxModern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docx
SasikalaKumaravel2
 
Applying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profitApplying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profit
Andy Twigg
 
Applying Data Science - for Fun and Profit
Applying Data Science - for Fun and ProfitApplying Data Science - for Fun and Profit
Applying Data Science - for Fun and Profit
C9 Inc
 
Watson Analytic
Watson AnalyticWatson Analytic
Watson Analytic
Shaily Dubey
 
Machine Learning for Lead Qualification
Machine Learning for Lead QualificationMachine Learning for Lead Qualification
Machine Learning for Lead Qualification
Rosanna Garcia
 
A guide to preparing your data for tableau
A guide to preparing your data for tableauA guide to preparing your data for tableau
A guide to preparing your data for tableau
Phillip Reinhart
 
mrkt354lecture4i-140209143215-phpapp02.pptx
mrkt354lecture4i-140209143215-phpapp02.pptxmrkt354lecture4i-140209143215-phpapp02.pptx
mrkt354lecture4i-140209143215-phpapp02.pptx
NeelamSheoliha2
 
Software EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docxSoftware EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docx
whitneyleman54422
 
Intelligent design tableau 3
Intelligent design tableau 3Intelligent design tableau 3
Intelligent design tableau 3
alok khobragade
 
22 Visual Displays Of Marketing Research Insights
22 Visual Displays Of Marketing Research Insights22 Visual Displays Of Marketing Research Insights
22 Visual Displays Of Marketing Research Insights
Michael Lieberman
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
skewdlogix
 
Whitepaper channel cloud computing paper 2
Whitepaper channel cloud computing paper 2Whitepaper channel cloud computing paper 2
Whitepaper channel cloud computing paper 2
Ian Moyse ☁
 
SalesForce Analytics Cloud Final Presentation
SalesForce Analytics Cloud Final PresentationSalesForce Analytics Cloud Final Presentation
SalesForce Analytics Cloud Final Presentation
Elliot Mar
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
Domino Data Lab
 
Using Safyr to navigate and analyse SAP data model demonstration screen shots
Using Safyr to navigate and analyse SAP data model demonstration screen shotsUsing Safyr to navigate and analyse SAP data model demonstration screen shots
Using Safyr to navigate and analyse SAP data model demonstration screen shots
Roland Bullivant
 
Chapter 3 • Nature of Data, Statistical Modeling, and Visuali.docx
Chapter 3 • Nature of Data, Statistical Modeling, and Visuali.docxChapter 3 • Nature of Data, Statistical Modeling, and Visuali.docx
Chapter 3 • Nature of Data, Statistical Modeling, and Visuali.docx
poulterbarbara
 

Similar to Weka_10BM60025_VGSOM (20)

Salesforce crm review
Salesforce crm reviewSalesforce crm review
Salesforce crm review
 
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxCopyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
 
Intelligent Sales Execution with SAP Sales Cloud for Smarter and Faster Sales
Intelligent Sales Execution with SAP Sales Cloud for Smarter and Faster SalesIntelligent Sales Execution with SAP Sales Cloud for Smarter and Faster Sales
Intelligent Sales Execution with SAP Sales Cloud for Smarter and Faster Sales
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
 
Modern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docxModern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docx
 
Applying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profitApplying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profit
 
Applying Data Science - for Fun and Profit
Applying Data Science - for Fun and ProfitApplying Data Science - for Fun and Profit
Applying Data Science - for Fun and Profit
 
Watson Analytic
Watson AnalyticWatson Analytic
Watson Analytic
 
Machine Learning for Lead Qualification
Machine Learning for Lead QualificationMachine Learning for Lead Qualification
Machine Learning for Lead Qualification
 
A guide to preparing your data for tableau
A guide to preparing your data for tableauA guide to preparing your data for tableau
A guide to preparing your data for tableau
 
mrkt354lecture4i-140209143215-phpapp02.pptx
mrkt354lecture4i-140209143215-phpapp02.pptxmrkt354lecture4i-140209143215-phpapp02.pptx
mrkt354lecture4i-140209143215-phpapp02.pptx
 
Software EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docxSoftware EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docx
 
Intelligent design tableau 3
Intelligent design tableau 3Intelligent design tableau 3
Intelligent design tableau 3
 
22 Visual Displays Of Marketing Research Insights
22 Visual Displays Of Marketing Research Insights22 Visual Displays Of Marketing Research Insights
22 Visual Displays Of Marketing Research Insights
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Whitepaper channel cloud computing paper 2
Whitepaper channel cloud computing paper 2Whitepaper channel cloud computing paper 2
Whitepaper channel cloud computing paper 2
 
SalesForce Analytics Cloud Final Presentation
SalesForce Analytics Cloud Final PresentationSalesForce Analytics Cloud Final Presentation
SalesForce Analytics Cloud Final Presentation
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
Using Safyr to navigate and analyse SAP data model demonstration screen shots
Using Safyr to navigate and analyse SAP data model demonstration screen shotsUsing Safyr to navigate and analyse SAP data model demonstration screen shots
Using Safyr to navigate and analyse SAP data model demonstration screen shots
 
Chapter 3 • Nature of Data, Statistical Modeling, and Visuali.docx
Chapter 3 • Nature of Data, Statistical Modeling, and Visuali.docxChapter 3 • Nature of Data, Statistical Modeling, and Visuali.docx
Chapter 3 • Nature of Data, Statistical Modeling, and Visuali.docx
 

Recently uploaded

বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Leena Ghag-Sakpal
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
Amin Marwan
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
BoudhayanBhattachari
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 

Weka_10BM60025_VGSOM

  • 1. WEKA- TOOL FOR DATA MINING SUBMITTED BY : DIVYA HAMIRWASIA 10BM60025
  • 2. INTRODUCTION Waikato Environment for Knowledge Analysis (WEKA) is a free and open source data mining tool. Data mining is the transformation of large amounts of data into meaningful patterns and rules, results of which could be used to take important business decisions. The ultimate goal of data mining is to create a model, a model that can improve the way you read and interpret your existing data and your future data. WEKA is the product of the University of Waikato (New Zealand) and was first implemented in its modern form in 1997. It uses the GNU General Public License (GPL). The software is written in the Java language and contains a GUI for interacting with data files and producing visual results. Advantages of Weka include: free availability under the GNU General Public License portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform a comprehensive collection of data preprocessing and modeling techniques ease of use due to its graphical user interfaces REGRESSION ANALYSIS Linear regression is an approach to modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X. The data we use here is trying to establish a relation between the number of people employeed and: the percentage price deflation the GNP in millions of dollars the number of unemployed in thousands the number of people employed by the military the number of people over 14 the year REGRESSION IN WEKA: Load the data by clicking on the preprocess tab. Click on the open file and choose the target folder and then the requires .arff file. After selecting the file, your WEKA Explorer should look similar to the screenshot:
  • 3. To create the model, click on the Classify tab. The first step is to select the model we want to build, so WEKA knows how to work with the data, and how to create the appropriate model: Click the Choose button, then expand the functions branch. Select the LinearRegression leaf. Now that the desired model has been chosen, we have to tell WEKA where the data is that it should use to build the model. Though it may be obvious to us that we want to use the data we supplied in the ARFF file, there are actually different options, some more advanced than what we'll be using. The other three choices are Supplied test set, where you can supply a different set of data to build the model; Cross-validation, which lets WEKA build a model based on subsets of the supplied data and then average them out to create a final model; and Percentage split, where WEKA takes a percentile subset of the supplied data to build a final model. These other choices are useful with different models, which we'll see in future articles. With regression, we can simply choose Use training set. This tells WEKA that to build our desired model, we can simply use the data set we supplied in our ARFF file. Select number of people employed as the dependent variable. Click start. The result is as follows:
  • 4. The result is : Number of people employed =206.3701 * percentage price deflation -1.2427 * number of people unemployed -0.5971 * number of people employed by the military + 0.3079 * number of people over 14 + 13699.5644 CLUSTER ANALYSIS Clustering allows a user to make groups of data to determine patterns from the data. Clustering has its advantages when the data set is defined and a general pattern needs to be determined from the data. The data set used for clustering example focuses on the BMW dealership . The dealership has kept track of how people walk through the dealership and the showroom, what cars they look at, and how often they ultimately make purchases. They are hoping to mine this data by finding patterns in the data and by using clusters to determine if certain behaviors in their customers emerge. There are 100 rows of data in this sample. CLUSTERING IN WEKA Load the data into WEKA from the bmw.arff file. To do so click on the preprocess tab and then click on the open file button. Select the target folder and select the needed file. Once the file is opened all the attributes will be listed as follows:
  • 5. Next, click on the cluster tab. Click Choose and select SimpleKMeans from the choices that appear. Finally, we want to adjust the attributes of our cluster algorithm by clicking SimpleKMeans. The only attribute which needs to be adjusted is the numClusters field which lets us decide how many clusters we want. We set this value as 5 here.
  • 6. We click start and the clustering is done. The result is as follows:
  • 7. INTERPRETATION OF THE RESULT: Each cluster shows us a type of behavior in our customers, from which we can begin to draw some conclusions: Cluster 0 — The people in this group appear to wander around the dealership, looking at cars parked outside on the lots, but trail off when it comes to coming into the dealership, and worst of all, they don't purchase anything. Cluster 1 — this group people tend to walk straight to the M5s, ignoring the 3-series cars and the Z4. However, they don't have a high purchase rate — only 52 percent. This is a potential problem and could be a focus for improvement for the dealership, perhaps by sending more salespeople to the M5 section. Cluster 2 — they aren't statistically relevant, and we can't draw any good conclusions from their behavior. (This happens sometimes with clusters and may indicate that you should reduce the number of clusters you've created). Cluster 3 —they always end up purchasing a car and always end up financing it. Here's where the data shows us some interesting things: It appears they walk around the lot looking at cars, then turn to the computer search available at the dealership. Ultimately, they tend to buy M5s or Z4s (but never 3-series). This cluster tells the dealership that it should consider making its search computers more prominent around the lots (outdoor search computers?), and perhaps making the M5 or Z4 much more prominent in the search results. Once the customer has made up his mind to purchase the vehicle, he always qualifies for financing and completes the purchase. Cluster 4 — they always look at the 3-series and never look at the much more expensive M5. They walk right into the showroom, choosing not to walk around the lot and tend to ignore the computer search terminals. While 50 percent get to the financing stage, only 32 percent ultimately finish the transaction. The dealership could draw the conclusion that these customers looking to buy their first BMWs know exactly what kind of car they want (the 3-series entry-level model) and are hoping to qualify for financing to be able to afford it. The dealership could possibly increase sales to this group by relaxing their financing standards or by reducing the 3-series prices. Other interesting way to examine the data in these clusters is to inspect it visually. To do this, right-click on the Result List section of the Cluster tab and then click on the Visualize Cluster Assignments. Change the X axis to be M5 (Num), the Y axis to Purchase (Num), and the Color to Cluster (Nom). This will show us in a chart how the clusters are grouped in terms of who looked at the M5 and who purchased one. We can see in the X=1, Y=1 point (those who looked at M5s and made a purchase) that the only clusters represented here are 1 and 3. We also see that the only clusters at point X=0, Y=0 are 4 and 0. It matches with our above result. Clusters 1 and 3 were buying the M5s, while cluster 0 wasn't buying anything, and cluster 4 was only looking at the 3-series. Figure shows the visual cluster layout for our example.