SlideShare a Scribd company logo
1 of 9
Data mining technique using WEKA




      IT for Business Intelligence




                        Submitted By:-

                        Siddharth Verma   10BM60086
WEKA
Data mining isn't solely the domain of big companies and expensive software. In fact, there's a
piece of software that does almost all the same things as these expensive pieces of software —
the software is called WEKA. WEKA is the product of the University of Waikato (New Zealand)
and was first implemented in its modern form in 1997. It uses the GNU General Public License
(GPL). The software is written in the Java™ language and contains a GUI for interacting with
data files and producing visual results (think tables and curves). It's Java-based, so if we don't
have a JRE installed on your computer, download the WEKA version that contains the JRE, as
well.To load data into WEKA, we have to put it into a format that will be understood. WEKA's
preferred method for loading data is in the Attribute-Relation File Format (ARFF), where we can
define the type of data being loaded, then
supply the data itself.

When we start WEKA, the GUI chooser
pops up as shown in figure

It lets us choose four ways to work with
WEKA and our data. The four ways are

       Explorer
       Experimenter
       Knowledge Flow
       Simple CLI



REGRESSION
Regression is the easiest technique to use, but is also probably the least powerful. In effect,
regression models all fit the same general pattern. There are a number of independent
variables, which, when taken together, produce a result — a dependent variable. The
regression model is then used to predict the result of an unknown dependent variable, given
the values of the independent variables.

We will perform Regression on the Colleges data comparing them on basis of various attributes.
Various attributes are:-

              School: Contains the name of each school
   School_Type: Coded 'LibArts' for liberal arts and 'Univ' for university
              SAT: Median combined Math and Verbal SAT score of students
              Acceptance: % of applicants accepted
              $/Student: Money spent per student in dollars
              Top 10%: % of students in the top 10% of their h.s. graduating class
              %PhD: % of faculty at the institution that have PhD degrees
              Grad%: % of students at institution who eventually graduate

To create our regression model, start WEKA, then choose the Explorer. In the Explorer screen,
select the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the
file the explorer window looks as below




In the left section of the Explorer window, it outlines all of the columns in the data (Attributes)
and the number of rows of data supplied (Instances). By selecting each column, the right
section of the Explorer window will also give the information about the data in that column of
the data set. For example, by selecting the SAT column in the left section the right-section
should change to show the additional statistical information about the column. Finally, there's
a visual way of examining the data, which can be viewed by clicking the Visualize All button.
To create the model, click on the Classify tab. The first step is to select the model we want to
build, so WEKA knows how to work with the data, and how to create the appropriate model:
    1. Click the Choose button, then expand the functions branch.
    2. Select the LinearRegression leaf.




This tells WEKA that we want to build a regression model.
Though it may be obvious to us that we want to use the data we supplied in the ARFF file, there
are actually different options than what we'll be using. The other three choices are Supplied
test set, where we can supply a different set of data to build the model;Cross-validation, which
lets WEKA build a model based on subsets of the supplied data and then average them out to
create a final model; and Percentage split, where WEKA takes a percentile subset of the
supplied data to build a final model. With regression, we can simply choose Use training set.




Finally, Choosing no attribute method to determine each attributes contribution to regression.
The last step to creating our model is to choose the dependent variable one by one all
numerical attributes.
Right below the test options, there's a combo box that lets you choose the dependent variable.
Choosing SAT as dependent variable .To create our model, click Start. Figure below shows the
output window




INTERPRETATION OF THE RESULT:

SAT = ……..+ (30.6632* School) + (-1.245* School type) + (0.0609* Acceptance) +

              (0.0341* Top) +(0.064* 10%) +(0.1479* PHD%) – 1089.2569

Interpreting the pattern and conclusion that our model generated we see that besides just a
strict house value:
     SAT affects choice of School — WEKA tells us that Sat score affects choice of school the
        most
     School Type do not matter — Since we use a simple 0 or 1 value for an upgraded
        bathroom, we can use the coefficient from the regression model to determine the value
        of an upgraded bathroom on the house value.
     SAT score has no correlation with money spent per student. — WEKA will only use
        columns that statistically contribute to the accuracy of the model. It will throw out and
        ignore columns that don't help in creating a good model. So this regression model is
        telling us that no effect of SAT score on relation with $ spent on students.
CLUSTERING
Clustering is the task of assigning a set of objects into groups (called clusters) so that the
objects in the same cluster are more similar (in some sense or another) to each other than to
those in other clusters. WEKA offers clustering capabilities not only as standalone schemes, but
also as filters and classifiers.

To begin with clustering we will use the data set of places. The data set contains places
classified on bases of – The nine rating criteria used by Places Rated Almanac are:

Climate & Terrain, Housing, Health Care & Environment, Crime, Transportation, Education, The
Arts, Recreation and Economics

To create clustering, start WEKA, then choose the Explorer. In the Explorer screen, select
the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the file
the explorer window looks as below




To perform clustering, select the "Cluster" tab in the Explorer and click on the "Choose" button.
This results in a drop down list of available clustering algorithms. In this case select
"SimpleKMeans". Next, click on the text box to the right of the "Choose" button to get the pop-
up window shown in Figure below, for editing the clustering parameter.
In the pop-up window we enter 5 as the number of clusters (instead of the default values of 2)
and we leave the value of "seed" as is. The seed value is used in generating a random number
which is, in turn, used for making the initial assignment of instances to clusters.




Once the options have been specified, we can run the clustering algorithm. Here we make sure
that in the "Cluster Mode" panel, the "Use training set" option is selected, and we click "Start".
We can right click the result set in the "Result list" panel and view the results of clustering in a
separate window.

We can choose the cluster number and any of the other attributes for each of the three
different dimensions available (x-axis, y-axis, and color).
Different combinations of choices will result in a visual rendering of different relationships
within each cluster.

INTERPRETING THE RESULT:

Each cluster shows us a type of behavior in our customers, from which we can begin to draw
some conclusions:

Cluster 0: Transportation facility is best in this place and it excels in educations as well. This place is rich
in arts and recreations as well.

Cluster 1: This place has least crime. Relatively low utility bills, property taxes, mortgage payments
makes it favorable place to live in.

Cluster 2: This place has highest violent crime rate and property crime rate. People in this place have to
pay highest utility bills, property taxes, mortgage payments but it has best climate and terrain and best
health care and environment too. Transport facilities are also good. This place is rich in arts and
recreations various venues are available which are best in categories.

Cluster 3: This place has least utility bills, property taxes, mortgage payments and therefore favorable.
Transport facilities are in bad shape. Little or no avenues of arts and recreation available. Lowest
average household income among all.

Cluster 4: This place has high crime rate. Health care and Environment is worse among all places and so
are education facilities. Highest average household income among all.

More Related Content

What's hot

Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKAbutest
 
Presentation on spss
Presentation on spssPresentation on spss
Presentation on spssalfiyajamalcj
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
SPSS an intro...
SPSS an intro...SPSS an intro...
SPSS an intro...Jithin Zcs
 
Interaction Modeling
Interaction ModelingInteraction Modeling
Interaction ModelingHemant Sharma
 
Creating a Histogram in SPSS
Creating a Histogram in SPSSCreating a Histogram in SPSS
Creating a Histogram in SPSSflorentinoz
 
Use case Diagram and Sequence Diagram
Use case Diagram and Sequence DiagramUse case Diagram and Sequence Diagram
Use case Diagram and Sequence DiagramNikhil Pandit
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorialbutest
 
What Is the Use of SPSS in Data Analysis
What Is the Use of SPSS in Data AnalysisWhat Is the Use of SPSS in Data Analysis
What Is the Use of SPSS in Data AnalysisSPSSResearch
 

What's hot (17)

Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
Presentation on spss
Presentation on spssPresentation on spss
Presentation on spss
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
SPSS an intro...
SPSS an intro...SPSS an intro...
SPSS an intro...
 
Spss
SpssSpss
Spss
 
Fundamental of SPSS
Fundamental of SPSSFundamental of SPSS
Fundamental of SPSS
 
SPSS
SPSSSPSS
SPSS
 
Interaction Modeling
Interaction ModelingInteraction Modeling
Interaction Modeling
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
Creating a Histogram in SPSS
Creating a Histogram in SPSSCreating a Histogram in SPSS
Creating a Histogram in SPSS
 
Use case Diagram and Sequence Diagram
Use case Diagram and Sequence DiagramUse case Diagram and Sequence Diagram
Use case Diagram and Sequence Diagram
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
 
Uml
UmlUml
Uml
 
Weka
Weka Weka
Weka
 
What Is the Use of SPSS in Data Analysis
What Is the Use of SPSS in Data AnalysisWhat Is the Use of SPSS in Data Analysis
What Is the Use of SPSS in Data Analysis
 
XL-MINER:Partition
XL-MINER:PartitionXL-MINER:Partition
XL-MINER:Partition
 
WEKA: Introduction To Weka
WEKA: Introduction To WekaWEKA: Introduction To Weka
WEKA: Introduction To Weka
 

Viewers also liked

LIMITED FREE SPEECH IN AMERICA
LIMITED FREE SPEECH IN AMERICALIMITED FREE SPEECH IN AMERICA
LIMITED FREE SPEECH IN AMERICAguestc48e0c
 
EWMA 2013 - Ep497 - Evidence Based Assessment of Moist Wound Healing Dressing...
EWMA 2013 - Ep497 - Evidence Based Assessment of Moist Wound Healing Dressing...EWMA 2013 - Ep497 - Evidence Based Assessment of Moist Wound Healing Dressing...
EWMA 2013 - Ep497 - Evidence Based Assessment of Moist Wound Healing Dressing...EWMAConference
 
Presentación 19
Presentación 19Presentación 19
Presentación 19lupemm
 
Educacion a distancia
Educacion a distanciaEducacion a distancia
Educacion a distanciayevireza89
 
Control total
Control totalControl total
Control totalmendozahm
 
Ensayo como tener un negocion en internet
Ensayo como tener un negocion en internetEnsayo como tener un negocion en internet
Ensayo como tener un negocion en internetBullfighting
 
Vermette - Transcript - Chapter 1 what is sociology?
Vermette - Transcript - Chapter 1 what is sociology?Vermette - Transcript - Chapter 1 what is sociology?
Vermette - Transcript - Chapter 1 what is sociology?Linda Vermette
 

Viewers also liked (7)

LIMITED FREE SPEECH IN AMERICA
LIMITED FREE SPEECH IN AMERICALIMITED FREE SPEECH IN AMERICA
LIMITED FREE SPEECH IN AMERICA
 
EWMA 2013 - Ep497 - Evidence Based Assessment of Moist Wound Healing Dressing...
EWMA 2013 - Ep497 - Evidence Based Assessment of Moist Wound Healing Dressing...EWMA 2013 - Ep497 - Evidence Based Assessment of Moist Wound Healing Dressing...
EWMA 2013 - Ep497 - Evidence Based Assessment of Moist Wound Healing Dressing...
 
Presentación 19
Presentación 19Presentación 19
Presentación 19
 
Educacion a distancia
Educacion a distanciaEducacion a distancia
Educacion a distancia
 
Control total
Control totalControl total
Control total
 
Ensayo como tener un negocion en internet
Ensayo como tener un negocion en internetEnsayo como tener un negocion en internet
Ensayo como tener un negocion en internet
 
Vermette - Transcript - Chapter 1 what is sociology?
Vermette - Transcript - Chapter 1 what is sociology?Vermette - Transcript - Chapter 1 what is sociology?
Vermette - Transcript - Chapter 1 what is sociology?
 

Similar to Weka term paper(siddharth 10 bm60086)

TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKAFayan TAO
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Saurabh Singh
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advancedexcel content
 
Introduction to weka
Introduction to wekaIntroduction to weka
Introduction to wekaJK Knowledge
 
Remix Your Data: Visualizing Library Instruction Statistics
Remix Your Data: Visualizing Library Instruction StatisticsRemix Your Data: Visualizing Library Instruction Statistics
Remix Your Data: Visualizing Library Instruction StatisticsBrianna Marshall
 
DATA MINING on WEKA
DATA MINING on WEKADATA MINING on WEKA
DATA MINING on WEKAsatyamkhatri
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambastVijay Ambast
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxTask A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxjosies1
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceVenkat Projects
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparisonStratebi
 
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docxMISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docxaltheaboyer
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 

Similar to Weka term paper(siddharth 10 bm60086) (20)

TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKA
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Introduction to weka
Introduction to wekaIntroduction to weka
Introduction to weka
 
Remix Your Data: Visualizing Library Instruction Statistics
Remix Your Data: Visualizing Library Instruction StatisticsRemix Your Data: Visualizing Library Instruction Statistics
Remix Your Data: Visualizing Library Instruction Statistics
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
DATA MINING on WEKA
DATA MINING on WEKADATA MINING on WEKA
DATA MINING on WEKA
 
WEKA:The Explorer
WEKA:The ExplorerWEKA:The Explorer
WEKA:The Explorer
 
Itb weka nikhil
Itb weka nikhilItb weka nikhil
Itb weka nikhil
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambast
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxTask A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparison
 
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docxMISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Weka term paper(siddharth 10 bm60086)

  • 1. Data mining technique using WEKA IT for Business Intelligence Submitted By:- Siddharth Verma 10BM60086
  • 2. WEKA Data mining isn't solely the domain of big companies and expensive software. In fact, there's a piece of software that does almost all the same things as these expensive pieces of software — the software is called WEKA. WEKA is the product of the University of Waikato (New Zealand) and was first implemented in its modern form in 1997. It uses the GNU General Public License (GPL). The software is written in the Java™ language and contains a GUI for interacting with data files and producing visual results (think tables and curves). It's Java-based, so if we don't have a JRE installed on your computer, download the WEKA version that contains the JRE, as well.To load data into WEKA, we have to put it into a format that will be understood. WEKA's preferred method for loading data is in the Attribute-Relation File Format (ARFF), where we can define the type of data being loaded, then supply the data itself. When we start WEKA, the GUI chooser pops up as shown in figure It lets us choose four ways to work with WEKA and our data. The four ways are  Explorer  Experimenter  Knowledge Flow  Simple CLI REGRESSION Regression is the easiest technique to use, but is also probably the least powerful. In effect, regression models all fit the same general pattern. There are a number of independent variables, which, when taken together, produce a result — a dependent variable. The regression model is then used to predict the result of an unknown dependent variable, given the values of the independent variables. We will perform Regression on the Colleges data comparing them on basis of various attributes. Various attributes are:-  School: Contains the name of each school
  • 3. School_Type: Coded 'LibArts' for liberal arts and 'Univ' for university  SAT: Median combined Math and Verbal SAT score of students  Acceptance: % of applicants accepted  $/Student: Money spent per student in dollars  Top 10%: % of students in the top 10% of their h.s. graduating class  %PhD: % of faculty at the institution that have PhD degrees  Grad%: % of students at institution who eventually graduate To create our regression model, start WEKA, then choose the Explorer. In the Explorer screen, select the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the file the explorer window looks as below In the left section of the Explorer window, it outlines all of the columns in the data (Attributes) and the number of rows of data supplied (Instances). By selecting each column, the right section of the Explorer window will also give the information about the data in that column of the data set. For example, by selecting the SAT column in the left section the right-section should change to show the additional statistical information about the column. Finally, there's a visual way of examining the data, which can be viewed by clicking the Visualize All button.
  • 4. To create the model, click on the Classify tab. The first step is to select the model we want to build, so WEKA knows how to work with the data, and how to create the appropriate model: 1. Click the Choose button, then expand the functions branch. 2. Select the LinearRegression leaf. This tells WEKA that we want to build a regression model.
  • 5. Though it may be obvious to us that we want to use the data we supplied in the ARFF file, there are actually different options than what we'll be using. The other three choices are Supplied test set, where we can supply a different set of data to build the model;Cross-validation, which lets WEKA build a model based on subsets of the supplied data and then average them out to create a final model; and Percentage split, where WEKA takes a percentile subset of the supplied data to build a final model. With regression, we can simply choose Use training set. Finally, Choosing no attribute method to determine each attributes contribution to regression. The last step to creating our model is to choose the dependent variable one by one all numerical attributes.
  • 6. Right below the test options, there's a combo box that lets you choose the dependent variable. Choosing SAT as dependent variable .To create our model, click Start. Figure below shows the output window INTERPRETATION OF THE RESULT: SAT = ……..+ (30.6632* School) + (-1.245* School type) + (0.0609* Acceptance) + (0.0341* Top) +(0.064* 10%) +(0.1479* PHD%) – 1089.2569 Interpreting the pattern and conclusion that our model generated we see that besides just a strict house value:  SAT affects choice of School — WEKA tells us that Sat score affects choice of school the most  School Type do not matter — Since we use a simple 0 or 1 value for an upgraded bathroom, we can use the coefficient from the regression model to determine the value of an upgraded bathroom on the house value.  SAT score has no correlation with money spent per student. — WEKA will only use columns that statistically contribute to the accuracy of the model. It will throw out and ignore columns that don't help in creating a good model. So this regression model is telling us that no effect of SAT score on relation with $ spent on students.
  • 7. CLUSTERING Clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. WEKA offers clustering capabilities not only as standalone schemes, but also as filters and classifiers. To begin with clustering we will use the data set of places. The data set contains places classified on bases of – The nine rating criteria used by Places Rated Almanac are: Climate & Terrain, Housing, Health Care & Environment, Crime, Transportation, Education, The Arts, Recreation and Economics To create clustering, start WEKA, then choose the Explorer. In the Explorer screen, select the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the file the explorer window looks as below To perform clustering, select the "Cluster" tab in the Explorer and click on the "Choose" button. This results in a drop down list of available clustering algorithms. In this case select "SimpleKMeans". Next, click on the text box to the right of the "Choose" button to get the pop- up window shown in Figure below, for editing the clustering parameter.
  • 8. In the pop-up window we enter 5 as the number of clusters (instead of the default values of 2) and we leave the value of "seed" as is. The seed value is used in generating a random number which is, in turn, used for making the initial assignment of instances to clusters. Once the options have been specified, we can run the clustering algorithm. Here we make sure that in the "Cluster Mode" panel, the "Use training set" option is selected, and we click "Start". We can right click the result set in the "Result list" panel and view the results of clustering in a separate window. We can choose the cluster number and any of the other attributes for each of the three different dimensions available (x-axis, y-axis, and color).
  • 9. Different combinations of choices will result in a visual rendering of different relationships within each cluster. INTERPRETING THE RESULT: Each cluster shows us a type of behavior in our customers, from which we can begin to draw some conclusions: Cluster 0: Transportation facility is best in this place and it excels in educations as well. This place is rich in arts and recreations as well. Cluster 1: This place has least crime. Relatively low utility bills, property taxes, mortgage payments makes it favorable place to live in. Cluster 2: This place has highest violent crime rate and property crime rate. People in this place have to pay highest utility bills, property taxes, mortgage payments but it has best climate and terrain and best health care and environment too. Transport facilities are also good. This place is rich in arts and recreations various venues are available which are best in categories. Cluster 3: This place has least utility bills, property taxes, mortgage payments and therefore favorable. Transport facilities are in bad shape. Little or no avenues of arts and recreation available. Lowest average household income among all. Cluster 4: This place has high crime rate. Health care and Environment is worse among all places and so are education facilities. Highest average household income among all.