Weka is a machine learning software written in Java that can be used for data mining tasks. It contains tools for data pre-processing, classification, regression, clustering, association rules, and feature selection. The document discusses loading data formats into Weka, performing basic preprocessing and classification, attribute selection using filter and wrapper approaches, and filtering datasets using supervised and unsupervised filters. Examples of these techniques are demonstrated.
It’s a data mining/machine learning tool developed by Department of
Computer Science, University of Waikato, New Zealand.
Weka is a collection of machine learning algorithms for data mining tasks.
Weka is open source software issued under the GNU General Public License
It’s a data mining/machine learning tool developed by Department of
Computer Science, University of Waikato, New Zealand.
Weka is a collection of machine learning algorithms for data mining tasks.
Weka is open source software issued under the GNU General Public License
This presentation would give a simple introduction to perform some basic data analysis using WEKA. Since it contains an image guiding criteria it would be very easy for the beginners.
Hinf6210 Project Classification Of Breast Cancer Dataset Abel Gebreyesus
Breast cancer treatment is one of the medical mysteries, yet unresolved challenge for medical practitioners. The key for better treatment is early diagnosis and treatment. However, even after early diagnosis and treatment, there is high chance of recurrence. By making early prognosis, thus, patients can get better treatment. Data mining, as a knowledge mining field, can contribute on better prognosis with better accuracy rate of prediction. In this report, working on WEKA software, we are trying to show on how to get a decision tree with better accuracy rate. Dealing with the Wisconsin Breast Cancer Database, collected by Dr. William H. Wolberg, University of Wisconsin Hospitals, we will discuss on how we a decision tree data mining technique gives better prediction tool.
This presentation would give a simple introduction to perform some basic data analysis using WEKA. Since it contains an image guiding criteria it would be very easy for the beginners.
Hinf6210 Project Classification Of Breast Cancer Dataset Abel Gebreyesus
Breast cancer treatment is one of the medical mysteries, yet unresolved challenge for medical practitioners. The key for better treatment is early diagnosis and treatment. However, even after early diagnosis and treatment, there is high chance of recurrence. By making early prognosis, thus, patients can get better treatment. Data mining, as a knowledge mining field, can contribute on better prognosis with better accuracy rate of prediction. In this report, working on WEKA software, we are trying to show on how to get a decision tree with better accuracy rate. Dealing with the Wisconsin Breast Cancer Database, collected by Dr. William H. Wolberg, University of Wisconsin Hospitals, we will discuss on how we a decision tree data mining technique gives better prediction tool.
In this talk, I have explained about feature selection, extraction with emphasis to image processing. Methods such as Principal Component Analysis, Canonical ANalysis are explained with numerical examples.
Optimization Technique for Feature Selection and Classification Using Support...IJTET Journal
Abstract— Classification problems often have a large number of features in the data sets, but only some of them are useful for classification. Data Mining Performance gets reduced by Irrelevant and redundant features. Feature selection aims to choose a small number of relevant features to achieve similar or even better classification performance than using all features. It has two main objectives are maximizing the classification performance and minimizing the number of features. Moreover, the existing feature selection algorithms treat the task as a single objective problem. Selecting attribute is done by the combination of attribute evaluator and search method using WEKA Machine Learning Tool. We compare SVM classification algorithm to automatically classify the data using selected features with different standard dataset.
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. The focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation into Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State Universitydhabalia
The software Quality Analysis is a measure of properties of a piece of software or its
specifications. The direct measurement of software quality is quite difficult due to lack of
quality factor measurement. To resolve this measurement problem, there is a model which
measures the quality of the software in terms of the attributes, specifications and
characteristics. This model is known as LSP (Logic Score Preference) .When client gives
specifications of the software to the developer then client expects the good quality of
software from developers. Hence, to decide the quality of software we can use this LSP
model.
This model validates following software quality attributes.
(1) Functionality
Suitability
Accuracy
Security
Interoperability
Compliance
(2) Usability
Understandability
Learn ability
Operability
(3) Performance
Processing time
Throughput
Resource consumption
(4) Maintainability
(5) Portability
(6) Reusability
In LSP, the features are decomposed into above aggregation blocks. And this decomposition
continues with in the each block until the all the lowest level features are directly measurable
and makes tree of decomposed features. And for each feature, an elementary criterion is
defined. And LSP calculates elementary preference for each criterion and then aggregate all
of them to calculate final global preference. And this global preference shows the quality of
the software. We can calculate global preference for different systems and we can analyze
and compare the systems’ quality.
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. Building on the introduction the focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation in to Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored, such as how it works, and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
UiPath Studio Web workshop series - Day 4DianaGray10
📣 Welcome to Day 4 of the UiPath Studio Web Workshop. In this session, we will focus on sharpening your data manipulation skills in UiPath Studio Web. Join us as we dive into tasks that involve various string operations, including length checks, substring extraction, index manipulation, handling last letters, and executing replace functions.
👉 Topics covered:
📌 Task1: String Manipulations
Assigning a Set of Strings
Performing Length Check, Substring Extraction, Index Manipulation, Handling Last Letters, and Removal Operations
📌 Task2: Word Replacement Magic
Assigning Input Value
Executing Replace Function to Substitute a word with another
Speakers:
Vajrang Billlakurthi, Digital Transformation Leader, Vajrang IT Services Pvt Ltd. and UiPath MVP
Swathi Nelakurthi, Associate Automation Developer, Vajrang IT Services Pvt Ltd
Rahul Goyal, SR. Director, ERP Systems, Ellucian and UiPath MVP
👉 Visit the series page to register to all events.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
2. Weka
• The software: Waikato Environment for
Knowledge Analysis
– Machine learning/data mining software written in
Java (distributed under the GNU Public License)
• The bird: an endemic bird of New Zealand
3. Outline
• ARFF format and loading files to Weka
• Basic preprocess and classifier Demo
• Attribute selection & Demo
• Filtering datasets & Demo
5. Attribute-Relation File Format (ARFF)
• Two distinct sections
– Header & Data
• Four data types supported
– numeric
– <nominal-specification>
– string
– date [<date-format>]
• E.g.: DATE "yyyy-MM-dd HH:mm:ss"
(http://www.cs.waikato.ac.nz/ml/weka/arff.html)
6. Converting Files to ARFF
• Weka has converters for the following file
formats:
– Spreadsheet files with extension .csv.
– C4.5’s native file format with extensions .names
and .data.
– Serialized instances with extension .bsi.
– LIBSVM format files with extension .libsvm.
– SVM-Light format files with extension .dat.
– XML-based ARFF format files with extension .xrff.
(Witten, Frank & Witten, 2011)
27. Why Feature Selection
• Not all the features contained in the datasets
of a classification problem are useful
• Redundant or irrelevant features may even
reduce the classification performance
• Eliminating noisy and unnecessary features
can
– Improve classification performance
– Make learning and executing processes faster
– Simplify the structure of the learned models
28. Feature Selection
• Two categories of feature selection
– Wrapper approaches:
• Conduct a search for the best feature subset using the learning
algorithm itself as part of the evaluation function
• A feature selection algorithm exists as a wrapper around a learning
algorithm
– Filter approaches:
• Independent of a learning algorithm
• Argued to be computationally less expensive and more general
• By considering the performance of the selected feature
subset on a particular learning algorithm, wrappers can
usually achieve better results than filter approaches
30. Filter: one example
• One algorithm that falls into the filter approach: the
FOCUS algorithm
– Exhaustively examines all subsets of features, selecting the
minimal subset of features that is sufficient to determine
the label value for all instances in the training set.
– May introduces the MIN-FEATURES bias.
– For example, in a medical diagnosis task, a set of features
describing a patient might include the patient’s social
security number (SSN). When FOCUS searches for the
minimum set of features, it will pick the SSN as the only
feature needed to uniquely determine the label. Given
only the SSN, any induction algorithm is expected to
generalize very poorly.
(Kohavi & John, 1997)
31. Searching Attribute Space
• The size of search space for n features is 2n, so it is
impractical to search the whole space exhaustively in
most situations
• Single Feature Ranking
– A relaxed version of feature selection that only requires
the computation of the relative importance of the features
and subsequently sorting them
– Computationally cheap, but the combination of the top-
ranked features may be a redundant subset
• Feature Subset Ranking, such as
– Greedy Algorithms
– Genetic Algorithm (GA)
32. WEKA Attribute Selection Function
• Two ways to do attribute selection:
– Normally done by searching the space of attribute
subsets, evaluating each one (Feature Subset Ranking)
• By combining 1 attribute subset evaluator and 1 search
method
– A potentially faster but less accurate approach is to
evaluate the attributes individually and sort them,
discarding attributes that fall below a chosen cutoff
point (Single Feature Ranking)
• By using 1 single-attribute evaluator and the ranking
method
33. Two Wrapper Methods in Weka
• ClassifierSubsetEval
– Use a classifier, specified in the object editor as a
parameter, to evaluate sets of attributes on the
training data or on a separate holdout set.
• WrapperSubsetEval
– Also use a classifier to evaluate attribute sets, but
employ cross-validation to estimate the accuracy
of the learning scheme for each set
49. Filtering Algorithms
• There are two kinds of filter
– Supervised : taking advantage of the class
information. A class must be assigned. Default
behavior uses the last attribute as class.
– Unsupervised: A class is not taking into consideration
here.
• Both unsupervised and supervised filters have
– Attribute filters, which work on the attributes in the
datasets, and
– Instance filters, which work on the instances
50. Unsupervised Attribute Filters
• Including operations of
– Adding and Removing Attributes
– Changing Values
– Converting attributes from one form to another
– Converting multi-instance data into single-
instance format
– Working with time series data
– Randomizing
51. (Witten, Frank & Witten, 2011)
This one will
be used in the
Demo.
68. Set the “attributeIndex” to 2 (the
“temperature” attribute) and the
“nominalIndices” to 1 (which
means to remove all the instances
with label (-inf-68.2].)
71. Then when you do the
classification, it will be based
on the filtered datasets, as
shown here.
72. Resources
• Weka official website:
http://www.cs.waikato.ac.nz/ml/weka/
• Two Weka tutorials on YouTube:
– https://www.youtube.com/user/WekaMOOC
– https://www.youtube.com/user/rushdishams/videos
• Book: Data Mining:
Practical Machine Learning Tools and Techniques.
Please refer to
http://www.cs.waikato.ac.nz/ml/weka/book.html
for more details.
73. References
• Frank, E., Machine Learning with WEKA. Retrieved April 05, 2014
from http://www.cs.waikato.ac.nz/ml/weka/documentation.html
• Kohavi, R. & John, G.H. (1997), Wrappers for feature subset
selection, Articial Intelligence 97, 315–333.
• Reservoir sampling. Retrieved April 05, 2014, from
http://en.wikipedia.org/wiki/Reservoir_sampling
• Witten, I. H., Frank, E., Hall, M. (2011) Data Mining: Practical
Machine Learning Tools and Techniques (Third Edition). Morgan
Kaufmann.
• Xue, B., Zhang, M., & Browne, W. N. (2012). Single feature ranking
and binary particle swarm optimisation based feature subset
ranking for feature selection. Paper presented at the Proceedings of
the Thirty-fifth Australasian Computer Science Conference - Volume
122, Melbourne, Australia.