Business Rule Learning with Interactive Selection of Association Rules - RuleML 2014 challenge

•Download as PPTX, PDF•

0 likes•651 views

Stanislav Vojíř

Presentation of a RuleML2014 Challenge paper

Software

Business Rule Learning
with Interactive Selection
of Association Rules
Stanislav Vojíř, Přemysl Václav Duben and Tomáš Kliegr
Department of Information and Knowledge Engineering
University of Economics, Prague

Relevant paper
Learning Business Rules
with Association Rule Classifiers
Tomáš Kliegr, Jaroslav Kuchař, Davide Sottara, Stanislav Vojíř

Motivation
 2 possible scenarios
 Automatic model creation
 Data mining of rules (without support)
 Rules prunning
 User-managed model creation
 Selection of rules gained from data mining
 Manually rules input

Rule base preparation
1. Data preparation
2. Association rule mining, rule selection
3. Classification model testing
4. Ruleset editing
5. Classification model testing

Data preparation
 Data set for data mining (CSV file, MySQL source)
 Import configuration (encoding, separators, primary key)
 (Training and testing dataset)
 Preprocessing
 Columns in data set => attributes for data mining
 Numerical columns => intervals, bins of values
 Categorical columns => bins of values

Data mining
of association rules
 GUHA procedure ASSOC
 Interactive data mining task configuration
using rule pattern
 Attributes with fixed values, dynamic binning wildcard…
 Interest measures (not only confidence, support)
 Support for disjunctions, negations, brackets
 Rules selection into rule clipboard
 classification model testing
 export of rules into knowledge base

Classification model testing
 Using training dataset or testing dataset with columns
with the same names
 Rules in DRL form => testing using Drools Expert
 Conflict resolution
 Confidence  Support  First fired rule

Ruleset editing
 Not only selection of rules gained from data mining
results
 Rule editing using interactive editor
 Antecedent => Rule condition
 Consequent => Rule body

Software components
summary
 EasyMiner
 Interactive data mining system
 PHP, JavaScript + Joomla! based CMS (reports support)
 Based on LISp-Miner system
 C++, C#.NET
 GUHA procedure ASSOC
 EasyMinerCenter
 New component for background knowledge
management
 PHP
 Data saved in RDF form (using ARC2 Store)

Software components
summary
 Business rules editor
 JavaScript
 Model tester
 Java EE application based on Drools Expert component

Future work
 New data mining backend
 Support for rule prunning
 Work with background knowledge base

Demo
 Example dataset
 7 columns
(age, salary, district, amount, payments, duration, rating)
 6181 rows
 Demo screencast
 http://easyminer.eu/screencasts

Try it yourself! EasyMiner.eu
 For more information, please visit the web:
http://easyminer.eu
 Screencasts
 Demo
 Technical information and papers

Collaboration and data sharing have become core elements of biomedical research. At the same time, there is a growing understanding of privacy threats related to data sharing, especially when sensitive data from distributed sources become available for linkage. Statistical disclosure control comprises well-known data anonymization techniques that allow the protection of data by introducing fuzziness. To protect datasets from different types of threats, different privacy criteria are commonly implemented. Data anonymization is an important measure, but it is computationally complex, and it can significantly reduce the expressiveness of data. To attenuate these problems, a number of algorithms has been proposed, which aim at increasing data quality or improving efficiency. Previous evaluations of such algorithms lack a systematic approach, as they focus on specific algorithms, specific privacy criteria, and specific runtime environments. Therefore, it is difficult for decision makers to decide which algorithm is best suited for their requirements. As a first step towards a comprehensive and systematic evaluation of anonymity algorithms, we report on our ongoing efforts for providing an open source benchmark. In this contribution, we focus on optimal algorithms utilizing global recoding with full-domain generalization. We present a systematic evaluation of domain-specific algorithms and generic search methods for a broad set of privacy criteria, including k-anonymity, l-diversity, t-closeness and d-presence, and their use in multiple real-world datasets. Our results show that there is no single solution fitting all needs, and that generic search methods can outperform highly specialized algorithms.

DATA MINING TOOL- ORANGENeeraj Goswami

Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...

Simplilearn

This presentation about Scikit-learn will help you understand what is Scikit-learn, what can we achieve using Scikit-learn and a demo on how to use Scikit-learn in Python. Scikit is a powerful and modern machine learning python library. It's a great tool for fully and semi-automated advanced data analysis and information extraction. There are a lot of reasons why Scikit-Learn is a preferred machine learning tool. It has efficient tools to identify and organize problems, such as whether it fits a supervised or unsupervised learning model. It contains many free and open data sets. It has a rich set of built-in libraries for learning and predicting. It provides model support for every problem type. It also has built-in functions such as pickle for model persistence. It is supported by a huge open source community and vendor base. Now, let us get started and understand Sciki-Learn in detail. Below topics are explained in this Scikit-Learn presentation: 1. What is Scikit-learn? 2. What we can achieve using Scikit-learn 3. Demo Simplilearn’s Python Training Course is an all-inclusive program that will introduce you to the Python development language and expose you to the essentials of object-oriented programming, web development with Django and game development. Python has surpassed Java as the top language used to introduce U.S. students to programming and computer science. This course will give you hands-on development experience and prepare you for a career as a professional Python programmer. What is this course about? The All-in-One Python course enables you to become a professional Python programmer. Any aspiring programmer can learn Python from the basics and go on to master web development & game development in Python. Gain hands-on experience creating a flappy bird game clone & website functionalities in Python. What are the course objectives? By the end of this online Python training course, you will be able to: 1. Internalize the concepts & constructs of Python 2. Learn to create your own Python programs 3. Master Python Django & advanced web development in Python 4. Master PyGame & game development in Python 5. Create a flappy bird game clone The Python training course is recommended for: 1. Any aspiring programmer can take up this bundle to master Python 2. Any aspiring web developer or game developer can take up this bundle to meet their training needs Learn more at https://www.simplilearn.com/mobile-and-software-development/python-development-training

Laure Soulier, Lamjed Ben Jabeur, Paul Mousset, Lynda Tamine. Quels facteurs de pertinence pour la recherche de produits e-commerce ?. Dans : Conférence francophone en Recherche d'Information et Applications (CORIA 2016), Toulouse, 09/03/2016-11/03/2016, Association Francophone de Recherche d'Information et Applications (ARIA), p. 415-430, mars 2016. https://www.irit.fr/publis/SIG/2016_CORIA_SOULIER.pdf Un moteur de recherche e-commerce vise à fournir un accès rapide et efficace à des produits qui correspondent aux besoins et aux préférences de l'utilisateur parmi une liste de produits similaires ou étroitement liés. Nous avons participé à la campagne d'évaluation « Living Lab for Information Retrieval » qui proposait une tâche de recherche de produits évaluée par des utilisateurs réels lors de scénarios de recherche réelle sur un site de e-commerce. L’évaluation expérimentale a montré des résultats prometteurs de notre modèle. Dans ce papier, nous proposons une analyse des fichiers logs issus de notre modèle afin d'identifier des facteurs d’efficacité liés à la requête et aux produits. L'objectif de cette étude est d'ouvrir des pistes de recherche pour la formalisation de modèles de recherche de produits. E-commerce product retrieval aims to provide a quick and efficient access to products that fit user’s needs and preferences among a tail of similar or closely related products. We participated to the ``Living Lab for Information Retrieval'' evaluation campaign devoted to a product search task in which real users evaluated participants' retrieval models in real search scenarios on e-commerce websites. The experimental evaluation has shown encouraging results for our proposed model. In this paper, we conduct an analysis of users' feeadback with respect to the clicks obtained by our model. The goal of the paper is therefore to identify the effectiveness factors underlying the user's queries and the retrieved products in order to open perspectives in the formalization of product search models.

Connecting Python To The Spark Ecosystem

Spark Summit

Fast Data Analytics with Spark and Python

Benjamin Bengfort

In this one day workshop, we will introduce Spark at a high level context. Spark is fundamentally different than writing MapReduce jobs so no prior Hadoop experience is needed. You will learn how to interact with Spark on the command line and conduct rapid in-memory data analyses. We will then work on writing Spark applications to perform large cluster-based analyses including SQL-like aggregations, machine learning applications, and graph algorithms. The course will be conducted in Python using PySpark.

Python and Bigdata - An Introduction to Spark (PySpark)

hiteshnd

Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17

spark-project

Clarity in the curriculum: Using Constructive Alignment to improve your module

Emma Kennedy

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...

Databricks

High Performance Python on Apache Spark

Wes McKinney

Data Mining with JDM API by Regina Wang (4/11)butest

Data science technology overview

Soojung Hong

Data Mining with SQL Server 2008

Peter Gfader

Introduction to Machine Learning with SciKit-Learn

Benjamin Bengfort

Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets. The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.

Meetup sthlm - introduction to Machine Learning with demo cases

Zenodia Charpy

Knowledge Discovery Using Data Mining

parthvora18

Guiding through a typical Machine Learning Pipeline

Michael Gerke

Introduction to Data Mining

Izwan Nizal Mohd Shaharanee

QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...

QuantUniversity

As the complexity in AI and Machine Learning processes increases, robust data pipelines need to be developed for industrial scale model development and deployment. . In regulated industries such as Finance, Healthcare etc. where automated decision making is increasingly becoming used, tracking design of experiments and from inception to deployment is critical to ensure a robust process is adopted. Model Life-cycle management solutions are proposed to track experiments, design robust experiments for hyper parameter tuning, optimization and selection of models and for monitoring. The number of choices and the parameters that need to be tracked makes is significantly challenging to trace experiments and to address reproducibility concerns. In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

The Statistical and Applied Mathematical Sciences Institute

ML on Big Data: Real-Time Analysis on Time Series

Sigmoid

Data Mining with SQL Server 2005

Dean Willson

Viewers also liked

Learning Outcomes and Assessment - Achieving Constructive Alignment Treforest...

Richard Oelmann

Tcj ensuring the alignment of assessment with learning outcomesmichelepinnock

Quels facteurs de pertinence pour la recherche de produits e-commerce ?

Lamjed Ben Jabeur

Connecting Python To The Spark Ecosystem

Spark Summit

Fast Data Analytics with Spark and Python

Benjamin Bengfort

Python and Bigdata - An Introduction to Spark (PySpark)

hiteshnd

Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17

spark-project

Clarity in the curriculum: Using Constructive Alignment to improve your module

Emma Kennedy

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...

Databricks

High Performance Python on Apache Spark

Wes McKinney

Viewers also liked (10)

Learning Outcomes and Assessment - Achieving Constructive Alignment Treforest...

Tcj ensuring the alignment of assessment with learning outcomes

Quels facteurs de pertinence pour la recherche de produits e-commerce ?

Connecting Python To The Spark Ecosystem

Fast Data Analytics with Spark and Python

Python and Bigdata - An Introduction to Spark (PySpark)

Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17

Clarity in the curriculum: Using Constructive Alignment to improve your module

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...

High Performance Python on Apache Spark

Similar to Business Rule Learning with Interactive Selection of Association Rules - RuleML 2014 challenge

Data Mining with JDM API by Regina Wang (4/11)butest

Data science technology overview

Soojung Hong

Data Mining with SQL Server 2008

Peter Gfader

Introduction to Machine Learning with SciKit-Learn

Benjamin Bengfort

Meetup sthlm - introduction to Machine Learning with demo cases

Zenodia Charpy

Knowledge Discovery Using Data Mining

parthvora18

Guiding through a typical Machine Learning Pipeline

Michael Gerke

Introduction to Data Mining

Izwan Nizal Mohd Shaharanee

QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...

QuantUniversity

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

The Statistical and Applied Mathematical Sciences Institute

ML on Big Data: Real-Time Analysis on Time Series

Sigmoid

Data Mining with SQL Server 2005

Dean Willson

Managing textual data semantically in relational databases by wael yahfooz an...

SK Ahammad Fahad

the massive volume of data in databases, web pages, and document files usually causes information to be disorganized and unclear for the user. Therefore, information in such an environment can be classified into three forms: structured, semistructured, or unstructured. Structured information is the best form of information because it facilitates the acquisition and comprehension of knowledge. Relational Database Management System (RDBMS) has a robust structure that manages, organizes and retrieves data. There are many attempts have been made in order to deal with such data. These attempts can be categorized into three groups: within a database schema, by a developed data model within the database, or by query-based techniques in database. Nonetheless, RDBMS contain massive amount of unstructured data such as textual data.. This paper proposed Textual Virtual Schema Model (TVSM). TVSM is conducted to perform semantic textual data linking and clustering and is embedded in the relational database structure (schema). In addition, linking and converting the unstructured information to structured data. Quality improvement of textual data clusters. Achievemento f high query processing efficiencyi n retrieving data clusters. TVSM initially developed to assist researchers, developers, and database administrators who are concerned on unstructured information management, information extraction, multi-document clustering, information retrieval, query processing efficiency, personal information management, question answering, information integration, news tracking, and news summarization.

Weka : A machine learning algorithms for data mining

Keshab Kumar Gaurav

Machine Learning Classifiers

Mostafa

Use Machine learning to solve classification problems through building binary and multi-class classifiers. Does your company face business-critical decisions that rely on dynamic transactional data? If you answered “yes,” you need to attend this free event featuring Microsoft analytics tools. We’ll focus on Azure Machine Learning capabilities and explore the following topics: - Introduction of two class classification problems. - Classification Algorithms (Two Class Classification) - Available algorithms in Azure ML. - Real business problems that is solved using two class classification.

Chapter 1: Introduction to Data Mining

Izwan Nizal Mohd Shaharanee

Data Mining: Mining stream time series and sequence data

Datamining Tools

Data Mining: Mining stream time series and sequence data

DataminingTools Inc

Ember

mrphilroth

Data Mining and the Web_Past_Present and Futurefeiwin

Similar to Business Rule Learning with Interactive Selection of Association Rules - RuleML 2014 challenge (20)

Data Mining with JDM API by Regina Wang (4/11)

Data science technology overview

Data Mining with SQL Server 2008

Introduction to Machine Learning with SciKit-Learn

Meetup sthlm - introduction to Machine Learning with demo cases

Knowledge Discovery Using Data Mining

Guiding through a typical Machine Learning Pipeline

Introduction to Data Mining

QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

ML on Big Data: Real-Time Analysis on Time Series

Data Mining with SQL Server 2005

Managing textual data semantically in relational databases by wael yahfooz an...

Weka : A machine learning algorithms for data mining

Machine Learning Classifiers

Chapter 1: Introduction to Data Mining

Data Mining: Mining stream time series and sequence data

Ember

Data Mining and the Web_Past_Present and Future

Recently uploaded

A Sighting of filterA in Typelevel Rite of Passage

Philip Schwarz

Large Language Models and the End of Programming

Matt Welsh

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Globus

The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.

How to Position Your Globus Data Portal for Success Ten Good Practices

Globus

Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.

Using IESVE for Room Loads Analysis - Australia & New Zealand

IES VE

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx

rickgrimesss22

2024 RoOUG Security model for the cloud.pptx

Georgi Kodinov

Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...

Mind IT Systems

Quarkus Hidden and Forbidden Extensions

Max Andersen

Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf

Jay Das

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...

Globus

The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite

Google

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite 👉👉 Click Here To Get More Info 👇👇 https://sumonreview.com/ai-pilot-review/ AI Pilot Review: Key Features ✅Deploy AI expert bots in Any Niche With Just A Click ✅With one keyword, generate complete funnels, websites, landing pages, and more. ✅More than 85 AI features are included in the AI pilot. ✅No setup or configuration; use your voice (like Siri) to do whatever you want. ✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It… ✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again. ✅ZERO Limits On Features Or Usages ✅Use Our AI-powered Traffic To Get Hundreds Of Customers ✅No Complicated Setup: Get Up And Running In 2 Minutes ✅99.99% Up-Time Guaranteed ✅30 Days Money-Back Guarantee ✅ZERO Upfront Cost See My Other Reviews Article: (1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review (2) SocioWave Review: https://sumonreview.com/sociowave-review (3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review (4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

XfilesPro

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

Tier1 app

Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Juraj Vysvader

Developing Distributed High-performance Computing Capabilities of an Open Sci...

Globus

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Globus

JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.

Corporate Management | Session 3 of 3 | Tendenci AMS

Tendenci - The Open Source AMS (Association Management Software)

Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have. For more Tendenci AMS events, check out www.tendenci.com/events

BoxLang: Review our Visionary Licenses of 2024

Ortus Solutions, Corp

First Steps with Globus Compute Multi-User Endpoints

Globus

In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.

Recently uploaded (20)

A Sighting of filterA in Typelevel Rite of Passage

Large Language Models and the End of Programming

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

How to Position Your Globus Data Portal for Success Ten Good Practices

Using IESVE for Room Loads Analysis - Australia & New Zealand

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx

2024 RoOUG Security model for the cloud.pptx

Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...

Quarkus Hidden and Forbidden Extensions

Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Developing Distributed High-performance Computing Capabilities of an Open Sci...

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Corporate Management | Session 3 of 3 | Tendenci AMS

BoxLang: Review our Visionary Licenses of 2024

First Steps with Globus Compute Multi-User Endpoints

Business Rule Learning with Interactive Selection of Association Rules - RuleML 2014 challenge

1. Business Rule Learning with Interactive Selection of Association Rules Stanislav Vojíř, Přemysl Václav Duben and Tomáš Kliegr Department of Information and Knowledge Engineering University of Economics, Prague

2. Relevant paper Learning Business Rules with Association Rule Classifiers Tomáš Kliegr, Jaroslav Kuchař, Davide Sottara, Stanislav Vojíř

3. Motivation  2 possible scenarios  Automatic model creation  Data mining of rules (without support)  Rules prunning  User-managed model creation  Selection of rules gained from data mining  Manually rules input

4. Rule base preparation 1. Data preparation 2. Association rule mining, rule selection 3. Classification model testing 4. Ruleset editing 5. Classification model testing

5. Data preparation  Data set for data mining (CSV file, MySQL source)  Import configuration (encoding, separators, primary key)  (Training and testing dataset)  Preprocessing  Columns in data set => attributes for data mining  Numerical columns => intervals, bins of values  Categorical columns => bins of values

6. Data mining of association rules  GUHA procedure ASSOC  Interactive data mining task configuration using rule pattern  Attributes with fixed values, dynamic binning wildcard…  Interest measures (not only confidence, support)  Support for disjunctions, negations, brackets  Rules selection into rule clipboard  classification model testing  export of rules into knowledge base

7. Data mining of association rules

8. Data mining of association rules

9. Classification model testing  Using training dataset or testing dataset with columns with the same names  Rules in DRL form => testing using Drools Expert  Conflict resolution  Confidence  Support  First fired rule

10. Classification model testing

11. Ruleset editing  Not only selection of rules gained from data mining results  Rule editing using interactive editor  Antecedent => Rule condition  Consequent => Rule body

12. Ruleset editing

13. Software components summary  EasyMiner  Interactive data mining system  PHP, JavaScript + Joomla! based CMS (reports support)  Based on LISp-Miner system  C++, C#.NET  GUHA procedure ASSOC  EasyMinerCenter  New component for background knowledge management  PHP  Data saved in RDF form (using ARC2 Store)

14. Software components summary  Business rules editor  JavaScript  Model tester  Java EE application based on Drools Expert component

15. Future work  New data mining backend  Support for rule prunning  Work with background knowledge base

16. Demo  Example dataset  7 columns (age, salary, district, amount, payments, duration, rating)  6181 rows  Demo screencast  http://easyminer.eu/screencasts

17. Try it yourself! EasyMiner.eu  For more information, please visit the web: http://easyminer.eu  Screencasts  Demo  Technical information and papers

Business Rule Learning with Interactive Selection of Association Rules - RuleML 2014 challenge

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Business Rule Learning with Interactive Selection of Association Rules - RuleML 2014 challenge

Similar to Business Rule Learning with Interactive Selection of Association Rules - RuleML 2014 challenge (20)

Recently uploaded

Recently uploaded (20)

Business Rule Learning with Interactive Selection of Association Rules - RuleML 2014 challenge