This document discusses various approaches to measuring the interestingness of patterns discovered during data mining. It describes objective interestingness measures based only on the data, like conciseness, generality, reliability, peculiarity and diversity. Subjective measures take into account user knowledge and expectations, evaluating novelty and surprisingness. Semantic measures consider pattern semantics and explanations, focusing on utility and actionability. The document also discusses limitations of typical objective measures like support and confidence, and outlines subjective approaches involving user impressions at different levels of knowledge granularity.
A data warehouse is a database that collects and manages data from various sources to provide business insights. It contains consolidated historical data kept separately from operational databases. A data warehouse helps executives analyze data to make strategic decisions. Data mining extracts valuable patterns and knowledge from large amounts of data through techniques like classification, clustering, and neural networks. It is used along with data warehouses for applications like churn analysis, fraud detection, and market segmentation.
This document discusses decision tree induction and attribute selection measures. It describes common measures like information gain, gain ratio, and Gini index that are used to select the best splitting attribute at each node in decision tree construction. It provides examples to illustrate information gain calculation for both discrete and continuous attributes. The document also discusses techniques for handling large datasets like SLIQ and SPRINT that build decision trees in a scalable manner by maintaining attribute value lists.
The document discusses query processing and query optimization in database management systems. It contains the following key points:
1. Modern DBMS get user queries, translate them to an internal representation for data access, and efficiently produce meaningful results.
2. The query processor checks queries for errors, generates an equivalent relational algebra expression for data access, and forwards it to the query optimizer.
3. The query optimizer generates various execution plans and selects the most efficient plan that takes less time and resources. It uses techniques like eliminating Cartesian products, pushing selections and projections, etc.
OLAP (Online Analytical Processing) is used for complex queries over large datasets to discover patterns and trends. OLAP queries are run over a data warehouse and use multidimensional models like MOLAP with pre-computed data cubes or ROLAP with a star schema. Key OLAP concepts include slicing and dicing the data cube and drill-down and roll-up operations. Data mining techniques like association rule mining can be used to discover relationships between items in transaction data beyond what is found with frequent itemset mining alone.
This document provides an overview of classification techniques. It defines classification as assigning records to predefined classes based on their attribute values. The key steps are building a classification model from training data and then using the model to classify new, unseen records. Decision trees are discussed as a popular classification method that uses a tree structure with internal nodes for attributes and leaf nodes for classes. The document covers decision tree induction, handling overfitting, and performance evaluation methods like holdout validation and cross-validation.
This document discusses database system concepts and architecture. It covers data models and their categories, including conceptual, physical and implementation models. It describes the history of data models such as network, hierarchical, relational, object-oriented and object-relational models. It also discusses schemas, instances, states, the three-schema architecture, data independence, DBMS languages, interfaces, utilities, centralized and client-server architectures, and classifications of DBMSs.
Data Warehouse : Dimensional Model: Snowflake Schema In the snowflake schema, dimension are present in a normalized from in multiple related tables.
The snowflake structure materialized when the dimensions of a star schema are detailed and highly structured, having several levels of relationship, and the child tables have multiple parent table.
Data mining involves multiple steps in the knowledge discovery process including data cleaning, integration, selection, transformation, mining, and pattern evaluation. It has various functionalities including descriptive mining to characterize data, predictive mining for inference, and different mining techniques like classification, association analysis, clustering, and outlier analysis.
A data warehouse is a database that collects and manages data from various sources to provide business insights. It contains consolidated historical data kept separately from operational databases. A data warehouse helps executives analyze data to make strategic decisions. Data mining extracts valuable patterns and knowledge from large amounts of data through techniques like classification, clustering, and neural networks. It is used along with data warehouses for applications like churn analysis, fraud detection, and market segmentation.
This document discusses decision tree induction and attribute selection measures. It describes common measures like information gain, gain ratio, and Gini index that are used to select the best splitting attribute at each node in decision tree construction. It provides examples to illustrate information gain calculation for both discrete and continuous attributes. The document also discusses techniques for handling large datasets like SLIQ and SPRINT that build decision trees in a scalable manner by maintaining attribute value lists.
The document discusses query processing and query optimization in database management systems. It contains the following key points:
1. Modern DBMS get user queries, translate them to an internal representation for data access, and efficiently produce meaningful results.
2. The query processor checks queries for errors, generates an equivalent relational algebra expression for data access, and forwards it to the query optimizer.
3. The query optimizer generates various execution plans and selects the most efficient plan that takes less time and resources. It uses techniques like eliminating Cartesian products, pushing selections and projections, etc.
OLAP (Online Analytical Processing) is used for complex queries over large datasets to discover patterns and trends. OLAP queries are run over a data warehouse and use multidimensional models like MOLAP with pre-computed data cubes or ROLAP with a star schema. Key OLAP concepts include slicing and dicing the data cube and drill-down and roll-up operations. Data mining techniques like association rule mining can be used to discover relationships between items in transaction data beyond what is found with frequent itemset mining alone.
This document provides an overview of classification techniques. It defines classification as assigning records to predefined classes based on their attribute values. The key steps are building a classification model from training data and then using the model to classify new, unseen records. Decision trees are discussed as a popular classification method that uses a tree structure with internal nodes for attributes and leaf nodes for classes. The document covers decision tree induction, handling overfitting, and performance evaluation methods like holdout validation and cross-validation.
This document discusses database system concepts and architecture. It covers data models and their categories, including conceptual, physical and implementation models. It describes the history of data models such as network, hierarchical, relational, object-oriented and object-relational models. It also discusses schemas, instances, states, the three-schema architecture, data independence, DBMS languages, interfaces, utilities, centralized and client-server architectures, and classifications of DBMSs.
Data Warehouse : Dimensional Model: Snowflake Schema In the snowflake schema, dimension are present in a normalized from in multiple related tables.
The snowflake structure materialized when the dimensions of a star schema are detailed and highly structured, having several levels of relationship, and the child tables have multiple parent table.
Data mining involves multiple steps in the knowledge discovery process including data cleaning, integration, selection, transformation, mining, and pattern evaluation. It has various functionalities including descriptive mining to characterize data, predictive mining for inference, and different mining techniques like classification, association analysis, clustering, and outlier analysis.
This document discusses decision trees, a classification technique in data mining. It defines classification as assigning class labels to unlabeled data based on a training set. Decision trees generate a tree structure to classify data, with internal nodes representing attributes, branches representing attribute values, and leaf nodes holding class labels. An algorithm is used to recursively split the data set into purer subsets based on attribute tests until each subset belongs to a single class. The tree can then classify new examples by traversing it from root to leaf.
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
The document discusses decision trees and how they work. It begins with explaining what a decision tree is - a tree-shaped diagram used to determine a course of action, with each branch representing a possible decision. It then provides examples of using a decision tree to classify vegetables and animals based on their features. The document also covers key decision tree concepts like entropy, information gain, leaf nodes, decision nodes, and the root node. It demonstrates how a decision tree is built by choosing splits that maximize information gain. Finally, it presents a use case of using a decision tree to predict loan repayment.
This document provides an overview of data mining concepts and techniques. It defines data mining as the extraction of interesting and useful patterns from large amounts of data. The document outlines several potential applications of data mining, including market analysis, risk analysis, and fraud detection. It also describes the typical steps involved in a data mining process, including data cleaning, pattern evaluation, and knowledge presentation. Finally, it discusses different data mining functionalities, such as classification, clustering, and association rule mining.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
The document discusses a proposed students' performance prediction system using multi-agent data mining techniques. It aims to predict student performance with high accuracy and help low-performing students. The system uses ensemble classifiers like Adaboost.M1 and LogitBoost and compares their prediction accuracy to the single classifier C4.5 decision tree. Experimental results showed SAMME boosting improved prediction accuracy over C4.5 and LogitBoost.
The document discusses data modeling, which involves creating a conceptual model of the data required for an information system. There are three types of data models - conceptual, logical, and physical. A conceptual data model describes what the system contains, a logical model describes how the system will be implemented regardless of the database, and a physical model describes the implementation using a specific database. Common elements of a data model include entities, attributes, and relationships. Data modeling is used to standardize and communicate an organization's data requirements and establish business rules.
The document discusses decision tree algorithms. It begins with an introduction and example, then covers the principles of entropy and information gain used to build decision trees. It provides explanations of key concepts like entropy, information gain, and how decision trees are constructed and evaluated. Examples are given to illustrate these concepts. The document concludes with strengths and weaknesses of decision tree algorithms.
This document provides an overview of the 3-tier data warehouse architecture. It discusses the three tiers: the bottom tier contains the data warehouse server which fetches relevant data from various data sources and loads it into the data warehouse using backend tools for extraction, cleaning, transformation and loading. The bottom tier also contains the data marts and metadata repository. The middle tier contains the OLAP server which presents multidimensional data to users from the data warehouse and data marts. The top tier contains the front-end tools like query, reporting and analysis tools that allow users to access and analyze the data.
This document introduces data mining concepts and techniques. It defines data mining as the process of discovering interesting patterns from large amounts of data. The document outlines several data mining functionalities including classification, clustering, association rule mining, and outlier detection. It also discusses popular data mining algorithms, major issues in data mining, and provides a brief history of the data mining field and community.
Mapping cardinality describes the number of entities in one set that can be associated with entities in another set via a relationship. There are four types of mapping cardinality: one-to-one, where each entity is associated with exactly one entity in the other set; one-to-many, where an entity can be associated with multiple entities but each entity can only be associated with one entity; many-to-one, the inverse of one-to-many; and many-to-many, where each entity can be associated with multiple entities in the other set.
This document discusses different types of schemas used in multidimensional databases and data warehouses. It describes star schemas, snowflake schemas, and fact constellation schemas. A star schema contains one fact table connected to multiple dimension tables. A snowflake schema is similar but with some normalized dimension tables. A fact constellation schema contains multiple fact tables that can share dimension tables. The document provides examples and comparisons of each schema type.
This document provides an overview of key aspects of data preparation and processing for data mining. It discusses the importance of domain expertise in understanding data. The goals of data preparation are identified as cleaning missing, noisy, and inconsistent data; integrating data from multiple sources; transforming data into appropriate formats; and reducing data through feature selection, sampling, and discretization. Common techniques for each step are outlined at a high level, such as binning, clustering, and regression for handling noisy data. The document emphasizes that data preparation is crucial and can require 70-80% of the effort for effective real-world data mining.
Data preprocessing techniques
See my Paris applied psychology conference paper here
https://www.slideshare.net/jasonrodrigues/paris-conference-on-applied-psychology
or
https://prezi.com/view/KBP8JnekVH9LkLOiKY3w/
The document discusses various data reduction strategies including attribute subset selection, numerosity reduction, and dimensionality reduction. Attribute subset selection aims to select a minimal set of important attributes. Numerosity reduction techniques like regression, log-linear models, histograms, clustering, and sampling can reduce data volume by finding alternative representations like model parameters or cluster centroids. Dimensionality reduction techniques include discrete wavelet transformation and principal component analysis, which transform high-dimensional data into a lower-dimensional representation.
The document discusses exploratory data analysis and provides examples of how it can be used. It summarizes two case studies: one where an energy utility detected billing fraud by analyzing meter reading patterns, and another where month of birth was found to correlate with exam scores for students in Tamil Nadu. The document then outlines the exploratory data analysis process and provides a high-level overview of U.S. and Indian birth date patterns identified through analysis of large datasets.
KDD is the process of automatically extracting hidden patterns from large datasets. It involves data cleaning, reduction, exploration, modeling, and interpretation to discover useful knowledge. The goal is to gain a competitive advantage by providing improved services through understanding of the data.
This document discusses heterogeneous database systems. It defines a heterogeneous database system as an automated or semi-automated system that integrates disparate database management systems to present a unified query interface to users. It discusses issues in multi-database query processing such as query support, cost, translation and change adaptation. The architecture involves individual databases, wrapper methods, a mediator and query processing/optimization. Database integration involves schema integration through a bottom-up design approach and the conversion of local schemas to a global schema.
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...vtunotesbysree
This document contains solved questions and answers from a past data warehousing and data mining exam. It includes questions on operational data stores, extract transform load (ETL) processes, online transaction processing (OLTP) vs online analytical processing (OLAP), data cubes, and data pre-processing approaches. The responses provide detailed explanations and examples for each topic.
This document discusses big data storage challenges and solutions. It describes the types of data that need to be stored, including structured, semi-structured, and unstructured data. Optimal storage solutions are suggested based on data type, including using Cassandra, HBase, HDFS, and MongoDB. The document also introduces WSO2 Storage Server and how the WSO2 platform supports big data through features like clustering and external indexes. Tools for summarizing big data are discussed, including MapReduce, Hive, Pig, and WSO2 BAM for publishing, analyzing, and visualizing big data.
This document defines a data warehouse as a collection of corporate information derived from operational systems and external sources to support business decisions rather than operations. It discusses the purpose of data warehousing to realize the value of data and make better decisions. Key components like staging areas, data marts, and operational data stores are described. The document also outlines evolution of data warehouse architectures and best practices for implementation.
This document discusses the key components of a database: forms are used to enter information which is then stored in tables to organize the data; queries can be used to select specific parts of tables or forms; and reports generate summaries of selected information from forms.
This document discusses decision trees, a classification technique in data mining. It defines classification as assigning class labels to unlabeled data based on a training set. Decision trees generate a tree structure to classify data, with internal nodes representing attributes, branches representing attribute values, and leaf nodes holding class labels. An algorithm is used to recursively split the data set into purer subsets based on attribute tests until each subset belongs to a single class. The tree can then classify new examples by traversing it from root to leaf.
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
The document discusses decision trees and how they work. It begins with explaining what a decision tree is - a tree-shaped diagram used to determine a course of action, with each branch representing a possible decision. It then provides examples of using a decision tree to classify vegetables and animals based on their features. The document also covers key decision tree concepts like entropy, information gain, leaf nodes, decision nodes, and the root node. It demonstrates how a decision tree is built by choosing splits that maximize information gain. Finally, it presents a use case of using a decision tree to predict loan repayment.
This document provides an overview of data mining concepts and techniques. It defines data mining as the extraction of interesting and useful patterns from large amounts of data. The document outlines several potential applications of data mining, including market analysis, risk analysis, and fraud detection. It also describes the typical steps involved in a data mining process, including data cleaning, pattern evaluation, and knowledge presentation. Finally, it discusses different data mining functionalities, such as classification, clustering, and association rule mining.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
The document discusses a proposed students' performance prediction system using multi-agent data mining techniques. It aims to predict student performance with high accuracy and help low-performing students. The system uses ensemble classifiers like Adaboost.M1 and LogitBoost and compares their prediction accuracy to the single classifier C4.5 decision tree. Experimental results showed SAMME boosting improved prediction accuracy over C4.5 and LogitBoost.
The document discusses data modeling, which involves creating a conceptual model of the data required for an information system. There are three types of data models - conceptual, logical, and physical. A conceptual data model describes what the system contains, a logical model describes how the system will be implemented regardless of the database, and a physical model describes the implementation using a specific database. Common elements of a data model include entities, attributes, and relationships. Data modeling is used to standardize and communicate an organization's data requirements and establish business rules.
The document discusses decision tree algorithms. It begins with an introduction and example, then covers the principles of entropy and information gain used to build decision trees. It provides explanations of key concepts like entropy, information gain, and how decision trees are constructed and evaluated. Examples are given to illustrate these concepts. The document concludes with strengths and weaknesses of decision tree algorithms.
This document provides an overview of the 3-tier data warehouse architecture. It discusses the three tiers: the bottom tier contains the data warehouse server which fetches relevant data from various data sources and loads it into the data warehouse using backend tools for extraction, cleaning, transformation and loading. The bottom tier also contains the data marts and metadata repository. The middle tier contains the OLAP server which presents multidimensional data to users from the data warehouse and data marts. The top tier contains the front-end tools like query, reporting and analysis tools that allow users to access and analyze the data.
This document introduces data mining concepts and techniques. It defines data mining as the process of discovering interesting patterns from large amounts of data. The document outlines several data mining functionalities including classification, clustering, association rule mining, and outlier detection. It also discusses popular data mining algorithms, major issues in data mining, and provides a brief history of the data mining field and community.
Mapping cardinality describes the number of entities in one set that can be associated with entities in another set via a relationship. There are four types of mapping cardinality: one-to-one, where each entity is associated with exactly one entity in the other set; one-to-many, where an entity can be associated with multiple entities but each entity can only be associated with one entity; many-to-one, the inverse of one-to-many; and many-to-many, where each entity can be associated with multiple entities in the other set.
This document discusses different types of schemas used in multidimensional databases and data warehouses. It describes star schemas, snowflake schemas, and fact constellation schemas. A star schema contains one fact table connected to multiple dimension tables. A snowflake schema is similar but with some normalized dimension tables. A fact constellation schema contains multiple fact tables that can share dimension tables. The document provides examples and comparisons of each schema type.
This document provides an overview of key aspects of data preparation and processing for data mining. It discusses the importance of domain expertise in understanding data. The goals of data preparation are identified as cleaning missing, noisy, and inconsistent data; integrating data from multiple sources; transforming data into appropriate formats; and reducing data through feature selection, sampling, and discretization. Common techniques for each step are outlined at a high level, such as binning, clustering, and regression for handling noisy data. The document emphasizes that data preparation is crucial and can require 70-80% of the effort for effective real-world data mining.
Data preprocessing techniques
See my Paris applied psychology conference paper here
https://www.slideshare.net/jasonrodrigues/paris-conference-on-applied-psychology
or
https://prezi.com/view/KBP8JnekVH9LkLOiKY3w/
The document discusses various data reduction strategies including attribute subset selection, numerosity reduction, and dimensionality reduction. Attribute subset selection aims to select a minimal set of important attributes. Numerosity reduction techniques like regression, log-linear models, histograms, clustering, and sampling can reduce data volume by finding alternative representations like model parameters or cluster centroids. Dimensionality reduction techniques include discrete wavelet transformation and principal component analysis, which transform high-dimensional data into a lower-dimensional representation.
The document discusses exploratory data analysis and provides examples of how it can be used. It summarizes two case studies: one where an energy utility detected billing fraud by analyzing meter reading patterns, and another where month of birth was found to correlate with exam scores for students in Tamil Nadu. The document then outlines the exploratory data analysis process and provides a high-level overview of U.S. and Indian birth date patterns identified through analysis of large datasets.
KDD is the process of automatically extracting hidden patterns from large datasets. It involves data cleaning, reduction, exploration, modeling, and interpretation to discover useful knowledge. The goal is to gain a competitive advantage by providing improved services through understanding of the data.
This document discusses heterogeneous database systems. It defines a heterogeneous database system as an automated or semi-automated system that integrates disparate database management systems to present a unified query interface to users. It discusses issues in multi-database query processing such as query support, cost, translation and change adaptation. The architecture involves individual databases, wrapper methods, a mediator and query processing/optimization. Database integration involves schema integration through a bottom-up design approach and the conversion of local schemas to a global schema.
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...vtunotesbysree
This document contains solved questions and answers from a past data warehousing and data mining exam. It includes questions on operational data stores, extract transform load (ETL) processes, online transaction processing (OLTP) vs online analytical processing (OLAP), data cubes, and data pre-processing approaches. The responses provide detailed explanations and examples for each topic.
This document discusses big data storage challenges and solutions. It describes the types of data that need to be stored, including structured, semi-structured, and unstructured data. Optimal storage solutions are suggested based on data type, including using Cassandra, HBase, HDFS, and MongoDB. The document also introduces WSO2 Storage Server and how the WSO2 platform supports big data through features like clustering and external indexes. Tools for summarizing big data are discussed, including MapReduce, Hive, Pig, and WSO2 BAM for publishing, analyzing, and visualizing big data.
This document defines a data warehouse as a collection of corporate information derived from operational systems and external sources to support business decisions rather than operations. It discusses the purpose of data warehousing to realize the value of data and make better decisions. Key components like staging areas, data marts, and operational data stores are described. The document also outlines evolution of data warehouse architectures and best practices for implementation.
This document discusses the key components of a database: forms are used to enter information which is then stored in tables to organize the data; queries can be used to select specific parts of tables or forms; and reports generate summaries of selected information from forms.
CV containing experience and skills learnt throughout my Masters, Honours and Bachelor of science studies and experience learnt from being involved in environmental projects with various consulting companies in South Africa
PepsiCo Celebrates Black History Month: African Americans who Broke Barriers ...Leigha Landry Wanczowski
For Black History Month, we recognize 6 African Americans who broke barriers at PepsiCo and helped make diversity and inclusion a way of life in our business.
Clean technology - Raw materials and Commercialisation - Canada - november 2016paul young cpa, cga
This presentation looks at raw materials that support energy storage. The presentation looks at the source of raw material as part of the commercialisation process for energy storage
This document provides information about upcoming events and schedule changes at the school. It is organized by date and includes: a PD day with no school on Monday; adjusted lunch and flex times and evening parent-teacher interviews on Tuesday through Thursday; classes starting at 9:30am on Friday with lunch at 12:06pm. Future dates are also noted like ski trips, no school days, early dismissal, and spring break. The weekly menu and announcements about sports teams, draws and nominations are also included.
Dokumen tersebut membahas tentang anatomi hewan, termasuk pengertian anatomi, pembagian anatomi menjadi anatomi makroskopik dan mikroskopik, anatomi khusus, perbandingan, dan veteriner. Juga dibahas tentang istilah-istilah anatomi, struktur dan susunan tulang, persendian, dan sistem anatomi seperti osteologi, artrologi, dan lainnya.
O documento discute a evolução da moeda, do dinheiro físico para o dinheiro digital e o Bitcoin. Explica como Satoshi Nakamoto criou o Bitcoin e o blockchain para suportar a rede da criptomoeda de forma descentralizada. Apresenta aplicações potenciais do blockchain além das criptomoedas, como contratos inteligentes e registros públicos de forma descentralizada e transparente.
Canada delivers third-straight monthly trade surplus as exports hit new highpaul young cpa, cga
This presentation looks at merchandise trade including exports, imports and trade imbalance.
The presentation will focus on finished goods vs- raw material as well as key sectors that drive exports.
The good news is commodity prices are rebounding which has help areas like energy!
Techquisition - Don't Be Disrupted. Be Disruption.Paul Cuatrecasas
The document discusses how technological disruption is affecting all industries, forcing even traditional companies to become "technology companies" through acquisitions. It provides numerous examples of large, non-tech companies acquiring technology startups and software companies across many industries in order to gain new capabilities, defend against disruption, and drive growth. The pace of technology M&A is accelerating as digital transformation becomes mission critical for survival in today's business environment.
Урок 27 для 7 класу - Стовпчасті та секторні діаграми, їх об’єкти і властивос...VsimPPT
Завантаження доступне на http://vsimppt.com.ua/
-------
Урок 27 для 7 класу - Стовпчасті та секторні діаграми, їх об’єкти і властивості. Створення та форматування стовпчастих і секторних діаграм у середовищі табличного процесора. Аналіз даних, поданих на діаграмі.
Урок 28 для 3 класу - Порівняння текстів з оманливою та правдивою інформацію....VsimPPT
Завантаження доступне на http://vsimppt.com.ua/
-------
Урок 28 для 3 класу - Порівняння текстів з оманливою та правдивою інформацію. Пошук хибних висловлювань у текстах (на основі інформації з інших предметів)
This document provides an introduction to association rule mining. It begins with an overview of association rule mining and its application to market basket analysis. It then discusses key concepts like support, confidence and interestingness of rules. The document introduces the Apriori algorithm for mining association rules, which works in two steps: 1) generating frequent itemsets and 2) generating rules from frequent itemsets. It provides examples of how Apriori works and discusses challenges in association rule mining like multiple database scans and candidate generation.
This document provides an introduction to Bayesian belief networks and naive Bayesian classification. It defines key probability concepts like joint probability, conditional probability, and Bayes' rule. It explains how Bayesian belief networks can represent dependencies between variables and how naive Bayesian classification assumes conditional independence between variables. The document concludes with examples of how to calculate probabilities and classify new examples using a naive Bayesian approach.
El documento resume 7 teoremas clave de circuitos eléctricos: 1) el teorema de superposición, 2) el teorema de Thevenin, 3) el teorema de Norton, 4) la máxima transferencia de potencia a una carga resistiva, 5) el teorema de reciprocidad, 6) la transformación estrella-delta y delta-estrella, y 7) los circuitos duales. Explica cada teorema y cómo pueden aplicarse para analizar y simplificar circuitos eléctricos.
Research design decisions and be competent in the process of reliable data co...Stats Statswork
Research Design may be described as the researchers scheme of outlining the flow of his project. It is based on research design, that the researcher goes about gathering data to answer his research question. It enables the researcher to prioritize his work, create better questionnaires and arrive at conclusions with greater clarity. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following – Always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Learn More: http://bit.ly/2S312hb
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics Across Methodologies | Wide Range Of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com/
Email: info@statswork.com
UnitedKingdom: +44-1143520021
India: +91-4448137070
WhatsApp: +91-8754446690
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Qualitative Research Evaluation Essay
Essay on Types Of Research
Mba
Sampling Methods Essay
Sample Methodology Essay
Research Methodology Report Sample
Essay on Research Methodology
English 101 Research Paper
Example Of Search Strategy
Essay on Medical Research
Research Methods Essay
Importance And Purpose Of Research Essay
Sample Research Proposal on Methodology
Example Of A Research Paper
Career Research Essay
Methodology of Research Essay examples
Essay about Sampling
Research Critique Essay example
Interestingness measures for multi level association rulesAlexander Decker
This document discusses measures for evaluating the interestingness of multi-level association rules mined from hierarchical datasets. It proposes two approaches - diversity-based and peculiarity-based measures. Diversity measures how rules differ significantly from each other and is measured using variance and Shannon entropy. Peculiarity determines how far rules are from other rules using distance measures. It enhances an existing distance measure to account for the hierarchical relationships between items. The enhanced measure calculates the diversity of the symmetric difference between rules instead of just the cardinality. This allows it to better measure the difference between rules containing hierarchically related items.
Interestingness Measures In Rule Mining: A ValuationIJERA Editor
In data mining it is normally desirable that discovered knowledge should possess characteristics such as accuracy, comprehensibility and interestingness. The vast majority of data mining algorithms generate patterns that are accurate and reliable but they might not be interesting. Interestingness measures are used to find the truly interesting rules which will help the user in decision making in exceptional state of affairs. A variety of interestingness measures for rule mining have been suggested by researchers in the field of data mining. In this paper we are going to carry out a valuation of these interestingness measures and classify them from numerous perspectives.
There are four main types of research data based on collection methods:
1) Observational data collected through observation
2) Experimental data collected through intervention to measure change
3) Simulation data generated by imitating real-world processes using models
4) Derived data created by transforming existing data points
Data collection involves gathering information systematically to answer research questions. It is required for academic research, ongoing projects, and developing new products/services. Data can be qualitative, quantitative, or mixed. It can also be primary data collected directly or secondary data obtained from other sources. The type of data determines the appropriate collection method to use.
Understanding_Users_and_Tasks_CA1_N00147768Stephen Norman
This document is a paper submitted by Stephen Norman for his MSc in User Experience Design program. The paper discusses and compares three user experience evaluation methods - ethnographic field studies, card sorting, and A/B testing. It presents these methods within a three dimensional framework of qualitative vs. quantitative, attitudinal vs. behavioral, and context of use. The paper then discusses how research goals should align with different stages of a product's development lifecycle and be informed by the appropriate evaluation methods.
This document provides an overview of different research methodologies discussed in the context of a feminist perspective. It discusses three main feminist methodologies: feminist empiricism, standpoint epistemology, and post-modernism. The key similarities between these methodologies are a dedication to understanding women's lives and perspectives, moving beyond a masculine focus, and incorporating both men's and women's voices to advance knowledge. The document also briefly outlines some differences between the methodologies in terms of their relationship between experience and theory.
Research Methodology Of The Research ApproachJessica Howard
This section discusses the research methodology used in the study. It outlines two main types of research methods: quantitative and qualitative. Quantitative research uses numerical data that can be mathematically analyzed, while qualitative research uses non-numerical data to understand experiences. The study will use qualitative methods as the research involves many social variables best explored through qualitative approaches. Data will be collected through interviews and analyzed thematically to understand perceptions.
Re-mining Positive and Negative Association Mining ResultsGurdal Ertek
Positive and negative association mining are well-known and
extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, separately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been handled
using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the association rules, are characterized and described through the second data
mining stage re-mining. The applicability of the methodology is demonstrated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue.
https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-positive-and-negative-association-mining-results/
This document discusses different methods of data collection. It defines data collection as the process of systematically gathering and measuring information on variables of interest in order to answer research questions and test hypotheses. The two main types of data are qualitative and quantitative. Qualitative data is non-numerical, descriptive data often in the form of words, while quantitative data is numerical and can be mathematically computed. Common qualitative methods include interviews and focus groups, while quantitative methods include surveys, experiments, and observational studies. The document also discusses mixed methods research, which combines qualitative and quantitative approaches.
This document discusses various methods of data collection for research. It begins by defining data collection as the process of systematically gathering and measuring information to answer research questions and test hypotheses. It then covers the main types of data (qualitative, quantitative, and mixed methods) and their characteristics. A number of specific primary and secondary data collection methods are outlined, including questionnaires, interviews, observations, experiments, and existing records/documents. Key points about ensuring high quality data collection through appropriate instruments and instructions are also made.
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...PhD Assistance
Outcomes in health-related issues including psychological, educational, Behavioral, environmental, and social are intended to sustain positive change by digital interferences. These changes may be delivered using any digital device like a phone or computer, and make them gainful for the provider. Complex and large-scale datasets that contain usage data can be yielded by testing a digital intervention. This data provides invaluable detail about how the users interact with these interventions and notify their knowledge of engagement, if they are analyzed properly. This paper recommends an innovative framework for the process of analyzing usage associated with a digital intervention .
PhD Assistance is an Academic The Best Dissertation Writing Service & Consulting Support Company established in 2001. specialiWeze in providing PhD Assignments, PhD Dissertation Writing Help , Statistical Analyses, and Programming Services to students in the USA, UK, Canada, UAE, Australia, New Zealand, Singapore and many more.
Website Visit: https://bit.ly/3dANXUD
Contact Us:
UK NO: +44-1143520021
India No: +91-8754446690
Email: info@phdassistance.com
Developing of climate data for building simulation with future weather condit...Rasmus Madsen
Today, climate models are used frequently to describe past, current or future climate conditions in par-ticular building simulation. A research study of how future climate change will affect the future indoor environment and buildings energy use in a Danish context has been conducted. To fulfil this research study, information of how climate models are developed are needed as well. The research study includes an objective descriptive approach from both Danish and global research of the given topic. The gathered information from the publications is evaluated with respect to indicators for the quality of the journals as well as the authors. The method used for development of the Danish design reference year, is not clear, and to have a full knowledge of how the climate change will affect building simulation in a Danish context, further research is needed. This research for development of a new Danish weather file will require both a descriptive and analytical research.
This document discusses data architecture and management for data analytics. It begins by defining data architecture and explaining that it is composed of models, policies, and standards that govern how data is collected, stored, integrated, and used. Various factors influence data architecture design, including enterprise requirements, technology drivers, economics, business policies, and data processing needs. The document then outlines three levels of data architecture specification - the logical level, physical level, and implementation level. It also discusses primary and secondary sources of data, with primary sources including observation, surveys, and experiments, and secondary sources including internal sources like sales reports and accounting data as well as external sources.
The document discusses quantitative research methods. It begins by defining quantitative data as pieces of information that can be counted, often from large random samples. Both qualitative and quantitative methods are then described as complementary approaches. Key points about quantitative research include: it aims to determine relationships between variables; designs are descriptive or experimental; it focuses on numbers, logic and objectivity rather than divergent reasoning; and characteristics include using structured instruments, representative large samples, reliability, clearly defined questions, and numerical data. The strengths are broader generalization while weaknesses include less detail and flexibility.
Selecting the correct Data Mining Method: Classification & InDaMiTe-RIOSR Journals
This document describes an intelligent data mining assistant called InDaMiTe-R that aims to help users select the correct data mining method for their problem and data. It presents a classification of common data mining techniques organized by the goal of the problem (descriptive vs predictive) and the structure of the data. This classification is meant to model the human decision process for selecting techniques. The document then describes InDaMiTe-R, which uses a case-based reasoning approach to recommend techniques based on past user experiences with similar problems and data. An example of its use is provided to illustrate how it extracts problem metadata, gets user restrictions, recommends initial techniques, and learns from the user's evaluations to improve future recommendations. A small evaluation
The document discusses data mining and knowledge discovery in databases. It defines data mining as the nontrivial extraction of implicit and potentially useful information from large amounts of data. With huge increases in data collection and storage, data mining aims to analyze data and discover patterns that can provide insights and knowledge about businesses and the real world. The data mining process involves selecting, preprocessing, transforming, and analyzing data to extract hidden patterns and relationships, which are then interpreted and evaluated.
Similar to Probabilistic Interestingness Measures - An Introduction with Bayesian Belief Networks (20)
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes.
In this presentation we discuss Microsoft HDInsight offering of Spark. Azure HDInsight, Microsoft’s managed Hadoop and Spark cloud service that runs the Hortonworks Data Platform. Spark for Azure HDInsight offers customers an enterprise-ready Spark solution that’s fully managed, secured, and highly available and made simpler for users with compelling and interactive experiences.
Data science with Windows Azure - A Brief IntroductionAdnan Masood
Data Science with Windows Azure is an introduction to HDInsight and Hadoop offerings from Microsoft Machine Learning and Big Data Cloud based platform. This was presented at Microsoft Data Science Group – Tampa Analytics Professionals.
Restructuring Technical Debt - A Software and System Quality ApproachAdnan Masood
Agile Software Architecture based overview of the technical debt metaphor … idea is that developers sometimes accept compromises in a system in one dimension (e.g., modularity) to meet an urgent demand in some other dimension (e.g., a deadline), and that such compromises incur a "debt": on which "interest" has to be paid and which the "principal" should be repaid at some point for the long-term health of the project. (ACM)
System Quality Attributes for Software ArchitectureAdnan Masood
Software Quality Attributes are the benchmarks that describe system’s intended behavior. These slides go through an overview of what some of these attributes are and how to evaluate them.
The document discusses Agile project management and Scrum frameworks. It describes Agile as a collection of values and principles for software development that emphasize individuals, working software, customer collaboration, and responding to change. Scrum is presented as a framework that uses short iterations called sprints, daily stand-up meetings, and emphasizes self-organizing cross-functional teams to deliver working software increments for customer feedback. Effective Scrum teams are autonomous, strive for continuous improvement, and have members with diverse skills who collaborate well.
Bayesian Networks - A Brief IntroductionAdnan Masood
- A Bayesian network is a graphical model that depicts probabilistic relationships among variables. It represents a joint probability distribution over variables in a directed acyclic graph with conditional probability tables.
- A Bayesian network consists of a directed acyclic graph whose nodes represent variables and edges represent probabilistic dependencies, along with conditional probability distributions that quantify the relationships.
- Inference using a Bayesian network allows computing probabilities like P(X|evidence) by taking into account the graph structure and probability tables.
Web API or WCF - An Architectural ComparisonAdnan Masood
ASP.NET Web API is a framework that makes it easy to build HTTP services that reach a broad range of clients, including browsers and mobile devices. The new ASP.NET Web API is a continuation of the previous WCF Web API projection. WCF was originally created to enable SOAP-based services and other related bindings. However, for simpler RESTful or RPCish services (think clients like jQuery) ASP.NET Web API is a good choice.
In this meeting we discussed what do you need to understand as an architect to implement your service oriented architecture using WCF or ASP.NET web API. With code samples, we will elaborate on WCF Web API’s transition to ASP.NET Web API and respective constructs such as Service vs. Web API controller, Operation vs. Action, URI templates vs ASP.NET Routing, Message handlers, Formatters and Operation handlers vs Filters, model binders. WebApi offers support for modern HTTP programming model with full support for ASP.NET Routing, content negotiation and custom formatters, model binding and validation, filters, query composition, is easy to unit test and offers improved Inversion of Control (IoC) via DependencyResolver.
You will walk away with a sample set of services that run on Silverlight, Windows Forms, WPF, Windows Phone and ASP.NET.
SOLID Principles of Refactoring Presentation - Inland Empire User GroupAdnan Masood
Abstract: SOLID is a mnemonic acronym coined by Robert C. Martin (aka Uncle Bob) referring to a collection of design principles of object-oriented programming and design. By using these principles, developers are much more likely to create a system that more maintainable and extensible. SOLID can be used to remove code smells by refactoring. In this session, you will learn about the following SOLID principles with code examples demonstrating the corresponding refactoring.
S – Single Responsibility Principle – An Object should have only one reason to change.
O – Open/Closed Principle – A software entity(module, library, routine) should be closed to any modification but be open to extension
L – Liskov Substitution Principle – Derived classes should be substitutable for the base classes
I – Interface Segregation Principle – Having more fine grained interfaces over fat interfaces
D – Dependency Inversion Principle – Depending on abstractions, not concrete implementations.
Brief bibliography of interestingness measure, bayesian belief network and ca...Adnan Masood
This document provides a brief bibliography of papers related to interestingness measures, Bayesian belief networks, and causal inference. It lists over 60 references published between 1961-2012 on topics such as outlier detection, probabilistic graphical models, sensitivity analysis of Bayesian networks, rule interestingness measures, and applications of Bayesian networks in domains like fraud detection and credit risk evaluation. The references are grouped into sections on learning Bayesian networks from data, sensitivity analysis, outlier detection, rule interestingness measures, and applications.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief Networks
1. A N I N T R O D U C T I O N W I T H B A Y E S I A N B E L I E F N E T W O R K S
A D N A N M A S O O D
S C I S . N O V A . E D U / ~ A D N A N
A D N A N @ N O V A . E D U
D O C T O R A L C A N D I D A T E
N O V A S O U T H E A S T E R N U N I V E R S I T Y
Probabilistic Interestingness
Measures
2. Introduction
Interestingness measures play an important role in data mining
regardless of the kind of patterns being mined. Good measures
should select and rank patterns according to their potential interest
to the user. Good measures should also reduce the time and space
cost of the mining process. (Geng & Hamilton, 2007)
Measuring the interestingness of discovered patterns is an active
and important area of data mining research. Although much work
has been conducted in this area, so far there is no widespread
agreement on a formal definition of interestingness in this context.
Based on the diversity of definitions presented to date,
interestingness is perhaps best treated as a very broad concept,
which emphasizes conciseness, coverage, reliability, peculiarity,
diversity, novelty, surprisingness, utility, and actionability. (Geng &
Hamilton, 2007)
3. Overview of interestingness measures
A Survey of Interestingness Measures for Knowledge Discovery (Ken McGarry, 2005)
4. Interestingness Measures & the Ranking
Ref: A Survey of Interestingness Measures for Knowledge Discovery (Ken McGarry, 2005)
6. Interestingness Measures and Expert based
Quality
2 categories:
Objective (D, M)
Computed from data only
Subjective (U)
Hypothesis : goal, domain knowledge
Hard to formalize (novelty)
Quality Measures in Data Mining, (Fabrice Guillet & Howard J. Hamilton, 2007)
7. Interestingness Measure - Definition
i(XY) = f(n, nx, ny, nxy)
General principles:
Semantic and readability for the user
Increasing value with the quality
Sensibility to equiprobability (inclusion)
Statistic Likelihood (confidence in the measure itself)
Noise resistance, time stability
Surprisingness, nuggets ?
8. Principle
Statistics on data D (transactions) for each rule
R=XY
Interestingness measure = i(R,D,H)
Degree of satisfaction of the hypothesis H in D
independently of U
9. Properties in the Literature
Properties of i(XY) = f(n, nx, ny, nxy)
[Piatetsky-Shapiro 1991] (strong rules):
(P1) =0 if X and Y are independent
(P2) increases with examples nxy
(P3) decreases with premise nx (or conclusion ny)(?)
[Major & Mangano 1993]:
(P4) increases with nxy when confidence is constant (nxy/nx)
[Freitas 1999]:
(P5) asymmetry (i(XY)/=i(YX))
Small disjunctions (nuggets)
[Tan et al. 2002], [Hilderman & Hamilton 2001] and [Gras et al. 2004]
10. Selected Properties
Inclusion and equiprobability
0, interval of security
Independence
0, interval of security
Bounded maximum value
Comparability, global threshold, inclusion
Non linearity
Noise Resistance, interval of security for independence and
equiprobability
Sensibility
N (nuggets), dilation (likelihood)
Frequency p(X) cardinal nx
Reinforcement by similar rules (contra-positive, negative
rule,…)
[Smyth & Goodman 1991][Kodratoff 2001][Gras et al 2001][Gras et al. 2004]
11. Interestingness Measure Classifying Criteria
These interestigness measures can be categorized into
three classifications: objective, subjective, and semantics-
based.
Objective Measure: An objective measure is based
only on the raw data. No knowledge about the user or
application is required. Most objective measures are
based on theories in probability, statistics, or
information theory. Conciseness, generality, reliability,
peculiarity, and diversity depend only on the data and
patterns, and thus can be considered objective.
12. Interestingness Measure Classifying Criteria
Subjective Measure: A subjective measure takes into
account both the data and the user of these data. To define a
subjective measure, access to the user’s domain or
background knowledge about the data is required. This access
can be obtained by interacting with the user during the data
mining process or by explicitly representing the user’s
knowledge or expectations. In the latter case, the key issue is
the representation of the user’s knowledge, which has been
addressed by various frameworks and procedures for data
mining [Liu et al. 1997, 1999; Silberschatz and Tuzhilin 1995,
1996; Sahar 1999]. Novelty and surprisingness depend on the
user of the patterns, as well as the data and patterns
themselves, and hence can be considered subjective.
13. Interestingness Measure Classifying Criteria
Semantic Measure: A semantic measure considers the semantics and
explanations of the patterns. Because semantic measures involve domain
knowledge from the user, some researchers consider them a special type of
subjective measure [Yao et al. 2006]. Utility and actionability depend on
the semantics of the data, and thus can be considered semantic. Utility-
based measures, where the relevant semantics are the utilities of the
patterns in the domain, are the most common type of semantic measure.
To use a utility-based approach, the user must specify additional
knowledge about the domain. Unlike subjective measures, where the
domain knowledge is about the data itself and is usually represented in a
format similar to that of the discovered pattern, the domain knowledge
required for semantic measures does not relate to the user’s knowledge or
expectations concerning the data. Instead, it represents a utility function
that reflects the user’s goals. This function should be optimized in the
mined results. For example, a store manager might prefer association rules
that relate to high-profit items over those with higher statistical
significance.
16. Conciseness
A pattern is concise if it contains relatively few
attribute-value pairs, while a set of patterns is
concise if it contains relatively few patterns. A
concise pattern or set of patterns is relatively easy to
understand and remember and thus is added more
easily to the user’s knowledge (set of beliefs).
Accordingly, much research has been conducted to
find a minimum set of patterns, using properties
such as monotonicity [Padmanabhan and Tuzhilin
2000] and confidence invariance [Bastide et al.
2000].
17. Generality/Coverage
A pattern is general if it covers a relatively large subset of a dataset.
Generality (or coverage) measures the comprehensiveness of a pattern, that
is, the fraction of all records in the dataset that matches the pattern. If a
pattern characterizes more information in the dataset, it tends to be more
interesting [Agrawal and Srikant 1994; Webb and Brain 2002]. Frequent
itemsets are the most studied general patterns in the data mining literature.
An itemset is a set of items, such as some items from a grocery basket. An
itemset is frequent if its support, the fraction of records in the dataset
containing the itemset, is above a given threshold [Agrawal and Srikant
1994].
The best known algorithm for finding frequent itemsets is the Apriori
algorithm [Agrawal and Srikant 1994]. Some generality measures can form
the bases for pruning strategies; for example, the support measure is used
in the Apriori algorithm as the basis for pruning itemsets. For classification
rules, Webb and Brain [2002] gave an empirical evaluation showing how
generality affects classification results. Generality frequently coincides with
conciseness because concise patterns tend to have greater coverage.
18. Reliability
A pattern is reliable if the relationship described by
the pattern occurs in a high percentage of applicable
cases. For example, a classification rule is reliable if
its predictions are highly accurate, and an
association rule is reliable if it has high confidence.
Many measures from probability, statistics, and
information retrieval have been proposed to measure
the reliability of association rules [Ohsaki et al.
2004; Tan et al. 2002].
19. Peculiarity
A pattern is peculiar if it is far away from other
discovered patterns according to some distance
measure. Peculiar patterns are generated from
peculiar data (or outliers), which are relatively few in
number and significantly different from the rest of
the data [Knorr et al. 2000; Zhong et al. 2003].
Peculiar patterns may be unknown to the user, hence
interesting.
20. Diversity
A pattern is diverse if its elements differ significantly from
each other, while a set of patterns is diverse if the patterns in
the set differ significantly from each other. Diversity is a
common factor for measuring the interestingness of
summaries [Hilderman and Hamilton 2001]. According to a
simple point of view, a summary can be considered diverse if
its probability distribution is far from the uniform
distribution. A diverse summary may be interesting because
in the absence of any relevant knowledge, a user commonly
assumes that the uniform distribution will hold in a summary.
According to this reasoning, the more diverse the summary is,
the more interesting it is. We are unaware of any existing
research on using diversity to measure the interestingness of
classification or association rules.
21. Novelty
A pattern is novel to a person if he or she did not know it
before and is not able to infer it from other known patterns.
No known data mining system represents everything that a
user knows, and thus, novelty cannot be measured explicitly
with reference to the user’s knowledge. Similarly, no known
data mining system represents what the user does not know,
and therefore, novelty cannot be measured explicitly with
reference to the user’s ignorance. Instead, novelty is detected
by having the user either explicitly identify a pattern as novel
[Sahar 1999] or notice that a pattern cannot be deduced from
and does not contradict previously discovered patterns. In the
latter case, the discovered patterns are being used as an
approximation to the user’s knowledge.
22. Surprisingness
A pattern is surprising (or unexpected) if it contradicts a
person’s existing knowledge or expectations [Liu et al.
1997, 1999; Silberschatz and Tuzhilin 1995, 1996]. A
pattern that is an exception to a more general pattern
which has already been discovered can also be
considered surprising [Bay and Pazzani 1999; Carvalho
and Freitas 2000]. Surprising patterns are interesting
because they identify failings in previous knowledge and
may suggest an aspect of the data that needs further
study. The difference between surprisingness and novelty
is that a novel pattern is new and not contradicted by any
pattern already known to the user, while a surprising
pattern contradicts the user’s previous knowledge or
expectations.
23. Utility
A pattern is of utility if its use by a person
contributes to reaching a goal. Different people may
have divergent goals concerning the knowledge that
can be extracted from a dataset. For example, one
person may be interested in finding all sales with
high profit in a transaction dataset, while another
may be interested in finding all transactions with
large increases in gross sales. This kind of
interestingness is based on user-defined utility
functions in addition to the raw data [Chan et al.
2003; Lu et al. 2001; Yao et al. 2004; Yao and
Hamilton 2006].
24. Actionability
A pattern is actionable (or applicable) in some
domain if it enables decision making about future
actions in this domain [Ling et al. 2002;Wang et al.
2002]. Actionability is sometimes associated with a
pattern selection strategy. So far, no general method
for measuring actionability has been devised.
Existing measures depend on the applications. For
example, Ling et al. [2002], measured actionability
as the cost of changing the customer’s current
condition to match the objectives, whereas Wang et
al. [2002], measured actionability as the profit that
an association rule can bring.
26. Objective interestingness measures
Problems:
nappies⇒babyfood
nappies⇒beer
We can reasonably expect that the sales of baby
food and nappies occur together frequently
27. Limits of Support
Support: supp(XY) = freq(XUY)
Generality of the rule
Minimum support threshold (ex: 10%)
Reduce the complexity
Specific rule (low support)
Valid rule (high confidence)
High potential of novelty/surprise
28. Limits of Confidence
Confidence: conf(XY) = P(Y|X) = freq(XUY)/freq(X)
Validity/logical aspect of the rule (inclusion)
Minimal confidence threshold (ex: 90%)
Reduces the amount of extracted rules
Interestingness /= validity
No detection of independence
Independence:
X and Y are independent: P(Y|X) = P(Y)
If P(Y) is high => nonsense rule with high support
Ex: Couches beer (supp=20%, conf=90%) if supp(beer)=90%
[Guillaume et al. 1998], [Lallich et al. 2004]
29. Limits of the Pair Support-Confidence
In practice:
High support threshold (10%)
High confidence threshold (90%)
Valid and general rules
Common Sense but not novelty
Efficient measures but insufficient to capture quality
30. Subjective interestingness measures
Unexpected (What’s interesting?):
Same condition, but different consequences
Different conditions, but same consequence
31. Subjective interestingness measures
General impression
gi(<S1, …, Sm>) [support, confidence]
↓
Reasonably precise concept
rpc(<S1, …, Sm → V1, …, Vg>) [support, confidence]
↓
Precise knowledge
pk(<S1, …, Sm → V1, …, Vg>) [support, confidence]
Analyzing the Subjective Interestingness of Association Rules
Bing Liu et al., 2000
33. Objective Measures: Examples of Quality Criteria
Criteria of interestingness [Hussein 2000]:
Objective:
Generality : (ex: Support)
Validity: (ex: Confidence)
Reliability: (ex: High generality and validity)
Subjective:
Common Sense: reliable + known yet
Actionability : utility for decision
Novelty: previously unknown
Surprise (Unexpectedness): contradiction ?
34. Association Rules
Association rules [Agrawal et al. 1993]:
Market-basket analysis
Non supervised learning
Algorithms + 2 measures (support and confidence)
Problems:
Enormous amount of rules (rough rules)
Few semantic on support and confidence measures
Need to help the user select the best rules
39. Subjective Measures: Other Subjective Measures
Projected Savings (KEFIR system’s interestingness)
[Matheus & Piatetsky-Shapiro 1994]
Fuzzy Matching Interestingness Measure [Lie et al. 1996]
General Impression [Liu et al. 1997]
Logical Contradiction [Padmanabhan & Tuzhilin’s 1997]
Misclassification Costs [Frietas 1999]
Vague Feelings (Fuzzy General Impressions) [Liu et al.
2000]
Anticipation [Roddick and rice 2001]
Interestingness [Shekar & Natarajan’s 2001]
40. Subjective Measures: Classification
Interestingness Measure Year Application Foundation Scope Subjective
Aspects
User’s Knowledge
Representation
1 Matheus and Piatetsky-
Shapiro’s Projected Savings
1994 Summaries Utilitarian Single
Rule
Unexpectedness Pattern Deviation
2 Klemettinen et al. Rule
Templates
1994 Association
Rules
Syntactic Single
Rule
Unexpectedness
& Actionability
Rule Templates
3 Silbershatz and Tuzhilin’s
Interestingness
1995 Format
Independent
Probabilistic Rule Set Unexpectedness Hard & Soft Beliefs
4 Liu et al. Fuzzy Matching
Interestingness Measure
1996 Classification
rules
Syntactic
Distance
Single
Rule
Unexpectedness Fuzzy Rules
5 Liu et al. General
Impressions
1997 Classification
Rules
Syntactic Single
Rule
Unexpectedness GI, RPK
6 Padmanabhan and Tuzhilin
Logical Contradiction
1997 Association
Rules
Logical, Statistic Single
Rule
Unexpectedness Beliefs XY
7 Freitas’ Attributes Costs 1999 Association
Rules
Utilitarian Single
Rule
Actionability Costs Values
8 Freitas’ Misclassification
Costs
1999 Association
rules
Utilitarian Single rule Actionability Costs Values
9 Liu et al. Vague Feelings
(Fuzzy General
Impressions)
2000 Generalized
Association
Rules
Syntactic Single
Rule
Unexpectedness GI, RPK, PK
10 Roddick and Rice’s
Anticipation
2001 Format
Independent
Probabilistic Single
Rule
Temporal
Dimension
Probability Graph
11 Shekar and Natarajan’s
Interestingness
2002 Association
Rules
Distance Single
Rule
Unexpectedness Fuzzy-graph based
taxonomy
41. List Of Interestingness Measures (cont)
Monodimensional e+, e-
Support [Agrawal et al. 1996]
Ralambrodrainy [Ralambrodrainy, 1991]
Bidimensional - Inclusion
Descriptive-Confirm [Yves Kodratoff, 1999]
Sebag et Schoenauer [Sebag, Schoenauer, 1991]
Examples neg examples ratio (*)
Bidimensional – Inclusion – Conditional Probability
Confidence [Agrawal et al. 1996]
Wang index [Wang et al., 1988]
Laplace (*)
Bidimensional – Analogous Rules
Descriptive Confirmed-Confidence [Yves Kodratoff, 1999] (*)
42. List Of Interestingness Measures (cont.)
Tridimensional – Analogous Rules
Causal Support [Kodratoff, 1999]
Causal Confidence [Kodratoff, 1999] (*)
Causal Confirmed-Confidence [Kodratoff, 1999]
Least contradiction [Aze & Kodratoff 2004] (*)
Tridimensional – Linear - Independent
Pavillon index [Pavillon, 1991]
Rule Interest [Piatetsky-Shapiro, 1991] (*)
Pearl index [Pearl, 1988], [Acid et al., 1991] [Gammerman, Luo, 1991]
Correlation [Pearson 1996] (*)
Loevinger index [Loevinger, 1947] (*)
Certainty factor [Tan & Kumar 2000]
Rate of connection[Bernard et Charron 1996]
Interest factor [Brin et al., 1997]
Top spin(*)
Cosine [Tan & Kumar 2000] (*)
Kappa [Tan & Kumar 2000]
43. List Of Interestingness Measures (cont.)
Tridimensional – Nonlinear – Independent
Chi squared distance
Logarithmic lift [Church & Hanks, 1990] (*)
Predictive association [Tan & Kumar 2000] (Goodman & Kruskal)
Conviction [Brin et al., 1997b]
Odd’s ratio [Tan & Kumar 2000]
Yule’Q [Tan & Kumar 2000]
Yule’s Y [Tan & Kumar 2000]
Jaccard [Tan & Kumar 2000]
Klosgen [Tan & Kumar 2000]
Interestingness [Gray & Orlowska, 1998]
Mutual information ratio (Uncertainty) [Tan et al., 2002]
J-measure [Smyth & Goodman 1991] [Goodman & Kruskal 1959] (*)
Gini [Tan et al., 2002]
General measure of rule interestingness [Jaroszewicz & Simovici, 2001] (*)
44. List Of Interestingness Measures (cont.)
Quadridimensional – Linear – independent
Lerman index of similarity[Lerman, 1981]
Index of Involvement[Gras, 1996]
Quadridimensional – likeliness (conditional probability?) of
dependence
Probability of error of Chi2 (*)
Intensity of Involvement [Gras, 1996] (*)
Quadridimensional – Inclusion – dependent – analogous rules
Entropic intensity of Involvement [Gras, 1996] (*)
TIC [Blanchard et al., 2004] (*)
Others
Surprisingness (*) [Freitas, 1998]
+ rules of exception [Duval et al. 2004]
+ rule distance, similarity [Dong & Li 1998]
45. Belief Based Interestingness Measure
Using a belief system is also the approach adopted by
Padmanabhan and Tuzhilin for discovering exception rules that
contradict belief rules.
Consider a belief X → Y and a rule A → B, where both X and A
are conjunctions of atomic conditions and both Y and B are
single atomic conditions on boolean attributes.
A rule A → B is unexpected with respect to the belief X → Y on
the dataset D if the following conditions hold:
1. B and Y logically contradict each other.
2. X ∧ A holds on a statistically large subset of tuples in D.
3. A,X → B holds and since B and Y logically contradict each
other, it follows that A,X → ¬Y also holds.
46. Unexpectedness and the Interestingness
Measures
Silberschatz and Tuzhilin used the term unexpectedness in the
context of interestingness measures for patterns evaluation.
They classify such measures into objective (data-driven) and
subjective (user-driven) measures. According to them, from the
subjective point of view, a pattern is interesting if it is:
Actionable: the end-user can act on it to her/his advantage.
Unexpected: the end-user is surprised by such findings.
As pointed out by the authors, the actionability is subtle and
difficult to capture; they propose rather to capture it through
unexpectedness, arguing that unexpected patterns are those that
lead the expert of the domain to make some actions.
49. Interestingness Measures and Bayesian Belief
Network
In the framework presented by Silberschatz and Tuzhilin, evaluating
the unexpectedness of a discovered pattern is done according to a
Belief System that the user has: the more the pattern disagrees with a
belief system, the more unexpected it is.
There are two kinds of beliefs. On one hand, hard beliefs are those
beliefs that are always true and that cannot be changed. In this case,
detecting a contradicting pattern means that something is wrong with
the data used to find this pattern. On the other hand, soft beliefs are
those that the user is willing to change with a new evidence. Each soft
belief is assigned with a degree specifying how the user is confident in
it. In their work, the authors proposed five approaches to affect such
degrees: Bayesian, Dempster-Shafer, Frequency, Cyc’s and Statistical
approaches.
The authors (Silberschatz and Tuzhilin) claim that the
Bayesian one is the most appropriate for defining the degree
of beliefs even if any other approach they have defined can
be used.
50. Conclusion and Future Work
Quality is a multidimensional concept
Subjective (expert opionion)
Interest = changes with the knowledge of the decision-maker
Extract knowledge / objective decision-maker
Objective (data and rules)
Interest = on the Hypothetical Data: Inclusion, Independence,
Imbalance, nuggets, robustness ...
What is a good index? (ingredients of quality)
The “hybrid” interestingness
Such as paradox detection
Detecting change over time
Bayesian belief networks
51. References & Bibliography
[Agrawal et al., 1993] R. Agrawal, T. Imielinsky et A. Swami. Mining associations rules between sets of items in large databases. Proc.
of ACM SIGMOD'93, 1993, p. 207-216
[Azé & Kodratoff, 2001] J. Azé et Y. Kodratoff. Evaluation de la résistance au bruit de quelques mesures d'extraction de règles
d'association. Extraction des connaissances et apprentissage 1(4), 2001, p. 143-154
[Azé & Kodratoff, 2001] J. Azé et Y. Kodratoff. Extraction de « pépites » de connaissances dans les données : une nouvelle approche et
une étude de sensibilité au bruit. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Bayardo & Agrawal, 1999] R.J. Bayardo et R. Agrawal. Mining the most interesting rules. Proc. of the 5th Int. Conf. on Knowledge
Discovery and Data Mining, 1999, p.145-154.
[Bernadet 2000] M. Bernardet. Basis of a fuzzy knowledge discovery system. Proc. of Principles of Data Mining and Knowledge
Discovery, LNAI 1510, pages 24-33. Springer, 2000.
[Bernard et Charron 1996] J.-M. Bernard et C. Charron. L’analyse implicative bayésienne, une méthode pour l’étude des dépendances
orientées. I. Données binaires, Revue Mathématique Informatique et Sciences Humaines (MISH), vol. 134, 1996, p. 5-38.
[Berti-Equille 2004] L. Berti-équille. Etat de l'art sur la qualité des données : un premier pas vers la qualité des connaissances. Rapport
d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Blanchard et al. 2001] J. Blanchard, F. Guillet, et H. Briand. L'intensité d'implication entropique pour la recherche de règles de
prédiction intéressantes dans les séquences de pannes d'ascenseurs. Extraction des Connaissances et Apprentissage (ECA), Hermès
Science Publication, 1(4):77-88, 2002.
[Blanchard et al. 2003] J. Blanchard, F. Guillet, F. Rantière, H. Briand. Vers une Représentation Graphique en Réalité Virtuelle pour
la Fouille Interactive de Règles d’Association. Extraction des Connaissances et Apprentissage (ECA), vol. 17, n°1-2-3, 105-118, 2003.
Hermès Science Publication. ISSN 0992-499X, ISBN 2-7462-0631-5
[Blanchard et al. 2003a] J. Blanchard, F. Guillet, H. Briand. Une visualisation orientée qualité pour la fouille anthropocentrée de
règles d’association. In Cognito - Cahiers Romans de Sciences Cognitives. A paraître. ISSN 1267-8015
[Blanchard et al. 2003b] J. Blanchard, F. Guillet, H. Briand. A User-driven and Quality oriented Visualiation for Mining Association
Rules. In Proc. Of the Third IEEE International Conference on Data Mining, ICDM’2003, Melbourne, Florida, USA, November 19 - 22,
2003.
[Blanchard et al., 2004] J. Blanchard, F. Guillet, R. Gras, H. Briand. Mesurer la qualité des règles et de leurs contraposées avec le taux
informationnel TIC. EGC2004, RNTI, Cépaduès. 2004 A paraître.
[Blanchard et al., 2004a] J. Blanchard, F. Guillet, R. Gras, H. Briand. Mesure de la qualité des règles d'association par l'intensité
d'implication entropique. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Breiman & al. 1984] L.Breiman, J. Friedman, R. Olshen and C.Stone. Classification and Regression Trees. Chapman & Hall,1984.
[Briand et al. 2004] H. Briand, M. Sebag, G. Gras et F. Guillet (eds). Mesures de Qualité pour la fouille de données. Revue des
Nouvelles Technologies de l’Information, RNTI, Cépaduès, 2004. A paraître.
[Brin et al., 1997] S. Brin, R. Motwani and C. Silverstein. Beyond Market Baskets: Generalizing Association Rules to Correlations. In
Proceedings of SIGMOD’97, pages 265-276, AZ, USA, 1997.
[Brin et al., 1997b] S. Brin, R. Motwani, J. Ullman et S. Tsur. Dynamic itemset counting and implication rules for market basket data.
Proc. of the Int. Conf. on Management of Data, ACM Press, 1997, p. 255-264.
52. References & Bibliography
[Church & Hanks, 1990] K. W. Church et P. Hanks. Word association norms, mutual information and lexicography. Computational
Linguistics, 16(1), 22-29, 1990.
[Clark & Robin 1991] Peter Clark and Robin Boswell: Rule Induction with CN2: Some Recent Improvements. In Proceeding of the
European Working Session on Learning EWSL-91, 1991.
[Dong & Li, 1998] G. Dong and J. Li. Interestingness of Discovered Association Rules in terms of Neighborhood-Based Unexpectedness.
In X. Wu, R. Kotagiri and K. Korb, editors, Proc. of 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD `98),
Melbourne, Australia, April 1998.
[Duval et al. 2004] B. Duval, A. Salleb, C. Vrain. Méthodes et mesures d’intérêt pour l’extraction de règles d’exception. Rapport
d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Fleury 1996] L. Fleury. Découverte de connaissances pour la gestion des ressources humaines. Thèse de doctorat, Université de Nantes,
1996.
[Frawley & Piatetsky-Shapiro 1992] Frawley W. Piatetsky-Shapiro G. and Matheus C., « Knowledge discovery in databases: an
overview », AI Magazine, 14(3), 1992, pages 57-70
[Freitas, 1998] A. A. Freitas. On Objective Measures of Rule Suprisingness. In J. Zytkow and M. Quafafou, editors, Proceedings of the
Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD `98), pages 1-9, Nantes, France,
September 1998.
[Freitas, 1999] A. Freitas. On rule interestingness measures. Knowledge-Based Systems Journal 12(5-6), 1999, p. 309-315.
[Gago & Bento, 1998 ] P. Gago and C. Bento. A Metric for Selection of the Most Promising Rules. PKDD’98, 1998.
[Gray & Orlowska, 1998] B. Gray and M. E. Orlowska. Ccaiia: Clustering Categorical Attributes into Interesting Association Rules. In
X. Wu, R. Kotagiri and K. Korb, editors, Proc. of 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD `98), pages
132 43, Melbourne, Australia, April 1998.
[Goodman & Kruskal 1959] L. A. Goodman andW. H. Kruskal. Measures of Association for Cross Classification, ii: Further discussion
and references. Journal of the American Statistical Association, ??? 1959.
[Gras et al. 1995] R. Gras, H. Briand and P. Peter. Structuration sets with implication intensity. Proc. of the Int. Conf. On Ordinal and
Symbolic Data Analysis - OSDA 95. Springer, 1995.
[Gras, 1996] R. Gras et coll.. L'implication statistique - Nouvelle méthode exploratoire de données. La pensée sauvage éditions, 1996.
[Gras et al. 2001] R. Gras, P. Kuntz, et H. Briand. Les fondements de l'analyse statistique implicative et quelques prolongements pour la
fouille de données. Mathématiques et Sciences Humaines : Numéro spécial Analyse statistique implicative, 1(154-155) :9-29, 2001.
[Gras et al. 2001b] R. Gras, P. Kuntz, R. Couturier, et F. Guillet. Une version entropique de l'intensité d'implication pour les corpus
volumineux. Extraction des Connaissances et Apprentissage (ECA), Hermès Science Publication, 1(1-2) :69-80, 2001.
[Gras et al. 2002] R. Gras, F. Guillet, et J. Philippe. Réduction des colonnes d'un tableau de données par quasi-équivalence entre
variables. Extraction des Connaissances et Apprentissage (ECA), Hermès Science Publication, 1(4) :197-202, 2002.
[Gras et al. 2004] R. Gras, R. Couturier, J. Blanchard, H. Briand, P. Kuntz, P. Peter. Quelques critères pour une mesure de la qualité des
règles d’association. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Guillaume et al. 1998] S. Guillaume, F. Guillet, J. Philippé. Improving the discovery of associations Rules with Intensity of
implication. Proc. of 2nd European Symposium Principles of data Mining and Knowledge Discovery, LNAI 1510, p 318-327. Springer
1998.
[Guillaume 2002] S. Guillaume. Discovery of Ordinal Association Rules. M.-S. Cheng, P. S. Yu, B. Liu (Eds.), Proc. Of the 6th Pacific-
sia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2002, LNCS 2336, pages 322-327 Springer 2002.
53. References & Bibliography
[Guillet et al. 1999] F. Guillet, P. Kuntz, et R. Lehn. A genetic algorithm for visualizing networks of association rules. Proc. the 12th Int.
Conf. On Industrial and Engineering Appl. of AI and Expert Systems, LNCS 1611, pages 145-154. Springer 1999
[Guillet 2000] F. Guillet. Mesures de qualité de règles d’association. Cours DEA-ECD. Ecole polytechnique de l’université de Nantes.
2000.
[Hilderman & Hamilton, 1998] R. J. Hilderman and H. J. Hamilton. Knowledge Discovery and Interestingness Measures: A Survey.
(KDD `98), ??? New-York 1998.
[Hilderman et Hamilton, 2001] R. Hilderman et H. Hamilton. Knowledge discovery and measures of interest. Kluwer Academic
publishers, 2001.
[Hussain et al. 2001] F. Hussain, H. Liu, E. Suzuki and H. Lu. Exception Rule Mining with a Relative Interestingness Measure. ???
[Jaroszewicz & Simovici, 2001] S. Jaroszewicz et D.A. Simovici. A general measure of rule interestingness. Proc. of the 7th Int. Conf.
on Knowledge Discovery and Data Mining, L.N.C.S. 2168, Springer, 2001, p. 253-265
[Klemettinen et al. 1994] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen and A. I. Verkamo. Finding Interesting Rules from
Large Sets of Discovered Association Rules. In N. R. Adam, B. K. Bhargava and Y. Yesha, editors, Proc. of the Third International Conf. on
Information and Knowledge Management``, pages 401-407, Gaitersburg, Maryland, 1994.
[Kodratoff, 1999] Y. Kodratoff. Comparing Machine Learning and Knowledge Discovery in Databases:An Application to Knowledge
Discovery in Texts. Lecture Notes on AI (LNAI)-Tutorial series. 2000.
[Kuntz et al. 2000] P.Kuntz, F.Guillet, R.Lehn and H.Briand. A User-Driven Process for Mining Association Rules. In D. Zighed, J.
Komorowski and J.M. Zytkow (Eds.), Principles of Data Mining and Knowledge Discovery (PKDD2000), Lecture Notes in Computer
Science, vol. 1910, pages 483-489, 2000. Springer.
[Kodratoff, 2001] Y. Kodratoff. Comparing machine learning and knowledge discovery in databases: an application to knowledge
discovery in texts. Machine Learning and Its Applications, Paliouras G., Karkaletsis V., Spyropoulos C.D. (eds.), L.N.C.S. 2049, Springer,
2001, p. 1-21.
[Kuntz et al. 2001] P. Kuntz, F. Guillet, R. Lehn and H. Briand. A user-driven process for mining association rules. Proc. of Principles of
Data Mining and Knowledge Discovery, LNAI 1510, pages 483-489. Springer, 2000.
[Kuntz et al. 2001b] P. Kuntz, F. Guillet, R. Lehn, et H. Briand. Vers un processus d'extraction de règles d'association centré sur
l'utilisateur. In Cognito, Revue francophone internationale en sciences cognitives, 1(20) :13-26, 2001.
[Lallich et al. 2004] S. Lallich et O. Teytaud . Évaluation et validation de l’intérêt des règles d’association. Rapport d’activité du groupe
gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Lehn et al. 1999] R.Lehn, F.Guillet, P.Kuntz, H.Briand and J. Philippé. Felix : An interactive rule mining interface in a kdd process. In
P. Lenca (editor), Proc. of the 10th Mini-Euro Conference, Human Centered Processes, HCP’99, pages 169-174, Brest, France, September
22-24, 1999.
[Lenca et al. 2004] P. Lenca, P. Meyer, B. Vaillant, P. Picouet, S. Lallich. Evaluation et analyse multi-critères des mesures de qualité des
règles d’association. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Lerman et al. 1981] I. C. Lerman, R. Gras et H. Rostam. Elaboration et évaluation d’un indice d’implication pour les données binaires.
Revue Mathématiques et Sciences Humaines, 75, p. 5-35, 1981.
[Lerman, 1981] I. C. Lerman. Classification et analyse ordinale des données. Paris, Dunod 1981.
[Lerman, 1993] I. C. Lerman. Likelihood linkage analysis classification method, Biochimie 75, p. 379-397, 1993.
[Lerman & Azé 2004] I. C. Lerman et J. Azé.Indidice probabiliste discriminant de vraisemblance du lien pour des données
volumineuses. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
54. References & Bibliography
[Liu et al., 1999] B. Liu, W. Hsu, L. Mun et H. Lee. Finding interesting patterns using user expectations. IEEE Transactions on
Knowledge and Data Engineering 11, 1999, p. 817-832.
[Loevinger, 1947] J. Loevinger. A systemic approach to the construction and evaluation of tests of ability. Psychological monographs,
61(4), 1947.
[Mannila & Pavlov, 1999] H. Mannila and D. Pavlov. Prediction with Local Patterns using Cross-Entropy. Technical Report,
Information and Computer Science, University of California, Irvine, 1999.
[Matheus & Piatetsky-Shapiro, 1996] C. J. Matheus and G. Piatetsky-Shapiro. Selecting and Reporting what is Interesting: The
KEFIR Application to Healthcare data. In U. M. Fayyad, G. Piatetsky-Shapiro, P.Smyth and R. Uthurusamy (eds), Advances in Knowledge
Discovery and Data Mining, p. 401-419, 1996. AAAI Press/MIT Press. [Meo 2000] R. Meo. Theory of dependence values, ACM
Transactions on Database Systems 5(3), p. 380-406, 2000.
[Padmanabhan et Tuzhilin, 1998] B. Padmanabhan et A. Tuzhilin. A belief-driven method for discovering unexpected patterns. Proc.
Of the 4th Int. Conf. on Knowledge Discovery and Data Mining, 1998, p. 94-100.
[Pearson, 1896] K. Pearson. Mathematical contributions to the theory of evolution. III. regression, heredity and panmixia. Philosophical
Transactions of the Royal Society, vol. A, 1896.
[Piatestsky-Shapiro, 1991] G. Piatestsky-Shapiro. Discovery, analysis, and presentation of strong rules. Knowledge Discovery in
Databases. Piatetsky-Shapiro G., Frawley W.J. (eds.), AAAI/MIT Press, 1991, p. 229-248
[Popovici, 2003] E. Popovici. Un atelier pour l'évaluation des indices de qualité. Mémoire de D.E.A. E.C.D., IRIN/Université
Lyon2/RACAI Bucarest, Juin 2003
[Ritschard & al., 1998] G. Ritschard, D. A. Zighed and N. Nicoloyannis. Maximiser l`association par agrégation dans un tableau croisé.
In J. Zytkow and M. Quafafou, editors, Proc. of the Second European Conf. on the Principles of Data Mining and Knowledge Discovery
(PKDD `98), Nantes, France, September 1998.
[Sebag et Schoenauer, 1988] M. Sebag et M. Schoenauer. Generation of rules with certainty and confidence factors from incomplete
and incoherent learning bases. Proc. of the European Knowledge Acquisition Workshop (EKAW'88), Boose J., Gaines B., Linster M.
(eds.), Gesellschaft für Mathematik und Datenverarbeitung mbH, 1988, p. 28.1-28.20.
[Shannon & Weaver, 1949] C.E. Shannon et W. Weaver. The mathematical theory of communication. University of Illinois Press,
1949.
[Silbershatz &Tuzhilin,1995] Avi Silberschatz and Alexander Tuzhilin. On Subjective Measures of Interestingness in Knowledge
Discovery, (KD. & DM. `95) ??? , 1995.
[Smyth & Goodman, 1991] P. Smyth et R.M. Goodman. Rule induction using information theory. Knowledge Discovery in Databases,
Piatetsky- Shapiro G., Frawley W.J. (eds.), AAAI/MIT Press, 1991, p. 159-176
[Tan & Kumar 2000] P. Tan, V. Kumar. Interestingness Measures for Association Patterns : A Perspective. Workshop tutorial (KDD
2000).
[Tan et al., 2002] P. Tan, V. Kumar et J. Srivastava. Selecting the right interestingness measure for association patterns. Proc. of the 8th
Int. Conf. on Knowledge Discovery and Data Mining, 2002, p. 32-41.