In this project, we investigated the use of association rules to extract useful knowledge from raw ontological data. To this end, we proposed an approach to pass from graph representation to transactional data. Then, we used different technological solutions to improve the performance of frequent item-sets extraction such as the FP-growth algorithm, and Hadoop. Check our code on Github: https://github.com/8-chems/OntologyMiner
Study of relevancy, diversity, and novelty in recommender systemsChemseddine Berbague
In the next slides, we present our approach to tackling the conflicting recommendation quality in recommender systems using a genetic-based clustering algorithm. In our approach, we studied the users' tendencies toward diversity and proposed a pairwise similarity measure to amount it. Later, we used the new similarity within a fitness function to create overlapped clusters and to recommend balanced recommendations in terms of diversity and relevancy.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...cscpconf
Optimization problems are dominantly being solved using Computational Intelligence. One of
the issues that can be addressed in this context is problems related to attribute subset selection
evaluation. This paper presents a computational intelligence technique for solving the
optimization problem using a proposed model called Modified Genetic Search Algorithms
(MGSA) that avoids local bad search space with merit and scaled fitness variables, detecting
and deleting bad candidate chromosomes, thereby reducing the number of individual
chromosomes from search space and subsequent iterations in next generations. This paper aims
to show that Rotation forest ensembles are useful in the feature selection method. The base
classifier is multinomial logistic regression method integrated with Haar wavelets as projection
filter and reproducing the ranks of each features with 10 fold cross validation method. It also
discusses the main findings and concludes with promising result of the proposed model. It
explores the combination of MGSA for optimization with Naïve Bayes classification. The result
obtained using proposed model MGSA is validated mathematically using Principal Component
Analysis. The goal is to improve the accuracy and quality of diagnosis of Breast cancer disease
with robust machine learning algorithms. As compared to other works in literature survey,
experimental results achieved in this paper show better results with statistical inferenc
Heuristics for the Maximal Diversity Selection ProblemIJMER
The problem of selecting k items from among a given set of N items such that the ‘diversity’
among the k items is maximum, is a classical problem with applications in many diverse areas such as
forming committees, jury selection, product testing, surveys, plant breeding, ecological preservation,
capital investment, etc. A suitably defined distance metric is used to determine the diversity. However,
this is a hard problem, and the optimal solution is computationally intractable. In this paper we present
the experimental evaluation of two approximation algorithms (heuristics) for the maximal diversity
selection problem
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
Study of relevancy, diversity, and novelty in recommender systemsChemseddine Berbague
In the next slides, we present our approach to tackling the conflicting recommendation quality in recommender systems using a genetic-based clustering algorithm. In our approach, we studied the users' tendencies toward diversity and proposed a pairwise similarity measure to amount it. Later, we used the new similarity within a fitness function to create overlapped clusters and to recommend balanced recommendations in terms of diversity and relevancy.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...cscpconf
Optimization problems are dominantly being solved using Computational Intelligence. One of
the issues that can be addressed in this context is problems related to attribute subset selection
evaluation. This paper presents a computational intelligence technique for solving the
optimization problem using a proposed model called Modified Genetic Search Algorithms
(MGSA) that avoids local bad search space with merit and scaled fitness variables, detecting
and deleting bad candidate chromosomes, thereby reducing the number of individual
chromosomes from search space and subsequent iterations in next generations. This paper aims
to show that Rotation forest ensembles are useful in the feature selection method. The base
classifier is multinomial logistic regression method integrated with Haar wavelets as projection
filter and reproducing the ranks of each features with 10 fold cross validation method. It also
discusses the main findings and concludes with promising result of the proposed model. It
explores the combination of MGSA for optimization with Naïve Bayes classification. The result
obtained using proposed model MGSA is validated mathematically using Principal Component
Analysis. The goal is to improve the accuracy and quality of diagnosis of Breast cancer disease
with robust machine learning algorithms. As compared to other works in literature survey,
experimental results achieved in this paper show better results with statistical inferenc
Heuristics for the Maximal Diversity Selection ProblemIJMER
The problem of selecting k items from among a given set of N items such that the ‘diversity’
among the k items is maximum, is a classical problem with applications in many diverse areas such as
forming committees, jury selection, product testing, surveys, plant breeding, ecological preservation,
capital investment, etc. A suitably defined distance metric is used to determine the diversity. However,
this is a hard problem, and the optimal solution is computationally intractable. In this paper we present
the experimental evaluation of two approximation algorithms (heuristics) for the maximal diversity
selection problem
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS ijcsa
The Cuckoo optimization algorithm (COA) is developed for solving single-objective problems and it cannot be used for solving multi-objective problems. So the multi-objective cuckoo optimization algorithm based on data envelopment analysis (DEA) is developed in this paper and it can gain the efficient Pareto frontiers. This algorithm is presented by the CCR model of DEA and the output-oriented approach of it.The selection criterion is higher efficiency for next iteration of the proposed hybrid method. So the profit function of the COA is replaced by the efficiency value that is obtained from DEA. This algorithm is
compared with other methods using some test problems. The results shows using COA and DEA approach for solving multi-objective problems increases the speed and the accuracy of the generated solutions.
Comparison on PCA ICA and LDA in Face Recognitionijdmtaiir
Face recognition is used in wide range of application.
In recent years, face recognition has become one of the most
successful applications in image analysis and understanding.
Different statistical method and research groups reported a
contradictory result when comparing principal component
analysis (PCA) algorithm, independent component analysis
(ICA) algorithm, and linear discriminant analysis (LDA)
algorithm that has been proposed in recent years. The goal of
this paper is to compare and analyze the three algorithms and
conclude which is best. Feret Dataset is used for consistency
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
ABSTRACT: Genetic Algorithms (GAs) have become increasingly useful over the years for solving combinatorial problems. Though they are generally accepted to be good performers among metaheuristic algorithms, most works have concentrated on the application of the GAs rather than the theoretical justifications. In this paper, we examine and justify the suitability of Genetic Algorithms in solving complex, multi-variable and multi-modal optimization problems. To achieve this, a simple Genetic Algorithm was used to solve four standard complicated optimization test functions, namely Rosenbrock, Schwefel, Rastrigin and Shubert functions. These functions are benchmarks to test the quality of an optimization procedure towards a global optimum. We show that the method has a quicker convergence to the global optima and that the optimal values for the Rosenbrock, Rastrigin, Schwefel and Shubert functions are zero (0), zero (0), -418.9829 and -14.5080 respectively
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attributebased-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
Predictive analysis include techniques fromdata mining that analyze current and historical data and make
predictions about the future. Predictive analytics is used in actuarial science, financial services, retail, travel,
healthcare, insurance, pharmaceuticals, marketing, telecommunications and other fields.Predicting patterns can
be considered as a classification problem and combining the different classifiers gives better results. We will
study and compare three methods used to combine multiple classifiers. Bayesian networks perform
classification based on conditional probability. It is ineffective and easy to interpret as it assumes that the
predictors are independent. Tree augmented naïve Bayes (TAN) constructs a maximum weighted spanning tree
that maximizes the likelihood of the training data, to perform classification.This tree structure eliminates the
independent attribute assumption of naïve Bayesian networks. Behavior-knowledge space method works in two
phases and can provide very good performances if large and representative data sets are available.
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
In search based test data generation, the problem of test data generation is reduced to that of
function minimization or maximization.Traditionally, for branch testing, the problem of test data
generation has been formulated as a minimization problem. In this paper we define an alternate
maximization formulation and experimentally compare it with the minimization formulation. We
use a genetic algorithm as the search technique and in addition to the usual genetic algorithm
operators we also employ the path prefix strategy as a branch ordering strategy and memory and elitism. Results indicate that there is no significant difference in the performance or the coverage obtained through the two approaches and either could be used in test data generation when coupled with the path prefix strategy, memory and elitism.
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...inventionjournals
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Constructing a classification model is important in machine learning for a particular task. A
classification process involves assigning objects into predefined groups or classes based on a
number of observed attributes related to those objects. Artificial neural network is one of the
classification algorithms which, can be used in many application areas. This paper investigates
the potential of applying the feed forward neural network architecture for the classification of
medical datasets. Migration based differential evolution algorithm (MBDE) is chosen and
applied to feed forward neural network to enhance the learning process and the network
learning is validated in terms of convergence rate and classification accuracy. In this paper,
MBDE algorithm with various migration policies is proposed for classification problems using
medical diagnosis.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS ijcsa
The Cuckoo optimization algorithm (COA) is developed for solving single-objective problems and it cannot be used for solving multi-objective problems. So the multi-objective cuckoo optimization algorithm based on data envelopment analysis (DEA) is developed in this paper and it can gain the efficient Pareto frontiers. This algorithm is presented by the CCR model of DEA and the output-oriented approach of it.The selection criterion is higher efficiency for next iteration of the proposed hybrid method. So the profit function of the COA is replaced by the efficiency value that is obtained from DEA. This algorithm is
compared with other methods using some test problems. The results shows using COA and DEA approach for solving multi-objective problems increases the speed and the accuracy of the generated solutions.
Comparison on PCA ICA and LDA in Face Recognitionijdmtaiir
Face recognition is used in wide range of application.
In recent years, face recognition has become one of the most
successful applications in image analysis and understanding.
Different statistical method and research groups reported a
contradictory result when comparing principal component
analysis (PCA) algorithm, independent component analysis
(ICA) algorithm, and linear discriminant analysis (LDA)
algorithm that has been proposed in recent years. The goal of
this paper is to compare and analyze the three algorithms and
conclude which is best. Feret Dataset is used for consistency
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
ABSTRACT: Genetic Algorithms (GAs) have become increasingly useful over the years for solving combinatorial problems. Though they are generally accepted to be good performers among metaheuristic algorithms, most works have concentrated on the application of the GAs rather than the theoretical justifications. In this paper, we examine and justify the suitability of Genetic Algorithms in solving complex, multi-variable and multi-modal optimization problems. To achieve this, a simple Genetic Algorithm was used to solve four standard complicated optimization test functions, namely Rosenbrock, Schwefel, Rastrigin and Shubert functions. These functions are benchmarks to test the quality of an optimization procedure towards a global optimum. We show that the method has a quicker convergence to the global optima and that the optimal values for the Rosenbrock, Rastrigin, Schwefel and Shubert functions are zero (0), zero (0), -418.9829 and -14.5080 respectively
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attributebased-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
Predictive analysis include techniques fromdata mining that analyze current and historical data and make
predictions about the future. Predictive analytics is used in actuarial science, financial services, retail, travel,
healthcare, insurance, pharmaceuticals, marketing, telecommunications and other fields.Predicting patterns can
be considered as a classification problem and combining the different classifiers gives better results. We will
study and compare three methods used to combine multiple classifiers. Bayesian networks perform
classification based on conditional probability. It is ineffective and easy to interpret as it assumes that the
predictors are independent. Tree augmented naïve Bayes (TAN) constructs a maximum weighted spanning tree
that maximizes the likelihood of the training data, to perform classification.This tree structure eliminates the
independent attribute assumption of naïve Bayesian networks. Behavior-knowledge space method works in two
phases and can provide very good performances if large and representative data sets are available.
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
In search based test data generation, the problem of test data generation is reduced to that of
function minimization or maximization.Traditionally, for branch testing, the problem of test data
generation has been formulated as a minimization problem. In this paper we define an alternate
maximization formulation and experimentally compare it with the minimization formulation. We
use a genetic algorithm as the search technique and in addition to the usual genetic algorithm
operators we also employ the path prefix strategy as a branch ordering strategy and memory and elitism. Results indicate that there is no significant difference in the performance or the coverage obtained through the two approaches and either could be used in test data generation when coupled with the path prefix strategy, memory and elitism.
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...inventionjournals
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Constructing a classification model is important in machine learning for a particular task. A
classification process involves assigning objects into predefined groups or classes based on a
number of observed attributes related to those objects. Artificial neural network is one of the
classification algorithms which, can be used in many application areas. This paper investigates
the potential of applying the feed forward neural network architecture for the classification of
medical datasets. Migration based differential evolution algorithm (MBDE) is chosen and
applied to feed forward neural network to enhance the learning process and the network
learning is validated in terms of convergence rate and classification accuracy. In this paper,
MBDE algorithm with various migration policies is proposed for classification problems using
medical diagnosis.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring. In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical data sets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional data sets and machine learning techniques for customer segmentation.
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuantUniversity
As the complexity in AI and Machine Learning processes increases, robust data pipelines need to be
developed for industrial scale model development and deployment. . In regulated industries such as
Finance, Healthcare etc. where automated decision making is increasingly becoming used, tracking
design of experiments and from inception to deployment is critical to ensure a robust process is
adopted. Model Life-cycle management solutions are proposed to track experiments, design robust
experiments for hyper parameter tuning, optimization and selection of models and for monitoring.
The number of choices and the parameters that need to be tracked makes is significantly
challenging to trace experiments and to address reproducibility concerns.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model
changes primarily for AI and ML models. In addition, we discuss how change analytics can be used
for process improvement and to enhance the model development and deployment processes.
Automating Data Science over a Human Genomics Knowledge BaseVaticle
# Automating Data Science over a Human Genomics Knowledge Base
Radouane Oudrhiri, the CTO of Eagle Genomics, will talk about how Eagle Genomics is building a platform for automating data science over a human genomics knowledge base. Rad will dive into the architecture Eagle Genomics and also discuss how Grakn serves as the knowledge base foundation of the system. Rad also give a brief history of databases, semantic expressiveness and how Grakn fits in the big picture.
# Radouane Oudrhiri, CTO, Eagle Genomics
Radouane has an extensive experience in leading world-class software and data-intensive system developments in different industries from Telecom to Healthcare, Nuclear, Automotive, Financials. Radouane is Lean/Six Sigma Master Black Belt with speciality in high-tech, IT and Software engineering and he is recognised as the leader and early adaptor of Lean/Six Sigma and DFSS to IT and Software. He is a fellow of the Royal Statistical Society (RSS) and member of the ISO Technical Committee (TC69: Applications of Statistical methods) where he is co-author of the Lean & Six Sigma Standard (ISO 18404) as well as the new standard under development (Design for Six Sigma). He is also part of the newly formed international Group on Big Data (nominated by BSI as the UK representative/expert). Radouane has also been Chair of the working group on Measurement Systems for Automated Processes/Systems within the ISPE (International Society for Pharmaceutical Engineering).
Agile analytics : An exploratory study of technical complexity managementAgnirudra Sikdar
The thesis involved the reviewing of various case studies to determine the types of modelling, choice of algorithm, types of analytical approaches and trying to determine the various complexities arising from these cases. From these reviews, procedures have been proposed to improve the efficiency and manage the various types of complexities from using agile methodological perspective. Focus was mostly done on Customer Segmentation and Clustering , with the sole purpose to bridge Big Data and Business Intelligence together using Analytic.
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
AI in Finance
Adopting Data Science and Machine Learning in the financial enterpriseQuantUniversity
Financial firms are taking AI and machine learning seriously to augment traditional investment decision making. Alternative datasets including text analytics, cloud computing, algorithmic trading are game changers for many firms who are adopting technology at a rapid pace. As more and more open-source technologies penetrate enterprises, quants and data scientists have a plethora of choices for building, testing and scaling quantitative models. Even though there are multiple solutions and platforms available to build machine learning solutions, challenges remain in adopting machine learning in the enterprise.In this talk we will illustrate a step-by-step process to enable replicable AI/ML research within the enterprise using QuSandbox.
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025.
The link to the ipynb : https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
Applying soft computing techniques to corporate mobile security systemsPaloma De Las Cuevas
Corporate workers increasingly use their own devices for work purposes, in a trend that has come to be called the "Bring Your Own Device" (BYOD) philosophy and companies are starting to include it in their policies. For this reason, corporate security systems need to be redefined and adapted, by the corporate Information Technology (IT) department, to these emerging behaviours. This work proposes applying soft-computing techniques, in order to help the Chief Security Officer (CSO) of a company (in charge of the IT department) to improve the
security policies.
The actions performed be company workers under a BYOD situation will be treated as events: an action or set of actions yielding to a response. Some of those events might cause a non compliance with some corporate policies, and then it would be necessary to define a set of security rules (action, consequence). Furthermore, the processing of the extracted knowledge will allow the rules to be adapted.
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Similar to Ontologies mining using association rules (20)
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting primitives for graph : SHORT REPORT / NOTES
Ontologies mining using association rules
1. Presented by:
ChemsEddine Berbague STIC May. 2015
Supervisor: Pr.Seridi Hassina
Co-supervisor: Dr. Beldjoudi Samia
Jury members:
Dr. Hariati Mehdi
Dr. Mendjel Mehdi
Master Project Presentation
«Association Rules Mining: Ontological Approach»
University of Badji Mokhtar-Annaba
Computer Science Departement
2. 2
• The work presented in the next slides is partially taken
and improved from the work of :
▫ Claudia Marinica. 2010 (Association Rule Interactive Post-processing
using Rule Schemas and Ontologies - ARIPSO).
Note
5. 5
Context and project axe
This Project is about two main tasks:
• Knowledge extraction.
• Ontologies enrichment.
• Axe : mining ontologies using association rules to extract useful
knowledge.
6. 6
Knowledge extraction: general scheme and definition
"...extracting from data original information, previously
unknown, and potentially useful."
[fayyad et al.,1996]
8. 8
STEP 1: generate frequent item-sets
STEP 2: generate association rules
Reduce the number of item-sets using support threshold
Reduce the number of comparisons
FP-Groth algorithm
Calculate support
Hashsets
Using advanced data structure
8
• APRIORI is one of well-known algorithms used for association rules
extraction. It identifies frequent sets from transactional datasets.
Data mining: association rules algorithm
11. 11
The steps of APRIORI algorithm
Step 5
For every frequent set m, generate
all non-empty subsets E
Step 6
For each sub-set non-empty s of E,
generate the rules: "s => (E-s)" if the
confidence C [support (s) / support(E))]>
min_conf
Step 3
Scan the transactional dataset to get the support of
each k-item-set, then filter the set in regard to
min_sup, to get the set 𝑳𝑘 of most frequent k-item-
sets
Step 4
Set of
candidates
= Null
Step 1
Scan the transactional dataset to get the support of
each 1-itemsset
Step 2
use 𝐿𝑘−1 join 𝐿𝑘−1 to generate the set of k-
itemsets candidates. No
Yes
12. 12
• APPRIORI is limited in the different steps of the extraction process:
Wide number of rules.
Semantically meaningless confidence and support measures.
User help is required to extract the targeted rules.
The complexity of the algorithm is O(b).
APRIORI: limitations
13. 13
• Advantages : unsupervised technique, readable results , full sets
• limits: big volume and low quality of the extracted rules:
• invalid statistically.
▫ Onions => pain
• redundant:
▫ R1: X, Y=> Z [c];X => Y [c1]; X => Z [c2]
▫ c1>c or c2>c => R1 is redundant
• Known by the expert
▫ X => Y (rule can be acquired from the context)
• useless for the expert
▫ X => Y (rule is semantically meaningless such as apple implies skirt)
• Difficulty of the manual analyze
• The complexity of the algorithm
▫ Complexity O(b)
• Need:
▫ Eliminate the un-interesting rules.
▫ Target the rules of quality.
Data mining: association rules problematic
14. « an ontology is an explicit and formal
specification of a shared
conceptualization" [Gruber,1993]
14
Knowledge engineering:
the ontologies
« introducing an ontology in an
information system allows to reduce the
conceptual and terminological confusion
and offers a shared understanding that
enhances the communication, the
sharing, , the interpretation, and the
possible re-using"[gandon,2006]
Formal definition:
O={C,G,I,P}
C=Concepts- elements of the domain.
G= Graph of concepts- relation is-one
I=Instances – individuals of the concept
P=Properties- relation between concepts
Food
product
Fruit
grape
red grape
green
grape
appel pear
Dairy
product
milk
cheese
butter
Meat
chicken
beef
15. 15
« semantic web is a part of the current web in which the information is represented
semantically, and allows machines and users to better function
together."
[berners-lee et al.,2001]
• Knowledge representation languages:
▫ RDF,OWL,...
▫ OWL-DL is based on the description logic and can be defined by an accurate and
decidable formalism.
• Reasoning engine:
▫ action-classification of concepts ,test of coherence et test of instantiation.
▫ Fact, Racer, Pellet,...
▫ Querying language: SparQL.
Knowledge engineering: semantic web
16. 16
• Increase the use of ontologies in the process of association rules
extraction:
• Convert the ontologies intro a transaction dataset.
▫ Benefit from the semantic richness to improve the quality of association
rules.
▫ Reduce the complexity of the classical association rules algorithms.
Objectives
17. 17
I. A new method to extract transactional information from the
ontologies.
II. Developing an application that allows to extract, validate, and
visualize the association rules.
III. Using the Framework HADOOP to extract frequent item-sets.
IV. Experimentations on NiceTag ontology.
Contribution
19. 19
• FP-Growth identifies all frequent item-sets without generating candidate item-sets.
• Approach of two steps:
▫ Step 1: Build a compact data structure named FP-tree. This step requires to pass by the
dataset.
▫ Step 2: Directly extract frequent item-sets from the FP-tree.
Complexity problem: FP-Growth algorithm
22. 22
• Features of the selected rules [Silberschats et Tuzhilin,1995] :
▫ Novelty : unexpected rules for the expert.
▫ Actionability : useful rules, allow an expert to take decisions.
• Quality metrics: [Freitas,1999]
▫ Objective measures.
▫ Subjective measures.
• Objective metrics (data-based)
[Piatetsky-shapiro,1991;Guillet and Hamilton,2007]
• Based-data statistical indicator of the association rules significance,
• Advantage : non-supervised quality metrics are easy to apply.
• Disadvantages: not adequate for personalized criterion.
Quality problem: metrics
23. 23
• Models [klementtinen et al., 1994]
• principal: the expert defines his expectations on which the association rules can be selected.
• Representing the expert expectations:
• inclusive pattern (PI) et restrictive pattern (RP)
• Selection technique: syntactic.
• Example:
▫ (PI) Fruit, Dairy products => Meat
▫ (PE) Pear, Dairy products => Meat
▫ R1: Pear, Milk => Pork
▫ R2: Apple , Milk => Chicken
▫ R3: Beef , Milk=> raisin
• R2 is selected.
Quality problem: models
Food
product
Fruit
grape
red grape
green
grape
appel pear
Dairy
product
milk
cheese
butter
Meat
chicken
beef
24. 24
Quality problem: post-processing technique
I. Association rules extraction using the classical method.
II. Knowledge model: enrichment of a model by an expert.
III. Phase of post-processing ARIPSO [Claudia Marinica. 2010] : apply
pruning/selection models.
25. 25
• Previous approaches have a limited use of ontologies.
• Using filtering models is a hard process which depends on the existence
of an expert.
Conclusion
26. 26
Plan de travail
Ontological approach
2
3
4
5
Introduction
Approche existantes
Proposed approach
Application
Conclusion
Description logic
Semantic web and ontologies
Conclusion
27. 27
Ontological layers: T-Box & A-Box
• Attributes assertion
• Concepts assertion
• Associations assertion
• Consistence verification
• Satisfability verification
T-Box A-Box
• Get/ search
• Instance verification
• training
• Coherence testing
Identity
evaluation
homonymie
Search
the text
• Define axioms
• Infer and classify concepts
• Infer associations
• Test the equivalence
• Test the implication
• Test the satisfability
Reasoning
« Extract a
knowledge base is
uncovering hidden
information»
29. 29
• It exists many syntaxes to represent an ontology, we cite among them,
the next:
• Manchester OWL Syntax
OWL/XML
OWL Functional Syntax
RDF/XML
Turtle
Latex
• OWL API permits to interrogate the ontology with different queries.
Ontologies: representation syntaxes
<owl:Class rdf:ID="Lait">
<rdfs:subClassOf
rdf:resource="&food;PotableLiquid"/>
<rdfs:label xml:lang="en">wine</rdfs:label>
<rdfs:label xml:lang="fr">vin</rdfs:label>
</owl:Class>
30. 30
• Exploit the semantic richness to:
▫ Extract transactions:
Step 1 : extract a T-Box model.
Step 2 : extract an A-Box model.
▫ Apply an extraction algorithm to generate the association rules.
How to achieve this task ?
Association rules extraction: ontological approach
31. 31
Ontological approach : general scheme
APPRIORI F-PTREE
Validation and
visualisation
HADOOP
Association rules
extraction
Associations rules and
frequent item-sets
Transactions
Ontology manager
T-Box
extraction
A-Box
extraction
Transactions
extraction
Concepts-based
filtering
Instances-based
filtering
Table T-Box Table A-Box
Algorithm
choice
Ontology
User
filtering
1 2
32. 32
T-Box layer
ID Item-sets
Patient <p1, disease>, <p2, drug>, <p3, cardiologist>,<p4,
gynecologist>, <p5, person>,
disease <p6-,symptom>,<p1-,patient>,<p7,drug>
doctor <p8-, cardiologist>, <p9-, gynecologist>, <p10, person>
symptom <p6, disease>
drug <p2-, patient>, <p7-, disease>
cardiologist <p8, doctor>, <p3-, patient>
gynecologist <p9, doctor>, <p4-, patient>
Person <p5-, patient>, <p10-, doctor>
1
patient
drug
doctor
disease
sympto
m
person
gynecol
ogist
cardiol
ogist
p
3
p
9
p
1
p
2
p
5
p
4
p
7
p
6 p
8
p
1
0
33. 33
A A-Box layer
2
ID Item-sets
Pat 10 <p1, disease 12>, <p2, drug 23 >, <p3, cardiolo x>,<p5,
person>
Pat 12 <p1, disease 12>, <p2, drug 24 >, <p3, cardiolo x>, <p5,
person>
doct 23 <p8-, cardiolo>, <p10, person>
symptom 45 <p6, disease 12>
patient
drug
doctor
disease
sympto
m
person
gynecol
ogist
cardiol
ogist
p
3
p
9
p
1
p
2
p
5
p
4
p
7
p
6 p
8
p
1
0
34. 34
Frequent item-sets extraction using HADOOP
Files of the
ontology
Resulted files
MAP
Identify all possible k-item-sets
REDUCE
Calculate the support of all k-item-
sets
Context
HADOOP
Using HADOOP to extract frequent item-sets
35. 35
Ontological approach : steps of association rules extraction
F-PTREE
Generate frequent item-sets
Set of frequent item-sets
Generating association rules using multi-
threading process
Set of association rules
Sub-set of rules
Support threshold
Apriori
Hadoop
44. 44
• Association rules mining suffer two main issues:
▫ Data complexity processing.
▫ Association rules quality.
• Semantic web can be exploited successfully to improve the quality of
association rules.
• In this project:
• We have extracted a transactional dataset.
• We have applied different frequent item-sets extraction techniques.
• We implemented a visual application to mine ontologies for association
rules.
Conclusion
46. 46
• [Claudia Marinica. 2010] Association Rule Interactive Post-processing using Rule Schemas and Ontologies -
ARIPSO.
• [fayyad et al.,1996]: Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. From data mining to
knowledge discovery in databases. AI Magazine, 17:37 – 54, 1996.
• [gandon,2006]: Fabien Gandon. Ontologies informatiques, May 2006.
• [gruber,1993]: Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In
Nicola Guarino and Roberto Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge
Representation. Kluwer AcademicPublishers, 1993.
• [berners-lee et al.,2001]: Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web - a new form of
web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American,
2001.
• [bayardo et al.1999]: Roberto J. Bayardo Jr., Rakesh Agrawal, and Dimitrios Gunopulos. Constraintbased rule
mining in large, dense databases. ICDE ’99: Proceedings of the 15th International Conference on Data
Engineering, pages 188–197, 1999
• [liu et al.,1999]: Bing Liu, Wynne Hsu, and Yiming Ma. Pruning and summarizing the discovered associations. In
KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data
mining, pages 125–134.ACM, 1999.
REFERENCES
47. 47
• [Srikant et agrwal,1996]: Ramakrishnan Srikant and Rakesh Agrawal. Mining quantitative association rules in
large relational tables. In Proceedings of the 1996 ACM SIGMOD international conference on Management of
data, pages 1–12, 1996.
• [Silberschats et Tuzhilin,1995] : Abraham Silberschatz and Alexander Tuzhilin. On subjective measures of
interestingness in knowledge discovery. Knowledge Discovery and Data Mining (KDD), pages 275–281, 1995.
• [Piatetsky-shapiro,1991]: G. Piatetsky-Shapiro. Knowledge Discovery in Databases, chapter Discovery, Analysis,
and Presentation of Strong Rules, page 229248. AAAI/MIT Press, 1991.
• [Guillet and Hamilton,2007]: F. Guillet and H. Hamilton. Quality Measures in Data Mining. Studies in
Computational Intelligence, 2007
• [klementtinen et al., 1994]: Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, and A. Inkeri
Verkamo. Finding interesting rules from large sets of discovered association rules. International Conference on
Information and Knowledge Management (CIKM), pages 401–407, 1994
• [Burdick, 2005]: Doug Burdick, Manuel Calimlim, Jason Flannick, Johannes Gehrke, and Tomi Yiu. Mafia: A
maximal frequent itemset algorithm. IEEE Transactions on Knowledge and Data Engineering, 17(11):1490–1504,
2005
REFERENCES
48. 48
• [J. Zaki et al, 2002]: Mohammed J. Zaki and Ching J. Hsiao. Charm: An efficient algorithm for
• closed itemset mining. In Proceedings of SIAM’02, 2002.
• [R. Agrawal et al, 1994]: Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules.
Procedings of 20th International Conference Very Large Data Bases, VLDB, pages 487–499, 1994.
• [J. Han et al, 2000]: Jiawei Han and Jian Pei. Mining frequent patterns by pattern-growth: methodology and
implications. ACM SIGKDD Explorations Newsletter, Special issue on Scalable data mining algorithms,
2000(2):14–20, 2.
• [Hadoop]: Apache Software Foundation. (2010). Hadoop. Retrieved from https://hadoop.apache.org
References