SlideShare a Scribd company logo
Artificial Intelligence
for Automating Data Analysis
Manuel Martín Salvador
Smart Technology Research Centre

27th November 2013
Outline
1. Data and KDD Process
2. Support for Analysts
3. Prior Knowledge
4. Types of IDAs
5. Future Directions
6. References
Presentation based on the paper by Serban et al. “A survey of intelligent assistants for data analysis” 2013
http://dx.doi.org/10.1145/2480741.2480748
Data

Many domains: biology, geography,
telecommunications, sales, process industry...
Structured and non-structured
Single source and multiple sources
Imperfect data: missing values, outliers...
Data

Many domains: biology, geography,
telecommunications, sales, process industry...
Structured and non-structured
Single source and multiple sources
Imperfect data: missing values, outliers...
Data

Many domains: biology, geography,
telecommunications, sales, process industry...
Structured and non-structured
Single source and multiple sources
Imperfect data: missing values, outliers...
Data

Many domains: biology, geography,
telecommunications, sales, process industry...
Structured and non-structured
Single source and multiple sources
Imperfect data: missing values, outliers...
KDD process
0. Goal?
KDD process
0. Goal?

Raw Data
1. Selection

Target Data
KDD process
0. Goal?

Raw Data
1. Selection

Target Data
2. Preprocessing

Preprocessed Data
KDD process
0. Goal?

Raw Data
1. Selection

Target Data
2. Preprocessing

Preprocessed Data
3. Transformation

Transformed Data
KDD process
0. Goal?

Raw Data
1. Selection

Target Data
2. Preprocessing

Preprocessed Data
3. Transformation

Transformed Data
4. Data Mining

Patterns
KDD process
0. Goal?

Raw Data
1. Selection

Target Data
2. Preprocessing

Preprocessed Data
3. Transformation

Transformed Data
4. Data Mining

Patterns
5. Interpretation /
Evaluation

Knowledge
KDD process
0. Goal?

Raw Data
1. Selection

Target Data

Refining

2. Preprocessing

Preprocessed Data
3. Transformation

Transformed Data
4. Data Mining

Patterns
5. Interpretation /
Evaluation

Knowledge
Starting a KDD process
Problems: Lack of guidance
Increasing number of techniques
Large volumes of data

Novice Analysts
Overwhelmed
Trial and error

Advanced Analysts
Comfort area
No further exploration
Supporting analysts
Single step of KDD process: Hints and advice for data selection;
support in choosing a suitable algorithm and parameters.
Multiple steps of KDD process: Help regarding the sequence of
operators and their parameters.
Graphical Design of KDD workflows: GUIs for interactively building
the process manually.
Automatic KDD workflow generation: Based on the data and
description of their task, the users receive a set of possible scenarios
for solving a problem.
Explanations: The rationale behind a decision or a result allows the
user to reason about the aid provided.
Supporting analysts
Single step of KDD process: Hints and advice for data selection;
support in choosing a suitable algorithm and parameters.
Multiple steps of KDD process: Help regarding the sequence of
operators and their parameters.
Graphical Design of KDD workflows: GUIs for interactively building
the process manually.
Automatic KDD workflow generation: Based on the data and
description of their task, the users receive a set of possible scenarios
for solving a problem.
Explanations: The rationale behind a decision or a result allows the
user to reason about the aid provided.
Supporting analysts
Single step of KDD process: Hints and advice for data selection;
support in choosing a suitable algorithm and parameters.
Multiple steps of KDD process: Help regarding the sequence of
operators and their parameters.
Graphical Design of KDD workflows: GUIs for interactively building
the process manually.
Automatic KDD workflow generation: Based on the data and
description of their task, the users receive a set of possible scenarios
for solving a problem.
Explanations: The rationale behind a decision or a result allows the
user to reason about the aid provided.
Supporting analysts
Single step of KDD process: Hints and advice for data selection;
support in choosing a suitable algorithm and parameters.
Multiple steps of KDD process: Help regarding the sequence of
operators and their parameters.
Graphical Design of KDD workflows: GUIs for interactively building
the process manually.
Automatic KDD workflow generation: Based on the data and
description of their task, the users receive a set of possible scenarios
for solving a problem.
Explanations: The rationale behind a decision or a result allows the
user to reason about the aid provided.
Supporting analysts
Single step of KDD process: Hints and advice for data selection;
support in choosing a suitable algorithm and parameters.
Multiple steps of KDD process: Help regarding the sequence of
operators and their parameters.
Graphical Design of KDD workflows: GUIs for interactively building
the process manually.
Automatic KDD workflow generation: Based on the data and
description of their task, the users receive a set of possible scenarios
for solving a problem.
Explanations: The rationale behind a decision or a result allows the
user to reason about the aid provided.
Prior knowledge

Meta-data of the input dataset:
Data properties such as number of attributes, amount of
missing values, or information-theoretic measures.
Meta-data of operators:
External (inputs, outputs, preconditions and effects) and
Internal (structure and performance).
Case base: Set of successful prior data analysis workflows.
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.

Q&A

User

Expert System

Ranking of
useful
techniques

Rules

Experts
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
REX [Gale 1986]: linear regression.
SPRINGEX [Raes 1992]: multivariate and non-parametric statistics.
Statistical Navigator [Raes 1992]: multivariate casual analysis and
classification.
KENS [Hand 1987], NONPAREIL [Hand 1990] and LMG [Hand 1990]: manual
exploration of rules.
Consultant-2 [Craw et al. 1992]: first IDA for machine learning algorithms.
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

Training

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
Evaluations of
algorithms

Prediction

Meta-data of
datasets

Meta-database

Meta-learner

Model

New dataset
User preferences

Meta-Learning System

Advise/Ranking
of algorithms
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
StatLog [Michie et al. 1994]: A decision tree model is built for each algorithm
predicting whether or not it is applicable on a new dataset.
The Data Mining Advisor [Giraud-Carrier 2005]: A k-NN algorithm is trained to
predict algorithm performance on a new dataset.
NOEMON [Kalousis et al. 2001]: Pairwise models are built and stored in a
knowledge base. Scores based on wins/ties/losses are obtained for each
algorithm in order to create a ranking.
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in
similar cases.
Operators

Experts

Case base

Case-based
reasoner

Workflow editor

User

Workflow

Meta-data
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in
similar cases.
CITRUS [Engels 1996]: A case base of operators and workflows was created by
experts. Most similar case is returned based on user needs and data statistics.
MiningMart [Morik et al. 2004]: A case base of workflows in a XML-based
language is available online. Cases are described in an ontology. It offers a
three-tier graphical editor: case, concept and relation editors.
The Hybrid Data Mining Assistant [Charest et al. 2008]: Combines CBR with the
experts rules of expert systems. Apart from meta-features, the case base
includes user satisfaction ratings which are used for case ranking.
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in
similar cases.
4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid
data analysis workflows.
Experts

Ontology

Dataset
User

Planner

Plans

Ranker

Ranking
of plans
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in
similar cases.
4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid
data analysis workflows.
AIDE [Amant et al. 1998]: Multi-level planning based on hierarchical task
network planning. A plan library contains subproblems and primitive operators.
IDEA [Bernstein et al. 2005]: Meta-data is encoded in an ontology. Valid plans
are ranked by user preferences.
NExT [Bernstein et al. 2007]: CBR-extension of IDEA approach. Firstly, it
retrieves the most suitable cases and then uses the planner for filling gaps.
1/2
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in
similar cases.
4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid
data analysis workflows.
KDDVM [Diamantini et al. 2009]: A directed graph of operators is iteratively built
using a custom algorithm. The operators are chosen from an ontology.
RDM [Zakova et al. 2010]: A two-planner system that uses an ontology formed
of knowledge (datasets, constraints...), algorithms and KDD tasks.
eLico-IDA [Kietz et al. 2009]: An ontology with operators and their effects is
queried for creating tasks that are sent to the HTN planner. A second ontology is
2/2
used to rank the resulting plans.
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in
similar cases.
4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid
data analysis workflows.
5. Workflow Composition Environments: Facilitate manual workflow creation and testing.

Dataset
Operators
User

Workflow editor

Workflow Composition Environment

Workflow
Types of IDAs
Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process.

1. Expert Systems: Apply rules defined by human experts to suggest useful techniques.
2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs.
3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in
similar cases.
4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid
data analysis workflows.
5. Workflow Composition Environments: Facilitate manual workflow creation and testing.
Canvas-Based Tools: IBM SPSS Modeler, SAS Enterprise Miner, Weka,
RapidMiner or Knime.
Scripting-Based Tools: MATLAB, R or Python.
Future directions

Cold start problem: A new dataset is not similar to any of the previous cases.
Adaptivity: Current IDAs are not able to adapt the workflows in the presence of
new data.
Predictive models: To predict the effects of the operators given the input data.
Reduce expert dependency: Self-maintenance of case bases.
Combination of approaches: CBR + expert rules, CBR + planning...
Scalability: To deal with large repositories of operators and case bases.
Future directions

Cold start problem: A new dataset is not similar to any of the previous cases.
Adaptivity: Current IDAs are not able to adapt the workflows in the presence of
new data.
Predictive models: To predict the effects of the operators given the input data.
Reduce expert dependency: Self-maintenance of case bases.
Combination of approaches: CBR + expert rules, CBR + planning...
Scalability: To deal with large repositories of operators and case bases.
Future directions

Cold start problem: A new dataset is not similar to any of the previous cases.
Adaptivity: Current IDAs are not able to adapt the workflows in the presence of
new data.
Predictive models: To predict the effects of the operators given the input data.
Reduce expert dependency: Self-maintenance of case bases.
Combination of approaches: CBR + expert rules, CBR + planning...
Scalability: To deal with large repositories of operators and case bases.
Future directions

Cold start problem: A new dataset is not similar to any of the previous cases.
Adaptivity: Current IDAs are not able to adapt the workflows in the presence of
new data.
Predictive models: To predict the effects of the operators given the input data.
Reduce expert dependency: Self-maintenance of case bases.
Combination of approaches: CBR + expert rules, CBR + planning...
Scalability: To deal with large repositories of operators and case bases.
Future directions

Cold start problem: A new dataset is not similar to any of the previous cases.
Adaptivity: Current IDAs are not able to adapt the workflows in the presence of
new data.
Predictive models: To predict the effects of the operators given the input data.
Reduce expert dependency: Self-maintenance of case bases.
Combination of approaches: CBR + expert rules, CBR + planning...
Scalability: To deal with large repositories of operators and case bases.
Future directions

Cold start problem: A new dataset is not similar to any of the previous cases.
Adaptivity: Current IDAs are not able to adapt the workflows in the presence of
new data.
Predictive models: To predict the effects of the operators given the input data.
Reduce expert dependency: Self-maintenance of case bases.
Combination of approaches: CBR + expert rules, CBR + planning...
Scalability: To deal with large repositories of operators and case bases.
Beware of automatic things!

Click here to see
Thanks
You can get these slides in
http://slideshare.net/draxus
msalvador@bournemouth.ac.uk
References
AMANT, R. AND COHEN, P. 1998. Interaction with a mixed-initiative system for exploratory data analysis. Knowl. Based Syst. 10, 5, 265–273.
BERNSTEIN, A. AND DAENZER, M. 2007. The NExT system: Towards true dynamic adaptations of semantic web service compositions. In The Semantic Web:
Research and Applications, Lecture Notes in Computer Science, vol. 4519, Springer, 739–748.
BERNSTEIN, A., PROVOST, F., AND HILL, S. 2005. Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive
classification. IEEE Trans. Knowl. Data Eng. 17, 4, 503–518.
CHAREST, M.,DELISLE, S.,CERVANTES, O., AND SHEN, Y. 2008. Bridging the gap between data mining and decision support: A case-based reasoning and
ontology approach. Intell. Data Anal. 12, 1–26.
CRAW, S., SLEEMAN, D., GRANER, N., AND RISSAKIS, M. 1992. Consultant: Providing advice for the machine learning toolbox. In Proceedings of the Annual
Technical Conference on Expert Systems (ES). 5–23.
DIAMANTINI, C., POTENA, D., AND STORTI, E. 2009b. Ontology-driven KDD process composition. In Advances in Intelligent Data Analysis VIII, Lecture Notes in
Computer Science, vol. 5772, Springer, 285–296.
ENGELS, R. 1996. Planning tasks for knowledge discovery in databases: Performing task-oriented userguidance. In Proceedings of the ACM SIGKDD
International Conference on Knowledge Discovery and Data mining (KDD). 170–175.
GALE,W. 1986. Rex review. In Artificial Intelligence and Statistics. Addison-Wesley Longman Publishing Co.,Inc., Boston, MA. 173–227.
GIRAUD-CARRIER, C. 2005. The data mining advisor: Meta-learning at the service of practitioners. In Proceedings of the International Conference on Machine
Learning and Applications (ICMLA). 113–119.
HAND, D. 1987. A statistical knowledge enhancement system. J. Royal Stat. Soc. Series A (General) 150, 4, 334–345.
HAND, D. 1990. Practical experience in developing statistical knowledge enhancement systems. Ann. Math. Artif. Intell. 2, 1, 197–208.
KALOUSIS, A. AND HILARIO, M. 2001. Model selection via meta-learning: A comparative study. Int. J. Artif. Intell. Tools 10, 4, 525–554.
KIETZ, J., SERBAN, F., BERNSTEIN, A., AND FISCHER, S. 2009. Towards cooperative planning of data mining workflows. In Proceedings of the ECML-PKDD
Workshop on Service-Oriented Knowledge Discovery. 1–12.
MICHIE, D., SPIEGELHALTER, D., AND TAYLOR, C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood, Upper Saddle River, NJ.
MORIK, K. AND SCHOLZ, M. 2004. The MiningMart approach to knowledge discovery in databases. In Intelligent Technologies for Information Analysis, N.
Zhong, and J. Liu, Eds., Springer, 47–65.
RAES, J. 1992. Inside two commercially available statistical expert systems. Stat. Comput. 2, 2, 55–62.
ZAKOVA, M., KREMEN, P., ZELEZNY, F., AND LAVRAC, N. 2010. Automating knowledge discovery workflow composition through ontology-based planning. IEEE
Tran. Autom. Sci. Eng. 8, 2, 253–264
Acknowledgements
Satellite: http://commons.wikimedia.org/wiki/File:GPS_Satellite_NASA_art-iif.jpg
Industry: http://commons.wikimedia.org/wiki/File:Industry_Texas.jpg
DNA: http://commons.wikimedia.org/wiki/File:DNA_Double_Helix.png
Table: http://www.iconarchive.com/show/ravenna-3d-icons-by-double-j-design/Database-Table-icon.html
Car: http://en.wikipedia.org/wiki/File:Jurvetson_Google_driverless_car_trimmed.jpg
Twitter: http://www.flickr.com/photos/recampaign/5623528621/
Multiple sources: http://www.flickr.com/photos/inl/7895742584/
Thermometer: http://commons.wikimedia.org/wiki/File:Digital_thermometer.jpg
Traffic Control: http://commons.wikimedia.org/wiki/File:Air_Traffic_Control,_Abraham_Lincoln_CVN-72.jpg
Question Mark: http://commons.wikimedia.org/wiki/File:Question_mark_road_sign,_Australia.jpg
Noise: http://www.flickr.com/photos/benleto/3223155821/
Outliers: http://commons.wikimedia.org/wiki/File:Diagrama_de_caixa_com_outliers_and_whisker.png
Bowling: http://en.wikipedia.org/wiki/File:Lawn_Bowling_-_Tim_Mason1.jpg
Baby: http://www.flickr.com/photos/107489497@N06/10671592736/
Library: http://commons.wikimedia.org/wiki/File:Interior_view_of_Stockholm_Public_Library.jpg
Back to the future car: http://lowrider-girl.deviantart.com/art/Back-To-The-Future-206312200
Coquette Icon Set: http://dryicons.com
Roboto font: http://developer.android.com/design/style/typography.html

More Related Content

What's hot

Data mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research TopicsData mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research Topics
Techsparks
 
Paper id 26201475
Paper id 26201475Paper id 26201475
Paper id 26201475
IJRAT
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?
Seval Çapraz
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Kimberley Mitchell
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
MohammadAsharAshraf
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
UmmeSalmaM1
 
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data ScienceCOVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
Vibhuti Mandral
 
GTU GeekDay Data Science and Applications
GTU GeekDay Data Science and ApplicationsGTU GeekDay Data Science and Applications
GTU GeekDay Data Science and Applications
Kürşat İNCE
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Edureka!
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
ShapeMySkills Pvt Ltd
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
André Karpištšenko
 
Machine learning
Machine learningMachine learning
Machine learning
Navdeep Asteya
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
Srinath Perera
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Ferdin Joe John Joseph PhD
 
data mining
data miningdata mining
data mining
Geet chopra
 
Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...
IQ Online Training
 
Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!
Turi, Inc.
 

What's hot (20)

Data mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research TopicsData mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research Topics
 
Paper id 26201475
Paper id 26201475Paper id 26201475
Paper id 26201475
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data ScienceCOVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
 
GTU GeekDay Data Science and Applications
GTU GeekDay Data Science and ApplicationsGTU GeekDay Data Science and Applications
GTU GeekDay Data Science and Applications
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
 
Machine learning
Machine learningMachine learning
Machine learning
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
data mining
data miningdata mining
data mining
 
Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...
 
Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!
 

Viewers also liked

Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
inscit2006
 
Expertise Networks
Expertise NetworksExpertise Networks
Expertise Networks
Joel Alleyne
 
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Universidad de los Llanos
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
Key Expert Systems Concepts
Key Expert Systems ConceptsKey Expert Systems Concepts
Key Expert Systems Concepts
Harmony Kwawu
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
Amritanshu Mehra
 

Viewers also liked (6)

Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
Expertise Networks
Expertise NetworksExpertise Networks
Expertise Networks
 
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Key Expert Systems Concepts
Key Expert Systems ConceptsKey Expert Systems Concepts
Key Expert Systems Concepts
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 

Similar to Artificial Intelligence for Automating Data Analysis

Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Hima Patel
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET Journal
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
IOSR Journals
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Ali Alkan
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET Journal
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
DeepaR42
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
ranjit banshpal
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 
data mining
data miningdata mining
data mining
manasa polu
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
IJMER
 
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
IJDKP
 
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
IJDKP
 
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
IJDKP
 

Similar to Artificial Intelligence for Automating Data Analysis (20)

Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
data mining
data miningdata mining
data mining
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
 
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
 
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
 
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 

More from Manuel Martín

Hogar (Des)Conectado
Hogar (Des)ConectadoHogar (Des)Conectado
Hogar (Des)Conectado
Manuel Martín
 
Automatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datosAutomatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datos
Manuel Martín
 
Modelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri NetsModelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri Nets
Manuel Martín
 
Brand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspectiveBrand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspective
Manuel Martín
 
Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...
Manuel Martín
 
Improving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devicesImproving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devices
Manuel Martín
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
Manuel Martín
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
Manuel Martín
 
Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014
Manuel Martín
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Manuel Martín
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream mining
Manuel Martín
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datos
Manuel Martín
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datos
Manuel Martín
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
Manuel Martín
 
Operaciones Colectivas en MPI
Operaciones Colectivas en MPIOperaciones Colectivas en MPI
Operaciones Colectivas en MPIManuel Martín
 
Introducción a GNU/Linux
Introducción a GNU/LinuxIntroducción a GNU/Linux
Introducción a GNU/LinuxManuel Martín
 
Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Manuel Martín
 
Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Manuel Martín
 
Presentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en GranadaPresentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en Granada
Manuel Martín
 

More from Manuel Martín (20)

Hogar (Des)Conectado
Hogar (Des)ConectadoHogar (Des)Conectado
Hogar (Des)Conectado
 
Automatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datosAutomatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datos
 
Modelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri NetsModelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri Nets
 
Brand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspectiveBrand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspective
 
Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...
 
Improving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devicesImproving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devices
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 
Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream mining
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datos
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datos
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
 
Decompiladores
DecompiladoresDecompiladores
Decompiladores
 
Operaciones Colectivas en MPI
Operaciones Colectivas en MPIOperaciones Colectivas en MPI
Operaciones Colectivas en MPI
 
Introducción a GNU/Linux
Introducción a GNU/LinuxIntroducción a GNU/Linux
Introducción a GNU/Linux
 
Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011
 
Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010
 
Presentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en GranadaPresentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en Granada
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Artificial Intelligence for Automating Data Analysis

  • 1. Artificial Intelligence for Automating Data Analysis Manuel Martín Salvador Smart Technology Research Centre 27th November 2013
  • 2. Outline 1. Data and KDD Process 2. Support for Analysts 3. Prior Knowledge 4. Types of IDAs 5. Future Directions 6. References Presentation based on the paper by Serban et al. “A survey of intelligent assistants for data analysis” 2013 http://dx.doi.org/10.1145/2480741.2480748
  • 3. Data Many domains: biology, geography, telecommunications, sales, process industry... Structured and non-structured Single source and multiple sources Imperfect data: missing values, outliers...
  • 4. Data Many domains: biology, geography, telecommunications, sales, process industry... Structured and non-structured Single source and multiple sources Imperfect data: missing values, outliers...
  • 5. Data Many domains: biology, geography, telecommunications, sales, process industry... Structured and non-structured Single source and multiple sources Imperfect data: missing values, outliers...
  • 6. Data Many domains: biology, geography, telecommunications, sales, process industry... Structured and non-structured Single source and multiple sources Imperfect data: missing values, outliers...
  • 8. KDD process 0. Goal? Raw Data 1. Selection Target Data
  • 9. KDD process 0. Goal? Raw Data 1. Selection Target Data 2. Preprocessing Preprocessed Data
  • 10. KDD process 0. Goal? Raw Data 1. Selection Target Data 2. Preprocessing Preprocessed Data 3. Transformation Transformed Data
  • 11. KDD process 0. Goal? Raw Data 1. Selection Target Data 2. Preprocessing Preprocessed Data 3. Transformation Transformed Data 4. Data Mining Patterns
  • 12. KDD process 0. Goal? Raw Data 1. Selection Target Data 2. Preprocessing Preprocessed Data 3. Transformation Transformed Data 4. Data Mining Patterns 5. Interpretation / Evaluation Knowledge
  • 13. KDD process 0. Goal? Raw Data 1. Selection Target Data Refining 2. Preprocessing Preprocessed Data 3. Transformation Transformed Data 4. Data Mining Patterns 5. Interpretation / Evaluation Knowledge
  • 14. Starting a KDD process Problems: Lack of guidance Increasing number of techniques Large volumes of data Novice Analysts Overwhelmed Trial and error Advanced Analysts Comfort area No further exploration
  • 15. Supporting analysts Single step of KDD process: Hints and advice for data selection; support in choosing a suitable algorithm and parameters. Multiple steps of KDD process: Help regarding the sequence of operators and their parameters. Graphical Design of KDD workflows: GUIs for interactively building the process manually. Automatic KDD workflow generation: Based on the data and description of their task, the users receive a set of possible scenarios for solving a problem. Explanations: The rationale behind a decision or a result allows the user to reason about the aid provided.
  • 16. Supporting analysts Single step of KDD process: Hints and advice for data selection; support in choosing a suitable algorithm and parameters. Multiple steps of KDD process: Help regarding the sequence of operators and their parameters. Graphical Design of KDD workflows: GUIs for interactively building the process manually. Automatic KDD workflow generation: Based on the data and description of their task, the users receive a set of possible scenarios for solving a problem. Explanations: The rationale behind a decision or a result allows the user to reason about the aid provided.
  • 17. Supporting analysts Single step of KDD process: Hints and advice for data selection; support in choosing a suitable algorithm and parameters. Multiple steps of KDD process: Help regarding the sequence of operators and their parameters. Graphical Design of KDD workflows: GUIs for interactively building the process manually. Automatic KDD workflow generation: Based on the data and description of their task, the users receive a set of possible scenarios for solving a problem. Explanations: The rationale behind a decision or a result allows the user to reason about the aid provided.
  • 18. Supporting analysts Single step of KDD process: Hints and advice for data selection; support in choosing a suitable algorithm and parameters. Multiple steps of KDD process: Help regarding the sequence of operators and their parameters. Graphical Design of KDD workflows: GUIs for interactively building the process manually. Automatic KDD workflow generation: Based on the data and description of their task, the users receive a set of possible scenarios for solving a problem. Explanations: The rationale behind a decision or a result allows the user to reason about the aid provided.
  • 19. Supporting analysts Single step of KDD process: Hints and advice for data selection; support in choosing a suitable algorithm and parameters. Multiple steps of KDD process: Help regarding the sequence of operators and their parameters. Graphical Design of KDD workflows: GUIs for interactively building the process manually. Automatic KDD workflow generation: Based on the data and description of their task, the users receive a set of possible scenarios for solving a problem. Explanations: The rationale behind a decision or a result allows the user to reason about the aid provided.
  • 20. Prior knowledge Meta-data of the input dataset: Data properties such as number of attributes, amount of missing values, or information-theoretic measures. Meta-data of operators: External (inputs, outputs, preconditions and effects) and Internal (structure and performance). Case base: Set of successful prior data analysis workflows.
  • 21. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. Q&A User Expert System Ranking of useful techniques Rules Experts
  • 22. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. REX [Gale 1986]: linear regression. SPRINGEX [Raes 1992]: multivariate and non-parametric statistics. Statistical Navigator [Raes 1992]: multivariate casual analysis and classification. KENS [Hand 1987], NONPAREIL [Hand 1990] and LMG [Hand 1990]: manual exploration of rules. Consultant-2 [Craw et al. 1992]: first IDA for machine learning algorithms.
  • 23. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. Training 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. Evaluations of algorithms Prediction Meta-data of datasets Meta-database Meta-learner Model New dataset User preferences Meta-Learning System Advise/Ranking of algorithms
  • 24. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. StatLog [Michie et al. 1994]: A decision tree model is built for each algorithm predicting whether or not it is applicable on a new dataset. The Data Mining Advisor [Giraud-Carrier 2005]: A k-NN algorithm is trained to predict algorithm performance on a new dataset. NOEMON [Kalousis et al. 2001]: Pairwise models are built and stored in a knowledge base. Scores based on wins/ties/losses are obtained for each algorithm in order to create a ranking.
  • 25. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. 3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in similar cases. Operators Experts Case base Case-based reasoner Workflow editor User Workflow Meta-data
  • 26. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. 3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in similar cases. CITRUS [Engels 1996]: A case base of operators and workflows was created by experts. Most similar case is returned based on user needs and data statistics. MiningMart [Morik et al. 2004]: A case base of workflows in a XML-based language is available online. Cases are described in an ontology. It offers a three-tier graphical editor: case, concept and relation editors. The Hybrid Data Mining Assistant [Charest et al. 2008]: Combines CBR with the experts rules of expert systems. Apart from meta-features, the case base includes user satisfaction ratings which are used for case ranking.
  • 27. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. 3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in similar cases. 4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid data analysis workflows. Experts Ontology Dataset User Planner Plans Ranker Ranking of plans
  • 28. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. 3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in similar cases. 4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid data analysis workflows. AIDE [Amant et al. 1998]: Multi-level planning based on hierarchical task network planning. A plan library contains subproblems and primitive operators. IDEA [Bernstein et al. 2005]: Meta-data is encoded in an ontology. Valid plans are ranked by user preferences. NExT [Bernstein et al. 2007]: CBR-extension of IDEA approach. Firstly, it retrieves the most suitable cases and then uses the planner for filling gaps. 1/2
  • 29. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. 3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in similar cases. 4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid data analysis workflows. KDDVM [Diamantini et al. 2009]: A directed graph of operators is iteratively built using a custom algorithm. The operators are chosen from an ontology. RDM [Zakova et al. 2010]: A two-planner system that uses an ontology formed of knowledge (datasets, constraints...), algorithms and KDD tasks. eLico-IDA [Kietz et al. 2009]: An ontology with operators and their effects is queried for creating tasks that are sent to the HTN planner. A second ontology is 2/2 used to rank the resulting plans.
  • 30. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. 3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in similar cases. 4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid data analysis workflows. 5. Workflow Composition Environments: Facilitate manual workflow creation and testing. Dataset Operators User Workflow editor Workflow Composition Environment Workflow
  • 31. Types of IDAs Intelligent Discovery Assistant (IDA): System that supports user in the data analysis process. 1. Expert Systems: Apply rules defined by human experts to suggest useful techniques. 2. Meta-Learning Systems: Automatically learn such rules from prior data analysis runs. 3. Case-Based Reasoning Systems: Find and adapt workflows that were successful in similar cases. 4. Planning-Based Data Analysis Systems: Use AI planners to generate and rank valid data analysis workflows. 5. Workflow Composition Environments: Facilitate manual workflow creation and testing. Canvas-Based Tools: IBM SPSS Modeler, SAS Enterprise Miner, Weka, RapidMiner or Knime. Scripting-Based Tools: MATLAB, R or Python.
  • 32. Future directions Cold start problem: A new dataset is not similar to any of the previous cases. Adaptivity: Current IDAs are not able to adapt the workflows in the presence of new data. Predictive models: To predict the effects of the operators given the input data. Reduce expert dependency: Self-maintenance of case bases. Combination of approaches: CBR + expert rules, CBR + planning... Scalability: To deal with large repositories of operators and case bases.
  • 33. Future directions Cold start problem: A new dataset is not similar to any of the previous cases. Adaptivity: Current IDAs are not able to adapt the workflows in the presence of new data. Predictive models: To predict the effects of the operators given the input data. Reduce expert dependency: Self-maintenance of case bases. Combination of approaches: CBR + expert rules, CBR + planning... Scalability: To deal with large repositories of operators and case bases.
  • 34. Future directions Cold start problem: A new dataset is not similar to any of the previous cases. Adaptivity: Current IDAs are not able to adapt the workflows in the presence of new data. Predictive models: To predict the effects of the operators given the input data. Reduce expert dependency: Self-maintenance of case bases. Combination of approaches: CBR + expert rules, CBR + planning... Scalability: To deal with large repositories of operators and case bases.
  • 35. Future directions Cold start problem: A new dataset is not similar to any of the previous cases. Adaptivity: Current IDAs are not able to adapt the workflows in the presence of new data. Predictive models: To predict the effects of the operators given the input data. Reduce expert dependency: Self-maintenance of case bases. Combination of approaches: CBR + expert rules, CBR + planning... Scalability: To deal with large repositories of operators and case bases.
  • 36. Future directions Cold start problem: A new dataset is not similar to any of the previous cases. Adaptivity: Current IDAs are not able to adapt the workflows in the presence of new data. Predictive models: To predict the effects of the operators given the input data. Reduce expert dependency: Self-maintenance of case bases. Combination of approaches: CBR + expert rules, CBR + planning... Scalability: To deal with large repositories of operators and case bases.
  • 37. Future directions Cold start problem: A new dataset is not similar to any of the previous cases. Adaptivity: Current IDAs are not able to adapt the workflows in the presence of new data. Predictive models: To predict the effects of the operators given the input data. Reduce expert dependency: Self-maintenance of case bases. Combination of approaches: CBR + expert rules, CBR + planning... Scalability: To deal with large repositories of operators and case bases.
  • 38. Beware of automatic things! Click here to see
  • 39. Thanks You can get these slides in http://slideshare.net/draxus msalvador@bournemouth.ac.uk
  • 40. References AMANT, R. AND COHEN, P. 1998. Interaction with a mixed-initiative system for exploratory data analysis. Knowl. Based Syst. 10, 5, 265–273. BERNSTEIN, A. AND DAENZER, M. 2007. The NExT system: Towards true dynamic adaptations of semantic web service compositions. In The Semantic Web: Research and Applications, Lecture Notes in Computer Science, vol. 4519, Springer, 739–748. BERNSTEIN, A., PROVOST, F., AND HILL, S. 2005. Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE Trans. Knowl. Data Eng. 17, 4, 503–518. CHAREST, M.,DELISLE, S.,CERVANTES, O., AND SHEN, Y. 2008. Bridging the gap between data mining and decision support: A case-based reasoning and ontology approach. Intell. Data Anal. 12, 1–26. CRAW, S., SLEEMAN, D., GRANER, N., AND RISSAKIS, M. 1992. Consultant: Providing advice for the machine learning toolbox. In Proceedings of the Annual Technical Conference on Expert Systems (ES). 5–23. DIAMANTINI, C., POTENA, D., AND STORTI, E. 2009b. Ontology-driven KDD process composition. In Advances in Intelligent Data Analysis VIII, Lecture Notes in Computer Science, vol. 5772, Springer, 285–296. ENGELS, R. 1996. Planning tasks for knowledge discovery in databases: Performing task-oriented userguidance. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 170–175. GALE,W. 1986. Rex review. In Artificial Intelligence and Statistics. Addison-Wesley Longman Publishing Co.,Inc., Boston, MA. 173–227. GIRAUD-CARRIER, C. 2005. The data mining advisor: Meta-learning at the service of practitioners. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA). 113–119. HAND, D. 1987. A statistical knowledge enhancement system. J. Royal Stat. Soc. Series A (General) 150, 4, 334–345. HAND, D. 1990. Practical experience in developing statistical knowledge enhancement systems. Ann. Math. Artif. Intell. 2, 1, 197–208. KALOUSIS, A. AND HILARIO, M. 2001. Model selection via meta-learning: A comparative study. Int. J. Artif. Intell. Tools 10, 4, 525–554. KIETZ, J., SERBAN, F., BERNSTEIN, A., AND FISCHER, S. 2009. Towards cooperative planning of data mining workflows. In Proceedings of the ECML-PKDD Workshop on Service-Oriented Knowledge Discovery. 1–12. MICHIE, D., SPIEGELHALTER, D., AND TAYLOR, C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood, Upper Saddle River, NJ. MORIK, K. AND SCHOLZ, M. 2004. The MiningMart approach to knowledge discovery in databases. In Intelligent Technologies for Information Analysis, N. Zhong, and J. Liu, Eds., Springer, 47–65. RAES, J. 1992. Inside two commercially available statistical expert systems. Stat. Comput. 2, 2, 55–62. ZAKOVA, M., KREMEN, P., ZELEZNY, F., AND LAVRAC, N. 2010. Automating knowledge discovery workflow composition through ontology-based planning. IEEE Tran. Autom. Sci. Eng. 8, 2, 253–264
  • 41. Acknowledgements Satellite: http://commons.wikimedia.org/wiki/File:GPS_Satellite_NASA_art-iif.jpg Industry: http://commons.wikimedia.org/wiki/File:Industry_Texas.jpg DNA: http://commons.wikimedia.org/wiki/File:DNA_Double_Helix.png Table: http://www.iconarchive.com/show/ravenna-3d-icons-by-double-j-design/Database-Table-icon.html Car: http://en.wikipedia.org/wiki/File:Jurvetson_Google_driverless_car_trimmed.jpg Twitter: http://www.flickr.com/photos/recampaign/5623528621/ Multiple sources: http://www.flickr.com/photos/inl/7895742584/ Thermometer: http://commons.wikimedia.org/wiki/File:Digital_thermometer.jpg Traffic Control: http://commons.wikimedia.org/wiki/File:Air_Traffic_Control,_Abraham_Lincoln_CVN-72.jpg Question Mark: http://commons.wikimedia.org/wiki/File:Question_mark_road_sign,_Australia.jpg Noise: http://www.flickr.com/photos/benleto/3223155821/ Outliers: http://commons.wikimedia.org/wiki/File:Diagrama_de_caixa_com_outliers_and_whisker.png Bowling: http://en.wikipedia.org/wiki/File:Lawn_Bowling_-_Tim_Mason1.jpg Baby: http://www.flickr.com/photos/107489497@N06/10671592736/ Library: http://commons.wikimedia.org/wiki/File:Interior_view_of_Stockholm_Public_Library.jpg Back to the future car: http://lowrider-girl.deviantart.com/art/Back-To-The-Future-206312200 Coquette Icon Set: http://dryicons.com Roboto font: http://developer.android.com/design/style/typography.html