SlideShare a Scribd company logo
1 of 11
Download to read offline
Manuel Martín Salvador 
@draxus 
msalvador@bournemouth.ac.uk 
OpenML workshop 
Eindhoven 21/10/2014
Background 
● MSc. Computer Engineering 
● Master in Soft Computing and Intelligent Systems 
Currently 
● PhD Student – Automatic and adaptive pre-processing for building 
predictive models 
● Teaching – Data Mining lab
Data preparation and pre-processing
Data preparation and pre-processing
Data preparation and pre-processing 
Labour 
intensive 
tasks 
(up to 80% of 
a data mining 
process)
Automating pre-processing 
A lot of available techniques 
No free lunch 
Multiple combinations 
Order of pre-processing methods matters 
No semantic → some approaches use ontologies 
Meta-learning → needs a good database of 
experiments
Scientific workflow platforms and 
repositories with experiments 
Software Repository Applications 
DiscoveryNet (inactive) - 
Kepler - Various 
Taverna MyExperiment (open) Bioinformatics 
Pegasus - Various 
Galaxy - Biomedical 
Pipeline Pilot Accelrys (commercial) 
* MLComp (“open”) Machine Learning 
Weka,MOA,R,RapidMiner OpenML (open) Machine Learning
OpenML statistics 
Datasets: 1042 
Tasks: 3025 
Flows: 640 
Runs: 31540 
Valid: 24410 
With errors: 7130 
Datasets: 300 
Individual components: 136 
Paired components: 635 
“Flow size”: 1 – 8198 
2 – 12178 
3 – 1993 
4 – 1533 
5 – 502 
6 – 6 
4500 
4000 
3500 
3000 
2500 
2000 
1500 
1000 
500 
0 
Distribution of components 
1600 
1400 
1200 
1000 
800 
600 
400 
200 
0 
Distribution of datasets 
Only 3 Weka filters: 
Principal Components, Discretize, PLSFilter
TO DO 
How to increase the number of pre-processing methods in OpenML? 
- The only way right now is using FilteredClassifier in Weka 
- What about R, MOA, RapidMiner? 
Improving flow representation 
- Right now is difficult to see how components are connected 
- Clear distinction of parameters 
- What about including Weka flows (XML based) and ADAMS flows? 
- PMML support? 
Statistics for available data, tasks, flows and runs 
Flow recommendation system for a given dataset 
[dataset, data characteristics, prediction accuracy, flow_id] 
Flow validation before executing it 
[dataset, data characteristics, flow characteristics, failure]
A little bit further 
Adapting flows while processing data streams 
- Detecting changes in data characteristics 
- Locally checking input/output in each flow component 
- Change propagation 
- Reducing cost of adaptation
Photos CC by Cristina Granados 
Visit us! 
Data Science Institute @ Bournemouth University

More Related Content

What's hot

Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningPruet Boonma
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and PythonShintaro Fukushima
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKoundinya Desiraju
 
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databasesIEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databasesIEEEFINALYEARSTUDENTPROJECTS
 
Disease prediction using machine learning
Disease prediction using machine learningDisease prediction using machine learning
Disease prediction using machine learningJinishaKG
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataBenjamin Bengfort
 
Data repository for sensor network a data mining approach
Data repository for sensor network  a data mining approachData repository for sensor network  a data mining approach
Data repository for sensor network a data mining approachijdms
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and BlockchainKan Yuenyong
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
 
Computer Assisted Data Analysis (Hands-on Practice)
Computer Assisted Data Analysis (Hands-on Practice)Computer Assisted Data Analysis (Hands-on Practice)
Computer Assisted Data Analysis (Hands-on Practice)Dr. Amjad Ali Arain
 
Outlier detection for high dimensional data
Outlier detection for high dimensional dataOutlier detection for high dimensional data
Outlier detection for high dimensional dataParag Tamhane
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.pptbutest
 

What's hot (20)

Journals analysis ppt
Journals analysis pptJournals analysis ppt
Journals analysis ppt
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and Python
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine learning
Machine learning Machine learning
Machine learning
 
resume_MH
resume_MHresume_MH
resume_MH
 
My
MyMy
My
 
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databasesIEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
 
Disease prediction using machine learning
Disease prediction using machine learningDisease prediction using machine learning
Disease prediction using machine learning
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
 
Data repository for sensor network a data mining approach
Data repository for sensor network  a data mining approachData repository for sensor network  a data mining approach
Data repository for sensor network a data mining approach
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach
 
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
 
Computer Assisted Data Analysis (Hands-on Practice)
Computer Assisted Data Analysis (Hands-on Practice)Computer Assisted Data Analysis (Hands-on Practice)
Computer Assisted Data Analysis (Hands-on Practice)
 
Outlier detection for high dimensional data
Outlier detection for high dimensional dataOutlier detection for high dimensional data
Outlier detection for high dimensional data
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
 

Viewers also liked

Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Manuel Martín
 
Introducción a GNU/Linux
Introducción a GNU/LinuxIntroducción a GNU/Linux
Introducción a GNU/LinuxManuel Martín
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datosManuel Martín
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datosManuel Martín
 
2012-07-03 Fisser Voogt Bom IFIP Taaltreffers
2012-07-03 Fisser Voogt Bom IFIP Taaltreffers2012-07-03 Fisser Voogt Bom IFIP Taaltreffers
2012-07-03 Fisser Voogt Bom IFIP TaaltreffersPetra Fisser
 
Operaciones Colectivas en MPI
Operaciones Colectivas en MPIOperaciones Colectivas en MPI
Operaciones Colectivas en MPIManuel Martín
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisManuel Martín
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
 

Viewers also liked (9)

Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011
 
Introducción a GNU/Linux
Introducción a GNU/LinuxIntroducción a GNU/Linux
Introducción a GNU/Linux
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datos
 
Decompiladores
DecompiladoresDecompiladores
Decompiladores
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datos
 
2012-07-03 Fisser Voogt Bom IFIP Taaltreffers
2012-07-03 Fisser Voogt Bom IFIP Taaltreffers2012-07-03 Fisser Voogt Bom IFIP Taaltreffers
2012-07-03 Fisser Voogt Bom IFIP Taaltreffers
 
Operaciones Colectivas en MPI
Operaciones Colectivas en MPIOperaciones Colectivas en MPI
Operaciones Colectivas en MPI
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 

Similar to Quick presentation for the OpenML workshop in Eindhoven 2014

Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
AzureML TechTalk
AzureML TechTalkAzureML TechTalk
AzureML TechTalkUdaya Kumar
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)Arnab Biswas
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scalaAndy Petrella
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run GraphVaticle
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013MLconf
 

Similar to Quick presentation for the OpenML workshop in Eindhoven 2014 (20)

Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
AzureML TechTalk
AzureML TechTalkAzureML TechTalk
AzureML TechTalk
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
An Analytics Platform for Connected Vehicles
An Analytics Platform for Connected VehiclesAn Analytics Platform for Connected Vehicles
An Analytics Platform for Connected Vehicles
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
 

More from Manuel Martín

Automatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datosAutomatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datosManuel Martín
 
Modelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri NetsModelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri NetsManuel Martín
 
Brand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspectiveBrand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspectiveManuel Martín
 
Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...Manuel Martín
 
Improving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devicesImproving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devicesManuel Martín
 
Towards Automatic Composition of Multicomponent Predictive Systems
Towards Automatic Composition of Multicomponent Predictive SystemsTowards Automatic Composition of Multicomponent Predictive Systems
Towards Automatic Composition of Multicomponent Predictive SystemsManuel Martín
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyManuel Martín
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream miningManuel Martín
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaManuel Martín
 
Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Manuel Martín
 
Presentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en GranadaPresentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en GranadaManuel Martín
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaManuel Martín
 
Pintando gráficas con Python
Pintando gráficas con PythonPintando gráficas con Python
Pintando gráficas con PythonManuel Martín
 
Charla de Introducción a Git
Charla de Introducción a GitCharla de Introducción a Git
Charla de Introducción a GitManuel Martín
 
Taller de introducción a Google App Engine
Taller de introducción a Google App EngineTaller de introducción a Google App Engine
Taller de introducción a Google App EngineManuel Martín
 
Presentación AndalucíaPeople en las BYMC6
Presentación AndalucíaPeople en las BYMC6Presentación AndalucíaPeople en las BYMC6
Presentación AndalucíaPeople en las BYMC6Manuel Martín
 
Taller de introducción al sistema operativo GNU/Linux
Taller de introducción al sistema operativo GNU/LinuxTaller de introducción al sistema operativo GNU/Linux
Taller de introducción al sistema operativo GNU/LinuxManuel Martín
 

More from Manuel Martín (19)

Hogar (Des)Conectado
Hogar (Des)ConectadoHogar (Des)Conectado
Hogar (Des)Conectado
 
Automatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datosAutomatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datos
 
Modelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri NetsModelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri Nets
 
Brand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspectiveBrand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspective
 
Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...
 
Improving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devicesImproving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devices
 
Towards Automatic Composition of Multicomponent Predictive Systems
Towards Automatic Composition of Multicomponent Predictive SystemsTowards Automatic Composition of Multicomponent Predictive Systems
Towards Automatic Composition of Multicomponent Predictive Systems
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream mining
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
 
Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010
 
Presentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en GranadaPresentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en Granada
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
 
Pintando gráficas con Python
Pintando gráficas con PythonPintando gráficas con Python
Pintando gráficas con Python
 
Charla de Introducción a Git
Charla de Introducción a GitCharla de Introducción a Git
Charla de Introducción a Git
 
Taller de introducción a Google App Engine
Taller de introducción a Google App EngineTaller de introducción a Google App Engine
Taller de introducción a Google App Engine
 
Presentación AndalucíaPeople en las BYMC6
Presentación AndalucíaPeople en las BYMC6Presentación AndalucíaPeople en las BYMC6
Presentación AndalucíaPeople en las BYMC6
 
Web 2.0 y Empresa 2.0
Web 2.0 y Empresa 2.0Web 2.0 y Empresa 2.0
Web 2.0 y Empresa 2.0
 
Taller de introducción al sistema operativo GNU/Linux
Taller de introducción al sistema operativo GNU/LinuxTaller de introducción al sistema operativo GNU/Linux
Taller de introducción al sistema operativo GNU/Linux
 

Recently uploaded

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 

Quick presentation for the OpenML workshop in Eindhoven 2014

  • 1. Manuel Martín Salvador @draxus msalvador@bournemouth.ac.uk OpenML workshop Eindhoven 21/10/2014
  • 2. Background ● MSc. Computer Engineering ● Master in Soft Computing and Intelligent Systems Currently ● PhD Student – Automatic and adaptive pre-processing for building predictive models ● Teaching – Data Mining lab
  • 3. Data preparation and pre-processing
  • 4. Data preparation and pre-processing
  • 5. Data preparation and pre-processing Labour intensive tasks (up to 80% of a data mining process)
  • 6. Automating pre-processing A lot of available techniques No free lunch Multiple combinations Order of pre-processing methods matters No semantic → some approaches use ontologies Meta-learning → needs a good database of experiments
  • 7. Scientific workflow platforms and repositories with experiments Software Repository Applications DiscoveryNet (inactive) - Kepler - Various Taverna MyExperiment (open) Bioinformatics Pegasus - Various Galaxy - Biomedical Pipeline Pilot Accelrys (commercial) * MLComp (“open”) Machine Learning Weka,MOA,R,RapidMiner OpenML (open) Machine Learning
  • 8. OpenML statistics Datasets: 1042 Tasks: 3025 Flows: 640 Runs: 31540 Valid: 24410 With errors: 7130 Datasets: 300 Individual components: 136 Paired components: 635 “Flow size”: 1 – 8198 2 – 12178 3 – 1993 4 – 1533 5 – 502 6 – 6 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Distribution of components 1600 1400 1200 1000 800 600 400 200 0 Distribution of datasets Only 3 Weka filters: Principal Components, Discretize, PLSFilter
  • 9. TO DO How to increase the number of pre-processing methods in OpenML? - The only way right now is using FilteredClassifier in Weka - What about R, MOA, RapidMiner? Improving flow representation - Right now is difficult to see how components are connected - Clear distinction of parameters - What about including Weka flows (XML based) and ADAMS flows? - PMML support? Statistics for available data, tasks, flows and runs Flow recommendation system for a given dataset [dataset, data characteristics, prediction accuracy, flow_id] Flow validation before executing it [dataset, data characteristics, flow characteristics, failure]
  • 10. A little bit further Adapting flows while processing data streams - Detecting changes in data characteristics - Locally checking input/output in each flow component - Change propagation - Reducing cost of adaptation
  • 11. Photos CC by Cristina Granados Visit us! Data Science Institute @ Bournemouth University