Submit Search
Upload
Interactive and reproducible data analysis with the open-source KNIME Analytics Platform
•
3 likes
•
1,076 views
Greg Landrum
Follow
Paper CINF44 from the 2018 ACS Spring Meeting in New Orleans
Read less
Read more
Science
Report
Share
Report
Share
1 of 51
Download now
Download to read offline
Recommended
KNIME Software Overview
KNIME Software Overview
KNIMESlides
Text Processing with KNIME
Text Processing with KNIME
KNIMESlides
Knime
Knime
Respa Peter
Ai meetup 3_25_2018_penguin
Ai meetup 3_25_2018_penguin
Ganesan Narayanasamy
Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
InfluxData
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
KNIMESlides
ML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification Model
Databricks
Recommended
KNIME Software Overview
KNIME Software Overview
KNIMESlides
Text Processing with KNIME
Text Processing with KNIME
KNIMESlides
Knime
Knime
Respa Peter
Ai meetup 3_25_2018_penguin
Ai meetup 3_25_2018_penguin
Ganesan Narayanasamy
Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
InfluxData
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
KNIMESlides
ML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification Model
Databricks
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
Murtaza Doctor
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
DataWorks Summit
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
InfluxData
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
KNIMESlides
Production Grade Data Science for Hadoop
Production Grade Data Science for Hadoop
DataWorks Summit/Hadoop Summit
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
Jason Plurad
Graph Computing with JanusGraph
Graph Computing with JanusGraph
Jason Plurad
Knime customer intelligence on social media odsc london
Knime customer intelligence on social media odsc london
Jessica Willis
Ai platform at scale
Ai platform at scale
Henry Saputra
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
DataWorks Summit
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
Big Data at your Desk with KNIME
Big Data at your Desk with KNIME
DataWorks Summit/Hadoop Summit
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
Big Data Spain
NetApp By The Numbers
NetApp By The Numbers
NetApp Insight
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
Neo4j
NetApp Flash Storage Facts
NetApp Flash Storage Facts
NetApp Insight
StreamSet ETL tool
StreamSet ETL tool
SwapnilSHampi
Graph + AI World Opening Keynote
Graph + AI World Opening Keynote
TigerGraph
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
Jason Plurad
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
DataWorks Summit/Hadoop Summit
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
Greg Landrum
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
Greg Landrum
More Related Content
What's hot
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
Murtaza Doctor
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
DataWorks Summit
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
InfluxData
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
KNIMESlides
Production Grade Data Science for Hadoop
Production Grade Data Science for Hadoop
DataWorks Summit/Hadoop Summit
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
Jason Plurad
Graph Computing with JanusGraph
Graph Computing with JanusGraph
Jason Plurad
Knime customer intelligence on social media odsc london
Knime customer intelligence on social media odsc london
Jessica Willis
Ai platform at scale
Ai platform at scale
Henry Saputra
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
DataWorks Summit
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
Big Data at your Desk with KNIME
Big Data at your Desk with KNIME
DataWorks Summit/Hadoop Summit
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
Big Data Spain
NetApp By The Numbers
NetApp By The Numbers
NetApp Insight
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
Neo4j
NetApp Flash Storage Facts
NetApp Flash Storage Facts
NetApp Insight
StreamSet ETL tool
StreamSet ETL tool
SwapnilSHampi
Graph + AI World Opening Keynote
Graph + AI World Opening Keynote
TigerGraph
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
Jason Plurad
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
DataWorks Summit/Hadoop Summit
What's hot
(20)
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
Production Grade Data Science for Hadoop
Production Grade Data Science for Hadoop
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
Graph Computing with JanusGraph
Graph Computing with JanusGraph
Knime customer intelligence on social media odsc london
Knime customer intelligence on social media odsc london
Ai platform at scale
Ai platform at scale
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data at your Desk with KNIME
Big Data at your Desk with KNIME
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
NetApp By The Numbers
NetApp By The Numbers
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
NetApp Flash Storage Facts
NetApp Flash Storage Facts
StreamSet ETL tool
StreamSet ETL tool
Graph + AI World Opening Keynote
Graph + AI World Opening Keynote
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Similar to Interactive and reproducible data analysis with the open-source KNIME Analytics Platform
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
Greg Landrum
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
Greg Landrum
Your Flight is Boarding Now!
Your Flight is Boarding Now!
MeetupDataScienceRoma
Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning Platform
Brandon White
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME Software
KNIMESlides
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
Alok Singh
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIMESlides
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the Learning
Mesosphere Inc.
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
DataKitchen
ODSC data science to DataOps
ODSC data science to DataOps
Christopher Bergh
Webinar slides: How to Get Started with Open Source Database Management
Webinar slides: How to Get Started with Open Source Database Management
Severalnines
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
SnapLogic
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Codemotion
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Greg Landrum
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
Greg Landrum
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
DataWorks Summit/Hadoop Summit
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
David Walker
Knime & bioinformatics
Knime & bioinformatics
BioinformaticsInstitute
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
KNIMESlides
Similar to Interactive and reproducible data analysis with the open-source KNIME Analytics Platform
(20)
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
Your Flight is Boarding Now!
Your Flight is Boarding Now!
Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning Platform
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME Software
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the Learning
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
ODSC data science to DataOps
ODSC data science to DataOps
Webinar slides: How to Get Started with Open Source Database Management
Webinar slides: How to Get Started with Open Source Database Management
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Knime & bioinformatics
Knime & bioinformatics
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
More from Greg Landrum
Chemical registration
Chemical registration
Greg Landrum
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
Greg Landrum
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Greg Landrum
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
Greg Landrum
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
Greg Landrum
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
Greg Landrum
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
Greg Landrum
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
Greg Landrum
Machine learning in the life sciences with knime
Machine learning in the life sciences with knime
Greg Landrum
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
Greg Landrum
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
Greg Landrum
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
More from Greg Landrum
(15)
Chemical registration
Chemical registration
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
Machine learning in the life sciences with knime
Machine learning in the life sciences with knime
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
Recently uploaded
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
ssifa0344
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
aarthirajkumar25
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
Green chemistry and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
RajatChauhan518211
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Diwakar Mishra
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
pradhanghanshyam7136
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
PRINCE C P
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
muntazimhurra
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Sérgio Sacani
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
Nanoparticles synthesis and characterization
Nanoparticles synthesis and characterization
kaibalyasahoo82800
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
jana861314
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
Sérgio Sacani
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Delhi Call girls
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
Nistarini College, Purulia (W.B) India
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
jana861314
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Sumit Kumar yadav
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
Sumit Kumar yadav
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Satoshi NAKAHIRA
Recently uploaded
(20)
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Green chemistry and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Nanoparticles synthesis and characterization
Nanoparticles synthesis and characterization
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Interactive and reproducible data analysis with the open-source KNIME Analytics Platform
1.
© 2018 KNIME
AG. All Rights Reserved. Interactive and reproducible data analysis with the open-source KNIME Analytics Platform Greg Landrum, Ph.D. KNIME AG @dr_greg_landrum ACS New Orleans 19 March 2018
2.
© 2018 KNIME
AG. All Rights Reserved. 2 Topics • A brief intro to KNIME – The company – The software • Context: some data analysis problems we’re trying to help with using workflows • A case study of reproducible interactive data analysis in KNIME
3.
© 2018 KNIME
AG. All Rights Reserved. 3 KNIME, the company • KNIME AG founded in 2008 • Offices in Zürich (HQ), Konstanz, Berlin, and Austin • 40+ employees • Maintainer of the Open Source KNIME Analytics Platform – comprehensive data loading, processing, analysis, modeling platform – visual frontend – open: to all sorts of data, other tools (R and Python, etc.), various user personas – 20+ open source releases since 2006 – open source. • KNIME Server – 14 commercial product releases since 2008 • KNIME cloud offerings
4.
© 2018 KNIME
AG. All Rights Reserved. 4 The KNIME® Analytics Platform
5.
© 2018 KNIME
AG. All Rights Reserved. 5 Visual KNIME Workflows Nodes perform tasks on data Workflows combine nodes to model data flow Status Input(s) Outputs Not Configured Idle Executed Error
6.
© 2018 KNIME
AG. All Rights Reserved. 6 Analysis & Mining Statistics, Machine Learning, Data Mining, Web Analytics, Text Mining, Network Analysis, Social Media Analysis, R, Weka, Python, Community / 3rd party, ... Data Access MySQL, Oracle, ... SAS, SPSS, ... Excel, Flat, ... Hive, Impala, ... XML, JSON, PMML Text, Doc, Image, ... Web Crawlers, Industry Specific, Community / 3rd party ... Transformation Row, Column, Matrix Text, Image, Networks, Time Series, Java, Python, Community / 3rd party, ... Visualization R, Python, JFreeChart, JavaScript, Community / 3rd party, ... Deployment via BIRT PMML, XML, JSON Databases, Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd party, ... Over 2000 native and embedded nodes included: Big Data Hive, Impala, HDFS Vertica, Teradata/Aster, Spark, MLlib, Community / 3rd party, ...
7.
© 2018 KNIME
AG. All Rights Reserved. 7 Free E-Learning Course: Web Page 7 • Hands-on e-learning course • Data Access, ETL, Analytics, Control Structures, Visualization • Around 50 small units • … with exercises • … and with solutions on the EXAMPLES server • Final exercises to test your knowledge! https://www.knime.org/knime- introductory-course
8.
© 2018 KNIME
AG. All Rights Reserved. 8 The KNIME Software Ecosystem Deployment: - to Applications - to Humans Collaboration: - Best Practices - Sharing Expertise Automation: - Scheduling - (Model) Management KNIME Analytics Platform KNIME Supported Extensions KNIME Extensions Partner Extensions Community Extensions KNIME Server
9.
© 2018 KNIME
AG. All Rights Reserved. 9 KNIME Server Shared Repositories Access Management Web Enablement Flexible Execution
10.
© 2018 KNIME
AG. All Rights Reserved. 10 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility
11.
© 2018 KNIME
AG. All Rights Reserved. 11 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized
12.
© 2018 KNIME
AG. All Rights Reserved. 12 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think workflows can help with these
13.
© 2018 KNIME
AG. All Rights Reserved. 13 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think KNIME can help with these
14.
© 2018 KNIME
AG. All Rights Reserved. 14 Interactive data analysis and modeling • Fairly often the whole process of data preprocessing, analysis, and modeling can’t be (or shouldn’t be) fully automated. • We want/need a human in the loop • Would be lovely if this weren’t painful Interactive
15.
© 2018 KNIME
AG. All Rights Reserved. 15 Repeatability and reproducibility • I can reproduce what I did before or repeat the same process with a different data set/method • You can do the same thing • Not necessarily talking about strict reproducibility (out to the 15th decimal place), but if we miss that we should be able discover where deviations come from • Would be lovely if this weren’t painful Reproducible
16.
© 2018 KNIME
AG. All Rights Reserved. 16 The need to use multiple tools and multiple data sources • There is no one-size-fits-all solution (or “one-stop shop”) • We’re inevitably going to be using more than one piece of software and working with data from more than one source. • Would be lovely if this weren’t painful Open
17.
© 2018 KNIME
AG. All Rights Reserved. 17 Collaboration between users with different sophistication levels • Some personae:1 – The scripter/programmer: “I’ve got this great new method you should try” – The tool user: “I’ll use software, but there’s no way I’m writing code” – The “stakeholder”: “Those folks are doing useful stuff and I need their results, but I don’t have time to learn some complex new piece of software.” • Would be lovely if enabling collaboration between these different personae wasn’t painful 1 Yes, these are stereotypes Collaborative
18.
© 2018 KNIME
AG. All Rights Reserved. 18 Deployment • Once I’ve built something I’d like to make it available to my colleagues – Sharing models – Sharing methods – Sharing results • Would be lovely if this weren’t painful Deployable
19.
© 2018 KNIME
AG. All Rights Reserved. 19 Just staying organized • I can usually remember where my scripts are • There’s no way I can remember where yours are • It would be lovely if it weren’t painful to find stuff Findable
20.
© 2018 KNIME
AG. All Rights Reserved. 20 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized
21.
© 2018 KNIME
AG. All Rights Reserved. 21 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think workflows can help with these
22.
© 2018 KNIME
AG. All Rights Reserved. 22 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think KNIME can help with these
23.
© 2018 KNIME
AG. All Rights Reserved. 23 The case study: HTS hit list triage
24.
© 2018 KNIME
AG. All Rights Reserved. 24 Background • The problem: Processing a hit list from a high- throughput phenotypic screen for malaria. – Clean up the hit list – Suggest compounds to be sent to a validation assay • Data source: 2014 Teach-Discover-Treat challenge http://www.tdtproject.org/challenge-1---malaria- hts.html • Additional info: – https://github.com/sriniker/TDT-tutorial-2014 – Riniker et al. https://f1000research.com/articles/6-1136/v2
25.
© 2018 KNIME
AG. All Rights Reserved. 25 Approach we’ll take: cleanup • Remove ”ugly” molecules: – PAINS filters1,2: containing substructures that are likely to interfere with/have interfered with the assay. – ”Rapid elimination of swill” (REOS)3: Too big, complicated or greasy. • Don’t want to apply these filters mindlessly, so we should always look at the results and allow manual rescue 1. Baell, J. B. & Holloway, G. A. J. Med. Chem. 53, 2719–40 (2010). 2. http://rdkit.blogspot.ch/2015/08/curating-pains-filters.html 3. Walters, W. P. & Namchuk, M. Nat. Rev. Drug Discov. 2, 259–66 (2003).
26.
© 2018 KNIME
AG. All Rights Reserved. 26 Approach we’ll take: selection for validation • We want good coverage of the chemical space of the HTS actives, but would ideally also like to learn something from the validation results • Approach: – Start with a diverse subset of the cleaned actives – Pick neighbors of each of these so that we have some SAR information in the results https://github.com/sriniker/TDT-tutorial-2014
27.
© 2018 KNIME
AG. All Rights Reserved. 27 The workflows • Download (with data) from EXAMPLES folder in KNIME itself
28.
© 2018 KNIME
AG. All Rights Reserved. 28 Deploying it • Both workflows are built from a series of wrapped metanodes. Each of these becomes a separate page in a Web Portal app when the workflow is copied to the KNIME server. DeployableCollaborative
29.
© 2018 KNIME
AG. All Rights Reserved. 29 Cleanup workflow (part 1)
30.
© 2018 KNIME
AG. All Rights Reserved. 30 Cleanup workflow (part 1) Interactive
31.
© 2018 KNIME
AG. All Rights Reserved. 31 Cleanup workflow (part 1) Reproducible
32.
© 2018 KNIME
AG. All Rights Reserved. 32 Cleanup workflow (part 1) Interactive
33.
© 2018 KNIME
AG. All Rights Reserved. 33 Cleanup workflow (part 1)
34.
© 2018 KNIME
AG. All Rights Reserved. 34 Cleanup workflow (part 1) Interactive
35.
© 2018 KNIME
AG. All Rights Reserved. 35 Cleanup workflow (part 2)
36.
© 2018 KNIME
AG. All Rights Reserved. 36 Cleanup workflow (part 2)
37.
© 2018 KNIME
AG. All Rights Reserved. 37 The output (in Excel)
38.
© 2018 KNIME
AG. All Rights Reserved. 38 Selection workflow
39.
© 2018 KNIME
AG. All Rights Reserved. 39 Selection workflow Interactive
40.
© 2018 KNIME
AG. All Rights Reserved. 40 Selection workflow Interactive
41.
© 2018 KNIME
AG. All Rights Reserved. 41 Selection workflow Interactive
42.
© 2018 KNIME
AG. All Rights Reserved. 42 Selection workflow Interactive
43.
© 2018 KNIME
AG. All Rights Reserved. 43 The output (in Excel)
44.
© 2018 KNIME
AG. All Rights Reserved. 44 Reviewing… • Discussed today: – Interactive data analysis and modeling – Repeatability and reproducibility – Collaboration between users with different sophistication levels – Deployment • For the 40 minute version of this presentation: – Just staying organized – The need to use multiple tools and multiple data sources
45.
© 2018 KNIME
AG. All Rights Reserved. 45 US Roadshow
46.
46© 2018 KNIME
AG. All Rights Reserved. KNIME is hiring! • Software developers (Java and/or JavaScript) • Data scientists (Application Scientists) • Director of product marketing Positions open in Austin, Berlin, Konstanz, and Zürich More info: https://www.knime.com/careers
47.
47© 2018 KNIME
AG. All Rights Reserved. Backups
48.
© 2018 KNIME
AG. All Rights Reserved. 48 Validating Reproducibility • Built-in support for validating that results generated from one run to the next are the same • Can be automated across multiple workflows or groups of workflows
49.
© 2018 KNIME
AG. All Rights Reserved. 49 Validating Reproducibility • Built-in support for validating that results generated from one run to the next are the same • Can be automated across multiple workflows or groups of workflows
50.
© 2018 KNIME
AG. All Rights Reserved. 50 Validating Reproducibility • Built-in support for validating that results generated from one run to the next are the same • Can be automated across multiple workflows or groups of workflows
51.
51© 2018 KNIME
AG. All Rights Reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.
Download now