SlideShare a Scribd company logo
1 of 51
Download to read offline
© 2018 KNIME AG. All Rights Reserved.
Interactive and reproducible data
analysis with the open-source KNIME
Analytics Platform
Greg Landrum, Ph.D.
KNIME AG
@dr_greg_landrum
ACS New Orleans
19 March 2018
© 2018 KNIME AG. All Rights Reserved. 2
Topics
• A brief intro to KNIME
– The company
– The software
• Context: some data analysis problems we’re trying
to help with using workflows
• A case study of reproducible interactive data
analysis in KNIME
© 2018 KNIME AG. All Rights Reserved. 3
KNIME, the company
• KNIME AG founded in 2008
• Offices in Zürich (HQ), Konstanz, Berlin, and Austin
• 40+ employees
• Maintainer of the Open Source KNIME Analytics Platform
– comprehensive data loading, processing, analysis, modeling platform
– visual frontend
– open: to all sorts of data, other tools (R and Python, etc.), various user
personas
– 20+ open source releases since 2006
– open source.
• KNIME Server
– 14 commercial product releases since 2008
• KNIME cloud offerings
© 2018 KNIME AG. All Rights Reserved. 4
The KNIME® Analytics Platform
© 2018 KNIME AG. All Rights Reserved. 5
Visual KNIME Workflows
Nodes perform tasks on data
Workflows combine nodes
to model data flow
Status
Input(s)
Outputs
Not Configured
Idle
Executed
Error
© 2018 KNIME AG. All Rights Reserved. 6
Analysis & Mining
Statistics, Machine Learning, Data
Mining, Web Analytics, Text
Mining, Network Analysis, Social
Media Analysis, R, Weka, Python,
Community / 3rd party, ...
Data Access
MySQL, Oracle, ...
SAS, SPSS, ...
Excel, Flat, ...
Hive, Impala, ...
XML, JSON, PMML
Text, Doc, Image, ...
Web Crawlers,
Industry Specific,
Community / 3rd
party ...
Transformation
Row, Column, Matrix
Text, Image, Networks, Time
Series, Java, Python,
Community / 3rd party, ...
Visualization
R, Python,
JFreeChart,
JavaScript,
Community / 3rd party, ...
Deployment
via BIRT
PMML, XML, JSON
Databases, Excel, Flat, etc.
Text, Doc, Image
Industry Specific
Community / 3rd party, ...
Over 2000 native and embedded nodes included:
Big Data
Hive, Impala, HDFS Vertica,
Teradata/Aster, Spark, MLlib,
Community / 3rd party, ...
© 2018 KNIME AG. All Rights Reserved. 7
Free E-Learning Course: Web Page
7
• Hands-on e-learning course
• Data Access, ETL, Analytics, Control
Structures, Visualization
• Around 50 small units
• … with exercises
• … and with solutions on the
EXAMPLES server
• Final exercises to test your
knowledge!
https://www.knime.org/knime-
introductory-course
© 2018 KNIME AG. All Rights Reserved. 8
The KNIME Software Ecosystem
Deployment:
- to Applications
- to Humans
Collaboration:
- Best Practices
- Sharing Expertise
Automation:
- Scheduling
- (Model) Management
KNIME
Analytics
Platform
KNIME
Supported
Extensions
KNIME
Extensions
Partner
Extensions
Community
Extensions
KNIME Server
© 2018 KNIME AG. All Rights Reserved. 9
KNIME Server
Shared Repositories Access Management Web Enablement
Flexible Execution
© 2018 KNIME AG. All Rights Reserved. 10
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
© 2018 KNIME AG. All Rights Reserved. 11
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
© 2018 KNIME AG. All Rights Reserved. 12
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
I think workflows can help
with these
© 2018 KNIME AG. All Rights Reserved. 13
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
I think KNIME can help with
these
© 2018 KNIME AG. All Rights Reserved. 14
Interactive data analysis and modeling
• Fairly often the whole process of data
preprocessing, analysis, and modeling can’t be (or
shouldn’t be) fully automated.
• We want/need a human in the loop
• Would be lovely if this weren’t painful
Interactive
© 2018 KNIME AG. All Rights Reserved. 15
Repeatability and reproducibility
• I can reproduce what I did before or repeat the
same process with a different data set/method
• You can do the same thing
• Not necessarily talking about strict reproducibility
(out to the 15th decimal place), but if we miss that
we should be able discover where deviations come
from
• Would be lovely if this weren’t painful
Reproducible
© 2018 KNIME AG. All Rights Reserved. 16
The need to use multiple tools and multiple data sources
• There is no one-size-fits-all solution (or “one-stop
shop”)
• We’re inevitably going to be using more than one
piece of software and working with data from more
than one source.
• Would be lovely if this weren’t painful
Open
© 2018 KNIME AG. All Rights Reserved. 17
Collaboration between users with different sophistication levels
• Some personae:1
– The scripter/programmer: “I’ve got this great new
method you should try”
– The tool user: “I’ll use software, but there’s no way I’m
writing code”
– The “stakeholder”: “Those folks are doing useful stuff and
I need their results, but I don’t have time to learn some
complex new piece of software.”
• Would be lovely if enabling collaboration between
these different personae wasn’t painful
1 Yes, these are stereotypes
Collaborative
© 2018 KNIME AG. All Rights Reserved. 18
Deployment
• Once I’ve built something I’d like to make it available
to my colleagues
– Sharing models
– Sharing methods
– Sharing results
• Would be lovely if this weren’t painful
Deployable
© 2018 KNIME AG. All Rights Reserved. 19
Just staying organized
• I can usually remember where my scripts are
• There’s no way I can remember where yours are
• It would be lovely if it weren’t painful to find stuff
Findable
© 2018 KNIME AG. All Rights Reserved. 20
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
© 2018 KNIME AG. All Rights Reserved. 21
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
I think workflows can help
with these
© 2018 KNIME AG. All Rights Reserved. 22
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
I think KNIME can help with
these
© 2018 KNIME AG. All Rights Reserved. 23
The case study: HTS hit list triage
© 2018 KNIME AG. All Rights Reserved. 24
Background
• The problem: Processing a hit list from a high-
throughput phenotypic screen for malaria.
– Clean up the hit list
– Suggest compounds to be sent to a validation assay
• Data source: 2014 Teach-Discover-Treat challenge
http://www.tdtproject.org/challenge-1---malaria-
hts.html
• Additional info:
– https://github.com/sriniker/TDT-tutorial-2014
– Riniker et al. https://f1000research.com/articles/6-1136/v2
© 2018 KNIME AG. All Rights Reserved. 25
Approach we’ll take: cleanup
• Remove ”ugly” molecules:
– PAINS filters1,2: containing substructures that are likely to
interfere with/have interfered with the assay.
– ”Rapid elimination of swill” (REOS)3: Too big, complicated
or greasy.
• Don’t want to apply these filters mindlessly, so we
should always look at the results and allow manual
rescue
1. Baell, J. B. & Holloway, G. A. J. Med. Chem. 53, 2719–40 (2010).
2. http://rdkit.blogspot.ch/2015/08/curating-pains-filters.html
3. Walters, W. P. & Namchuk, M. Nat. Rev. Drug Discov. 2, 259–66 (2003).
© 2018 KNIME AG. All Rights Reserved. 26
Approach we’ll take: selection for validation
• We want good coverage of the chemical space of
the HTS actives, but would ideally also like to learn
something from the validation results
• Approach:
– Start with a diverse subset of the cleaned actives
– Pick neighbors of each of these so that we have some SAR
information in the results
https://github.com/sriniker/TDT-tutorial-2014
© 2018 KNIME AG. All Rights Reserved. 27
The workflows
• Download (with data) from
EXAMPLES folder in KNIME itself
© 2018 KNIME AG. All Rights Reserved. 28
Deploying it
• Both workflows are built from a series of wrapped
metanodes. Each of these becomes a separate page
in a Web Portal app when the workflow is copied to
the KNIME server.
DeployableCollaborative
© 2018 KNIME AG. All Rights Reserved. 29
Cleanup workflow (part 1)
© 2018 KNIME AG. All Rights Reserved. 30
Cleanup workflow (part 1)
Interactive
© 2018 KNIME AG. All Rights Reserved. 31
Cleanup workflow (part 1)
Reproducible
© 2018 KNIME AG. All Rights Reserved. 32
Cleanup workflow (part 1)
Interactive
© 2018 KNIME AG. All Rights Reserved. 33
Cleanup workflow (part 1)
© 2018 KNIME AG. All Rights Reserved. 34
Cleanup workflow (part 1)
Interactive
© 2018 KNIME AG. All Rights Reserved. 35
Cleanup workflow (part 2)
© 2018 KNIME AG. All Rights Reserved. 36
Cleanup workflow (part 2)
© 2018 KNIME AG. All Rights Reserved. 37
The output (in Excel)
© 2018 KNIME AG. All Rights Reserved. 38
Selection workflow
© 2018 KNIME AG. All Rights Reserved. 39
Selection workflow
Interactive
© 2018 KNIME AG. All Rights Reserved. 40
Selection workflow
Interactive
© 2018 KNIME AG. All Rights Reserved. 41
Selection workflow
Interactive
© 2018 KNIME AG. All Rights Reserved. 42
Selection workflow
Interactive
© 2018 KNIME AG. All Rights Reserved. 43
The output (in Excel)
© 2018 KNIME AG. All Rights Reserved. 44
Reviewing…
• Discussed today:
– Interactive data analysis and modeling
– Repeatability and reproducibility
– Collaboration between users with different sophistication
levels
– Deployment
• For the 40 minute version of this presentation:
– Just staying organized
– The need to use multiple tools and multiple data sources
© 2018 KNIME AG. All Rights Reserved. 45
US Roadshow
46© 2018 KNIME AG. All Rights Reserved.
KNIME is hiring!
• Software developers (Java and/or JavaScript)
• Data scientists (Application Scientists)
• Director of product marketing
Positions open in Austin, Berlin, Konstanz, and Zürich
More info: https://www.knime.com/careers
47© 2018 KNIME AG. All Rights Reserved.
Backups
© 2018 KNIME AG. All Rights Reserved. 48
Validating Reproducibility
• Built-in support for validating that results generated
from one run to the next are the same
• Can be automated across multiple workflows or
groups of workflows
© 2018 KNIME AG. All Rights Reserved. 49
Validating Reproducibility
• Built-in support for validating that results generated
from one run to the next are the same
• Can be automated across multiple workflows or
groups of workflows
© 2018 KNIME AG. All Rights Reserved. 50
Validating Reproducibility
• Built-in support for validating that results generated
from one run to the next are the same
• Can be automated across multiple workflows or
groups of workflows
51© 2018 KNIME AG. All Rights Reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by
KNIME.com AG under license from KNIME GmbH, and are registered in the United States.
KNIME® is also registered in Germany.

More Related Content

What's hot

Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionMurtaza Doctor
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...DataWorks Summit
 
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...InfluxData
 
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1KNIMESlides
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseJason Plurad
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
Knime customer intelligence on social media odsc london
Knime customer intelligence on social media odsc london   Knime customer intelligence on social media odsc london
Knime customer intelligence on social media odsc london Jessica Willis
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scaleHenry Saputra
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureBig Data Spain
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaNeo4j
 
NetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Insight
 
Graph + AI World Opening Keynote
Graph + AI World Opening KeynoteGraph + AI World Opening Keynote
Graph + AI World Opening KeynoteTigerGraph
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphJason Plurad
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 

What's hot (20)

Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
 
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
 
Production Grade Data Science for Hadoop
Production Grade Data Science for HadoopProduction Grade Data Science for Hadoop
Production Grade Data Science for Hadoop
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
Knime customer intelligence on social media odsc london
Knime customer intelligence on social media odsc london   Knime customer intelligence on social media odsc london
Knime customer intelligence on social media odsc london
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
 
NetApp By The Numbers
NetApp By The NumbersNetApp By The Numbers
NetApp By The Numbers
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
NetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Flash Storage Facts
NetApp Flash Storage Facts
 
StreamSet ETL tool
StreamSet  ETL toolStreamSet  ETL tool
StreamSet ETL tool
 
Graph + AI World Opening Keynote
Graph + AI World Opening KeynoteGraph + AI World Opening Keynote
Graph + AI World Opening Keynote
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 

Similar to Interactive and reproducible data analysis with the open-source KNIME Analytics Platform

Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysisGreg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformFreenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformBrandon White
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
 
Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareKNIMESlides
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIMESlides
 
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningWebinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningMesosphere Inc.
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsDataKitchen
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOpsChristopher Bergh
 
Webinar slides: How to Get Started with Open Source Database Management
Webinar slides: How to Get Started with Open Source Database ManagementWebinar slides: How to Get Started with Open Source Database Management
Webinar slides: How to Get Started with Open Source Database ManagementSeveralnines
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Codemotion
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? Greg Landrum
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningGreg Landrum
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersDavid Walker
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformKNIMESlides
 

Similar to Interactive and reproducible data analysis with the open-source KNIME Analytics Platform (20)

Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Your Flight is Boarding Now!
Your Flight is Boarding Now!Your Flight is Boarding Now!
Your Flight is Boarding Now!
 
Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformFreenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning Platform
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME Software
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningWebinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the Learning
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Webinar slides: How to Get Started with Open Source Database Management
Webinar slides: How to Get Started with Open Source Database ManagementWebinar slides: How to Get Started with Open Source Database Management
Webinar slides: How to Get Started with Open Source Database Management
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
 

More from Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registrationGreg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Greg Landrum
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataGreg Landrum
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 

More from Greg Landrum (15)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 

Recently uploaded

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 

Recently uploaded (20)

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 

Interactive and reproducible data analysis with the open-source KNIME Analytics Platform

  • 1. © 2018 KNIME AG. All Rights Reserved. Interactive and reproducible data analysis with the open-source KNIME Analytics Platform Greg Landrum, Ph.D. KNIME AG @dr_greg_landrum ACS New Orleans 19 March 2018
  • 2. © 2018 KNIME AG. All Rights Reserved. 2 Topics • A brief intro to KNIME – The company – The software • Context: some data analysis problems we’re trying to help with using workflows • A case study of reproducible interactive data analysis in KNIME
  • 3. © 2018 KNIME AG. All Rights Reserved. 3 KNIME, the company • KNIME AG founded in 2008 • Offices in Zürich (HQ), Konstanz, Berlin, and Austin • 40+ employees • Maintainer of the Open Source KNIME Analytics Platform – comprehensive data loading, processing, analysis, modeling platform – visual frontend – open: to all sorts of data, other tools (R and Python, etc.), various user personas – 20+ open source releases since 2006 – open source. • KNIME Server – 14 commercial product releases since 2008 • KNIME cloud offerings
  • 4. © 2018 KNIME AG. All Rights Reserved. 4 The KNIME® Analytics Platform
  • 5. © 2018 KNIME AG. All Rights Reserved. 5 Visual KNIME Workflows Nodes perform tasks on data Workflows combine nodes to model data flow Status Input(s) Outputs Not Configured Idle Executed Error
  • 6. © 2018 KNIME AG. All Rights Reserved. 6 Analysis & Mining Statistics, Machine Learning, Data Mining, Web Analytics, Text Mining, Network Analysis, Social Media Analysis, R, Weka, Python, Community / 3rd party, ... Data Access MySQL, Oracle, ... SAS, SPSS, ... Excel, Flat, ... Hive, Impala, ... XML, JSON, PMML Text, Doc, Image, ... Web Crawlers, Industry Specific, Community / 3rd party ... Transformation Row, Column, Matrix Text, Image, Networks, Time Series, Java, Python, Community / 3rd party, ... Visualization R, Python, JFreeChart, JavaScript, Community / 3rd party, ... Deployment via BIRT PMML, XML, JSON Databases, Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd party, ... Over 2000 native and embedded nodes included: Big Data Hive, Impala, HDFS Vertica, Teradata/Aster, Spark, MLlib, Community / 3rd party, ...
  • 7. © 2018 KNIME AG. All Rights Reserved. 7 Free E-Learning Course: Web Page 7 • Hands-on e-learning course • Data Access, ETL, Analytics, Control Structures, Visualization • Around 50 small units • … with exercises • … and with solutions on the EXAMPLES server • Final exercises to test your knowledge! https://www.knime.org/knime- introductory-course
  • 8. © 2018 KNIME AG. All Rights Reserved. 8 The KNIME Software Ecosystem Deployment: - to Applications - to Humans Collaboration: - Best Practices - Sharing Expertise Automation: - Scheduling - (Model) Management KNIME Analytics Platform KNIME Supported Extensions KNIME Extensions Partner Extensions Community Extensions KNIME Server
  • 9. © 2018 KNIME AG. All Rights Reserved. 9 KNIME Server Shared Repositories Access Management Web Enablement Flexible Execution
  • 10. © 2018 KNIME AG. All Rights Reserved. 10 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility
  • 11. © 2018 KNIME AG. All Rights Reserved. 11 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized
  • 12. © 2018 KNIME AG. All Rights Reserved. 12 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think workflows can help with these
  • 13. © 2018 KNIME AG. All Rights Reserved. 13 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think KNIME can help with these
  • 14. © 2018 KNIME AG. All Rights Reserved. 14 Interactive data analysis and modeling • Fairly often the whole process of data preprocessing, analysis, and modeling can’t be (or shouldn’t be) fully automated. • We want/need a human in the loop • Would be lovely if this weren’t painful Interactive
  • 15. © 2018 KNIME AG. All Rights Reserved. 15 Repeatability and reproducibility • I can reproduce what I did before or repeat the same process with a different data set/method • You can do the same thing • Not necessarily talking about strict reproducibility (out to the 15th decimal place), but if we miss that we should be able discover where deviations come from • Would be lovely if this weren’t painful Reproducible
  • 16. © 2018 KNIME AG. All Rights Reserved. 16 The need to use multiple tools and multiple data sources • There is no one-size-fits-all solution (or “one-stop shop”) • We’re inevitably going to be using more than one piece of software and working with data from more than one source. • Would be lovely if this weren’t painful Open
  • 17. © 2018 KNIME AG. All Rights Reserved. 17 Collaboration between users with different sophistication levels • Some personae:1 – The scripter/programmer: “I’ve got this great new method you should try” – The tool user: “I’ll use software, but there’s no way I’m writing code” – The “stakeholder”: “Those folks are doing useful stuff and I need their results, but I don’t have time to learn some complex new piece of software.” • Would be lovely if enabling collaboration between these different personae wasn’t painful 1 Yes, these are stereotypes Collaborative
  • 18. © 2018 KNIME AG. All Rights Reserved. 18 Deployment • Once I’ve built something I’d like to make it available to my colleagues – Sharing models – Sharing methods – Sharing results • Would be lovely if this weren’t painful Deployable
  • 19. © 2018 KNIME AG. All Rights Reserved. 19 Just staying organized • I can usually remember where my scripts are • There’s no way I can remember where yours are • It would be lovely if it weren’t painful to find stuff Findable
  • 20. © 2018 KNIME AG. All Rights Reserved. 20 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized
  • 21. © 2018 KNIME AG. All Rights Reserved. 21 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think workflows can help with these
  • 22. © 2018 KNIME AG. All Rights Reserved. 22 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think KNIME can help with these
  • 23. © 2018 KNIME AG. All Rights Reserved. 23 The case study: HTS hit list triage
  • 24. © 2018 KNIME AG. All Rights Reserved. 24 Background • The problem: Processing a hit list from a high- throughput phenotypic screen for malaria. – Clean up the hit list – Suggest compounds to be sent to a validation assay • Data source: 2014 Teach-Discover-Treat challenge http://www.tdtproject.org/challenge-1---malaria- hts.html • Additional info: – https://github.com/sriniker/TDT-tutorial-2014 – Riniker et al. https://f1000research.com/articles/6-1136/v2
  • 25. © 2018 KNIME AG. All Rights Reserved. 25 Approach we’ll take: cleanup • Remove ”ugly” molecules: – PAINS filters1,2: containing substructures that are likely to interfere with/have interfered with the assay. – ”Rapid elimination of swill” (REOS)3: Too big, complicated or greasy. • Don’t want to apply these filters mindlessly, so we should always look at the results and allow manual rescue 1. Baell, J. B. & Holloway, G. A. J. Med. Chem. 53, 2719–40 (2010). 2. http://rdkit.blogspot.ch/2015/08/curating-pains-filters.html 3. Walters, W. P. & Namchuk, M. Nat. Rev. Drug Discov. 2, 259–66 (2003).
  • 26. © 2018 KNIME AG. All Rights Reserved. 26 Approach we’ll take: selection for validation • We want good coverage of the chemical space of the HTS actives, but would ideally also like to learn something from the validation results • Approach: – Start with a diverse subset of the cleaned actives – Pick neighbors of each of these so that we have some SAR information in the results https://github.com/sriniker/TDT-tutorial-2014
  • 27. © 2018 KNIME AG. All Rights Reserved. 27 The workflows • Download (with data) from EXAMPLES folder in KNIME itself
  • 28. © 2018 KNIME AG. All Rights Reserved. 28 Deploying it • Both workflows are built from a series of wrapped metanodes. Each of these becomes a separate page in a Web Portal app when the workflow is copied to the KNIME server. DeployableCollaborative
  • 29. © 2018 KNIME AG. All Rights Reserved. 29 Cleanup workflow (part 1)
  • 30. © 2018 KNIME AG. All Rights Reserved. 30 Cleanup workflow (part 1) Interactive
  • 31. © 2018 KNIME AG. All Rights Reserved. 31 Cleanup workflow (part 1) Reproducible
  • 32. © 2018 KNIME AG. All Rights Reserved. 32 Cleanup workflow (part 1) Interactive
  • 33. © 2018 KNIME AG. All Rights Reserved. 33 Cleanup workflow (part 1)
  • 34. © 2018 KNIME AG. All Rights Reserved. 34 Cleanup workflow (part 1) Interactive
  • 35. © 2018 KNIME AG. All Rights Reserved. 35 Cleanup workflow (part 2)
  • 36. © 2018 KNIME AG. All Rights Reserved. 36 Cleanup workflow (part 2)
  • 37. © 2018 KNIME AG. All Rights Reserved. 37 The output (in Excel)
  • 38. © 2018 KNIME AG. All Rights Reserved. 38 Selection workflow
  • 39. © 2018 KNIME AG. All Rights Reserved. 39 Selection workflow Interactive
  • 40. © 2018 KNIME AG. All Rights Reserved. 40 Selection workflow Interactive
  • 41. © 2018 KNIME AG. All Rights Reserved. 41 Selection workflow Interactive
  • 42. © 2018 KNIME AG. All Rights Reserved. 42 Selection workflow Interactive
  • 43. © 2018 KNIME AG. All Rights Reserved. 43 The output (in Excel)
  • 44. © 2018 KNIME AG. All Rights Reserved. 44 Reviewing… • Discussed today: – Interactive data analysis and modeling – Repeatability and reproducibility – Collaboration between users with different sophistication levels – Deployment • For the 40 minute version of this presentation: – Just staying organized – The need to use multiple tools and multiple data sources
  • 45. © 2018 KNIME AG. All Rights Reserved. 45 US Roadshow
  • 46. 46© 2018 KNIME AG. All Rights Reserved. KNIME is hiring! • Software developers (Java and/or JavaScript) • Data scientists (Application Scientists) • Director of product marketing Positions open in Austin, Berlin, Konstanz, and Zürich More info: https://www.knime.com/careers
  • 47. 47© 2018 KNIME AG. All Rights Reserved. Backups
  • 48. © 2018 KNIME AG. All Rights Reserved. 48 Validating Reproducibility • Built-in support for validating that results generated from one run to the next are the same • Can be automated across multiple workflows or groups of workflows
  • 49. © 2018 KNIME AG. All Rights Reserved. 49 Validating Reproducibility • Built-in support for validating that results generated from one run to the next are the same • Can be automated across multiple workflows or groups of workflows
  • 50. © 2018 KNIME AG. All Rights Reserved. 50 Validating Reproducibility • Built-in support for validating that results generated from one run to the next are the same • Can be automated across multiple workflows or groups of workflows
  • 51. 51© 2018 KNIME AG. All Rights Reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.