KNIME is an open source platform for data analytics and processing. It allows users to visually create data workflows using predefined nodes to perform tasks like data mining, analysis, transformation and more. KNIME can integrate data from multiple sources and interface with languages like R, Python and SQL. It provides a graphical user interface for building, executing and monitoring analytics workflows.
2. What is KNIME ?
• KNIME Stands for Konstanz Information Miner.
• Developed at University of Konstanz in Germany 2004-2006 and focused
initially on pharmaceutical research.
• The KNIME is an open source platform for analytical data
modelling and processing.
• KNIME allows users to visually create data flows (or pipelines)
• Written in Java based on the Eclipse SDK platform .
• Modular platform for building and executing workflows using predefined
components, called nodes.
• Core functionality available for tasks such as standard data mining, analysis
and manipulation.
• GUI based with scripting integration.
• An especially powerful aspect of KNIME is its ability to integrate data from multiple
sources
• KNIME also offers extensions that allow it to interface with R, Python, Java, and SQL.
3. KNIME DATA ANALYTICS LIFECYCLE
READ
DATA
READ
DATA
READ
DATA
Extract,
Transform,
Load (ETL)
Data
Analytics or
Predictive
Analysis
Reporting
and/or
Injection
6. A node is the smallest programming unit in KNIME
Each node serves a dedicated task.
After being created, a node needs settings to exec
ute the task, this phase is called configuration.
After configuration, a node needs to be executed
to actually carry out the assigned task.
01
02
03
04
Node Status and Operations
7. Node Status and Operations
• A node can have 3 states:
Idle: The node is not yet configured and cannot be executed
with its current settings.
Configured: The node has been set up correctly, and may be
executed at any time
Executed: The node has been successfully executed. Results
may be viewed and used in downstream nodes.
8. Node Status and Operations
Input Output
Status
Partitioning
Not Configured
Idle
Executed
Error
12. KNIME WORKFLOW
• KNIME provides huge repository of
modules for easy-to-use and for
modular:
KNIME
Data
Preprocessing
Data fusion
Data
Transformation
13. DATABASE
MySQL,
any JDBC (Oracle, DB2,
MySQL Server).
FILES
Csv, txt, Excel, Word,
PDF,
Images, texts.
WEB,CLOUD
Web services
Twitter, Google
FILESDATABASE WEB, CLOUD
Data Access
15. Linear correlation and dependency measures
Many nodes also support statistical standards such as count,
sum, mean, etc.
“Statistics” node has base measures of distribution
KNIME STATISTICS
16. Data partitioning and multiple
folds
These are extended through partner
implementations and scripting
languages (R, Python, Weka, etc.)
Base KNIME supports most
machine learning algorithms
KNIME MACHINE LEARNING
17. KNIME REPORTING
• Generates reports in office document formats, PDF, and
HTML
• BIRT Tool as part of the Eclipse framework
• Native part of the KNIME workbench
• Extends data visualization capabilities
• Auto-distribute by email, or publish to websites
18. Process Mapping
Process Analysis
IDEAS
DATA AGGREGATION
• Combine data from different
sources, local or remote
• ETL data into a single repository for
querying/analytics
BUSINESS INTELLIGENCE
• Data intelligence and reporting over large
aggregated datasets
• Automated reusable workflows for
standardized reporting
PREDICTIVE ANALYTICS
• Ability for insight across very large
datasets
KNIME ANALYTICS
• Advantage of being a data agnostic
aggregator
• Ability to work through very large
datasets with little hardware
• Access to complex algorithms with
easy tools
DATA ANALYTICS USE CASES
19. KNIME ADVANTAGES
• KNIMEs core-architecture allows processing of large data volumes that are only limited by the
available hard disk space (not limited to the available RAM). E.g. KNIME allows analysis of 300
million customer addresses, 20 million cell images and 10 million molecular structures.
• Additional plugins allows the integration of methods for Text mining, Image mining, as well as
time series analysis.
• KNIME integrates various other open-source projects, e.g. machine learning algorithms from
Weka, the statistics package R project, ImageJ, and the Chemistry Development Kit .
• KNIME is implemented in Java but also allows for wrappers calling other code in addition to
providing nodes that allow to run Java, Python, Perl and other code fragments