Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyright © 2015 KNIME.com AG
Big Data Science is just a
Click Away!
Rosaria Silipo
KNIME.com
Copyright © 2015 KNIME.com AG
Variety, Volume, Velocity
Variety:
• integrating heterogeneous data (and tools)
Volume:
• fr...
Copyright © 2015 KNIME.com AG
Every Minute…
5
Copyright © 2015 KNIME.com AG
IoT
6
Copyright © 2015 KNIME.com AG 7
The Challenge
Copyright © 2015 KNIME.com AG
Energy Usage Prediction from Smart Meters Data
• Read Smart Meter Energy Data (176 millions ...
Copyright © 2015 KNIME.com AG
Workflow 1: PrepareData
9
~ 2 days
Copyright © 2015 KNIME.com AG 10
Big Data
Copyright © 2015 KNIME.com AG
Big Data Support
• KNIME Big Data Access Nodes
– preconfigured connectors
– in database proc...
Copyright © 2015 KNIME.com AG
Hadoop Sandboxes
• Hortonworks:
http://hortonworks.com/products/hortonworks-sandbox/
• Cloud...
Copyright © 2015 KNIME.com AG
Access Big
Data
Select Table
In-DB
Processing
Into
KNIME
… as easy as 1,2,3,… 4
13
4321
Copyright © 2015 KNIME.com AG
1. Database Connector
Generic Database Connector
– Can connect to any JDBC source
– Register...
Copyright © 2015 KNIME.com AG
1. Register JDBC Driver
15
Open KNIME and go to
File -> Preferences
Increase connection time...
Copyright © 2015 KNIME.com AG
1. Dedicated Connectors
Dedicated pre-configured connectors
– Bundling necessary JDBC driver...
Copyright © 2015 KNIME.com AG
2. Data Table Selection
18
Select
Table
Copyright © 2015 KNIME.com AG
3. In-Database Processing
• Filter rows and columns
• Join tables/queries
• Sort your data
•...
Copyright © 2015 KNIME.com AG
3. Queries for average Measures
20
In-DB
Processing
Copyright © 2015 KNIME.com AG
3. Average Monthly Values
22
In-DB
Processing
Copyright © 2015 KNIME.com AG
4. Import Data from Database
23
< 30 min
1 2
3
4
Into KNIME
Copyright © 2015 KNIME.com AG
New Big Data Platform?
24
No problem!
Just change the connector node!
Copyright © 2015 KNIME.com AG
Other Useful Database Nodes
• Drop table
– missing table handling
– cascade option
• Execute...
Copyright © 2015 KNIME.com AG 26
KNIME Big Data Extension
Copyright © 2015 KNIME.com AG
KNIME Big Data Extension
• KNIME Big Data Access Nodes
– preconfigured connectors
– HDFS Fil...
Copyright © 2015 KNIME.com AG
HDFS File Handling
• KNIME & Extensions ->
KNIME File Handling Nodes
• HDFS Connection and
H...
Copyright © 2015 KNIME.com AG
Hive/Impala Loader
29
• Upload a KNIME data table to Hive/Impala
Copyright © 2015 KNIME.com AG
KNIME Big Data Extension: Download and Install
KNIME.com Extension Store
License Required!
I...
Copyright © 2015 KNIME.com AG
License on KNIME Store
http://tech.knime.org/knime-store
30-day trial license available with...
Copyright © 2015 KNIME.com AG
References
• Whitepaper “KNIME opens the Doors to Big Data”
http://www.knime.org/files/big_d...
Copyright © 2015 KNIME.com AG
Thank You!
• education@knime.com
• Twitter: @KNIME
• LinkedIn Group: KNIME
• KNIME Blog: htt...
Upcoming SlideShare
Loading in …5
×

Big Data with KNIME is as easy as 1, 2, 3, ...4!

3,695 views

Published on

This presentation shows how we have re-engineered an old legacy workflow to run partially on Hadoop from within the KNIME Analytics Platform, to speed up dramatically the execution time.
It also shows how easy it has been to move the ETL part of the workflow to Hadoop using the KNIME big data access nodes and in-database processing nodes.
KNIME big data nodes are Cloudera, Hortonworks, and MapR certified, as of today (October 21 2015)

Published in: Data & Analytics

Big Data with KNIME is as easy as 1, 2, 3, ...4!

  1. 1. Copyright © 2015 KNIME.com AG Big Data Science is just a Click Away! Rosaria Silipo KNIME.com
  2. 2. Copyright © 2015 KNIME.com AG Variety, Volume, Velocity Variety: • integrating heterogeneous data (and tools) Volume: • from small files... • ...to distributed data repositories (Hadoop) • bring the tools to the data Velocity: • from distributing computationally heavy computations... • ...to real time scoring of millions of records/sec. 4
  3. 3. Copyright © 2015 KNIME.com AG Every Minute… 5
  4. 4. Copyright © 2015 KNIME.com AG IoT 6
  5. 5. Copyright © 2015 KNIME.com AG 7 The Challenge
  6. 6. Copyright © 2015 KNIME.com AG Energy Usage Prediction from Smart Meters Data • Read Smart Meter Energy Data (176 millions rows) • Clean Up and Aggregate total Energy Usage by hour, week, day, month, year • Calculate Behavioral Measures for each Smart Meter • Cluster Smart Meters with Similar Behavior (k- Means) • Predict Energy Usage in Clustered Smart Meters (Auto-Regressive Time Series Prediction) 8 Workflow 1 Workflow 2 Workflow 3
  7. 7. Copyright © 2015 KNIME.com AG Workflow 1: PrepareData 9 ~ 2 days
  8. 8. Copyright © 2015 KNIME.com AG 10 Big Data
  9. 9. Copyright © 2015 KNIME.com AG Big Data Support • KNIME Big Data Access Nodes – preconfigured connectors – in database processing • Big Data Platforms – HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, Actian, any big data platform really! • Spark MLlib integration (coming soon) • Streaming Executor (coming soon)
  10. 10. Copyright © 2015 KNIME.com AG Hadoop Sandboxes • Hortonworks: http://hortonworks.com/products/hortonworks-sandbox/ • Cloudera: http://www.cloudera.com/content/cloudera/en/downloads/ quickstart_vms.html • Virtual Box https://www.virtualbox.org/ • VMWare Player http://www.vmware.com/ 12
  11. 11. Copyright © 2015 KNIME.com AG Access Big Data Select Table In-DB Processing Into KNIME … as easy as 1,2,3,… 4 13 4321
  12. 12. Copyright © 2015 KNIME.com AG 1. Database Connector Generic Database Connector – Can connect to any JDBC source – Register new JDBC driver via preferences page 14 Access Big Data
  13. 13. Copyright © 2015 KNIME.com AG 1. Register JDBC Driver 15 Open KNIME and go to File -> Preferences Increase connection timeout for long running retrieval operations Access Big Data
  14. 14. Copyright © 2015 KNIME.com AG 1. Dedicated Connectors Dedicated pre-configured connectors – Bundling necessary JDBC drivers – Easy to use – DB specific behavior/capability Some dedicated connectors are part of the open source KNIME Analytics Platform, some belong to the commercial KNIME Big Data Extension 16 works for most Hadoop HIVE installations, including Hortonworks free Access Big Data
  15. 15. Copyright © 2015 KNIME.com AG 2. Data Table Selection 18 Select Table
  16. 16. Copyright © 2015 KNIME.com AG 3. In-Database Processing • Filter rows and columns • Join tables/queries • Sort your data • Write your own query • Aggregate* your data 19 Similar Settings as GroupBy node Similar Settings as Joiner node * Database GroupBy node exposes DB specific aggregation methods In-DB Processing
  17. 17. Copyright © 2015 KNIME.com AG 3. Queries for average Measures 20 In-DB Processing
  18. 18. Copyright © 2015 KNIME.com AG 3. Average Monthly Values 22 In-DB Processing
  19. 19. Copyright © 2015 KNIME.com AG 4. Import Data from Database 23 < 30 min 1 2 3 4 Into KNIME
  20. 20. Copyright © 2015 KNIME.com AG New Big Data Platform? 24 No problem! Just change the connector node!
  21. 21. Copyright © 2015 KNIME.com AG Other Useful Database Nodes • Drop table – missing table handling – cascade option • Execute any SQL statement • Manipulate existing queries 25 Executes several queries separated by ; and new line
  22. 22. Copyright © 2015 KNIME.com AG 26 KNIME Big Data Extension
  23. 23. Copyright © 2015 KNIME.com AG KNIME Big Data Extension • KNIME Big Data Access Nodes – preconfigured connectors – HDFS File Handling – Hive/Impala Loader • Big Data Platforms – HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, Actian, SAP Hana (to be), … • Spark MLlib integration (coming soon) • Streaming Executor (coming soon)
  24. 24. Copyright © 2015 KNIME.com AG HDFS File Handling • KNIME & Extensions -> KNIME File Handling Nodes • HDFS Connection and HDFS File Permission nodes 28
  25. 25. Copyright © 2015 KNIME.com AG Hive/Impala Loader 29 • Upload a KNIME data table to Hive/Impala
  26. 26. Copyright © 2015 KNIME.com AG KNIME Big Data Extension: Download and Install KNIME.com Extension Store License Required! Installation Instructions http://tech.knime.org/installation-instructions Product Description http://www.knime.org/knime-big-data-extension
  27. 27. Copyright © 2015 KNIME.com AG License on KNIME Store http://tech.knime.org/knime-store 30-day trial license available with special Promotion Code education@knime.com
  28. 28. Copyright © 2015 KNIME.com AG References • Whitepaper “KNIME opens the Doors to Big Data” http://www.knime.org/files/big_data_in_knime_1.pdf • Blog Post “Integrating Big data is as Easy as 1,2,3, … 4” http://www.knime.org/blog/integrating-big-data-is-as-easy-as- 1-2-3-4 • The Big Data Extension Product Description http://www.knime.org/knime-big-data-extension 32
  29. 29. Copyright © 2015 KNIME.com AG Thank You! • education@knime.com • Twitter: @KNIME • LinkedIn Group: KNIME • KNIME Blog: http://www.knime.org/blog 33

×