Moving and Transforming Data with

Pentaho Data Integration
a.k.a. KETTLE
Welcome!
• Software engineer
• rbouman@pentaho.com
Pentaho
• Business Intelligence & Analytics
• Full stack
• Open core
–

GPLv2, Apache 2.0

–

Enterprise and OEM licenses

• Java-based
• Web front-ends
The Pentaho Stack
•
•
•
•
•
•
•
•
•
•

Data Integration / ETL
Big Data / NoSQL
Data Modeling
Reporting
OLAP / Analysis
Data Visualization
Dashboarding
Data Mining / Predictive Analysis
(Mobile) Delivery
Bursting, Scheduling, Self Service
Full Stack BI & BA

Sources

Reports

OLAP

Blending

Visualization
Data Warehouse
ETL

T

T

T

T

Models
D
D

Instant Analytics

Dashboards

D

F
D

Mining
D
Information:

A well-prepared, well-presented meal
Extraction:

Catching the right data and hauling it in
Transformation

Disgusting job of validating & cleaning data
Loading:

Store manageable units for later use
Loading:

Store manageable units for later use
Pentaho Data Integration
• Kettle
– Extract,

Transform, Load

– Blending
– Instant Analytics

• Changing input to desired output
Kettle Architecture
Data Integration Engine
Job
Engine

Call

Job

Transformation
Engine

Transformation

Tools and Utilities
Launch:
Kitchen, Carte

Launch:
Pan

Develop: Spoon

Repository
(RDBMS)
.kjb

.ktr
Jobs & Transformations
• Jobs
– Synchronous

workflow of job entries (tasks)

• Transformations
– Stepwise

parallel & asynchronous

processing of a recordstream
• Distributed
Sources and Destinations
•
•
•
•
•
•
•

RDBMS (> 40)
NoSQL / Big Data
OLAP (Mondrian, Palo, XML/A)
Web (REST, SOAP, XML, JSON .)
Files (CSV, Fixed, Excel …)
ERP (SAP, Salesforce, OpenERP)
...way Too Many To Mention™!
Transformations
•
•
•
•
•
•
•
•

String & Date manipulation
Data Validation / Business Rules
Lookup / Join
Calculation, Statistics
Cryptography
Decisions, Flow control
Scripting
...> 150, excluding plugins
Demo
• NLUUG Program Webpage
Input

Output
dim_room

ETL

dim_track

fact_talk

dim_speaker

dim_company
Demo Transformation
Business Model
• Open core
– Majority
– Give

is open source

and take

• Enterprise Edition
– Extra

features, Support, Services
Community
• Plugins
– Pentaho

Marketplace

• Code contributions
• Applications & Solutions
• Tutorials, Support
Marketplace
Community Meetups
Pentaho Software
• Community Edition Binaries
–

community.pentaho.com (release)

ci.pentaho.com (development)
• Source code
–

–

github.com/pentaho

• Enterprise Edition Evaluation
–

pentaho.com/testdrive
Resources
• Online documentation
– infocenter.pentaho.com
– wiki.pentaho.com

(manual)

(community wiki)

• Issue tracker
– jira.pentaho.com

• Community Support
– forums.pentaho.com
– Freenode

IRC: ##pentaho
Books
Thank You
Join the conversation. You can find us on:
blog.pentaho.com
@Pentaho
Facebook.com/Pentaho
Pentaho Business Analytics
25

Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)