SlideShare a Scribd company logo
Pentaho Data Integration
Best Practices
Matt Casters
Chief Data Integration, Kettle Founder
#PCM14
© 2014, Pentaho. All Rights Reserved.
2 #PCM14
About this session
Today You Will Learn
Best practices when working
with Pentaho Data
Integration
Agenda
❯Working with PDI
❯Design patterns
❯Demo fun stuff
© 2014, Pentaho. All Rights Reserved.
3 #PCM14
Working with PDI
© 2014, Pentaho. All Rights Reserved.
4 #PCM14
Working with PDI: Naming
❯Provide meaningful names for steps and job entries
❯Do not hesitate to use special characters
© 2014, Pentaho. All Rights Reserved.
5 #PCM14
Working with PDI: Naming
❯Avoid environment specific names
Test Database
France MySQL
East Coast Cluster
CRM
WWW
Cluster
© 2014, Pentaho. All Rights Reserved.
6 #PCM14
Working with PDI: Naming
❯Keep your environment tidy
• Folders can have sub-folders!
❯Use naming conventions for everything
• Database tables and fields
• Directories
• Server names
© 2014, Pentaho. All Rights Reserved.
7 #PCM14
Working with PDI: Naming
❯Use a corporate standard
❯Verify and enforce periodically
❯Use rules to validate repository imports
• Database names
• Notes and descriptions
© 2014, Pentaho. All Rights Reserved.
8 #PCM14
Working with PDI: Naming
❯Use a corporate standards
❯Verify and enforce periodically
❯Use rules to validate repository imports
• Database names
• Notes and descriptions
© 2014, Pentaho. All Rights Reserved.
9 #PCM14
Working with PDI: Tidy up!
❯Limit the number of steps or job entries
© 2014, Pentaho. All Rights Reserved.
10 #PCM14
Working with PDI: Tidy up!
❯Enable grid size 32 or 16
❯Prevents accidental move of step or entry
© 2014, Pentaho. All Rights Reserved.
11 #PCM14
Working with PDI: Parameters
❯Explicit use of variables
❯Easier testing
❯Make re-use a breeze
© 2014, Pentaho. All Rights Reserved.
12 #PCM14
Working with PDI: Variables
❯Environment specific : kettle.properties
❯Prefer ${SOLUTION_HOME}
❯Avoid ${Internal.Transformation.Filename.Directory}
❯Configure step copies with variables
© 2014, Pentaho. All Rights Reserved.
13 #PCM14
Working with PDI: Logging
❯Log everything!
❯Measurement is management
❯Use the Pentaho audit mart
❯Learn about all the possible logging features
© 2014, Pentaho. All Rights Reserved.
14 #PCM14
Working with PDI: Mappings
❯Mapping vs Simple Mapping step
❯Realize this is a macro
❯Use completely different field names
❯Avoid renaming or removing fields
© 2014, Pentaho. All Rights Reserved.
15 #PCM14
Working with PDI: Metadata Injection
❯Avoid manual population of dialogs
❯Whenever you need dynamic ETL
❯5.1 supports data streaming
❯Example:
• stage 50 different files with one transformation
© 2014, Pentaho. All Rights Reserved.
16 #PCM14
Working with PDI: Performance
❯Transformations are networks
❯Network speed is limited to the slowest part
❯The slowest step is indicated while running in Spoon
❯Slow steps have a full input and empty output buffer
© 2014, Pentaho. All Rights Reserved.
17 #PCM14
Working with PDI: Performance
❯First re-write, re-think, re-organize
❯Parallize work
❯End-to-end data pipe-lining
❯Do work where it's the fastest (ELT)
© 2014, Pentaho. All Rights Reserved.
18 #PCM14
Working with PDI: Lifecycle
❯Automate export of repositories
❯Use import rules to validate quality
❯Always use version control for file based setup
© 2014, Pentaho. All Rights Reserved.
19 #PCM14
Design patterns
© 2014, Pentaho. All Rights Reserved.
20 #PCM14
Design patterns: Loops
❯Use the job or transformation executor steps
❯Much easier since version 5
❯Demonstration: process lots of small files
© 2014, Pentaho. All Rights Reserved.
21 #PCM14
DEMONSTRATION
© 2014, Pentaho. All Rights Reserved.
22 #PCM14
Design patterns: Queues
❯Process buffer
❯Facilitates parallelism
❯Forces process logging best practice
❯Only way to process recurring files
© 2014, Pentaho. All Rights Reserved.
23 #PCM14
Design patterns: Load balancing
❯On a step level: PDI EE 5.x built-in
❯Balance jobs and transformations with carte
❯Set up a carte cluster
❯Use a queue
❯Interrogate Carte web services
© 2014, Pentaho. All Rights Reserved.
24 #PCM14
DEMONSTRATION
© 2014, Pentaho. All Rights Reserved.
25 #PCM14
Design patterns: Watchdog
❯“Who watches the watchmen?”
❯Simple recipe:
• On success increment a counter
• Periodically verify that the counter is advancing
• Take action when counter is not advancing
© 2014, Pentaho. All Rights Reserved.
26 #PCM14
Design patterns: Watchdog
❯Schema
Main Job
Job Counter
Success?
Increase counter
Watchdog
Validate that
counter
increased
Action
Action when
not increased
© 2014, Pentaho. All Rights Reserved.
27 #PCM14
Design patterns: Auto-recovery
❯Auto-skip:
• use anything incremental
• Add incremental ID to source tables if missing
❯Auto-cleanup:
• increment run ID after successful job
• Remove run ID from target table at start of job
❯Database recovery:
• Job and transformation level transactions (PDI 5 EE)
© 2014, Pentaho. All Rights Reserved.
28 #PCM14
Fun stuff...
a short demo to wrap up
© 2014, Pentaho. All Rights Reserved.
29 #PCM14
Summary
To take away:
❯Best practices improve quality, simplify, save time
❯Use this presentation as a checklist
❯Do regular audits of your data integration work

More Related Content

Similar to PCM_PDI_BestPractices.pdf

Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
Achieve Internet
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
Achieve Internet
 
DrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every TimeDrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every Time
Pantheon
 
Postgres in production.2014
Postgres in production.2014Postgres in production.2014
Postgres in production.2014
EDB
 
ROMA NOVIKOV, BAQ, "Prometheus + grafana based monitoring"
ROMA NOVIKOV, BAQ, "Prometheus + grafana based monitoring"ROMA NOVIKOV, BAQ, "Prometheus + grafana based monitoring"
ROMA NOVIKOV, BAQ, "Prometheus + grafana based monitoring"
Dakiry
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
LetsConnect
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
EDB
 
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Fwdays
 
Improving Website Performance and Scalability with Memcached
Improving Website Performance and Scalability with MemcachedImproving Website Performance and Scalability with Memcached
Improving Website Performance and Scalability with Memcached
Acquia
 
The Technical Co-Founders Handbook
The Technical Co-Founders HandbookThe Technical Co-Founders Handbook
The Technical Co-Founders Handbook
Joseph K. Ziegler
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
DataWorks Summit
 
Database engineering
Database engineeringDatabase engineering
Database engineering
Laine Campbell
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
DataWorks Summit
 
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience
 
SANDcamp 2014 - A Perfect Launch, Every Time
SANDcamp 2014 - A Perfect Launch, Every TimeSANDcamp 2014 - A Perfect Launch, Every Time
SANDcamp 2014 - A Perfect Launch, Every Time
Jon Peck
 
Automated SDTM Creation and Discrepancy Detection Jobs: The Numbers Tell The ...
Automated SDTM Creation and Discrepancy Detection Jobs: The Numbers Tell The ...Automated SDTM Creation and Discrepancy Detection Jobs: The Numbers Tell The ...
Automated SDTM Creation and Discrepancy Detection Jobs: The Numbers Tell The ...
SGS
 
Soccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM ConnectionsSoccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM Connections
panagenda
 
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customersLunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Daniel Zivkovic
 
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
DataStax
 
2015 02 24 lmtv baselining
2015 02 24 lmtv baselining2015 02 24 lmtv baselining
2015 02 24 lmtv baselining
Tony Fortunato
 

Similar to PCM_PDI_BestPractices.pdf (20)

Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
 
DrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every TimeDrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every Time
 
Postgres in production.2014
Postgres in production.2014Postgres in production.2014
Postgres in production.2014
 
ROMA NOVIKOV, BAQ, "Prometheus + grafana based monitoring"
ROMA NOVIKOV, BAQ, "Prometheus + grafana based monitoring"ROMA NOVIKOV, BAQ, "Prometheus + grafana based monitoring"
ROMA NOVIKOV, BAQ, "Prometheus + grafana based monitoring"
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
 
Improving Website Performance and Scalability with Memcached
Improving Website Performance and Scalability with MemcachedImproving Website Performance and Scalability with Memcached
Improving Website Performance and Scalability with Memcached
 
The Technical Co-Founders Handbook
The Technical Co-Founders HandbookThe Technical Co-Founders Handbook
The Technical Co-Founders Handbook
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
 
Database engineering
Database engineeringDatabase engineering
Database engineering
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
 
SANDcamp 2014 - A Perfect Launch, Every Time
SANDcamp 2014 - A Perfect Launch, Every TimeSANDcamp 2014 - A Perfect Launch, Every Time
SANDcamp 2014 - A Perfect Launch, Every Time
 
Automated SDTM Creation and Discrepancy Detection Jobs: The Numbers Tell The ...
Automated SDTM Creation and Discrepancy Detection Jobs: The Numbers Tell The ...Automated SDTM Creation and Discrepancy Detection Jobs: The Numbers Tell The ...
Automated SDTM Creation and Discrepancy Detection Jobs: The Numbers Tell The ...
 
Soccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM ConnectionsSoccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM Connections
 
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customersLunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
 
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
 
2015 02 24 lmtv baselining
2015 02 24 lmtv baselining2015 02 24 lmtv baselining
2015 02 24 lmtv baselining
 

More from Valdir Adorni

Template_Faculdade_Apache Kafka.pdf
Template_Faculdade_Apache Kafka.pdfTemplate_Faculdade_Apache Kafka.pdf
Template_Faculdade_Apache Kafka.pdf
Valdir Adorni
 
Benefícios TCS ao Candidato.pdf
Benefícios TCS ao Candidato.pdfBenefícios TCS ao Candidato.pdf
Benefícios TCS ao Candidato.pdf
Valdir Adorni
 
Brazil survey result
Brazil survey resultBrazil survey result
Brazil survey result
Valdir Adorni
 
SRDF-legacy-2-site-Interfamily-Connectivity
SRDF-legacy-2-site-Interfamily-ConnectivitySRDF-legacy-2-site-Interfamily-Connectivity
SRDF-legacy-2-site-Interfamily-ConnectivityValdir Adorni
 
Clariion_PAE_EPOKA
Clariion_PAE_EPOKAClariion_PAE_EPOKA
Clariion_PAE_EPOKA
Valdir Adorni
 
dicionriopt_snia_web
dicionriopt_snia_webdicionriopt_snia_web
dicionriopt_snia_web
Valdir Adorni
 

More from Valdir Adorni (6)

Template_Faculdade_Apache Kafka.pdf
Template_Faculdade_Apache Kafka.pdfTemplate_Faculdade_Apache Kafka.pdf
Template_Faculdade_Apache Kafka.pdf
 
Benefícios TCS ao Candidato.pdf
Benefícios TCS ao Candidato.pdfBenefícios TCS ao Candidato.pdf
Benefícios TCS ao Candidato.pdf
 
Brazil survey result
Brazil survey resultBrazil survey result
Brazil survey result
 
SRDF-legacy-2-site-Interfamily-Connectivity
SRDF-legacy-2-site-Interfamily-ConnectivitySRDF-legacy-2-site-Interfamily-Connectivity
SRDF-legacy-2-site-Interfamily-Connectivity
 
Clariion_PAE_EPOKA
Clariion_PAE_EPOKAClariion_PAE_EPOKA
Clariion_PAE_EPOKA
 
dicionriopt_snia_web
dicionriopt_snia_webdicionriopt_snia_web
dicionriopt_snia_web
 

Recently uploaded

Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 

Recently uploaded (20)

Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 

PCM_PDI_BestPractices.pdf

  • 1. Pentaho Data Integration Best Practices Matt Casters Chief Data Integration, Kettle Founder #PCM14
  • 2. © 2014, Pentaho. All Rights Reserved. 2 #PCM14 About this session Today You Will Learn Best practices when working with Pentaho Data Integration Agenda ❯Working with PDI ❯Design patterns ❯Demo fun stuff
  • 3. © 2014, Pentaho. All Rights Reserved. 3 #PCM14 Working with PDI
  • 4. © 2014, Pentaho. All Rights Reserved. 4 #PCM14 Working with PDI: Naming ❯Provide meaningful names for steps and job entries ❯Do not hesitate to use special characters
  • 5. © 2014, Pentaho. All Rights Reserved. 5 #PCM14 Working with PDI: Naming ❯Avoid environment specific names Test Database France MySQL East Coast Cluster CRM WWW Cluster
  • 6. © 2014, Pentaho. All Rights Reserved. 6 #PCM14 Working with PDI: Naming ❯Keep your environment tidy • Folders can have sub-folders! ❯Use naming conventions for everything • Database tables and fields • Directories • Server names
  • 7. © 2014, Pentaho. All Rights Reserved. 7 #PCM14 Working with PDI: Naming ❯Use a corporate standard ❯Verify and enforce periodically ❯Use rules to validate repository imports • Database names • Notes and descriptions
  • 8. © 2014, Pentaho. All Rights Reserved. 8 #PCM14 Working with PDI: Naming ❯Use a corporate standards ❯Verify and enforce periodically ❯Use rules to validate repository imports • Database names • Notes and descriptions
  • 9. © 2014, Pentaho. All Rights Reserved. 9 #PCM14 Working with PDI: Tidy up! ❯Limit the number of steps or job entries
  • 10. © 2014, Pentaho. All Rights Reserved. 10 #PCM14 Working with PDI: Tidy up! ❯Enable grid size 32 or 16 ❯Prevents accidental move of step or entry
  • 11. © 2014, Pentaho. All Rights Reserved. 11 #PCM14 Working with PDI: Parameters ❯Explicit use of variables ❯Easier testing ❯Make re-use a breeze
  • 12. © 2014, Pentaho. All Rights Reserved. 12 #PCM14 Working with PDI: Variables ❯Environment specific : kettle.properties ❯Prefer ${SOLUTION_HOME} ❯Avoid ${Internal.Transformation.Filename.Directory} ❯Configure step copies with variables
  • 13. © 2014, Pentaho. All Rights Reserved. 13 #PCM14 Working with PDI: Logging ❯Log everything! ❯Measurement is management ❯Use the Pentaho audit mart ❯Learn about all the possible logging features
  • 14. © 2014, Pentaho. All Rights Reserved. 14 #PCM14 Working with PDI: Mappings ❯Mapping vs Simple Mapping step ❯Realize this is a macro ❯Use completely different field names ❯Avoid renaming or removing fields
  • 15. © 2014, Pentaho. All Rights Reserved. 15 #PCM14 Working with PDI: Metadata Injection ❯Avoid manual population of dialogs ❯Whenever you need dynamic ETL ❯5.1 supports data streaming ❯Example: • stage 50 different files with one transformation
  • 16. © 2014, Pentaho. All Rights Reserved. 16 #PCM14 Working with PDI: Performance ❯Transformations are networks ❯Network speed is limited to the slowest part ❯The slowest step is indicated while running in Spoon ❯Slow steps have a full input and empty output buffer
  • 17. © 2014, Pentaho. All Rights Reserved. 17 #PCM14 Working with PDI: Performance ❯First re-write, re-think, re-organize ❯Parallize work ❯End-to-end data pipe-lining ❯Do work where it's the fastest (ELT)
  • 18. © 2014, Pentaho. All Rights Reserved. 18 #PCM14 Working with PDI: Lifecycle ❯Automate export of repositories ❯Use import rules to validate quality ❯Always use version control for file based setup
  • 19. © 2014, Pentaho. All Rights Reserved. 19 #PCM14 Design patterns
  • 20. © 2014, Pentaho. All Rights Reserved. 20 #PCM14 Design patterns: Loops ❯Use the job or transformation executor steps ❯Much easier since version 5 ❯Demonstration: process lots of small files
  • 21. © 2014, Pentaho. All Rights Reserved. 21 #PCM14 DEMONSTRATION
  • 22. © 2014, Pentaho. All Rights Reserved. 22 #PCM14 Design patterns: Queues ❯Process buffer ❯Facilitates parallelism ❯Forces process logging best practice ❯Only way to process recurring files
  • 23. © 2014, Pentaho. All Rights Reserved. 23 #PCM14 Design patterns: Load balancing ❯On a step level: PDI EE 5.x built-in ❯Balance jobs and transformations with carte ❯Set up a carte cluster ❯Use a queue ❯Interrogate Carte web services
  • 24. © 2014, Pentaho. All Rights Reserved. 24 #PCM14 DEMONSTRATION
  • 25. © 2014, Pentaho. All Rights Reserved. 25 #PCM14 Design patterns: Watchdog ❯“Who watches the watchmen?” ❯Simple recipe: • On success increment a counter • Periodically verify that the counter is advancing • Take action when counter is not advancing
  • 26. © 2014, Pentaho. All Rights Reserved. 26 #PCM14 Design patterns: Watchdog ❯Schema Main Job Job Counter Success? Increase counter Watchdog Validate that counter increased Action Action when not increased
  • 27. © 2014, Pentaho. All Rights Reserved. 27 #PCM14 Design patterns: Auto-recovery ❯Auto-skip: • use anything incremental • Add incremental ID to source tables if missing ❯Auto-cleanup: • increment run ID after successful job • Remove run ID from target table at start of job ❯Database recovery: • Job and transformation level transactions (PDI 5 EE)
  • 28. © 2014, Pentaho. All Rights Reserved. 28 #PCM14 Fun stuff... a short demo to wrap up
  • 29. © 2014, Pentaho. All Rights Reserved. 29 #PCM14 Summary To take away: ❯Best practices improve quality, simplify, save time ❯Use this presentation as a checklist ❯Do regular audits of your data integration work