Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)

7,050 views

Published on

Presentation for the 2013 NLUUG seminar, november 21 2013

Published in: Technology

Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)

  1. 1. Moving and Transforming Data with Pentaho Data Integration a.k.a. KETTLE
  2. 2. Welcome! • Software engineer • rbouman@pentaho.com
  3. 3. Pentaho • Business Intelligence & Analytics • Full stack • Open core – GPLv2, Apache 2.0 – Enterprise and OEM licenses • Java-based • Web front-ends
  4. 4. The Pentaho Stack • • • • • • • • • • Data Integration / ETL Big Data / NoSQL Data Modeling Reporting OLAP / Analysis Data Visualization Dashboarding Data Mining / Predictive Analysis (Mobile) Delivery Bursting, Scheduling, Self Service
  5. 5. Full Stack BI & BA Sources Reports OLAP Blending Visualization Data Warehouse ETL T T T T Models D D Instant Analytics Dashboards D F D Mining D
  6. 6. Information: A well-prepared, well-presented meal
  7. 7. Extraction: Catching the right data and hauling it in
  8. 8. Transformation Disgusting job of validating & cleaning data
  9. 9. Loading: Store manageable units for later use
  10. 10. Loading: Store manageable units for later use
  11. 11. Pentaho Data Integration • Kettle – Extract, Transform, Load – Blending – Instant Analytics • Changing input to desired output
  12. 12. Kettle Architecture Data Integration Engine Job Engine Call Job Transformation Engine Transformation Tools and Utilities Launch: Kitchen, Carte Launch: Pan Develop: Spoon Repository (RDBMS) .kjb .ktr
  13. 13. Jobs & Transformations • Jobs – Synchronous workflow of job entries (tasks) • Transformations – Stepwise parallel & asynchronous processing of a recordstream • Distributed
  14. 14. Sources and Destinations • • • • • • • RDBMS (> 40) NoSQL / Big Data OLAP (Mondrian, Palo, XML/A) Web (REST, SOAP, XML, JSON .) Files (CSV, Fixed, Excel …) ERP (SAP, Salesforce, OpenERP) ...way Too Many To Mention™!
  15. 15. Transformations • • • • • • • • String & Date manipulation Data Validation / Business Rules Lookup / Join Calculation, Statistics Cryptography Decisions, Flow control Scripting ...> 150, excluding plugins
  16. 16. Demo • NLUUG Program Webpage Input Output dim_room ETL dim_track fact_talk dim_speaker dim_company
  17. 17. Demo Transformation
  18. 18. Business Model • Open core – Majority – Give is open source and take • Enterprise Edition – Extra features, Support, Services
  19. 19. Community • Plugins – Pentaho Marketplace • Code contributions • Applications & Solutions • Tutorials, Support
  20. 20. Marketplace
  21. 21. Community Meetups
  22. 22. Pentaho Software • Community Edition Binaries – community.pentaho.com (release) ci.pentaho.com (development) • Source code – – github.com/pentaho • Enterprise Edition Evaluation – pentaho.com/testdrive
  23. 23. Resources • Online documentation – infocenter.pentaho.com – wiki.pentaho.com (manual) (community wiki) • Issue tracker – jira.pentaho.com • Community Support – forums.pentaho.com – Freenode IRC: ##pentaho
  24. 24. Books
  25. 25. Thank You Join the conversation. You can find us on: blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics 25

×