<ul>Pentaho Introduction </ul><ul>Matt Casters </ul>
Matt Casters <ul><li>Chief of Data Integration at Pentaho </li><ul><li>Lead Development
Project manager
Community liason </li></ul><li>Kettle Project Founder
Author of Pentaho Kettle Solutions </li><ul><li>Published by Wiley
650 pages </li></ul></ul>
Pentaho Data Integration for BI Business Intelligence! That's what we do.
Pentaho Data Integration – Kettle  K ettle E xtraction T ransportation T ransformation L oading E nvironment
Pentaho Data Integration – Extraction <ul><li>Extract data from : </li></ul><ul><ul><li>35+ database types </li><ul><li>My...
Oracle, SQL Server, etc </li></ul><li>Text files
XML files
XLS files
Xbase files (dBase, Foxpro, etc)
File systems information
Generated data
MS Access files
LDAP
Geo-data
Upcoming SlideShare
Loading in...5
×

Pentaho Data Integration Introduction

26,922

Published on

A gentle and short introduction into Pentaho Data Integration a.k.a. Kettle

Published in: Technology
4 Comments
26 Likes
Statistics
Notes
No Downloads
Views
Total Views
26,922
On Slideshare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
0
Comments
4
Likes
26
Embeds 0
No embeds

No notes for slide

Pentaho Data Integration Introduction

  1. 1. <ul>Pentaho Introduction </ul><ul>Matt Casters </ul>
  2. 2. Matt Casters <ul><li>Chief of Data Integration at Pentaho </li><ul><li>Lead Development
  3. 3. Project manager
  4. 4. Community liason </li></ul><li>Kettle Project Founder
  5. 5. Author of Pentaho Kettle Solutions </li><ul><li>Published by Wiley
  6. 6. 650 pages </li></ul></ul>
  7. 7. Pentaho Data Integration for BI Business Intelligence! That's what we do.
  8. 8. Pentaho Data Integration – Kettle K ettle E xtraction T ransportation T ransformation L oading E nvironment
  9. 9. Pentaho Data Integration – Extraction <ul><li>Extract data from : </li></ul><ul><ul><li>35+ database types </li><ul><li>MySQL, PostgreSQL, SQLite, ...
  10. 10. Oracle, SQL Server, etc </li></ul><li>Text files
  11. 11. XML files
  12. 12. XLS files
  13. 13. Xbase files (dBase, Foxpro, etc)
  14. 14. File systems information
  15. 15. Generated data
  16. 16. MS Access files
  17. 17. LDAP
  18. 18. Geo-data
  19. 19. ... </li></ul></ul>
  20. 20. Pentaho Data Integration – Transportation <ul><li>Transportation of data </li></ul><ul><ul><li>Engine based data transfer (no code generator)
  21. 21. Very flexible pathways: </li><ul><li>splitting
  22. 22. partitioning
  23. 23. merging
  24. 24. joining
  25. 25. duplicating
  26. 26. clustering (MPP) </li></ul></ul></ul>
  27. 27. Pentaho Data Integration – Transformation <ul><li>Flexibly transform data </li></ul><ul><ul><li>Looking up data </li><ul><li>databases
  28. 28. files
  29. 29. memory... </li></ul><li>Calculating
  30. 30. Scripting </li><ul><li>JavaScript, SQL, RegExp </li></ul><li>Splitting
  31. 31. Mapping
  32. 32. Selecting
  33. 33. Filtering
  34. 34. Pivotting ... </li></ul></ul>
  35. 35. Pentaho Data Integration – Loading <ul><li>Load data into a target format </li></ul><ul><ul><li>Database loads
  36. 36. Data warehouse population
  37. 37. Partitioned loading
  38. 38. Bulk loading
  39. 39. Parallel loading
  40. 40. Clustering </li></ul></ul>
  41. 41. Pentaho Data Integration – Environment <ul><li>Full GUI called “Spoon” to edit every option in Kettle </li></ul><ul><ul><li>Drag & Drop
  42. 42. Debugger
  43. 43. Rich GUI </li></ul></ul><ul><li>Command line tools </li></ul><ul><ul><li>execute jobs
  44. 44. execute transformations </li></ul></ul><ul><li>Web server </li></ul><ul><ul><li>clustering
  45. 45. remote execution </li></ul></ul><ul><li>Programming API for Java
  46. 46. Plugin eco-system
  47. 47. ... </li></ul>
  48. 48. Pentaho Data Integration – Community <ul><li>Paying Pentaho customers
  49. 49. Large and small corporations </li></ul><ul><ul><li>All possible sectors </li></ul></ul><ul><li>Lone rangers & Hobbiests
  50. 50. All regions on Earth
  51. 51. Meet on our Forum : +40,000 posts in 10,000 threads in 4 years
  52. 52. Use our JIRA case tracking systems
  53. 53. Download more than 10,000 copies of Kettle per month </li></ul>http://www.ohloh.net/projects/3624?p=Kettle http://www.softpedia.com/progClean/Kettle-Clean-80094.html
  54. 54. Pentaho Data Integration – use-cases <ul><li>Load data from text files and store it into a database
  55. 55. Export data from database to text-file or more other databases
  56. 56. Data migration between database applications
  57. 57. Exploration of data in existing databases (tables, views, etc.)
  58. 58. Information improvement using lookups
  59. 59. Data cleaning
  60. 60. Application integration
  61. 61. Data warehouse population
  62. 62. Application integration
  63. 63. Report data generation
  64. 64. ... </li></ul>
  65. 65. Pentaho Data Integration – Adoption <ul><li>Wide range of production deployments </li></ul><ul><ul><li>Small and medium-sized companies
  66. 66. Large enterprises </li></ul></ul><ul><li>Rapid product evolution </li></ul><ul><ul><li>Driven by Pentaho investment
  67. 67. Includes significant community contributions </li><ul><li>“ Contribution-friendly” architecture
  68. 68. Natural fit for additional data sources, targets and transformations </li></ul></ul></ul>
  69. 69. Pentaho Data Integration – Adoption <ul><li>Most deployed open source data integration solution. Independent study by Mark Madsen of Third Nature and the BeyeNETWORK
  70. 70. Download free study at pentaho.com </li></ul>
  71. 71. <ul>Big Data </ul>
  72. 72. Pentaho – Big Data <ul><li>Enabling BI on top of big data
  73. 73. From Tera-bytes to Peta-bytes
  74. 74. Big Data stored in Hadoop (MapReduce) / HDFS / Hive
  75. 75. Reduces complexity for developers
  76. 76. Leverages standard components like Pentaho Data Integration
  77. 77. Drag & drop creation of map and reduce transformations
  78. 78. Cooperation with Apache
  79. 79. Presentation + Demo : http://vimeo.com/14641559 </li></ul>
  80. 80. Pentaho Data Integration – Links <ul><li>Homepage: http://kettle.pentaho.org
  81. 81. Forum: http://forums.pentaho.org/forumdisplay.php?f=69
  82. 82. Case tracker: http://jira.pentaho.org/browse/PDI
  83. 83. Continuous Integration Server: http://ci.pentaho.com/job/Kettle
  84. 84. Wiki : http://wiki.pentaho.org/ display/EAI
  85. 85. IRC Channel: ##pentaho (on Freenode)
  86. 86. Mailing list: http://groups.google.com/group/kettle-developers
  87. 87. My blog: http://www.ibridge.be
  88. 88. My coordinates: mcasters at pentaho dot org </li></ul>
  89. 89. Pentaho Books
  90. 90. Q&A <ul>Thank you for listening! </ul>

×