Pentaho Data Integration in Data Warehouse.
Open-source Pentaho provides business intelligence (BI) and data warehousing solutions at a fraction of the cost of proprietary solutions.
Pentaho Data Integration (PDI) provides the Extract, Transform, and Load (ETL) capabilities that facilitates the process of capturing, cleansing, and storing data using a uniform and consistent format that is accessible and relevant to end users and IoT technologies
By Muhammad Ayaz Farid Shah.
03446940736.
MSCS.
2. What is business intelligence.
▪ Business intelligence is the process of transforming the business data
into information/ knowledge using computer-based techniques thus
enabling the users to take effective fact-based decision.
3. Business Intelligence: Need of time.
▪ What would be my insightful decision based on ocean of data? How
quick I take decision based on that huge data? End to End BI
Solution
▪ How can I integrate heterogeneous data feeds to common platform
to analyze ELT
▪ How to interpret raw data in best possible manner? Data
discovery.
▪ Can I predict the future of my business trajectory? ML, Predictive
▪ What is the best way to share the data? visualization Reporting
▪ How can I monitor the dynamics of changing trends? Dashboard
4. BI essentially intended for the
following 3 things.
▪ Precise and concise interpretation of data.
▪ Identify new opportunities.
▪ Implementing an effective strategy to have competitive edge.
5. BI Existing Solutions.
Large BIVenders New Breed
IBM Pentaho
SAP QlikTech
Microsoft Logi
Oracle Alteryx
Data Integration Analytics
Informatica Rapidminer
6. Why Pentaho ?
▪ One step solution for all the business analytics need.
▪ Low integration time and infrastructure cost.
▪ Have community support .
▪ Easily Scalable.
▪ Virtually unlimited visualization and data source.
▪ And much more.
7. About Pentaho
▪ Pentaho is founded in 2004 at Orlando, USA.
▪ Recognized leader in business analytics and data integration.
▪ Subscription based business model.
▪ Achieved critical mass:
Over 1200 commercial customers
Over 10,000 production deployments.
Over 185 countries.
Download Pentaho BI suit form website.
www.pentaho.com or www.sourceforge.net
8. What is Pentaho and what is it?
▪ It is a business intelligence system.
It offers
▪ Analytics
▪ Visual data integration
▪ Reports
▪ Dashboards
▪ Data mining
▪ ELT
11. Data integration Challenges
▪ Data is everywhere.
▪ Data is inconsistent.
Records are different in each system.
▪ Performance issues.
Running queries to summarize data take long period.
▪ Data is never all in DataWarehouse.
Excel Sheets, New application.
12. What is Kettle?
▪ Batch data integration and processing tool written in java.
▪ Exists to Retrieve, Process and Load data.
▪ ETL ( Extract,Transform, and Load).
▪ Extracts data form various data sauces.
▪ Transform data
▪ From being optimized for transaction.
▪ To being optimized for reporting and analysis .
▪ Synchronizes the data coming form different databases.
▪ Data cleanness to remove errors.
▪ Load data into data warehouse.
13. Why do I need it ETL?
▪ ETL tool save time and money when developing a data warehouse by
removing the need for hand coding.
▪ It is very difficult for database administrators to connect between different
brands of databases without using an external tool.
▪ ETL is heart and soul of business intelligence(BI).
▪ Provide a graphical environment for data integration, migration, and
synchronization.
▪ Drag and drop graphic components to execute the desired task, saving
time and effort.
14. ETL
▪ The set of criteria that were used for the ETL tools comparison were divided into seven categories.
▪ TCO (Total cost of ownership).
Open-source products are typically free to use, but support, training, and consulting are what companies
need to pay for.
▪ Risk. ( Going over budget, Over schedule, Not completing the requirements of the customers)
▪ Ease of use. (Having a good GUI also reduces the time to train and use the tool)
▪ Support. (Nowadays all software products have support.)
▪ Speed. (Pentaho Kettle is faster)
▪ Data Quality. (Data Quality is fast, has features in its GUI)
▪ Monitoring. (Pentaho Kettle has practical monitoring tools. )
▪ Connectivity. (ETL tools transfer data to a very wide variety of Database systems, XML, and web
services.)
15. What is Kettle good for ?
▪ Loading data to RDBMS.
▪ Syncing two data sources.
▪ Processing data retrieved form multiple sources and pushed to multiple
destinations.
▪ Graphical manipulation of data.
▪ It has a very easy to use GUI.
17. Larger picture
Kettle 10 years old.
Joined Pentaho about 7 years ago.
Open source, at version 4.4
BI suite
▪ Reporting
▪ Analytics
▪ Dashboards
▪ ML (Machine Learning)
18. Kettle Tools
▪ Spoon ( Allows you to design transformations and jobs that can be run with
the Kettle tools)
▪ Kitchen ( Execute jobs designed by Spoon in XML or database repository)
▪ Pan (A program to execute transformations designed by spoon in XML or
database repository)
19. Most common uses of Kettle
▪ Data warehouse and DataMart loads.
▪ Data integration. (Changing input to desired output)
▪ Data cleansing.
▪ Data migration.
▪ Data Export.
▪ Etc.
20. Pentaho Data integration
▪ Transportation of data.
▪ Splitting
▪ Partitioning
▪ Merging
▪ Joining
▪ Duplicating
22. Steps for downloading and installing
Pentaho
▪ Step 1: Download Java from https://download.oracle.com
▪ Step 2: Download Pentaho from https://sourceforge.net
▪ Step 3: Create a new folder in C: Drive and give the same name as the
version of Pentaho.
▪ Step 4: Extract the Pentaho in this new folder
▪ Step 5: Now from MY COMPUTER -> Properties -> Advanced s stem
settings -> EnvironmentVariables -> New ->Variable name : JAVA_HOME
▪ Step 6: Check the JRE in CMD by typing echo %JAVA_HOME%
▪ Step 7: From Pentaho folder, run the spoon.bat as Administrator.
23. Step 1: Download Java
▪ Download Java from https://download.oracle.com
24. Step 2: Download pentaho
▪ Download pentaho from https://sourceforge.net
25. Step 3: Create a new folder
▪ Create a new folder in C: Drive and give the same name as the version of Pentaho.
26. Step 4: Extract the Pentaho
▪ Extract the Pentaho in this new folder
27. Step 5: Environment variables
▪ Now from MY COMPUTER -> Properties -> Advanced system settings -> Environment
Variables -> New ->Variable name : JAVA_HOME
28. Step 6: Check the JRE
▪ Check the JRE in CMD by typing echo %JAVA_HOME%
29. Step 7: Run Pentaho
▪ From Pentaho folder, run the spoon.bat as Administrator.