SlideShare a Scribd company logo
Welcome to the webinar on

Business Intelligence and Big Data Analytics
with Pentaho
Presented by

&

www.compulinkacademy.com

www.ellicium.com
Contents

1

An Introduction to Pentaho

2

Overview of Pentaho technology stack

3

Pentaho ETL

4

Data Exploration using Pentaho

5

Big Data with Pentaho

6

Getting started with Pentaho
Welcome to Open source world
Open-source software is computer software with its source
code made available and licensed with a license in which
the copyright holder provides the rights to study, change
and distribute the software to anyone and for any purpose.
Open-source software is very often developed in a public,
collaborative manner.
Reporting
•
•
•
•

Analysis
•
•
•
•

Actuate BIRT
Jasper Reports
Pentaho
Open Reports

ETL Tools
•
•
•
•

JPivot
Mondrian/
Pentaho
PALO

You already use it!!!
•
•

• Jasper
• Pentaho
• SpagoBI

Napster

•

Amazon reviews,

•

YouTube

Data Mining /
Statistics
• Weka /
Pentaho
• R

BI Platforms

Clover ETL
Enhydra Octopus
Talend
Kettle / Pentaho

Linux

Databases
•
•
•
•

Derby
Ingres
MySQL
PostgreSQL

What it means for BI and analytics
A report by the Standish Group states that adoption of open-source software models has resulted

in savings of about $60 billion per year to consumers.
Welcome to Pentaho!!!!
•Commercial open source alternative for business intelligence (BI) Founded
in 2004 by five founders
•Management - proven BI and open source veterans from Business Objects,
Cognos, Hyperion, JBoss, Oracle, Red Hat, SAS
• Pioneer in Commercial open source BI Large reference able customer
base, wide range of BI/DW deployments !

•It offers a suite of open source Business Intelligence (BI) products called
Pentaho Business Analytics providing data integration, OLAP services,
reporting, dashboarding, data mining and ETL capabilities
Pentaho customers
What analysts are saying about Pentaho
Pentaho is the only open source company featured in Ovum's Ovum Decision Matrix
for Business Intelligence. "Pentaho is one of the few vendors that provide a direct
integration into Hadoop and NoSQL databases, allowing users to analyse and visualize
NoSQL data alongside traditional data sources"
Forrester recognized Pentaho as the sole "Strong Performer“. "Pentaho provides an
impressive Hadoop data integration tool." Pentaho was cited for its rich functionality
and extensive integration with Apache Hadoop, and for providing certified integration
with distributions from Cloudera, EMC Greenplum and Hortonworks.

Passionned's Business Intelligence Tools Survey highlighted the completeness of the
Pentaho product suite compared to other vendors, as well as Pentaho's significant
cost-saving by pricing products per deployment, not per-user. Pentaho earned
recommendation as a complete enterprise solution.
Pentaho was included in Gartner's Magic Quadrant for Business Intelligence Platforms.
The report, published, offers the analyst firm's insights on business intelligence
vendors who meet an inclusion threshold based on annual sales, capabilities, and
customer survey responses.
Pentaho Licensing
The current version of the Pentaho BI Platform will be distributed under
the terms of the GNU General Public License (GPL).
Under the GPL, if you intend to distribute GPL-licensed code to your
customers as part of other software you have created, you may, depending
on the software you have created, be required to GPL that code.
Companies that wish to distribute the Pentaho BI Platform have the option
of purchasing a commercial license from Pentaho Corporation. A
commercial license would exempt you from GPL obligations.
The GNU General Public License (GPL) is the most widely used free
software license, which guarantees end users the freedoms to use, study,
share and modify the software. Derived works can only be distributed
under the same license terms.
Pentaho BI Enterprise Edition
Overview of Pentaho Stack
Pentaho BI Stack
Delivering Value in Different Deployment Models
Coexistence with traditional proprietary BI
•Minimize risk/exposure with consolidated vendors
•Prove technology and services internally
•Explore the relationship benefits of a transparent model without
software lock-in
Co-deployment with traditional proprietary BI
•Leverage existing investments
•Pragmatically “use what works”
•Reduce overall TCO by incorporating commercial open source
Replacement of traditional proprietary BI
•Upgrade BI capabilities
•Reduce TCO
•Capitalize on the opportunity of a “disruption” (software upgrade,
license change, etc.) in your BI environment
Pentaho ETL
Pentaho Kettle ETL
•Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible
for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently
used in data warehouses environments, PDI can also be used for other purposes:
•Migrating data between applications or databases
•Exporting data from databases to flat files
•Loading data massively into databases
•Data cleansing
•Integrating applications
Pentaho Kettle ETL

Big Data Step
Output Step

Input Step

Transformation
Step
Lookup Step

Join Step

Dw Step
Job Step

Mapping Step

www.compulinkacademy.com
Pentaho Kettle ETL
Spoon
• GUI that allows you to design transformations and jobs
•Transformations and Jobs can describe themselves using an XML file or
can be put in a Kettle database repository.
•Spoon is available as executable script and batch file to make use of tool
in heterogeneous environment.
Pan

•A program to execute transformations designed by Spoon in XML or database
repository.
•Transformations are scheduled in batch mode to be run automatically at regular
intervals
Carte
•Simple web server to execute transformations and jobs remotely.
•Accept an XML that contains transformation to execute and the execution
configuration.
•Allows to remotely monitor, start and stop the transformations and jobs
Pentaho Kettle ETL
Pentaho Kettle ETL
Data Exploration using Pentaho
Pentaho Dashboards
Many ways to design Pentaho dashboards
Pentaho Dashboards
What is CDE ?
* CDE is one of the plug-in for Pentaho BI Server, contributed and maintained by Pentaho Partner
webdetails.
* We create dashboards using this tool.
* Community Dashboard Editor (CDE) was born to simplify the creation, edition and rendering
processes of the Dashboards.
* CDE is a very powerful and complete tool, combining front end with data sources and custom
components in a seamless way.
CDE has 3 major components
They are.
* Layout
* Components
* Data Sources.CDE has developed based on MVC-2 architecture of Advanced Java
Overview of Pentaho CDE
Exploring Big data with Pentaho
Main Big Data Technologies
Hadoop

NoSQL Databases

Analytic RDBMS

•

•

•

•

•

Low cost, reliable scaleout architecture
Distributed computing
Proven success in
Fortune 500 companies
Exploding interest

Hadoop

•
•

Huge horizontal scaling
and high availability
Highly optimized for
retrieval and appending
Types
•
•
•

Document stores
Key Value stores
Graph databases

NoSQL Databases

•

Optimized for bulk-load
and fast aggregate query
workloads
Types
•
•
•

Column-oriented
MPP
In-memory

Analytic Databases
What makes Pentaho different for big data
Ingestion / Manipulation
/ Integration

Scheduling
Modeling

Would you rather do this?

… OR THIS?
Pentaho Big Data Integration
Pentaho is integrated with Hadoop at many levels

•Traditional ETL - Graphical designer to visually build transformations that read and write data
in Hadoop from/to anywhere and transform the data on the way. No coding required
•HBase Read/Write
•Hive, Hive2 SQL Query and Write
•Impala SQL Query and Write
•Support for Avro file format and snappy compression
•Data Orchestration - Graphical designer to visually build and schedule jobs that orchestrate
processing, data movement and most aspects of operationalizing your data preparation.
•HDFS Copy files
•Map Reduce Job Execution
•Pig Script Execution
•Amazon EMR Job Execution
•Oozie integration
•Sqoop Import/Export
•Pentaho MapReduce Execution
Pentaho Big Data Integration
•Pentaho MapReduce - Graphical designer to visually build MapReduce jobs and run
them in cluster. With a simple, point-and-click alternative to writing Hadoop
MapReduce programs in Java or Pig, Pentaho exposes a familiar ETL-style user
interface.
•Traditional Reporting - All data sources supported above can be used directly or
blended with other data to drive our pixel perfect reporting engine. The reports can
be secured, parameterized and published to the web. The reports can be mashed up
with other pentaho visualizations to create dashboards.
•Web Based Interactive Reporting - Pentaho's Metadata layer leverages data stored in
Hive, Hive2 and Impala for WYSIWYG, interactive, self-service reporting.
•Pentaho Analyzer - Leverage your data stored Impala or Hive2 for interactive visual
analysis with drill through, lasso filtering, zooming, and attribute highlighting for
greater insight.
Getting started with Pentaho
Getting started with Pentaho
•Download Pentaho from http://community.pentaho.com/
•Download MySQL from
http://dev.mysql.com/downloads/mysql/
• Download CDE from www.webdetails.pt/ctools/cde.html
Read installation instructions from following blogs:
•http://pentaho-bi-suite.blogspot.in/2013/04/installation-ofpentaho-bi-server.html
• We have a Pentaho installation guide available. Please request
for guide at: info@ellicium.com
Thank you !!!
Contact us for customized Pentaho
training on
info@compulinkacademy.com
info@ellicium.com
Or Call Sameer on +91-8793334411

More Related Content

What's hot

Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
Jos van Dongen
 
Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BIEdureka!
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLJonathan Levin
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
Dr Anjan Krishnamurthy
 
Pentaho technical whitepaper-1-6
Pentaho technical whitepaper-1-6Pentaho technical whitepaper-1-6
Pentaho technical whitepaper-1-6skonda
 
Introduction To Pentaho Analysis
Introduction To Pentaho AnalysisIntroduction To Pentaho Analysis
Introduction To Pentaho Analysis
pentaho Content
 
Pentaho Partner Program Info
Pentaho Partner Program InfoPentaho Partner Program Info
Pentaho Partner Program Info
Sharmila Wijeyakumar
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteria
Asis Mohanty
 
Pentaho interview question and answers
Pentaho interview question and answersPentaho interview question and answers
Pentaho interview question and answers
enrollmy training
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
Alex Meadows
 
Informatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools ComparisonInformatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools Comparison
Roberto Espinosa
 
Ikenstudiolive
IkenstudioliveIkenstudiolive
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
snehal parikh
 
Maharshi_Amin_416
Maharshi_Amin_416Maharshi_Amin_416
Maharshi_Amin_416mamin1411
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
SpagoWorld
 
Troubleshooting Plan Changes with Query Store in SQL Server 2016
Troubleshooting Plan Changes with Query Store in SQL Server 2016Troubleshooting Plan Changes with Query Store in SQL Server 2016
Troubleshooting Plan Changes with Query Store in SQL Server 2016
Embarcadero Technologies
 
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
Gina Pabalan
 

What's hot (20)

Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Pentaho etl-tool
Pentaho etl-toolPentaho etl-tool
Pentaho etl-tool
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BI
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETL
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Pentaho technical whitepaper-1-6
Pentaho technical whitepaper-1-6Pentaho technical whitepaper-1-6
Pentaho technical whitepaper-1-6
 
Introduction To Pentaho Analysis
Introduction To Pentaho AnalysisIntroduction To Pentaho Analysis
Introduction To Pentaho Analysis
 
Pentaho Partner Program Info
Pentaho Partner Program InfoPentaho Partner Program Info
Pentaho Partner Program Info
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteria
 
Pentaho interview question and answers
Pentaho interview question and answersPentaho interview question and answers
Pentaho interview question and answers
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Informatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools ComparisonInformatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools Comparison
 
Ikenstudiolive
IkenstudioliveIkenstudiolive
Ikenstudiolive
 
ETL
ETLETL
ETL
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Maharshi_Amin_416
Maharshi_Amin_416Maharshi_Amin_416
Maharshi_Amin_416
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
 
Troubleshooting Plan Changes with Query Store in SQL Server 2016
Troubleshooting Plan Changes with Query Store in SQL Server 2016Troubleshooting Plan Changes with Query Store in SQL Server 2016
Troubleshooting Plan Changes with Query Store in SQL Server 2016
 
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
 

Viewers also liked

Advanced ETL2 Pentaho
Advanced ETL2  Pentaho Advanced ETL2  Pentaho
Advanced ETL2 Pentaho Sunny U Okoro
 
Jenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker EcosystemJenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker Ecosystem
Mario IC
 
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyClustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Giovanni Toraldo
 
Scaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and KubernetesScaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and Kubernetes
Carlos Sanchez
 
Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho valex_haro
 
NGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application DeliveryNGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application Delivery
Ashnikbiz
 
Introduction to docker swarm
Introduction to docker swarmIntroduction to docker swarm
Introduction to docker swarm
Walid Ashraf
 
Migración de datos con OpenERP-Kettle
Migración de datos con OpenERP-KettleMigración de datos con OpenERP-Kettle
Migración de datos con OpenERP-Kettle
raimonesteve
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
Ashnikbiz
 
Docker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - ComposeDocker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - Compose
Mario IC
 
Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5
IndicThreads
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Puppet
 
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, RegistryDocker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Mario IC
 
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Carlos Sanchez
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and Docker
Wellington Marinho
 
Docker swarm introduction
Docker swarm introductionDocker swarm introduction
Docker swarm introduction
Evan Lin
 
Load Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXLoad Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINX
NGINX, Inc.
 
Continuous Integration (Jenkins/Hudson)
Continuous Integration (Jenkins/Hudson)Continuous Integration (Jenkins/Hudson)
Continuous Integration (Jenkins/Hudson)Dennys Hsieh
 

Viewers also liked (20)

Advanced ETL2 Pentaho
Advanced ETL2  Pentaho Advanced ETL2  Pentaho
Advanced ETL2 Pentaho
 
Tao zhang
Tao zhangTao zhang
Tao zhang
 
Jenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker EcosystemJenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker Ecosystem
 
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyClustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
 
Scaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and KubernetesScaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and Kubernetes
 
Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho
 
NGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application DeliveryNGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application Delivery
 
Introduction to docker swarm
Introduction to docker swarmIntroduction to docker swarm
Introduction to docker swarm
 
Migración de datos con OpenERP-Kettle
Migración de datos con OpenERP-KettleMigración de datos con OpenERP-Kettle
Migración de datos con OpenERP-Kettle
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
Docker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - ComposeDocker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - Compose
 
Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
 
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, RegistryDocker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and Docker
 
Docker swarm introduction
Docker swarm introductionDocker swarm introduction
Docker swarm introduction
 
Load Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXLoad Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINX
 
Continuous Integration (Jenkins/Hudson)
Continuous Integration (Jenkins/Hudson)Continuous Integration (Jenkins/Hudson)
Continuous Integration (Jenkins/Hudson)
 

Similar to Business Intelligence and Big Data Analytics with Pentaho

Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
DataminingTools Inc
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011
Datalytics
 
Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting finalSkills Matter
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
Sri Ambati
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
SpagoWorld
 
Getting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning GuideGetting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning Guide
Intel IT Center
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
OW2
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
DataWorks Summit
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
Querona Presentation 2018
Querona Presentation 2018Querona Presentation 2018
Querona Presentation 2018
Synergo!
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Kinetica
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
Mobcoder
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
Mark Kromer
 
Eligotech presents @ Data Donderdag on 24 April 2014
Eligotech presents @ Data Donderdag on 24 April 2014Eligotech presents @ Data Donderdag on 24 April 2014
Eligotech presents @ Data Donderdag on 24 April 2014
Paul Broekhoven
 
Business Intelligence Tool Jaspersoft
Business Intelligence Tool JaspersoftBusiness Intelligence Tool Jaspersoft
Business Intelligence Tool Jaspersoft
Asgar Hussain Inamdar
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_SuiteRobin Fong 方俊强
 

Similar to Business Intelligence and Big Data Analytics with Pentaho (20)

Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011
 
Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting final
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
 
Getting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning GuideGetting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning Guide
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Querona Presentation 2018
Querona Presentation 2018Querona Presentation 2018
Querona Presentation 2018
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Eligotech presents @ Data Donderdag on 24 April 2014
Eligotech presents @ Data Donderdag on 24 April 2014Eligotech presents @ Data Donderdag on 24 April 2014
Eligotech presents @ Data Donderdag on 24 April 2014
 
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - PentahoBig Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
 
Business Intelligence Tool Jaspersoft
Business Intelligence Tool JaspersoftBusiness Intelligence Tool Jaspersoft
Business Intelligence Tool Jaspersoft
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 

More from Uday Kothari

Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune
Uday Kothari
 
MoSync Cross Platform mobile app development
MoSync  Cross Platform mobile app developmentMoSync  Cross Platform mobile app development
MoSync Cross Platform mobile app development
Uday Kothari
 
Cross platform mobile app development tools review
Cross platform mobile app development tools reviewCross platform mobile app development tools review
Cross platform mobile app development tools review
Uday Kothari
 
BI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikViewBI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikView
Uday Kothari
 
The art technique of data visualization
The art  technique of data visualizationThe art  technique of data visualization
The art technique of data visualization
Uday Kothari
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
Uday Kothari
 
Innovative Internet & Digital marketing
 Innovative Internet & Digital marketing  Innovative Internet & Digital marketing
Innovative Internet & Digital marketing
Uday Kothari
 

More from Uday Kothari (7)

Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune
 
MoSync Cross Platform mobile app development
MoSync  Cross Platform mobile app developmentMoSync  Cross Platform mobile app development
MoSync Cross Platform mobile app development
 
Cross platform mobile app development tools review
Cross platform mobile app development tools reviewCross platform mobile app development tools review
Cross platform mobile app development tools review
 
BI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikViewBI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikView
 
The art technique of data visualization
The art  technique of data visualizationThe art  technique of data visualization
The art technique of data visualization
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
 
Innovative Internet & Digital marketing
 Innovative Internet & Digital marketing  Innovative Internet & Digital marketing
Innovative Internet & Digital marketing
 

Recently uploaded

Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 

Recently uploaded (20)

Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

Business Intelligence and Big Data Analytics with Pentaho

  • 1. Welcome to the webinar on Business Intelligence and Big Data Analytics with Pentaho Presented by & www.compulinkacademy.com www.ellicium.com
  • 2. Contents 1 An Introduction to Pentaho 2 Overview of Pentaho technology stack 3 Pentaho ETL 4 Data Exploration using Pentaho 5 Big Data with Pentaho 6 Getting started with Pentaho
  • 3. Welcome to Open source world Open-source software is computer software with its source code made available and licensed with a license in which the copyright holder provides the rights to study, change and distribute the software to anyone and for any purpose. Open-source software is very often developed in a public, collaborative manner. Reporting • • • • Analysis • • • • Actuate BIRT Jasper Reports Pentaho Open Reports ETL Tools • • • • JPivot Mondrian/ Pentaho PALO You already use it!!! • • • Jasper • Pentaho • SpagoBI Napster • Amazon reviews, • YouTube Data Mining / Statistics • Weka / Pentaho • R BI Platforms Clover ETL Enhydra Octopus Talend Kettle / Pentaho Linux Databases • • • • Derby Ingres MySQL PostgreSQL What it means for BI and analytics A report by the Standish Group states that adoption of open-source software models has resulted in savings of about $60 billion per year to consumers.
  • 4. Welcome to Pentaho!!!! •Commercial open source alternative for business intelligence (BI) Founded in 2004 by five founders •Management - proven BI and open source veterans from Business Objects, Cognos, Hyperion, JBoss, Oracle, Red Hat, SAS • Pioneer in Commercial open source BI Large reference able customer base, wide range of BI/DW deployments ! •It offers a suite of open source Business Intelligence (BI) products called Pentaho Business Analytics providing data integration, OLAP services, reporting, dashboarding, data mining and ETL capabilities
  • 6. What analysts are saying about Pentaho Pentaho is the only open source company featured in Ovum's Ovum Decision Matrix for Business Intelligence. "Pentaho is one of the few vendors that provide a direct integration into Hadoop and NoSQL databases, allowing users to analyse and visualize NoSQL data alongside traditional data sources" Forrester recognized Pentaho as the sole "Strong Performer“. "Pentaho provides an impressive Hadoop data integration tool." Pentaho was cited for its rich functionality and extensive integration with Apache Hadoop, and for providing certified integration with distributions from Cloudera, EMC Greenplum and Hortonworks. Passionned's Business Intelligence Tools Survey highlighted the completeness of the Pentaho product suite compared to other vendors, as well as Pentaho's significant cost-saving by pricing products per deployment, not per-user. Pentaho earned recommendation as a complete enterprise solution. Pentaho was included in Gartner's Magic Quadrant for Business Intelligence Platforms. The report, published, offers the analyst firm's insights on business intelligence vendors who meet an inclusion threshold based on annual sales, capabilities, and customer survey responses.
  • 7. Pentaho Licensing The current version of the Pentaho BI Platform will be distributed under the terms of the GNU General Public License (GPL). Under the GPL, if you intend to distribute GPL-licensed code to your customers as part of other software you have created, you may, depending on the software you have created, be required to GPL that code. Companies that wish to distribute the Pentaho BI Platform have the option of purchasing a commercial license from Pentaho Corporation. A commercial license would exempt you from GPL obligations. The GNU General Public License (GPL) is the most widely used free software license, which guarantees end users the freedoms to use, study, share and modify the software. Derived works can only be distributed under the same license terms.
  • 11. Delivering Value in Different Deployment Models Coexistence with traditional proprietary BI •Minimize risk/exposure with consolidated vendors •Prove technology and services internally •Explore the relationship benefits of a transparent model without software lock-in Co-deployment with traditional proprietary BI •Leverage existing investments •Pragmatically “use what works” •Reduce overall TCO by incorporating commercial open source Replacement of traditional proprietary BI •Upgrade BI capabilities •Reduce TCO •Capitalize on the opportunity of a “disruption” (software upgrade, license change, etc.) in your BI environment
  • 13. Pentaho Kettle ETL •Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes: •Migrating data between applications or databases •Exporting data from databases to flat files •Loading data massively into databases •Data cleansing •Integrating applications
  • 14. Pentaho Kettle ETL Big Data Step Output Step Input Step Transformation Step Lookup Step Join Step Dw Step Job Step Mapping Step www.compulinkacademy.com
  • 15. Pentaho Kettle ETL Spoon • GUI that allows you to design transformations and jobs •Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. •Spoon is available as executable script and batch file to make use of tool in heterogeneous environment. Pan •A program to execute transformations designed by Spoon in XML or database repository. •Transformations are scheduled in batch mode to be run automatically at regular intervals Carte •Simple web server to execute transformations and jobs remotely. •Accept an XML that contains transformation to execute and the execution configuration. •Allows to remotely monitor, start and stop the transformations and jobs
  • 19. Pentaho Dashboards Many ways to design Pentaho dashboards
  • 20. Pentaho Dashboards What is CDE ? * CDE is one of the plug-in for Pentaho BI Server, contributed and maintained by Pentaho Partner webdetails. * We create dashboards using this tool. * Community Dashboard Editor (CDE) was born to simplify the creation, edition and rendering processes of the Dashboards. * CDE is a very powerful and complete tool, combining front end with data sources and custom components in a seamless way. CDE has 3 major components They are. * Layout * Components * Data Sources.CDE has developed based on MVC-2 architecture of Advanced Java
  • 22. Exploring Big data with Pentaho
  • 23. Main Big Data Technologies Hadoop NoSQL Databases Analytic RDBMS • • • • • Low cost, reliable scaleout architecture Distributed computing Proven success in Fortune 500 companies Exploding interest Hadoop • • Huge horizontal scaling and high availability Highly optimized for retrieval and appending Types • • • Document stores Key Value stores Graph databases NoSQL Databases • Optimized for bulk-load and fast aggregate query workloads Types • • • Column-oriented MPP In-memory Analytic Databases
  • 24. What makes Pentaho different for big data Ingestion / Manipulation / Integration Scheduling Modeling Would you rather do this? … OR THIS?
  • 25. Pentaho Big Data Integration Pentaho is integrated with Hadoop at many levels •Traditional ETL - Graphical designer to visually build transformations that read and write data in Hadoop from/to anywhere and transform the data on the way. No coding required •HBase Read/Write •Hive, Hive2 SQL Query and Write •Impala SQL Query and Write •Support for Avro file format and snappy compression •Data Orchestration - Graphical designer to visually build and schedule jobs that orchestrate processing, data movement and most aspects of operationalizing your data preparation. •HDFS Copy files •Map Reduce Job Execution •Pig Script Execution •Amazon EMR Job Execution •Oozie integration •Sqoop Import/Export •Pentaho MapReduce Execution
  • 26. Pentaho Big Data Integration •Pentaho MapReduce - Graphical designer to visually build MapReduce jobs and run them in cluster. With a simple, point-and-click alternative to writing Hadoop MapReduce programs in Java or Pig, Pentaho exposes a familiar ETL-style user interface. •Traditional Reporting - All data sources supported above can be used directly or blended with other data to drive our pixel perfect reporting engine. The reports can be secured, parameterized and published to the web. The reports can be mashed up with other pentaho visualizations to create dashboards. •Web Based Interactive Reporting - Pentaho's Metadata layer leverages data stored in Hive, Hive2 and Impala for WYSIWYG, interactive, self-service reporting. •Pentaho Analyzer - Leverage your data stored Impala or Hive2 for interactive visual analysis with drill through, lasso filtering, zooming, and attribute highlighting for greater insight.
  • 28. Getting started with Pentaho •Download Pentaho from http://community.pentaho.com/ •Download MySQL from http://dev.mysql.com/downloads/mysql/ • Download CDE from www.webdetails.pt/ctools/cde.html Read installation instructions from following blogs: •http://pentaho-bi-suite.blogspot.in/2013/04/installation-ofpentaho-bi-server.html • We have a Pentaho installation guide available. Please request for guide at: info@ellicium.com
  • 29. Thank you !!! Contact us for customized Pentaho training on info@compulinkacademy.com info@ellicium.com Or Call Sameer on +91-8793334411