Pentaho 7.0 aims to bridge the gap between data preparation and analytics by allowing analytics from anywhere in the data pipeline. It brings analytics into data prep workflows, enables sharing analytics during prep, and improves reporting. It also provides enhanced support for big data technologies like Spark, Hadoop security, and metadata injection to automate data onboarding. A demo shows the ability to visually inspect data during prep to identify issues. Analysts say this allows more collaboration between business and IT and accelerates insights.
08448380779 Call Girls In Friends Colony Women Seeking Men
What's New in Pentaho 7.0?
1. What’s New in
Pentaho 7.0?
Pedro Martins, Head of Implementation
Kleyson Rios, Solutions Architect
2. Data and analytics users spend the majority of their time either
preparing data for analysis or waiting for data to be prepared
for them.
According to the Gartner Market Guide for Self-
Service Data Preparation Analytics…
3. Today’s data landscape is littered with
disparate tools and disjointed
processes.
How do you get ahead with that?
Disparate tools, disparate problems
Business
Business Analytics
Data Prep
ETL / Data Engineering
IT
4. 7.0 is about unlocking the data divide
between business and IT.
Leveraging the Pentaho platform
to drive the business with usable,
accurate, and accessible analytics
across the organization.
Business
Business Analytics
Data Prep
ETL / Data Engineering
IT
Bridging the gap between
data preparation and analytics
5. Pentaho’s platform today
A data integration and business analytics platform that can access,
prepare, blend and analyze any structured or unstructured data.
6. 1. Alleviates disparate
tools and complexity.
Pentaho’s future: a platform
that fully enables a healthy data ecosystem
2. Makes analytics
accessible at any stage
of the data pipeline.
3. On top of governance,
security, big data
ecosystem support, and
the required foundations
of a blended data world.
7. Analyze data anywhere in the data pipeline
Pentaho 7.0
Bridging the gap between data preparation and
analytics with a Visual Data Experience from
anywhere in the data pipeline:
• Bringing Analytics into Data Prep
• Share Analytics during Data Prep
• Reporting Enhancements
On top of governance, security, and Big Data
ecosystem support for a blended data world:
• Spark
• Metadata Injection
• Hadoop Security
• Support for Kafka, Avro, Parquet
• Admin Simplification
8. A Visual Data Experience
From Anywhere in The
Data Pipeline
Pentaho 7.0
9. Imagine if you were able to access analytics
from anywhere within the data pipeline
10. Bringing analytics into data prep
Visualize data in-flight, without switching in and
out of tools
11. Bringing analytics into data prep
Access to tables, visualizations, charts, graphs, or ad
hoc analysis during data prep.
12. Identify missing or incorrect data during the
data prep process.
Bringing analytics into data prep
13. Publish data sources to the business, and get
data to the business faster
Bringing analytics into data prep
14. Shrinking the gap between data preparation &
business analytics
2. Creates a more collaborative process
between business and IT, shortening the
cycle from data to analytics.
1. ETL developers and data prep staff
can easily spot check analytics
without switching in and out of tools.
15. The ability to spot check and visualize our data throughout its
lifecycle allows for a much more informative and streamlined data-
driven decision making process to create more reliability, while
reducing costs. ”
Meir Kornfield, Director, Product Management
and Business Intelligence, Sears Holdings Corp.
“
A Visual Data Experience in the words of
Sears Holdings Corp.
17. 7.0 makes big data operational
Operationalize High
Performance Pipelines
with Spark Integration
Protect Data Assets with
Expanded Hadoop
Security
Automate Onboarding
with Enhanced Metadata
Injection
18. Spark potential
Potential and Growth
• Faster processing than MapReduce
• Drives real time & intelligent big data applications at scale
Market Challenges
• Skill barriers – Spark requires specialized developer skills
• Somewhat lacking in enterprise maturity – memory management, multi user access, etc.
• Effective integration with broader data architectures is challenging
19. Current state: Pentaho and Spark
• Execute Spark applications in PDI jobs
• Supports existing Java and Scala code from
core Spark libraries
20. Intuitive coordination of high performance pipelines
Challenge: Hard to manage multiple Spark applications and multiple programming
languages AND operationalize them in data pipelines with full flexibility
7.0 Expands Spark Orchestration
• Coordinate and schedule Spark applications
for Horton and Cloudera
• Operationalize streaming, machine learning,
and core Spark techniques within jobs
• Choice of programming language, incl. Python
21. Remove skill barriers to use Spark
Challenge: Spark requires specialized developer skills. Need an easy way to integrate
Spark data with other data processes.
7.0 Adds SQL on Spark Connectivity
• PDI access to SQL on Spark for rapid data prep
and queries – on Horton and Cloudera
• Improves productivity by using existing IT data
skill sets on Spark
• Accelerates time to value in big data pipeline
projects
22. More secure clusters, better big data governance,
and reduced risk
Challenge: Protect key enterprise big data assets against intrusion and reduce risk of
security breaches
Expanded Hadoop Security
• Secure multi-user access to the cluster via updated
Kerberos integration, enabling user level tracking by
mapping PDI users to Hadoop users
• Compatibility with Sentry to enforce user authorization
rules governing access to specific Hadoop data assets
KERBEROS
23. Accelerate data onboarding with Metadata Injection
Read Transform Write
Data
Source
Data
TargetData
SourceData
SourceData
SourceData
SourceData
SourceData
SourceData
SourceData
Source
x100
What happens when data sources proliferate?
Example use cases:
• Migrating 100+ tables between databases
• Ingesting 100+ data sources into Hadoop
• Allowing end users to onboard data themselves
25. DRIVE HUNDREDS OF JOBS WITH 1 TEMPLATE
Accelerate data onboarding with Metadata Injection
Read Transform Write
Data
Source
Data
TargetData
SourceData
SourceData
SourceData
SourceData
SourceData
SourceData
SourceData
Source
x100
Pass metadata in at run time
to generate jobs on the fly
Reduced development time,
cost, and risk
26. Rapidly automate and scale big data onboarding
Challenge: IT teams spend too much time coding ingestion and processing jobs for a
wide variety of big data sources
Metadata Injection Expansion
• Expands options for auto-generated data flows, by allowing
metadata to be passed to a wider array of PDI steps at
runtime
• Increases IT productivity when building out many data
migration and onboarding processes
• Now works with 30+ additional PDI steps
• Includes compatibility with Hadoop, Hbase, NoSQL,
JSON, XML, and Analytic DB steps
28. Simplified configuration, deployment and
administration of Pentaho
Pentaho 7.0 reduces time to insights by making it faster and easier to configure,
deploy and manage DI and BI services within development to production
environments used to support the data lifecycle with no licensing impact
Configure and Deploy
Faster
Simplify Administration
29. Pentaho 7.0 in Action
Use Case: Retail bank needs to reduce costs and risk related to
credit card fraud with a repeatable business process
• Orchestrate workflow across components
and integrate data in one end-to-end pipeline
• Fewer tools and fewer new skills needed
• Differentiated solution in the market for visual
inspection of data at any step in the prep process
• Data prep cycle time, time to insight accelerated
Coordinate
fraud model
creation on
Spark
Ingest new
transactions
to Hadoop via
Kafka
Access
modeled data
for analysis
via SQL on
Spark
Visually
inspect data
set for quality,
completeness
Collaboratively
share results
with the
business
31. No other platform lets IT and the business collaborate in this
way, at such an early stage in the process. ” - Adrian Bridgwater, TechTarget
“
Analysts & Press on Pentaho 7.0:
So this looks to be a major enhancement which is really setting
out the Pentaho stall as a BI vendor of choice at the
Enterprise level with integrated capability which is easier to use
and more powerful out of the box than the comparable offerings in
the marketplace which are still reliant on skilled technicians to
unite and enact the solutions. ” - David Norris, Bloor Research
“
– Bev Terrell, SiliconANGLE
As Hadoop can be a challenge around security, Pentaho is
expanding its Hadoop data security integration to promote
better Big Data governance, protecting clusters from intruders. ”
“