Explore how data integration (or “mashups”) can maximize analytic value and help business teams create streamlined data pipelines that enables ad-hoc analytic inquiries. You’ll learn why businesses increasingly focused on blending data on demand and at the source, the concrete analytic advantages that this approach delivers, and the type of architectures required for delivering trusted, blended data. We provide a checklist to assess your data integration needs and capabilities, and review some real-world examples of how blending various data types has created significant analytic value and concrete business impact.
2. We Enable the Modern Data-driven Business
Modern, Cohesive Business Analytics and Data Integration Platform
• Full spectrum of analytics for all key roles
• Embeddable, cloud-ready analytics
• Broadest and deepest big data integration
Innovation Through an Open Heritage
• Open, pluggable, purpose-built for the future
• Sustained leadership in big data ecosystem
Business Momentum
• Over 1,500 commercial customers
• Over 10,000 production deployments
4. Background
Much of the value from big data
will come from
“mashing up” proprietary data
with external and open data.
McKinsey Global Institute
10 IT-enabled Business Trends
for the Decade Ahead, 2013
8. Background
Proportion Utilizing Unstructured Data From:
Social Media: 66%
Internet of Things: 65%
Mobile Device Data: 58%
“When individual sources include automated
and/or manual inputs, originate from disparate
systems with different architectures, and are
subject to different levels of governance, an
effective integration process is essential.”
From “Delivering Governed Data For Analytics At Scale,” Forrester Consulting, 2015
9. The most powerful insights come
from blending data on demand
and at the source
10. On Demand and At the Source
Architected & Trusted Approach
• Designed with full knowledge of
underlying systems and constraints
• Utilize most efficient point of
processing
• Provide fast access, avoid
unnecessary staging
• Maintains governance rules
• Preserve semantics, auditability
11. Where Does Data Integration Add Value?
Business Intelligence
and Data Warehousing
“Effective decisions depend on aggregated,
calculated, and time-series data values in a DW
– data and data structure that wouldn’t exist
without data integration”
Builds New and
Valuable Data Sets
“Similar to a value-adding process in
manufacturing, DI collects raw material (data
from sources systems) and assembles it into a
product (new data sets)”
360-Degree Views of
Business Entities
“Success in sales and service often depends on
complete views of each customer, which are
typically assembled with data integration tools
and techniques”
From “Ten Ways Data Integration Provides Business Value,” Philip Russom, TDWI, 2011
12. Data Readiness Checklist
Do I Need Data Integration Capabilities?
1. Do I need to blend several different data sources?
2. Is my data cleansed and modeled?
3. Do I want to enrich my data with new data sources?
4. Have I already captured all the data I need?
5. Will my data sources change in 6, 12, or 18 months?
6. Do I need ad-hoc and drill-down analytic capabilities?
All Signs Point to
Data Integration
14. Blending Web Analytics and Support Data
Business Question:
Am I supporting all of the right browsers for my web app?
Blended by
region and
browser
Software Product
Manager
Google Analytics
web visits via API
Flat file of historical
product support
requests
Android visits,
but we don’t
support yet
15. Blending Machine and Production Data
Business Question:
What facility temperature is optimal for manufacturing output?
HVAC sensor data in
Hadoop accessed
via Hive
Production quotas
and actuals from
data warehouse
Blended by
facility and
time
Operations
Manager
Cold temperature ranges associated with
higher production across almost all facilities
See detailed mashup videos:
pentaho.com/blend-of-the-week
16. Caterpillar
Delivering a 360-Degree View of Equipment
Business Challenge
• Identify opportunities for maintenance
and fuel savings in industrial equipment
operations
• Predict equipment breakdowns to avoid
downtime
• Extend fleet-level insights to equipment
operators
17. Caterpillar
Delivering a 360-Degree View of Equipment
Pentaho Benefits
• Blend sensor data with customer data
and more into unified analytics service
• Operationalize predictive ‘useful life’
models in the data workflow
• Provide a revenue-generating offering
to customers that drives substantial
fuel and maintenance savings
18. Entity 360 Marine Asset Intelligence
Business User (COO)
Reporting on
Operations and
Efficiency
End Users
Dashboards and
Reports on Machine
Performance
Business
Analytics
Server
Data
Marts
Data Scientist
Data Mining and
Predictive Data
Governance
Local Machine
and Server
Data
Fleet Data via
Satellite
Cross
Department
Operations Data
Data
Integration
Data
Integration
19. British Telecom
Protecting Against Cyber Threats
Business Challenge
• Launch new service to market: BT
Assure Cyber, an enterprise
solution for cyber security insights
across many data types
• Previously BT Assure Cyber could
only integrate relational data
sources and not big data sources
20. British Telecom
Protecting Against Cyber Threats
Pentaho Benefits
• Native support for Hadoop in an
enterprise environment
• Ability to integrate telemetry data
from sensors, security controls and
advanced detection tools
• Reduced detection time of cyber
threats from weeks to seconds
21. In Closing
Next Steps
Explore more mashup examples:
www.pentaho.com/blend-of-the-week
Take a look at Pentaho in the 2016 Gartner
Business Analytics Magic Quadrant
Key Takeaways
Teams are taking “data mashups” to new heights
Blend data on demand and at the source
Data integration can maximize analytic value
Pentaho is an end-to-end business analytics and data integration platform. And in particular, a major area of focus for our platform has been helping customers integrate Big Data sources into their architecture and analyze that blend of traditional and emerging data. At the same time, we also provide analytics software that is highly embeddable in that they can fit seamlessly into existing applications and processes. A lot of our success in these areas has been due to both early innovation in Big Data – I believe we’ve been working with Hadoop for 5 to 6 years – as well as the open architecture and standards that our platform is built on.
On-demand & at the source
Example:
Quality of service changes in real time dependent on the network: was the customer able to connect, to hear, to remain connected without being dropped, etc.?
you can easily create architected, blended views across both the traditional Call Detail Records in the warehouse, and the network data streaming into Big Data/NoSQLstore (MongoDB in this example) without sacrificing the governance or performance you expect.
These blended views allow analysts and customer call centers to get accurate, of-the-minute information in near real time to determine the best action to take
Understand browser usage for my products – what direction around supported browsers?
Blend Google Analytics – visitors by region, browser
Support – historical support requests by browser
What browsers do people use to come to our website, vs. what browsers are we getting support requests on?
Anecdotally heard temp dictates product output; is there an appropriate temperature to max production?
Want to look at temperatures by building on an hourly basis
Corp data warehouse: production quotas and actuals
Does temperature correlate with output?
Hadoop has the HVAC data, exposed via Hive – relational layer on Hadoop… Corp data warehouse from table directly…
Connect to this transformation from Pentaho – and auto generate model
Looking at broad ranges – ‘hot’ and ‘cold’
BT
BT Assure Cyber offers comprehensive and fully integrated cyber security for large organisations with complex security needs including the UK Ministry of Defence and other corporate and government customers. UK Government Communications HQ (GCHQ) and other sources claim cyber breaches cost the global economy hundreds of billions of dollars annually.
Previously BT Assure Cyber could only integrate relational data sources, which meant that its customers’ big data sources were not being harvested to detect potential security breaches. By embedding Pentaho, event data and telemetry from a rich variety of data sources including business systems, sensors, traditional security controls and advanced detection tools are all integrated and analysed. Incidents that previously would have taken days or weeks to investigate and respond to, can now be identified and acted upon immediately.
The Disruptive Insight
BT knew that data variety and unpredictability was a mounting problem, but as an ‘Oracle shop’ with a relational heritage, they could not see an obvious way to solve it – customers all have different data landscapes depending on their individual needs, and BT must cater to all. BT started to tackle this by introducing the Hadoop framework. However, relying on Hadoop’s native, immature tools like Sqoop, Flume, Oozie, Kafka to integrate and analyse the data proved massively difficult, time-consuming and risky
Pentaho convinced BT that its big data integration and analytics platform combined with a metadata approach was the only way to reliably assimilate and blend data from so many different and unpredictable sources. BT recognised that Pentaho’s visual tools provided much faster time to value with lower risk, requiring fewer specialised development resources than native Hadoop tools.
Why the Customer Chose Pentaho
Pentaho’s ability to handle data variety and uncertainty in a range of customer scenarios was central to winning this deal. Pentaho successfully proved its viability and offered references in many other high-stakes, complex and secure customer use cases in industries like financial services and energy.
Getting the Deal Done
The single most important factor was Pentaho’s native support for Hadoop in an enterprise environment. Combined with Pentaho’s commercial flexibility to licence the platform in line with how BT charges its customers – by data volumes ingested – the ROI case finally became too compelling to resist.