Data Mashups
for Analytics
Bringing Everything Together for
Actionable Insights
Ben Hopkins
Sr. Product Marketing Manager, Pentaho
We Enable the Modern Data-driven Business
Modern, Cohesive Business Analytics and Data Integration Platform
• Full spectrum of analytics for all key roles
• Embeddable, cloud-ready analytics
• Broadest and deepest big data integration
Innovation Through an Open Heritage
• Open, pluggable, purpose-built for the future
• Sustained leadership in big data ecosystem
Business Momentum
• Over 1,500 commercial customers
• Over 10,000 production deployments
Agenda
① Background
② Approaches to Data Blending
③ The Role of Data Integration
④ Real World Examples & Success
Background
Much of the value from big data
will come from
“mashing up” proprietary data
with external and open data.
McKinsey Global Institute
10 IT-enabled Business Trends
for the Decade Ahead, 2013
Poll Results from pentaho.com
Poll Results from pentaho.com
Poll Results from pentaho.com
Background
Proportion Utilizing Unstructured Data From:
Social Media: 66%
Internet of Things: 65%
Mobile Device Data: 58%
“When individual sources include automated
and/or manual inputs, originate from disparate
systems with different architectures, and are
subject to different levels of governance, an
effective integration process is essential.”
From “Delivering Governed Data For Analytics At Scale,” Forrester Consulting, 2015
The most powerful insights come
from blending data on demand
and at the source
On Demand and At the Source
Architected & Trusted Approach
• Designed with full knowledge of
underlying systems and constraints
• Utilize most efficient point of
processing
• Provide fast access, avoid
unnecessary staging
• Maintains governance rules
• Preserve semantics, auditability
Where Does Data Integration Add Value?
Business Intelligence
and Data Warehousing
“Effective decisions depend on aggregated,
calculated, and time-series data values in a DW
– data and data structure that wouldn’t exist
without data integration”
Builds New and
Valuable Data Sets
“Similar to a value-adding process in
manufacturing, DI collects raw material (data
from sources systems) and assembles it into a
product (new data sets)”
360-Degree Views of
Business Entities
“Success in sales and service often depends on
complete views of each customer, which are
typically assembled with data integration tools
and techniques”
From “Ten Ways Data Integration Provides Business Value,” Philip Russom, TDWI, 2011
Data Readiness Checklist
Do I Need Data Integration Capabilities?
1. Do I need to blend several different data sources?
2. Is my data cleansed and modeled?
3. Do I want to enrich my data with new data sources?
4. Have I already captured all the data I need?
5. Will my data sources change in 6, 12, or 18 months?
6. Do I need ad-hoc and drill-down analytic capabilities?
All Signs Point to
Data Integration
Data Blending
Examples &
Success Stories
Blending Web Analytics and Support Data
Business Question:
Am I supporting all of the right browsers for my web app?
Blended by
region and
browser
Software Product
Manager
Google Analytics
web visits via API
Flat file of historical
product support
requests
Android visits,
but we don’t
support yet
Blending Machine and Production Data
Business Question:
What facility temperature is optimal for manufacturing output?
HVAC sensor data in
Hadoop accessed
via Hive
Production quotas
and actuals from
data warehouse
Blended by
facility and
time
Operations
Manager
Cold temperature ranges associated with
higher production across almost all facilities
See detailed mashup videos:
pentaho.com/blend-of-the-week
Caterpillar
Delivering a 360-Degree View of Equipment
Business Challenge
• Identify opportunities for maintenance
and fuel savings in industrial equipment
operations
• Predict equipment breakdowns to avoid
downtime
• Extend fleet-level insights to equipment
operators
Caterpillar
Delivering a 360-Degree View of Equipment
Pentaho Benefits
• Blend sensor data with customer data
and more into unified analytics service
• Operationalize predictive ‘useful life’
models in the data workflow
• Provide a revenue-generating offering
to customers that drives substantial
fuel and maintenance savings
Entity 360 Marine Asset Intelligence
Business User (COO)
Reporting on
Operations and
Efficiency
End Users
Dashboards and
Reports on Machine
Performance
Business
Analytics
Server
Data
Marts
Data Scientist
Data Mining and
Predictive Data
Governance
Local Machine
and Server
Data
Fleet Data via
Satellite
Cross
Department
Operations Data
Data
Integration
Data
Integration
British Telecom
Protecting Against Cyber Threats
Business Challenge
• Launch new service to market: BT
Assure Cyber, an enterprise
solution for cyber security insights
across many data types
• Previously BT Assure Cyber could
only integrate relational data
sources and not big data sources
British Telecom
Protecting Against Cyber Threats
Pentaho Benefits
• Native support for Hadoop in an
enterprise environment
• Ability to integrate telemetry data
from sensors, security controls and
advanced detection tools
• Reduced detection time of cyber
threats from weeks to seconds
In Closing
Next Steps
 Explore more mashup examples:
www.pentaho.com/blend-of-the-week
 Take a look at Pentaho in the 2016 Gartner
Business Analytics Magic Quadrant
Key Takeaways
 Teams are taking “data mashups” to new heights
 Blend data on demand and at the source
 Data integration can maximize analytic value
Questions
and Discussion
Thank You

Data Mashups for Analytics

  • 1.
    Data Mashups for Analytics BringingEverything Together for Actionable Insights Ben Hopkins Sr. Product Marketing Manager, Pentaho
  • 2.
    We Enable theModern Data-driven Business Modern, Cohesive Business Analytics and Data Integration Platform • Full spectrum of analytics for all key roles • Embeddable, cloud-ready analytics • Broadest and deepest big data integration Innovation Through an Open Heritage • Open, pluggable, purpose-built for the future • Sustained leadership in big data ecosystem Business Momentum • Over 1,500 commercial customers • Over 10,000 production deployments
  • 3.
    Agenda ① Background ② Approachesto Data Blending ③ The Role of Data Integration ④ Real World Examples & Success
  • 4.
    Background Much of thevalue from big data will come from “mashing up” proprietary data with external and open data. McKinsey Global Institute 10 IT-enabled Business Trends for the Decade Ahead, 2013
  • 5.
    Poll Results frompentaho.com
  • 6.
    Poll Results frompentaho.com
  • 7.
    Poll Results frompentaho.com
  • 8.
    Background Proportion Utilizing UnstructuredData From: Social Media: 66% Internet of Things: 65% Mobile Device Data: 58% “When individual sources include automated and/or manual inputs, originate from disparate systems with different architectures, and are subject to different levels of governance, an effective integration process is essential.” From “Delivering Governed Data For Analytics At Scale,” Forrester Consulting, 2015
  • 9.
    The most powerfulinsights come from blending data on demand and at the source
  • 10.
    On Demand andAt the Source Architected & Trusted Approach • Designed with full knowledge of underlying systems and constraints • Utilize most efficient point of processing • Provide fast access, avoid unnecessary staging • Maintains governance rules • Preserve semantics, auditability
  • 11.
    Where Does DataIntegration Add Value? Business Intelligence and Data Warehousing “Effective decisions depend on aggregated, calculated, and time-series data values in a DW – data and data structure that wouldn’t exist without data integration” Builds New and Valuable Data Sets “Similar to a value-adding process in manufacturing, DI collects raw material (data from sources systems) and assembles it into a product (new data sets)” 360-Degree Views of Business Entities “Success in sales and service often depends on complete views of each customer, which are typically assembled with data integration tools and techniques” From “Ten Ways Data Integration Provides Business Value,” Philip Russom, TDWI, 2011
  • 12.
    Data Readiness Checklist DoI Need Data Integration Capabilities? 1. Do I need to blend several different data sources? 2. Is my data cleansed and modeled? 3. Do I want to enrich my data with new data sources? 4. Have I already captured all the data I need? 5. Will my data sources change in 6, 12, or 18 months? 6. Do I need ad-hoc and drill-down analytic capabilities? All Signs Point to Data Integration
  • 13.
  • 14.
    Blending Web Analyticsand Support Data Business Question: Am I supporting all of the right browsers for my web app? Blended by region and browser Software Product Manager Google Analytics web visits via API Flat file of historical product support requests Android visits, but we don’t support yet
  • 15.
    Blending Machine andProduction Data Business Question: What facility temperature is optimal for manufacturing output? HVAC sensor data in Hadoop accessed via Hive Production quotas and actuals from data warehouse Blended by facility and time Operations Manager Cold temperature ranges associated with higher production across almost all facilities See detailed mashup videos: pentaho.com/blend-of-the-week
  • 16.
    Caterpillar Delivering a 360-DegreeView of Equipment Business Challenge • Identify opportunities for maintenance and fuel savings in industrial equipment operations • Predict equipment breakdowns to avoid downtime • Extend fleet-level insights to equipment operators
  • 17.
    Caterpillar Delivering a 360-DegreeView of Equipment Pentaho Benefits • Blend sensor data with customer data and more into unified analytics service • Operationalize predictive ‘useful life’ models in the data workflow • Provide a revenue-generating offering to customers that drives substantial fuel and maintenance savings
  • 18.
    Entity 360 MarineAsset Intelligence Business User (COO) Reporting on Operations and Efficiency End Users Dashboards and Reports on Machine Performance Business Analytics Server Data Marts Data Scientist Data Mining and Predictive Data Governance Local Machine and Server Data Fleet Data via Satellite Cross Department Operations Data Data Integration Data Integration
  • 19.
    British Telecom Protecting AgainstCyber Threats Business Challenge • Launch new service to market: BT Assure Cyber, an enterprise solution for cyber security insights across many data types • Previously BT Assure Cyber could only integrate relational data sources and not big data sources
  • 20.
    British Telecom Protecting AgainstCyber Threats Pentaho Benefits • Native support for Hadoop in an enterprise environment • Ability to integrate telemetry data from sensors, security controls and advanced detection tools • Reduced detection time of cyber threats from weeks to seconds
  • 21.
    In Closing Next Steps Explore more mashup examples: www.pentaho.com/blend-of-the-week  Take a look at Pentaho in the 2016 Gartner Business Analytics Magic Quadrant Key Takeaways  Teams are taking “data mashups” to new heights  Blend data on demand and at the source  Data integration can maximize analytic value
  • 22.
  • 23.

Editor's Notes

  • #3 Pentaho is an end-to-end business analytics and data integration platform. And in particular, a major area of focus for our platform has been helping customers integrate Big Data sources into their architecture and analyze that blend of traditional and emerging data. At the same time, we also provide analytics software that is highly embeddable in that they can fit seamlessly into existing applications and processes. A lot of our success in these areas has been due to both early innovation in Big Data – I believe we’ve been working with Hadoop for 5 to 6 years – as well as the open architecture and standards that our platform is built on.
  • #10 On-demand & at the source
  • #11 Example: Quality of service changes in real time dependent on the network: was the customer able to connect, to hear, to remain connected without being dropped, etc.? you can easily create architected, blended views across both the traditional Call Detail Records in the warehouse, and the network data streaming into Big Data/NoSQLstore (MongoDB in this example) without sacrificing the governance or performance you expect. These blended views allow analysts and customer call centers to get accurate, of-the-minute information in near real time to determine the best action to take
  • #15 Understand browser usage for my products – what direction around supported browsers? Blend Google Analytics – visitors by region, browser Support – historical support requests by browser What browsers do people use to come to our website, vs. what browsers are we getting support requests on?
  • #16 Anecdotally heard temp dictates product output; is there an appropriate temperature to max production? Want to look at temperatures by building on an hourly basis Corp data warehouse: production quotas and actuals Does temperature correlate with output? Hadoop has the HVAC data, exposed via Hive – relational layer on Hadoop… Corp data warehouse from table directly… Connect to this transformation from Pentaho – and auto generate model Looking at broad ranges – ‘hot’ and ‘cold’
  • #20 BT BT Assure Cyber offers comprehensive and fully integrated cyber security for large organisations with complex security needs including the UK Ministry of Defence and other corporate and government customers. UK Government Communications HQ (GCHQ) and other sources claim cyber breaches cost the global economy hundreds of billions of dollars annually. Previously BT Assure Cyber could only integrate relational data sources, which meant that its customers’ big data sources were not being harvested to detect potential security breaches. By embedding Pentaho, event data and telemetry from a rich variety of data sources including business systems, sensors, traditional security controls and advanced detection tools are all integrated and analysed. Incidents that previously would have taken days or weeks to investigate and respond to, can now be identified and acted upon immediately. The Disruptive Insight BT knew that data variety and unpredictability was a mounting problem, but as an ‘Oracle shop’ with a relational heritage, they could not see an obvious way to solve it – customers all have different data landscapes depending on their individual needs, and BT must cater to all. BT started to tackle this by introducing the Hadoop framework. However, relying on Hadoop’s native, immature tools like Sqoop, Flume, Oozie, Kafka to integrate and analyse the data proved massively difficult, time-consuming and risky Pentaho convinced BT that its big data integration and analytics platform combined with a metadata approach was the only way to reliably assimilate and blend data from so many different and unpredictable sources. BT recognised that Pentaho’s visual tools provided much faster time to value with lower risk, requiring fewer specialised development resources than native Hadoop tools. Why the Customer Chose Pentaho Pentaho’s ability to handle data variety and uncertainty in a range of customer scenarios was central to winning this deal. Pentaho successfully proved its viability and offered references in many other high-stakes, complex and secure customer use cases in industries like financial services and energy. Getting the Deal Done The single most important factor was Pentaho’s native support for Hadoop in an enterprise environment. Combined with Pentaho’s commercial flexibility to licence the platform in line with how BT charges its customers – by data volumes ingested – the ROI case finally became too compelling to resist.