More Related Content Similar to Big Data for BI - Beyond the Hype - Pentaho (20) Big Data for BI - Beyond the Hype - Pentaho1. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75551
BI for Big Data
Beyond the Hype
2. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75552
Pentaho Mission
The Future of Analytics: Big Data Exploration without Boundaries
Modern, unified data integration and business
analytics platform
• Native integration into big data ecosystem
• Embeddable, cloud-ready analytics
Fast and Broad Innovation
• Open source development model
Critical mass achieved
• Over 1,000 commercial customers
• Over 10,000 production deployments
3. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75553 3
Ian Fyfe
Big Data Solutions Engineering, Pentaho
Ian brings over 20 years of experience in the business analytics software market
with roles spanning consulting services, pre-sales engineering, product
management and product marketing. Ian started his career by co-founding a
business intelligence startup and has worked at Business Objects, Informix,
Epiphany, PeopleSoft and Jaspersoft.
4. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75554
Common Use Cases
4
5. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75555
The Value of Big Data for our Customers
Big opportunities
Improve operational effectiveness
• Machines/sensors: predict failures, network attacks
• Financial risk management: reduce fraud, increase security
Reduce data warehouse cost
• Integrate new data sources without increased database cost
• Provide online access to ‘dark data’
Drive incremental revenue
• Predict customer behavior across all channels
• Understand and monetize customer behavior
6. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75556
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Example Use Cases Today
Transactional
•Fraud detection
•Financial services / stock
markets
Sub-Transactional
•Weblogs
•Social/online media
•Telecoms events
Non-Transactional
•Web pages, blogs etc
•Documents
•Physical events
•Application events
•Machine events
7. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75557
Click Stream Analytics
From buying patterns to revenue
Business Challenge
• Monetize buying patterns hidden in billions of
data points
• Quickly analyze multi-channel click stream data
Pentaho Benefits
• Reduced ETL time to analyze blended data
from Hadoop, Hbase & data warehouse
• Use of big data analytics to grow revenue from
targeted campaigns
8. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75558
Device Data Analytics
Big Data for Fortune 100 Enterprise Storage provider
Business Challenge
• Affordably scale machine data from storage
devices for customer support app
• Predict device failure
• Enhance product performance
Pentaho Benefits
• Easy to use ETL & analysis for Hadoop, Hbase,
& Oracle data sources
• 15x cost improvement
• Stronger performance against customer SLA’s
9. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75559
Healthcare
Embedded Pentaho to better
patient care & compliance through
analysis of unstructured digital pen
data stored in CouchDB
Online Retailer
Understanding the buying patterns
of 5 million users from click stream
data stored in Hadoop & HBase
Gaming
Better monetization of premium
game features through analyzing
large volumes of player data -
stored in MongoDB & Infobright
Social Commerce
Better campaign performance
through monitoring social media,
page clicks and email marketing
data stored in HP Vertica
Travel & Entertainment
Helping thousands of travel
partners like expedia.co.uk and
thomascook.fr improve promotional
targeting using Hbase and Hadoop
Mobile & Digital Media
Embedded Pentaho to measure
massive volumes of mobile and
event data generated from mobile
devices stored in MongoDB
Innovative Organizations Use Pentaho
to Unlock Value from Big Data Stores
10. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755510
Pentaho Embedded Analytics
New Revenue Stream in Eight Weeks
Business Challenge
• Gain new revenue source from add-on
module with reporting, analysis & dashboards
• Get to market fast to differentiate
Pentaho Benefits
• Easy to embed & brand
• Broad capabilities result in new revenue stream
• Increased functionality & compelling
visualizations
11. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755511
Embedded Analytics
Pentaho Uniquely Positioned to Win
Dashboard Framework
Dashboard Designer
Why We Win in Embedded:
• Architectural ‘sweet spot’ for Pentaho
platform
• Flexible pricing, adaptable to fit partner
pricing
• Open source and innovation
• Fastest time-to-market for embedded
analytics
Continued Leadership:
• Cloud & multi-tenancy ease-of-use
• Simplified REST services for ISVs
• BI Platform SDK enhancements – deep
solution examples, tutorials and training
• Continued focus on standards and
extensibility
12. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755512
Big Data Technologies
BI Strengths and Weaknesses
© 2012, Pentaho. All Rights Reserved.
12
13. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755513
The Current Solutions
10,000
2005 20152010
5,000
0
Current Database Solutions are designed for
structured data.
• Optimized to answer known questions quickly
• Schemas dictate form/context
• Difficult to adapt to new data types and new
questions
• Expensive at petabyte scale
STRUCTURED DATA UNSTRUCTURED DATA
GIGABYTESOFDATACREATED(INBILLIONS)
10%
14. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755514
Main Big Data Technologies
Hadoop NoSQL Databases Analytic Databases
Hadoop
• Low cost, reliable
scale-out architecture
• Distributed computing
Proven success in
Fortune 500
companies
• Exploding interest
NoSQL Databases
• Huge horizontal scaling
and high availability
• Highly optimized for
retrieval and appending
• Types
• Document stores
• Key Value stores
• Graph databases
Analytic RDBMS
• Optimized for bulk-load
and fast aggregate
query workloads
• Types
• Column-oriented
• MPP
• In-memory
15. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755515
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
Hadoop Core Components
HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
❯ Massive redundant storage across a commodity
cluster
MAPREDUCE
❯ Map: distribute a computational problem
across a cluster
❯ Reduce: Master node collects the answers to
all the sub-problems and combines them
MANY DISTROS AVAILABLE
US and Worldwide: +1 (866) 660-7555 | Slide
16. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755516
Major Hadoop Utilities
Apache Hive
Apache Pig
Apache HBase
Sqoop
Oozie
Hue
Flume
Apache Whirr
Apache Zookeeper
SQL-like language and
metadata repository
High-level language
for expressing data
analysis programs
The Hadoop database.
Random, real -time
read/write access
Highly reliable
distributed
coordination service
Library for running
Hadoop in the cloud
Distributed service for
collecting and
aggregating log and
event data
Browser-based
desktop interface for
interacting with
Hadoop
Server-based
workflow engine for
Hadoop activities
Integrating Hadoop
with RDBMS
17. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755517
Hadoop & Databases
18. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755518
“The working conditions can
be are shocking”
ETL Developer
Big Data Platform Challenges
19. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755519
Challenges
1. Somewhat immature
2. Lack of tooling
3. Steep technical learning curve
4. Hiring qualified people
5. Availability of enterprise-ready products and tools
6. High latency (Hadoop)
7. Running inside the cluster
20. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755520
Challenges
WOULD YOU RATHER DO THIS?
Scheduling
Modeling
Ingestion / Manipulation /
Integration
… OR THIS?
21. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755521
Investigating
BI & Big Data Solutions
21
22. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755522
Questions to Ask
Business Drivers
1. Mandate to reduce EDW costs?
2. Clear use case that you need to solve?
3. Do you have access to technical skill set?
Technical
1. Do you have more than one kind of big data store, for example Hadoop as well as HBase,
MongoDB or Cassandra?
2. Would you prefer to use the same tool for big data stores in addition to your traditional relational
data stores?
3. Are you ok waiting minutes or even hours to access your big data?
4. Are you ok using a spreadsheet-like interface to access and analyze your data?
5. Do you need complete BI capabilities, including reporting, interactive visualization, and predictive
analytics?
6. Do you need to enrich your big data with data from outside of the big data platform?
7. Is the big data you want to analyze bigger than the amount of memory you have available?
http://blog.pentaho.com/tag/ian-fyfe/
23. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755523
Demo
© 2012, Pentaho. All Rights Reserved.
23
24. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755524
Data Ingestion
Manipulation
Integration
Enterprise &
Ad Hoc Reporting
Data Discovery
Visualization
Predictive Analytics
Complete Big Data Analytics &
Visual Data Management
RelationalHadoop NoSQL
Analytic
Databases
Pentaho Big Data Analytics
25. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755525
Open
Discussion
26. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755526
Thank You
blog.pentaho.com
@Pentaho
Facebook.com/Pentaho
Pentaho Business Analytics
JOIN THE CONVERSATION. YOU CAN FIND US ON:
Editor's Notes * Not many companies have transactional data that classifies as Big Data. Credit card companies, and financial services companies are about it.
* With stock market data were are talking about every stock trade and the bid and ask prices between the transactions - for every stock on multiple markets for a significant time period.
For many other companies the Big Data is sub-transactional - it is the events that lead up to transactions
* Weblogs are semi/badly structured. Consider the number of weblog entries created as you look for a book online - researching 5-10 books, reading reviews and comments. You might generate 1000 entries and may or may not buy a book - potentially lots of entries for no transaction. We also want to enrich this data with metadata about the URLs and information about the location of user
* In an online game or world every interaction between participants and the system and between each other is logged. An individual participant might generate > 1 million events for their 1 monthly transaction
* A single phone call or text message generates many events within a telecoms company TAKE-AWAYS
Pentaho has many big data customers across a range of industries and big data platforms.
TAKE-AWAYS
Pentaho provides complete integrated DI+BI for every leading big data platform.
Big Data solutions are not databases. They don’t provide the capabilities that BI toolsets expect of a database.
Hadoop also has a high latency. This means the smallest query possible has an execution time that is much slower than that of a database
Hadoop is optimized for executing very intensive data processing tasks on very large amounts of data. It is not optimized for quick queries. Some Hadoop experts recommend configuring the workloads so that Hadoop jobs take an hour or more. This conflicts with OLAP performance criteria of 5-10 seconds per query.
There are database implementations within the Hadoop world, Hive, HBase etc. Unfortunately for developers who are used to working with data transformation tools, the productivity within the Hadoop environment is not what they are used to. TAKE-AWAYS
The better choice is obviously visual development