3. 1
2
3
4
IDC Predictions 2012
Gartner, Predicts 2012
Wikibon 2012, Big Data Market Size and Vendor Revenues.
McKinsey Global Institute 2011, Big data: The next frontier for innovation,
competition, and productivity
5 Economist Intelligence Unit 2011, Big data. Harnessing a game-changing asset
BIG DATA: WHY NOW?
digital data globally doubles
every two years1
x2
90%
10-50%
70%
of Top 500 enterprises will fail
to exploit Big Data2
>30%
cost reduction in production
through Big Data exploitation4
of all IT invest 2015 will be Big
Data driven2
85%
of all data is unstructured and
cannot be handled with
traditional analytics tools1
of enterprises have no formal
concept for data management5
T-Systems | Big Data
14.11.2013
2
4. THE BI ECOSYSTEM ACCORDING TO FORRESTER
T-Systems | Big Data
14.11.2013
3
5. THE 2012 GARTNER HYPE CYCLE FOR BIG DATA
IN-MEMORY ANALYTICS APPROACHING MAINSTREAM ADOPTION
T-Systems | Big Data
14.11.2013
4
7. HADOOP VS IN-MEMORY ANALYTICS
IMA is the Ferrari: Sexy, very fast, but with limited luggage space
Hadoop (with Impala) is a fleet of MPV's: Good performance & capacity, easy to drive, affordable
Hadoop (without Impala) is a fleet of Long Haul trucks: Moderate performance, Excellent Capacity,
needs a specialist driver’s license and drives overnight.
How fast do you want your delivery made? What is being delivered? How much do you want to spend?
Do you have specialist drivers?
Some Hadoop Improvements
• With the ecosystem of contributors and distributions, Hadoop becomes easier and easier to
use e.g. Cloudera’s Impala, Microsoft’s HDInsight, MapR’s Drill, Hortonworks’ Stinger Initiative
• With Cloudera’s Hadoop offering when you buy the Trucks they throw in the MPV's for free
• Hadoop 2.0 brings YARN, Graph Analysis and Stream Processing
• With the speed of improvements in HDFS/HBase/Hive/Yarn, the gap between batch and realtime/low-latency is going to be cut fairly soon e.g. from Hive 0.10 to 0.11 with the new RCFile
data format there is a performance boost >10x
T-Systems | Big Data
14.11.2013
6
8. HADOOP INNOVATION #1: MUCH CHEAPER STORAGE
SAN Storage
NAS File Servers
Local Storage
$2 - $10/Gigabyte
$1M gets:
0.5Petabytes
200,000 IOPS
8Gbyte/sec
$1 - $5/Gigabyte
$1M gets:
1 Petabyte
200,000 IOPS
10Gbyte/sec
<$0.50/Gigabyte
$1M gets:
10 Petabytes
400,000 IOPS
250 Gbytes/sec
Software by
HDS, bundled with
hardware by HDS
Software by
NetApp, bundled with
hardware by NetApp
Software by
open source Hadoop ecosystem,
hardware self-assembled
T-Systems | Big Data
14.11.2013
7
7
9. HADOOP INNOVATION #2: STORE FIRST, QUESTIONS LATER
Legacy BI
Business
Problem
Backward-looking analysis
High performance BI
Using data out of business
applications
Quasi-real-time, In-memory
analysis
Using data out of business
applications
Technology
Solution
SAP Business Objects
IBM Cognos
MicroStrategy
Structured
Limited (2 – 3 TB in RAM)
Data Type/Scalability
Structured
Limited (1 PB in RAM)
Batch, Forward-looking
predictive analysis
Questions defined in the
moment, using data from
many sources
Cloudera Hadoop
Hortonworks Hadoop
Microsoft Hadoop
Structured or unstructured
Quasi unlimited (20 – 30 PB)
Complex Event Processing
Selected Vendors
Oracle Exadata
SAP HANA
„Hadoop“ Ecosystem
T-Systems | Big Data
14.11.2013
8
10. GARTNER HYPE CYCLE FOR ANALYTIC APPLICATIONS
A GREAT STARTING POINT FOR BI AND BIG DATA USE CASES
T-Systems | Big Data
14.11.2013
9
11. Implementing HADOOP to generate profit
selected Use Cases
Intelligent News Discovery
Research and analysis of video, audio
and online print
Semantic analyses and results
visualization
Security Analytics
Print Queue analysis for Confidential
and/or sensitive documents
Email Analysis
Comprehensive monitoring of
unlimited data volumes and types
Metro Traffic Diagnostics
Analysis of traffic situations
Improved planning and local resident
satisfaction
Big Event optimisation
Efficient Fleet Management
Driving tips for drivers
Competitive advantage thanks to cost
reductions
Lower fuel consumption and CO2 emissions
Better planning of routes and cargo loads
Smarter Energy Management
Optimized use of resources
for all energy sources
Future utilisation forecasts
Feeds into customer-specific
pricing
Campaign Analytics
Monitoring of
marketing campaigns
Consideration of all sources
and formats
Efficient campaign management
T-Systems | Big Data
Smarter Procurement
Transparency across all
suppliers and prices
Stronger negotiating position
in purchasing
Efficient cashflow management
14.11.2013
10
12. HADOOP USE CASES BY BUSINESS FUNCTION
Marketing & Sales
Product Development &
Research
Product Service &
Support
Distribution & Logistics
Finance & Controlling
Online Marketing
Campaign Optimization
Using Online Forums for
Product Development &
Sentiment Analysis
Production Optimization using
Sensor Data and
Machine 2 Machine
Communication
Supply Chain Optimization
controlling own and OEM
production capacity
Customer Individual Discounts
for products on websites and call
centers (multi factor, real time)
Predictive Maintenance &
Prediction (Combat unwanted
production stops)
Truck transportation
optimization (transport order
navigational data, combined with
traffic data)
Financial Simulation and
Scenario Calculations
Production Planning for
Seasonal Goods
(multi factor )
Road Charge Optimization (real
time adaptation of fees
according to current traffic)
Big Data for Point of Sales
Optimization/Cross Selling
Big Data for Point of Sales
Optimization/Cross Selling
Competitive Analysis
using Online Press,
Social Media with Scraping and
Text Analysis
Social Media Usage
for Macro/Micro Trend analysis
Massive Parallel Processing for
Drug Testing in Pharma
CERN number crunching for
test data (40GB/sec)
Financial Simulation and
Scenario Calculations
Online Fraud Detection (Credit
Card transactions, etc.)
Risk Controlling
(Market Risk/Value at Risk)
Customer Churn Analysis
for Prepaid Telco business
(behavior based)
Detection of unknown financial
risk (e.g. for real estate loans)
Optimize Target Group
Marketing for online banking
based on trading/depot
transactions
T-Systems | Big Data
14.11.2013
11
13. WHAT ARE THE PRE-REQUISITES FOR AN EFFECTIVE VALUE
DERIVED FROM HADOOP?
Foundation is a Data Strategy
• Map Data to Business Value – which data is required to deliver on a value statement or answer
a fundamental business question
• Categorise critical Data vs non-Critical Data – critical data is not only the data identified in the
Business Value question above, but is that data that could/should have long-term (potential)
value and is typically used across multiple business processes or a value chain. Master Data
Management is a key activity here
• Define your Data Ecosystem – not only the technology but the processes, responsibilities
matched to roles - and three core capabilities – data, insight and action
• Data Governance
Define the appropriate Data Roles in the organisation
the governance structure must be federated, with a central governing body addressing the most
important, common data and most of the data managed locally in the lines of business.
Improve Data Quality
Improve Data Accessibility
T-Systems | Big Data
14.11.2013
12
14. SOME NEW ROLES IN DATA/ANALYTICS
THE COMING OF AGE OF DATA IN THE ENTERPRISE
The Data Scientist
The Chief Data Officer
Data Hygienist/Data Steward
Data Explorer
Business Solution Architect/Domain Expert
Campaign Expert
Data Security Officer
50%
Big Data talent gap expected
until 20184
4 McKinsey Global Institute 2011, Big data: The
next frontier for innovation, competition, and
productivity
T-Systems | Big Data
14.11.2013
13
15. MANY ORGANISATIONS RESEMBLE THIS TODAY
HOW DOES HADOOP COMPLEMENT EXISTING INVESTMENTS IN
BUSINESS INTELLIGENCE?
Business Intelligence Tools and analytical applications
Reporting
Data
Warehouse
Dashboard
Appliance
OLAP
Data Mining
Data Mart
Cube
Data integration ETL
Transactional
OLTP DBMS
Business
Applications
ERP, CRM, etc.
Existing data sources
T-Systems | Big Data
14.11.2013
14
16. HADOOP COMPLEMENTS EXISTING BI INVESTMENT
Business Intelligence Tools and analytical applications
Reporting
Dashboard
OLAP
Data & Text Mining
Predictive
Analytics
Complex event
processing
Stuctured and
unstructured data
Data
Warehouse
Appliance
Data integration ETL
Transactional
OLTP DBMS
Business
Applications
ERP, CRM, etc.
Existing data sources
Operational
Intelligence
Data Mart
Cube
Real-time data
processing and
analysis
Static data
Flowing data
Hadoop,
NoSQL,
Log-Data
Cloud
SaaS
New data sources
T-Systems | Big Data
14.11.2013
15
17. HOW USE CASE SEGMENTATION DRIVES SOLUTION DESIGN
AND TECHNOLOGY SELECTION
USE CASE
POTENTIAL TOOL
Real-time Reporting of SAP OLTP data,
including joins and data transformations
SAP HANA
Summarise Unstructured DATA LOGS
(scheduled)
HADOOP MAP/REDUCE
Realtime reporting of Summarised Data
Logs, with Joins to other NON OLTP Data
IMPALA
Near Realtime reporting of Social Media
Data
IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent
Social Media Data)
Realtime reporting of recent OLTP data
joined with recent Social Media Data
HANA + HADOOP MAP/REDUCE (scheduled to collect recent
Social Media Data and load into HANA)
Image Analysis Processing (scheduled)
HADOOP MAP/REDUCE (scheduled job runs sophisticated
analysis of Video files and stores results in a structured file)
Image Analysis Reporting
IMPALA (to report on results file)
Predictive Analysis Reporting (comparing
OLTP & NON OLTP DATA)
HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer
applicable Historic or relevant Non OLTP Data to HANA)
T-Systems | Big Data
14.11.2013
16
18. HOW USE CASE SEGMENTATION DRIVES SOLUTION DESIGN
AND TECHNOLOGY SELECTION
T-Systems | Big Data
14.11.2013
17
19. SUMMARY
Data Volumes are here to stay
Hadoop is getting more powerful, more realtime and easier to use
Hadoop is not your Big Data answer – it is part of your BI and Big Data ecosystem
An Enterprise Data Strategy and Data Governance is critical to success
Make sure you have two conversations in your enterprise
• A Business Conversation about the business values from your BI Ecosystem
• An IT Conversation to ensure your IT Organisation understands the new world of BI, the
shortcomings, the strengths and roles of the component technologies
“What matters is how — and why — vastly more data leads to vastly greater value creation.
Designing and determining those links is typically in the province of top management”
but needs to be facilitated by the IT Organisation in Business terms
T-Systems | Big Data
14.11.2013
18
20. A PARTING THOUGHT
HADOOP (AND BIG DATA) IS 4 V‘S NOT JUST 3
ANALYTICS
creates
VALUE
value comes from knowing more than the rest
T-Systems | Big Data
14.11.2013
19
22. AGENDA
Where are we with Big Data and Hadoop at the end of 2013?
What is the disruptive innovation in Hadoop?
What are target use cases, horizontally and telco-specific?
How do you start realizing value from Hadoop today?
What are the prerequisites for an effective value derived from Hadoop?
How does Hadoop complement existing investments in business intelligence?
How use case segmentation drives solution design and technology selection
T-Systems | Big Data
14.11.2013
21
23. LEARNING THE LANGUAGE OF BIG DATA
ZooKeeper
Matlab
GreenPlum
Talend
Ruby
Redis
Shep
InfoChimps
Hbase
Jaspersoft
C++
Java
Pig
Platfora
Hive
Continuity
MapReduce
NoSQL
Aster
Hadoop
Tableau
Kafka
MongoDB
GoPivotal
Python
Nutch
Neo4j
Cassandra
Avro
Pentaho
Riak
R
Skytree
Splunk
Karmasphere
Studio
HDFS
Chukwa
CouchDB
JRuby
T-Systems | Big Data
14.11.2013
22