Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Capturing big value in big data

460 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Capturing big value in big data

  1. 1. This document is offered compliments of BSP Media Group. www.bspmediagroup.com All rights reserved.
  2. 2. HADOOP Capturing Big Value in Big Data T-Systems | Big Data 14.11.2013 1
  3. 3. 1 2 3 4 IDC Predictions 2012 Gartner, Predicts 2012 Wikibon 2012, Big Data Market Size and Vendor Revenues. McKinsey Global Institute 2011, Big data: The next frontier for innovation, competition, and productivity 5 Economist Intelligence Unit 2011, Big data. Harnessing a game-changing asset BIG DATA: WHY NOW? digital data globally doubles every two years1 x2 90% 10-50% 70% of Top 500 enterprises will fail to exploit Big Data2 >30% cost reduction in production through Big Data exploitation4 of all IT invest 2015 will be Big Data driven2 85% of all data is unstructured and cannot be handled with traditional analytics tools1 of enterprises have no formal concept for data management5 T-Systems | Big Data 14.11.2013 2
  4. 4. THE BI ECOSYSTEM ACCORDING TO FORRESTER T-Systems | Big Data 14.11.2013 3
  5. 5. THE 2012 GARTNER HYPE CYCLE FOR BIG DATA IN-MEMORY ANALYTICS APPROACHING MAINSTREAM ADOPTION T-Systems | Big Data 14.11.2013 4
  6. 6. POSITIONING HADOOP NOVEMBER 2013 HADOOP APPROACHING MAINSTREAM ADOPTION T-Systems | Big Data 14.11.2013 5
  7. 7. HADOOP VS IN-MEMORY ANALYTICS IMA is the Ferrari: Sexy, very fast, but with limited luggage space Hadoop (with Impala) is a fleet of MPV's: Good performance & capacity, easy to drive, affordable Hadoop (without Impala) is a fleet of Long Haul trucks: Moderate performance, Excellent Capacity, needs a specialist driver’s license and drives overnight. How fast do you want your delivery made? What is being delivered? How much do you want to spend? Do you have specialist drivers? Some Hadoop Improvements • With the ecosystem of contributors and distributions, Hadoop becomes easier and easier to use e.g. Cloudera’s Impala, Microsoft’s HDInsight, MapR’s Drill, Hortonworks’ Stinger Initiative • With Cloudera’s Hadoop offering when you buy the Trucks they throw in the MPV's for free • Hadoop 2.0 brings YARN, Graph Analysis and Stream Processing • With the speed of improvements in HDFS/HBase/Hive/Yarn, the gap between batch and realtime/low-latency is going to be cut fairly soon e.g. from Hive 0.10 to 0.11 with the new RCFile data format there is a performance boost >10x T-Systems | Big Data 14.11.2013 6
  8. 8. HADOOP INNOVATION #1: MUCH CHEAPER STORAGE SAN Storage NAS File Servers Local Storage $2 - $10/Gigabyte $1M gets: 0.5Petabytes 200,000 IOPS 8Gbyte/sec $1 - $5/Gigabyte $1M gets: 1 Petabyte 200,000 IOPS 10Gbyte/sec <$0.50/Gigabyte $1M gets: 10 Petabytes 400,000 IOPS 250 Gbytes/sec Software by HDS, bundled with hardware by HDS Software by NetApp, bundled with hardware by NetApp Software by open source Hadoop ecosystem, hardware self-assembled T-Systems | Big Data 14.11.2013 7 7
  9. 9. HADOOP INNOVATION #2: STORE FIRST, QUESTIONS LATER Legacy BI Business Problem  Backward-looking analysis  High performance BI Using data out of business applications  Quasi-real-time, In-memory analysis Using data out of business applications  Technology Solution      SAP Business Objects IBM Cognos MicroStrategy Structured Limited (2 – 3 TB in RAM)     Data Type/Scalability Structured Limited (1 PB in RAM)  Batch, Forward-looking predictive analysis  Questions defined in the moment, using data from many sources    Cloudera Hadoop Hortonworks Hadoop Microsoft Hadoop   Structured or unstructured Quasi unlimited (20 – 30 PB) Complex Event Processing Selected Vendors Oracle Exadata SAP HANA „Hadoop“ Ecosystem T-Systems | Big Data 14.11.2013 8
  10. 10. GARTNER HYPE CYCLE FOR ANALYTIC APPLICATIONS A GREAT STARTING POINT FOR BI AND BIG DATA USE CASES T-Systems | Big Data 14.11.2013 9
  11. 11. Implementing HADOOP to generate profit selected Use Cases Intelligent News Discovery  Research and analysis of video, audio and online print  Semantic analyses and results visualization Security Analytics  Print Queue analysis for Confidential and/or sensitive documents  Email Analysis  Comprehensive monitoring of unlimited data volumes and types Metro Traffic Diagnostics  Analysis of traffic situations  Improved planning and local resident satisfaction  Big Event optimisation Efficient Fleet Management  Driving tips for drivers  Competitive advantage thanks to cost reductions  Lower fuel consumption and CO2 emissions  Better planning of routes and cargo loads Smarter Energy Management  Optimized use of resources for all energy sources  Future utilisation forecasts  Feeds into customer-specific pricing Campaign Analytics  Monitoring of marketing campaigns  Consideration of all sources and formats  Efficient campaign management T-Systems | Big Data Smarter Procurement  Transparency across all suppliers and prices  Stronger negotiating position in purchasing  Efficient cashflow management 14.11.2013 10
  12. 12. HADOOP USE CASES BY BUSINESS FUNCTION Marketing & Sales Product Development & Research Product Service & Support Distribution & Logistics Finance & Controlling Online Marketing Campaign Optimization Using Online Forums for Product Development & Sentiment Analysis Production Optimization using Sensor Data and Machine 2 Machine Communication Supply Chain Optimization controlling own and OEM production capacity Customer Individual Discounts for products on websites and call centers (multi factor, real time) Predictive Maintenance & Prediction (Combat unwanted production stops) Truck transportation optimization (transport order navigational data, combined with traffic data) Financial Simulation and Scenario Calculations Production Planning for Seasonal Goods (multi factor ) Road Charge Optimization (real time adaptation of fees according to current traffic) Big Data for Point of Sales Optimization/Cross Selling Big Data for Point of Sales Optimization/Cross Selling Competitive Analysis using Online Press, Social Media with Scraping and Text Analysis Social Media Usage for Macro/Micro Trend analysis Massive Parallel Processing for Drug Testing in Pharma CERN number crunching for test data (40GB/sec) Financial Simulation and Scenario Calculations Online Fraud Detection (Credit Card transactions, etc.) Risk Controlling (Market Risk/Value at Risk) Customer Churn Analysis for Prepaid Telco business (behavior based) Detection of unknown financial risk (e.g. for real estate loans) Optimize Target Group Marketing for online banking based on trading/depot transactions T-Systems | Big Data 14.11.2013 11
  13. 13. WHAT ARE THE PRE-REQUISITES FOR AN EFFECTIVE VALUE DERIVED FROM HADOOP? Foundation is a Data Strategy • Map Data to Business Value – which data is required to deliver on a value statement or answer a fundamental business question • Categorise critical Data vs non-Critical Data – critical data is not only the data identified in the Business Value question above, but is that data that could/should have long-term (potential) value and is typically used across multiple business processes or a value chain. Master Data Management is a key activity here • Define your Data Ecosystem – not only the technology but the processes, responsibilities matched to roles - and three core capabilities – data, insight and action • Data Governance  Define the appropriate Data Roles in the organisation  the governance structure must be federated, with a central governing body addressing the most important, common data and most of the data managed locally in the lines of business. Improve Data Quality Improve Data Accessibility T-Systems | Big Data 14.11.2013 12
  14. 14. SOME NEW ROLES IN DATA/ANALYTICS THE COMING OF AGE OF DATA IN THE ENTERPRISE        The Data Scientist The Chief Data Officer Data Hygienist/Data Steward Data Explorer Business Solution Architect/Domain Expert Campaign Expert Data Security Officer 50% Big Data talent gap expected until 20184 4 McKinsey Global Institute 2011, Big data: The next frontier for innovation, competition, and productivity T-Systems | Big Data 14.11.2013 13
  15. 15. MANY ORGANISATIONS RESEMBLE THIS TODAY HOW DOES HADOOP COMPLEMENT EXISTING INVESTMENTS IN BUSINESS INTELLIGENCE? Business Intelligence Tools and analytical applications Reporting Data Warehouse Dashboard Appliance OLAP Data Mining Data Mart Cube Data integration ETL Transactional OLTP DBMS Business Applications ERP, CRM, etc. Existing data sources T-Systems | Big Data 14.11.2013 14
  16. 16. HADOOP COMPLEMENTS EXISTING BI INVESTMENT Business Intelligence Tools and analytical applications Reporting Dashboard OLAP Data & Text Mining Predictive Analytics Complex event processing Stuctured and unstructured data Data Warehouse Appliance Data integration ETL Transactional OLTP DBMS Business Applications ERP, CRM, etc. Existing data sources Operational Intelligence Data Mart Cube Real-time data processing and analysis Static data Flowing data Hadoop, NoSQL, Log-Data Cloud SaaS New data sources T-Systems | Big Data 14.11.2013 15
  17. 17. HOW USE CASE SEGMENTATION DRIVES SOLUTION DESIGN AND TECHNOLOGY SELECTION USE CASE POTENTIAL TOOL Real-time Reporting of SAP OLTP data, including joins and data transformations SAP HANA Summarise Unstructured DATA LOGS (scheduled) HADOOP MAP/REDUCE Realtime reporting of Summarised Data Logs, with Joins to other NON OLTP Data IMPALA Near Realtime reporting of Social Media Data IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data) Realtime reporting of recent OLTP data joined with recent Social Media Data HANA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data and load into HANA) Image Analysis Processing (scheduled) HADOOP MAP/REDUCE (scheduled job runs sophisticated analysis of Video files and stores results in a structured file) Image Analysis Reporting IMPALA (to report on results file) Predictive Analysis Reporting (comparing OLTP & NON OLTP DATA) HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer applicable Historic or relevant Non OLTP Data to HANA) T-Systems | Big Data 14.11.2013 16
  18. 18. HOW USE CASE SEGMENTATION DRIVES SOLUTION DESIGN AND TECHNOLOGY SELECTION T-Systems | Big Data 14.11.2013 17
  19. 19. SUMMARY Data Volumes are here to stay Hadoop is getting more powerful, more realtime and easier to use Hadoop is not your Big Data answer – it is part of your BI and Big Data ecosystem An Enterprise Data Strategy and Data Governance is critical to success Make sure you have two conversations in your enterprise • A Business Conversation about the business values from your BI Ecosystem • An IT Conversation to ensure your IT Organisation understands the new world of BI, the shortcomings, the strengths and roles of the component technologies “What matters is how — and why — vastly more data leads to vastly greater value creation. Designing and determining those links is typically in the province of top management” but needs to be facilitated by the IT Organisation in Business terms T-Systems | Big Data 14.11.2013 18
  20. 20. A PARTING THOUGHT HADOOP (AND BIG DATA) IS 4 V‘S NOT JUST 3 ANALYTICS creates VALUE value comes from knowing more than the rest T-Systems | Big Data 14.11.2013 19
  21. 21. Backup
  22. 22. AGENDA Where are we with Big Data and Hadoop at the end of 2013? What is the disruptive innovation in Hadoop? What are target use cases, horizontally and telco-specific? How do you start realizing value from Hadoop today? What are the prerequisites for an effective value derived from Hadoop? How does Hadoop complement existing investments in business intelligence? How use case segmentation drives solution design and technology selection T-Systems | Big Data 14.11.2013 21
  23. 23. LEARNING THE LANGUAGE OF BIG DATA ZooKeeper Matlab GreenPlum Talend Ruby Redis Shep InfoChimps Hbase Jaspersoft C++ Java Pig Platfora Hive Continuity MapReduce NoSQL Aster Hadoop Tableau Kafka MongoDB GoPivotal Python Nutch Neo4j Cassandra Avro Pentaho Riak R Skytree Splunk Karmasphere Studio HDFS Chukwa CouchDB JRuby T-Systems | Big Data 14.11.2013 22
  24. 24. LEARNING THE LANGUAGE OF BIG DATA T-Systems | Big Data 14.11.2013 23

×