IBM Big Data Platform Nov 2012


Published on

Presented by Reto Cavegn at the 4th meeting: We would like to present IBM's view on BigData, what the market is requiring, and what products and strategies are evolved out of this requirements. Futher, we will present some reference projects to show, on what use cases customers are working today and what challanges our customers try to solve with BigData. Let me round up with some challenges and lessons we have learned.

Published in: Technology
1 Comment
  • Nice presentation! Thank you for upload it Reto
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

IBM Big Data Platform Nov 2012

  1. 1. IBM Big Data Platform Overview and Use CasesReto CavegnInformation Management Tech-SalesIBM 23, 2012 © 2012 IBM Corporation
  2. 2. What is ‚Big‘? SI Binary kilobyte (kB) 103 1010 megabyte 106 1020 (MB) gigabyte (GB) 109 1030 terabyte (TB) 1012 1040 petabyte (PB) 1015 1050 exabyte (EB) 1018 1060 zettabyte (ZB) 1021 1070 2009 Internet: 500 exabytes 2012 Global Data: 2.7 zettabytes (IDC)264 -1 grains of Rice = 922’337’000’000 t2010 Rice production globally: 672’017’598 t 2 © 2012 IBM Corporation
  3. 3. The Characteristics of Big Data Cost efficiently Responding to the Collectively analyzing processing the increasing Velocity the broadening Variety growing Volume 50x 30 Billion 35 ZB RFID 80% of the sensors and worlds data is counting unstructured 2010 2020 Establishing the 1 in 3 business leaders don’t trust Veracity of big the information they use to make data sources decisions 3 © 2012 IBM Corporation
  4. 4. There are Many Use Cases for Big Data Know Everything about your Customer Social media customer sentiment analysis Innovate New Products Promotion optimization at Speed and Scale Segmentation Social Media - Product/brand Sentiment Customer profitability analysis Click-stream analysis Brand strategy CDR processing Market analysis Multi-channel interaction analysis RFID tracking & analysis Loyalty program analytics Transaction analysis to create insight- Churn prediction based product/service offeringsRun Zero Latency Instant Awareness ofOperations Risk and Fraud Multimodal surveillance Smart Grid/meter management Cyber security Distribution load forecasting Fraud modeling & detection Sales reporting Risk modeling & management Inventory & merchandising optimization Regulatory reporting Options trading ICU patient monitoring Disease surveillance Transportation network optimization Store performance Exploit Instrumented Assets Environmental analysis Experimental research Network analytics Asset management and predictive issue resolution Website analytics IT log analysis 4 © 2012 IBM Corporation
  5. 5. Leveraging Big Data Requires Multiple Platform Capabilities Understand and navigate Federated Discovery and Navigation federated big data sources Manage & store huge Hadoop File System volume of any data MapReduce Structure and control data Data Warehousing Manage streaming data Stream Computing Analyze unstructured data Text Analytics Engine Integrate and govern all Integration, Data Quality, Security, data sources Lifecycle Management, MDM 5 © 2012 IBM Corporation
  6. 6. Business-centric Big Data enables you to start with a critical businesspain and expand the foundation for future requirements “Big data” isn’t just a technology—it’s a business strategy for capitalizing on information resources Getting started is crucial Success at each entry point is accelerated by products within the Big Data platform Build the foundation for future requirements by expanding further into the big data platform 6 6 © 2012 IBM Corporation
  7. 7. 1 – Unlock Big Data Customer Need – Understand existing data sources – Expose the data within existing content management and file systems for new uses, without copying the data to a central location – Search and navigate big data from federated sources Value Statement – Get up and running quickly and discover and retrieve relevant big data – Use big data sources in new information- centric applications Customer examples – Proctor and Gamble – Connect employees with a 360° view of big data sources 7 © 2012 IBM Corporation
  8. 8. Airbus put 50 new planes in the air without additional 24x7 support person Capabilities Utilized: InfoSphere Data Explorer (Vivisimo) • Deliver airplanes without adding FTEs • Securely leverage web-based supply- chain visibility • Securely access repositories across the enterprise • Reduce AOG average resolution time from 50 min to 15 min • Compliance regs in 150 countries. Reduced compliance costs globally 5-25%8 8 © 2012 IBM Corporation
  9. 9. 2 – Analyze Raw Data Customer Need – Ingest data as-is into Hadoop and derive insight from it – Process large volumes of diverse data within Hadoop – Combine insights with the data warehouse – Low-cost ad-hoc analysis with Hadoop to test new hypothesis Value Statement – Gain new insights from a variety and combination of data sources – Overcome the prohibitively high cost of converting unstructured data sources to a structured format – Extend the value of the data warehouse by bringing in new types of data and driving new types of analysis – Experiment with analysis of different data combinations to modify the analytic models in the data warehouse Customer examples – Financial Services Regulatory Org – managed additional data types and integrated with their existing data warehouse 9 © 2012 IBM Corporation
  10. 10. Vestas optimizes capital investments based on 2.5 Petabytes of information. Capabilities Utilized: BigInsights Hadoop System Data Warehousing • Model the weather to optimize placement of turbines, maximizing power generation and longevity. • Reduce time required to identify placement of turbine from weeks to hours. • Incorporate 2.5 PB of structured and semi-structured information flows. • Data volume expected to grow to 6 PB.10 10 © 2012 IBM Corporation
  11. 11. Cisco turns to IBM big data for intelligent infrastructure management Optimize building energy consumption with centralized monitoring and control of building monitoring system Automates preventive and corrective maintenance of building corrective systems Uses Streams, InfoSphere BigInsights and Cognos - Log Analytics - Energy Bill Forecasting - Energy consumption optimization - Detection of anomalous usage - Presence-aware energy mgt.11 11 - Policy enforcement 2012 IBM Corporation ©
  12. 12. 3 – Simplify your Warehouse • Customer Need – Business users are hampered by the poor performance of analytics of a general-purpose enterprise warehouse – queries take hours to run – Enterprise data warehouse is encumbered by too much data for too many purposes – Need to ingest huge volumes of structured data and run multiple concurrent deep analytic queries against it – IT needs to reduce the cost of maintaining the data warehouse • Value Statement – Speed and Simplicity for deep analytics (Netezza) – 100s to 1000s users/second for operation analytics (IBM Smart Analytics System) • Customer examples – Catalina Marketing – executing 10x the amount of predictive workloads with the same staff12 12 © 2012 IBM Corporation
  13. 13. Catalina Marketing increased coupon redemption rates by 30% while running 70x more queries on 5x data Capabilities Utilized: IBM Netezza Delivering personalized coupons to shoppers in real time Store and access 400B market basket records to provide personalized experience““Because of (Netezza’s) in-database technology,we believe well be able to do 600 predictive 600 predictive models per year, 10Xmodels per year (10X as many as before) with the as many as beforesame staff." Eric Williams CIO and executive VP 13 © 2012 IBM Corporation
  14. 14. 5 – Analyze Streaming Data Customer Need – Harness and process streaming data sources – Select valuable data and insights to be stored for further processing – Quickly process and analyze perishable data, and take timely Streaming Data action Sources Streams Computing Value Statement – Significantly reduced processing ACTION time and cost – process and then store what’s valuable – React in real-time to capture opportunities before they expire Customer examples – Ufone – Telco Call Detail Record (CDR) analytics for customer churn prevention 14 © 2012 IBM Corporation
  15. 15. KTH Swedish Royal Institute of Technology Reducing Traffic Congestion Capabilities Utilized: Stream Computing • Deployed real-time Smarter Traffic system to predict and improve traffic flow. • Analyzes streaming real-time data gathered from cameras at entry/exit to city, GPS data from taxis and trucks, and weather information. • Predicts best time and method to travel such as when to leave to catch a flight at the airport Significant benefits: • Enables ability to analyze and predict traffic faster and more accurately than ever before • Provides new insight into mechanisms that affect a complex traffic system • Smarter, more efficient, and more environmentally friendly traffic15 15 © 2012 IBM Corporation
  16. 16. Eurovision 16 © 2012 IBM Corporation
  17. 17. Architecture Rapport quotidien InfoSphere BigInsights17 © 2012 IBM Corporation
  18. 18. EuroBuzz : real time (after contest)18 © 2012 IBM Corporation
  19. 19. EuroBuzz : real time (72 hours before the contest) Winner : Sueden19 © 2012 IBM Corporation
  20. 20. The Platform Advantage The platform provides benefit as you Analytic Applications move from an entry point to a second BI / Exploration / Functional Industry Predictive Content BI / Reporting Visualization App App Analytics Analytics and third project Reporting Shared components and integration IBM Big Data Platform between systems lowers deployment Visualization Application Systems costs & Discovery Development Management Key points of leverage – Reuse text analytics across Streams and Accelerators Hadoop – HDFS connectors between Streams and Hadoop Stream Data Information Integration System Computing Warehouse – Common integration, meta data and governance across all engines – Accelerators built across multiple engines – common analytics, models, and visualization Information Integration & Governance 20 © 2012 IBM Corporation
  21. 21. Big Data Accelerators Make it Easier than Ever toBuild Big Data Applications IBM Accelerator for Social Data Analytics • B2C businesses • Sample applications: Customer acquisition / retention, Customer Segmentation or Micro Segmentation, Marketing Campaign Optimization, Lead generation, Brand Management or Surveillance • Ships with BigInsights v2 and Streams v3 IBM Accelerator for Machine Data Analytics • Cross-industry: manufacturing, oil & gas, energy and utility, healthcare, travel and transportation, CPG, Retail, etc. • Operational efficiency monitoring, security incident investigation. proactive maintenance, troubleshooting, outage prevention, efficiency tracking, etc • Ships with BigInsights v2 IBM Accelerator for Telco Event Data Analytics • Telcos • Campaign management, real-time promotion, fraud detection, service assurance and network monitoring, • Ships with Streams v3, but works with BigInsights or PureSparta for Analytics (a.k.a. Netezza)21 © 2012 IBM Corporation
  22. 22. Big data made simple: Everyone can develop andleverage big data AdministratorsUnlock the value within data:, manage, and optimize data• Enable all roles of an organization to access and analysis operationscollaboratively leverage the value of the data• Bring all relevant data together for analysis, GPSeliminating silos External DataBusiness Users Business Executives ...get real-time reports and analysis ...offer personalized based on data inside as well as price promotions to outside the enterprise (web, social different customer media etc.) segments in real-time Business Analysts ... analyze social media buzz for the new services/offerings to gauge initial success and any course correction needed Developers ... develop new Apps and detailed algorithms in response Business Development to user and business ... find and deliver new mechanisms to monetize requirements network traffic and partner with upstream content Data Scientists Familiar and effective concepts used in new ways ... analyze subscriber usage pattern make big data consumable: providers in real-time and combine that with the • Each role can create Applications profile for delivering promotional or • Spreadsheet-style interface to analyze data retention offers • Apps and “App Store” to build reusable applications 22 • Dashboards and Visualization © 2012 IBM Corporation
  23. 23. People giving the Right tools & info is Essential© 2012 IBM Corporation 23 © 2012 IBM Corporation
  24. 24. Where to Find More Information Free Book in PDF Format Harness the Power of Big Data: The IBM Big Data Platform Free Download of InfoSphere BigInsights from InfoSphere BigInsights Tech Enablement Wiki InfoSphere BigInsights Information Center InfoSphere Streams Information Center InfoSphere Streams Wiki Home 24 © 2012 IBM Corporation
  25. 25. Reto Cavegn IBM Switzerland Ltd. Senior IT Specialist Vulkanstrasse 106 P.O. Box IBM Software Group CH-8010 Zürich Mobile +41 79 201 5650© 2012 IBM Corporation 25 © 2012 IBM Corporation
  26. 26. THINK26 26 © 2012 IBM Corporation
  27. 27. BigInsights Backup Slides 27 © 2012 IBM Corporation
  28. 28. BigInsights enterprise edition components IBM InfoSphere BigInsights Visualization & Discovery Applications & Development Administration Integration BigSheets Admin Console JDBC Apps Text Analytics MapReduce Dashboard & Workflow Pig & Jaql Hive Monitoring Visualization Netezza Advanced Analytic Engines DB2 Text Processing Engine & R Adaptive Algorithms Extractor Library (AQL+HIL) Streams Workload Optimization Integrated Enhanced Splittable Text Adaptive DataStage Installer Security Compression MapReduce ZooKeeper Flexible Guardium Oozie Jaql HCatalog Scheduler Lucene Pig Hive Index Platform Computing Runtime MapReduce Management Cognos Security Data Store Audit & History Flume HBase Column Store Lineage File System Sqoop HDFS Open Source IBM28 © 2012 IBM Corporation
  29. 29. BigInsights 2.0 includes the latest open source versions Open Source levels across distributions Big HortonWorks MapR Greenplum Cloudera Cloudera Component Insights HDP 1.1 2.0 HD 1.1 CDH3u5 CDH4* 2.0 Hadoop 1.0.3 1.0.3 0.20.2 1.0.0 V0.20.2 2.0.0 * HBase 0.94.0 0.92.1 0.92.1 0.90.4 0.90.6 0.92.1 Hive 0.9.0 0.9.0 0.9.0 0.7.1 0.7.1 0.8.1 Pig 0.10.1 0.9.2 0.10.0 0.9.1 0.8.1 0.9.2 Zookeeper 3.4.3 3.3.4 X 3.3.3 3.3.5 3.4.3 Oozie 3.2.0 3.1.3 3.1.0 X 2.3.2 3.1.3 Avro 1.6.3 X X X X X Flume 0.9.4 1.2.0 1.2.0 X 0.9.4 1.1.0 Sqoop 1.4.1 1.4.2 1.4.1 X 1.3.0 1.4.1 HCatalog 0.4.0 0.4.0 X X X XBigInsights continues to offer the most proven, stable versions of Apache Hadoop components *Cloudera CDH4 Hadoop 2.0 includes Map Reduce 2.0 which Cloudera states “not yet considered stable”29 © 2012 IBM Corporation
  30. 30. Social Data Analytics Accelerator (included in BigInsights and Streams)What does it do? Provides the ability to analyze large volumes of various types of social media data with real-time processing Social Data AnalyticsWhy should you care? It enables clients to easily obtain insights necessary for: – Effective/targeted Marketing Campaigns – Timely product/marketing decisions – Gaining competitive Intelligence – Building customer retention and new customer acquisition programsExample Application : Movie Campaign Effectiveness• Large Movie Studio wants to understand reaction of movie commercials around events (e.g., SuperBowl)• Over 30 Million social media consumer profiles built and used in the analysis• Real-time summary of insights correlated with the airing of the commercial30 30 © 2012 IBM Corporation
  31. 31. Machine Data Analytics Accelerator (included in BigInsights)What does it do? Provides the ability to ingest, parse and extract a wide variety of machine data – Faceted search enables easy navigation and discovery Machine Data Analytics – Visualization enables easy analysis of the dataWhy should you care? It enables clients to gain insights, beyond what was traditionally possible, into operations, customer experience, transactions and behavior, processing machine data in minutes instead of days and weeks With these insights, clients can: – Proactively plan to increase operational efficiency – Troubleshoot problems and investigate security incidents – Monitor end-to-end infrastructure to avoid service degradation or outagesExample Application: Facilities Management• Use real time data from building devices such as meters, sensors and motion detectors to monitor and manage power usage31 31 © 2012 IBM Corporation
  32. 32. Telecommunications Event Data AnalyticsAccelerator (included in Streams)What does it do? Provides full application for transformation and analytics for telephone company call and event detail records – Revenue assurance and fraud detection in real time TelecommunicationsWhy should you care? Event Data Analytics Enables telecommunications companies to gain billing insights based on services, vendors and business lines With these insights, telco companies can: – Create service differentiation – Strengthen customer loyalty and reduce churn – Provide targeted services – Personalized billing – High-quality customer experienceExample Application: Asian telco company• Real-time mediation and analysis of 6B CDRs per day• Data processing time reduced from 12 hrs to 1 sec• Hardware cost reduced to 1/8th32 32 © 2012 IBM Corporation
  33. 33. And Watson as an alternative way forward In February, 2011, an IBM supercomputer called Watson, which was built for deep question and answer leveraging Big Data & Natural Language processing, beat the two all time champions of the popular U.S. question and answer game show, “Jeopardy!”. Since winning Jeopardy, IBM has focused the Watson team on leveraging this technology to solve our clients’ real world problems IBM Watson client inquiries follow 5 different use cases Use Case Overview Sample Inquiries Improve effectiveness of front line Payer + Provider for patient Diagnosis & Action workers (e.g. doctors, mechanics, diagnosis and treatment financial advisors) focused on a single (e.g. Wellpoint). Vehicle case for a single client diagnosis and maintenance. Contact Center Improve effectiveness of contact Banking, Telco contact Ctr centers (or self service portals) by and Tech Help Desk for managing knowledge bases & improved contact centers. incorporating client data Ask IBM Watson. R&D Support Accelerate and reduce the cost of Pharma, Chemical, Refining research and development by research and development. uncovering rare insights that may solve research problems Process Optimization Identify areas for improvement in Reducing congestive heart overall processes by analyzing failure readmissions (e.g. unstructured data supporting process Seton Healthcare) steps and output Fraud / Risk Management Identify early signs of fraud or best Additional evidence and practices for managing risk in order to research into potential lower overall liability and costs of doing contractor fraud. Advanced business insight re: risk of investment. “Watson is going to revolutionize many, many industries and it will fundamentally change the way we interact with computers & machines.” John Kelly, SVP & Head of IBM Research 33 © 2012 IBM Corporation