Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Hadoop for Cognitive Analytics

1,276 views

Published on

Using Hadoop for Cognitive Analytics

Published in: Technology
  • Be the first to comment

Using Hadoop for Cognitive Analytics

  1. 1. Using Hadoop for Cognitive Analytics Pedro Desouza, Ph.D. Associate Partner Big Data & Analytics Center of Competence IBM Global Business Services June 29, 2016
  2. 2. © 2016 IBM Corporation Global Business Services Outline 2 P Metro Pulse: Enhancing Decision Making Processes With Hyperlocal Data DashboardsP Use Cases In Multiple IndustriesP Geographic Hierarchies, External Metrics, and Mapping RepresentationP Integration Of External and Customer-Specific MetricsP Solution ArchitectureP Technological ComponentsP Micro Services for Data Ingestion and CurationP
  3. 3. © 2016 IBM Corporation Global Business Services Improving Decision Making Accuracy by Combining Business Metrics with Hyperlocal Data 3 Weather Social Media Sentiment Economics… Events Thousands of them together, on a single repository Other Points of Interests Subway Stations Demographics Hyperlocal Data Business decision can be made on precise hyperlocal context for each store Store Context Combiningbusinessmetricsofeachstore withhyperlocaldataprovidesinsightsvia visualinspectionandadvancedanalytics Demand Forecast, Marketing Campaign, Distribution Plan and many other business decisions are usually based on aggregate levels of data that don’t precisely consider the context where the business operates. Stores in London
  4. 4. © 2016 IBM Corporation Global Business Services Improving Forecast Accuracy with External Data 4 Traditional Method: Neuron Net, ARIMA… Forecast based on Neural Network with External Data: 23.9% better accuracy Actuals of a retail store Riemer, M., Vempaty, A., Calmon, F., Heath, F., Hull, R., and Khabiri, E., Correcting Forecast with Multifactor Neural Attention, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. http://jmlr.org/proceedings/papers/v48/riemer16.pdf T. J. Watson IBM Research Center
  5. 5. © 2016 IBM Corporation Global Business Services Same color on the map  Similar context considering all external metrics 5 Retail Use Case: Identification of Low/High Performers Groups of similar stores in locations with similar hyperlocal contexts Category: “All Products’, “Electronics”, or “Cosmetics”… Top Performer Top Performer Top Performer Top Performer Group 1 Baseline Group 2 Baseline Group 3 Baseline Group 4 Baseline Potential Revenue Increase: Rev Inc G1= ∆𝑖 Rev Inc G2= ∆𝑖 Rev Inc G3 = ∆𝑖 Rev Inc G4 = ∆𝑖 Micro-Segmentation + External Metrics  Higher Accuracy for Root Cause Analysis and Revenue Increase
  6. 6. © 2016 IBM Corporation Global Business Services Population Movement Analytics 6 Store in Dallas Close, but few visits. Why? 15%20% 7% 9% 12% 5% of visits Percentage of visits based on buyer’s Home Location, obtained via anonymous app use analysis. 18% Potential location for a new store. Advertisement • Population demographics • Where people are and go P Market Campaign • Interests of each region (% of visits) • Population density P Other Use Cases City Planning • Traffic growth • Precise route • Emergency Services P
  7. 7. © 2016 IBM Corporation Global Business Services Telecommunication Use Cases: Quality of Services (Tower Location) 7 Affluent Houses  Life Time Revenue (LTR) High Low Medium Congestion High Low Medium Intuition: New tower Max LTR: Ideal position for a new tower Congestion Famous band free show, Saturday, 9-11PM: Tower will be over capacity Schedule a mobile base antenna during event
  8. 8. © 2016 IBM Corporation Global Business Services Use cases are countless… Banking and Finance 1. Branch Segmentation / New Market Opportunities 2. Cash Demand Forecasting 3. Promotion Customization 4. Staffing Mix / Specialty Account Services 5. Customer Churn 6. ATM Kiosk-to-Location Ratio Optimization Retail 1. Uncaptured Opportunity 2. Assortment Optimization 3. Out of Stock 4. Demand Forecasting 5. Dynamic Pricing 6. Promotion Effectiveness Insurance 1. Risk Management and Pricing Optimization 2. Portfolio Suitability 3. Demand Forecasting 4. Staffing Mix / Specialty Account Services 5. Damage Forecasting City Analytics Industry Use Cases Consumer Packaged Goods 1. Product mix 2. Out of Stock 3. Visibility 4. Expansion Opportunity 5. Customer Churn 6. Promotion Effectiveness Travel and Transportation 1. Booking Traffic Forecasting Based on POIs 2. Service Relative Pricing Model 3. Promotion Customization 4. Amenity Mix 5. Cancellation Forecasting Telecommunications 1. Customer Churn 2. Package/Service Offering Optimization 3. Coverage Optimization 4. New Product Demand 5. Device Repair Services 6. Service Outage Forecasting 8
  9. 9. © 2016 IBM Corporation Global Business Services Geographic Hierarchy, External Metrics, and Polygons 9 Rockaways Manhattan Soho Midtown Brooklyn Queens Southern Eastern Central External Metric Data Point Domain defined by coordinates: Temperature at (x,y) is 72 F. (x,y) External Metric Data Point Domain defined by a node of the hierarchy: It’s raining in Queens.  It’s raining in all polygons under Queens. Level 0 Level 1 Level 2 New York ManhattanBrooklyn Queens Soho Midtown RockawaysCentralSouthern Eastern Nodes ((lat lon, lat lon, … , lat lon)) ((lat lon, lat lon, … , lat lon), (lat lon, lat lon, … , lat lon)) Polygon 1 Polygon 2 Polygon 3 Rockaways: Central: Most cities have files with the boundaries of sub-regions represented as polygons:
  10. 10. © 2016 IBM Corporation Global Business Services Associating External and Internal Contexts 10 External Metrics, Events, News… Geographic Hierarchy Polygons Prime Entities (Stores, Towers, ATM…) Customer- Specific Metrics Customer Hierarchies (Product, Sales…) External/Public Context Internal/Customer-Specific Context Coordinates of Prime Entities of any customer can instantly leverage the external context associated to polygons Easily replaced for any customerSame for all customers IBM Metro Pulse Solution
  11. 11. © 2016 IBM Corporation Global Business Services Fundamental Polygon Functions 11 2) polygons_intersection(“Polygon P”, “Polygon Q”) 1polygons_intersection(“Pol 1”, “Pol 2”) 0polygons_intersection(“Pol 1”, “Pol 3”) Pol 1 Pol 2 Pol 3 Data Quality: No two polygons under the same hierarchy can intersect on any point other than on the edges or vertices. 1) point_in_polygon(“Point X”, “Polygon P”) Pol 1 Pol 2 Pol 3 Pol 4 A B C 1point_in_polygon(“A”, “Pol 2”) 0point_in_polygon(“B”, “Pol 3”) Data Quality: All Prime Entities and Points of Interest must belong to one and only one polygon in each geographic hierarchy.
  12. 12. © 2016 IBM Corporation Global Business Services External Data Normalization Via a Reference Polygon 12 Reference Polygon Pol 1 Pol 2 Pol 3 Pol 4 Metric 1: Original Pol 1 Pol 2 Pol 3 Pol 4 “Metric 1” values are based on a set of polygons that don’t match the reference polygon. Pol 1 Pol 2 Pol 3 Pol 4 Metric 1: Normalized Different types of metrics (e.g., count, temperature) require different types of aggregation methods.
  13. 13. © 2016 IBM Corporation Global Business Services External Data Landing Zone IBM Data Lake … Metro Pulse High Level Architecture 13 Global Enriched City Repository External Data From Cities All Over The World) Geographic Boundaries, Polygons, and Hierarchies Analytics Workbench Customer G Analytics Workbench Customer J ... Customer G Specific Data Customer J Specific Data On Premise On Premise DaaS Cities relevant to Customer Z DaaS Cities relevant to Customer L DaaS Cities relevant to Customer K Customers interested in external data only. ... Analytics Workbench Customer A Analytics Workbench Customer B Analytics Workbench Customer F ... Cities relevant to Customer F Customer A Specific Data Customer B Specific Data Customer F Specific Data On the Cloud Analytics Workbench Gold Copy
  14. 14. © 2016 IBM Corporation Global Business Services Weather GBS Data Lake ExternalData byCity Twitter Census ... Geographical Borders, Polygons, and Hierarchies Metro Pulse Global City Repository (Curated Data) REST API Power Users LandingZone DaaS Metro Pulse Analytical Workbench Gold Copy (One Deployment per Customer) POS ATM Cell Towers ... Files, Tables SFTP / Direct Connections IngestionLayer Customer-SpecificData byCity/Site Metro Pulse Architecture – Version: 2.1 Performance Layer 14 Data Scientists Size of Prize Movement Analytics News Analysis ... Modeling Enhanced Forecast Customer- Specific City Repository Core Analytics Parameters Repository Sandbox DaaS Visualization Business User Power Users AccessServices RESTAPI
  15. 15. © 2016 IBM Corporation Global Business Services D3… Data Lake Analytics Workbench Data Flow 15 Raw Internal Data Raw Internal Data Clean Internal Data SFTP Validated Internal Data Tabular Internal Data Derived Data Consumable Data Visualized Data Raw External Data Raw External Data Clean External Data Validated External Data Tabular External Data Published Data Cached Published Data Data Samples Results New Core Analytics Sandbox Published in Production Published in Production Data Samples Results New Analytics Sandbox Published in Production Hadoop Cluster: HDFS and HBASEStaging NodeCustomer’s Site Cassandra Redis User’s Additional Data Customer’s Site User’s Database Customer’s Site Spark Spark Integrated Data Node.js Node.js Micro services reusable not only for other customers, but also for other solutions
  16. 16. © 2016 IBM Corporation Global Business Services Micro Services for Data Ingestion and Curation 16 Data Sources Ingestion Engine RDMBS Structured Files Unstructured Copy Data HadoopEdge Node Analytic Persistence Curation Engine Hadoop, HBASE. Cassandra, Redis… Get Data Raw Data Store Prepare Raw Data Curate Data Transform / Enrich Data Conformed/ Polyglot Data Store 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2322 2322 1 2 3 19 Reference Data Lookup 20 Transform Data 21 Enrich Data 22 Archive 23 Purge 1 Error & Exception Processing 2 Configuration set up 3 Audit, Balance, & Control 4 Transport Data from Source to Edge Node 5 Convert Data Formats 6 Copy/Move Data to Hadoop 7 Preprocessing Service 8 Technical Data Validation (TDQ) 9 Source Delta Processing 10 Persist Raw Data 11 Catalog Raw Data 12 Profile Data 13 Cross File Analysis 14 Causality Analysis 15 Target Load Service 16 Business Data Validation 17 Merge / Match 18 Manage Keys Micro Services
  17. 17. © 2016 IBM Corporation Global Business Services Loading Geographic Hierarchy to HBASE Table L0 Row Desc … London London is… Great Britain… Paris Paris is… Continental Europe… Table L1 Row Desc History… London:Central … …… London:North … …… HistoryName London Paris Central North Name … Paris:Central … ……Central … … Table L2 Row Desc History… London:Central:Kensington … …… London:Central:Buckingham … …… Kensington Buckingham Name … Table L3 Row Desc History… London:Central:Kensington:Notting Barns … …… … … …… Notting Barns … Name … P1 P2 P3 PN… P1 P2 P3 PN… P1 P2 P3 PN… P1 P2 P3 PN… Column Family: Data Column Family: Polygons Column Family: Data Column Family: Polygons Column Family: Data Column Family: Polygons Column Family: Data Column Family: Polygons 50484 51673 54735 53896 75736 78493 78303 79659 50484 51673 54735 50484 51673 50484
  18. 18. © 2016 IBM Corporation Global Business Services Metro Pulse Analytical Workbench Edge Node Flume Agent: Tweets Flume Agent: Weather Flume Agent: News Hadoop Data Nodes: HDFS Tweets Weather News ... Metro Pulse Analytical Workbench Edge Node Flume Agent: Tweets Flume Agent: Weather Flume Agent: News Hadoop Data Nodes: HDFS Tweets Weather News ... Metro Pulse Analytical Workbench Edge Node Flume Agent: Tweets Flume Agent: Weather Flume Agent: News Hadoop Data Nodes: HDFS Tweets Weather News ... Easy to broadcast same data to multiple customers. Easy to add new customers. Metro Pulse Analytical Workbench Edge Node Flume Agent: Tweets Flume Agent: Weather Flume Agent: News Hadoop Data Nodes: HDFS Tweets Weather News ... Ingesting External Data via Flume 18 Flume Agent: Tweets Flume Agent: Weather Flume Agent: News ...Metro Pulse Global Repository Flume Server Global City Repository Tweets Weather News Internet Agents can be optimally configured according to the data sources characteristics Each agent writes to a different HDFS folders: no conflict, good for parallel execution Each source is captured as a HBASE column family One data source per agent: easy to add new sources
  19. 19. © 2016 IBM Corporation Global Business Services Performance Layer 19 - V_Transaction - V_Level_Entity - V_Polygon_Entity - V_Size_of_Prize ... Cache Manager Get_View(“XYZ”) - V_Level_Entity - V_Size_of_Prize API - If “XYZ” in Redis, return “XYZ” - Else: - Get “XYZ” from Cassandra - Return “XYZ” to the API - Load “XYZ” to Redis “XYZ” Eviction Policy: Less Recently Used Sub-second latency and high throughput Dashboards small files High throughput for large files  DaaS
  20. 20. © 2016 IBM Corporation Global Business Services Sample of Visualization Objects on D3.js 20
  21. 21. © 2016 IBM Corporation Global Business Services 21

×