Harnessing Big Data in Real-Time

2,414 views

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,414
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Harnessing Big Data in Real-Time

  1. 1. Harnessing Big Data in Real-Time John Schitka, SAP, Big Data
  2. 2. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 2 Information Processing is at a critical inflection point Real-time Business Requirements Real-time bonus calculations for consumers Sales Customer Service Customer overdue credit calculation by product areas Finance and Operations Iterative period end closing with new posting into accounts constantly Manufacturing New ATP strategies; MRP run for individual ATP check/instant re-planning IMPACT ON BUSINESS Slow Response Times | Usability Challenges | Lack Of Adaptability IMPACT ON IT High Latency | Complexity | High Cost of Solutions Transactional Datastore Data Warehouse Sensors Data Mobile Data Archives Social & Text Geo-Spatial Location Intelligence Order Processing Operational Reporting RT Risk & Fraud Trend Analysis Sentiment Analytics Predictive Analytics Pattern Recognition Analyze ETL Staging Collect Clean-Data Quality Transact Aggregate Summarize Communicate Monitor Predict Planning 0 1 Point optimization is no longer enough for real-time business
  3. 3. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 3 H2 – the Power of HANA and Hadoop Instant Platform SAP HANA Infinite Store HADOOP Real-Time Predictive Analytics SAP Analytical Applications, BI and Infinite Insight Combine INSTANT Results with INFINITE Store
  4. 4. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 4 Open Hadoop Strategy Big Data Science Services SAP HANA Platform SAP Data Services Data ConnectorsAcquire Accelerate Analyze Sybase IQ SAP HANA GeospatialPredictive Text Analysis Visualize and Act Industry/LOB Apps Custom AppsAnalytic Apps SQL XS EngineR
  5. 5. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 5 SAP Big Data Solutions Architecture DataIngestionAcquisition Processing Engine Application Function Libraries & Data Models Database Services (OLTP + OLAP) Extended Application Services Integration Services SAP HANA PLATFORM. Unified AdministrationApplication Development Custom Apps Mobile Apps Big Data Apps ERP Apps SAP Analytics Smart Data Access Transfer Datasets SAP IQ Web / Sensor Call Center Other Data Sources SAP SLT / Rep Server SAP Data Services SAP SQL Anywhere SAP ESP Hadoop Adapter Hadoop Hive SAP ERP BW Hadoop Large Scale Data Capture, Generate Analytical Datasets, Train/Validate Predictive Models
  6. 6. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 6 SOURCES OLTP, ERP, CRM Systems Documents & Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geo-location Data SAP in the Modern Data Architecture OPERATIONS TOOLS Provision, Manage & Monitor DEV & DATA TOOLS Build & Test DATASYSTEMSAPPLICATIONS ROOMS Statistical Analysis BI / Reporting, Ad Hoc Analysis Interactive Web & Mobile Applications Enterprise Applications EDW MPP RDBM S EDW MPP Governance &Integration Security Operations Data Access Data Management HANA
  7. 7. 7© 2014 SAP AG or an SAP affiliate company. All rights reserved. 1GB– 3D CT Scan 150MB– 3D MRI 30MB – X-ray 120MB – Mammograms 300 TB+ 200 Cancer Genomes 200 TB+ All Known Variants 15 PB+ Broad & Sanger DB 800 MB Per Genome 20-40% annual increase in medical image archives Explosion of Biological Health Information Has Surpassed Human Cognitive Capacity BIGDATA 1990 Decisions by Clinical Phenotype Structural GeneticsFactsper Decision 2000 2010 2020 5 10 100 1000 Functional Genetics Proteomics and other effector molecules The Strategic Application of Information Technology in Health Care Organizations (Third Edition 2011) by John P. Glaser and Claudia Salzberg
  8. 8. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 8 Example: Data Analysis of Cancer Genome Goal: Analytics for Personalized Medicine
  9. 9. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 9 MKI Design Decisions to Improve Speed of Processing Use Hadoop for Pre-Processing; SAP HANA for Advanced Analytics
  10. 10. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 10 Genomic DNA analysis in real-time will transform how we enable comprehensive patient care to fight against cancer. SAP HANA will be the mission critical and reliable data platform to make real-time cancer analytics into a reality. Separately, our internal technical comparison demonstrated that SAP HANA outperforms a traditional disk-based system by factor of 408,000 when performing other types of data analysis. Yukihisa Kato, Director & Executive Officer, CTO, Research and Development Center, MITSUI KNOWLEDGE INDUSTRY CO.,LTD. Benefits  Accelerated predictive & correlation analysis with in- memory processing  Reduced time to detect variant DNA  Optimized treatment plans based on DNA mutations 408,000x faster than traditional disk-based systems in PoC 216x faster DNA analysis results - from 2-3 days to 20 minutes “ ” SAP HANA + HADOOP + R SAP HANA + Hadoop for Advanced Analytics Results: Deliver Personalized Results More Quickly
  11. 11. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 11 SAP Enterprise data Non-SAP Enterprise data Mobile data Machine data (Sensors, SCADA, Machine Logs, Etc.) Data Sources Analytics & Applications HANA In Memory Transactional Planning & Simulation GraphAnalytical Predictive Analysis Spatial Extended Storage (SAP IQ) TieredStorage(TimeCritical –LessTimeSensitive) SmartDataAccess Dashboard / Reporting in Real-Time Large Low Cost Data Platform (Hadoop) Stream Processing Real-Time Replication Synchronization Historical Data, Offline Batch Processes, Model Training etc. SAP HANA Platform for Big Data Transform High Volume, High Velocity data into High Value Data. Enable Real-Time Analytics. Use Cases •Energy Optimization •Predictive maintenance •Remote asset mgmt. •Supply/demand forecast •Inventory mgmt. •Route optimization Generic pattern 1: Machine Data Insight Prototypical Machine Data case Real-time data stream (Billions of events/day) Millions of events/day correlated with Enterprise Data Enable real-time operations, analysis and actions
  12. 12. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 12 SAP Enterprise data Non-SAP Enterprise data Click-Stream data Social data Data Sources Analytics & Applications HANA In Memory Transactional Planning & Simulation GraphAnalytical Predictive Analysis Spatial Extended Storage (SAP IQ) TieredStorage(TimeCritical –LessTimeSensitive) SmartDataAccess Dashboard / Reporting in Real-Time Large Low Cost Data Platform (Hadoop) Stream Processing Real-Time Replication Synchronization Historical Data, Offline Batch Processes, Model Training etc. SAP HANA Platform for Big Data Process high volume, high variety, high velocity data, offline & real-time. Enabling real-time analytics and actionable insight. Use Cases •Customer Behavior •Customer Segmentation •Customer Loyalty •Customer Churn •Online Consume Habits •Campaign Performance •Predictive Maintenance Generic pattern 2: Customer Insight Prototypical customer behavior analysis case Terabytes of data/month Millions of events/day correlated with Enterprise Data Enable actionable insight got targeted applications Historical Data Real-Time Offers
  13. 13. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 13 HANA integration with SAP Sybase IQ and Hadoop Real-time insights by managing & analyzing ALL enterprise data - Fluid Integration HANA Table SAP HANA in memory SAP Sybase IQ petascale HIVE IQ Table HDFS/MapReduce HANA Extended table Policy-based automatic data movement (SDA) Smart Data Access Hadoop massively parallel TableTable ODBC JavaUDF JavaUDF ETL ETL
  14. 14. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 14 SAP HANA and Hadoop Integration (ETL)  GUI for design & development  High performance reading from and loading into Hadoop  Extended optimizer: HIVEQL and PIG aware SAP HANA SAP Data Services  MapReduce pushdown  Text Data Processing (Entity Extraction)
  15. 15. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 15 Loading data from Hadoop into your database Job Process HQL Generator Hive Result set HDFS FileReader Process Data Database Loader ODBC/ JDBC driver SAP Data Services 1. Based on target, SAP Data Services translates queries into: o Hive Query Language (HQL)  Hive o Pig script  HDFS 2. Hive/Pig converts queries to Map/Reduce jobs 3. Result data files are generated on the HDFS system 4. SAP Data Services use multiple threads to process data from Hive/Pig 5. Optional transforms: Data quality operations 6. Load results into database 1 2 3 4 5 6 Pig Generator HDFS Join tables, order / filter data, apply functions Text data processing M/R M/R
  16. 16. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 16 Rapid Data Provisioning with Data Virtualization © 2013 SAP AG. All rights reserved. Application Merge Results SELECT from DB(x) SELECT from DB(y) SELECT from HIVE Application One SQL Script SAP HANA Virtual Tables Currently Supported DBs : SAP ASE, Oracle 12c, MS SQL Server v11, SAP IQ, Hadoop/HIVE, Teradata Data-Type Mapping & Compensate Missing Functions in DB Modeling Environment Modeling Environment Modeling Environment Modeling and Development Environment
  17. 17. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 17 MS SQL Server Oracle SAP HANA smart data access capability Data virtualization for on-premise and hybrid cloud environments Benefits  Enables access to remote data access just like “local” table  Provides SAP HANA to SAP HANA queries  Smart query processing including query decomposition with predicate push-down, functional compensation  Supports data location agnostic development  No special syntax to access heterogeneous data sources  Non-disruptive evolution Heterogeneous data sources  SAP HANA to Hadoop (Hive)  SAP HANA to Teradata  SAP HANA to SAP HANA  SAP HANA to SAP ASE, Oracle 12c, Microsoft SQL Server ver11  SAP HANA to SAP IQ Transactional + Analytical Teradata Hadoop SAP HANA ASE IQ SAP HANA Virtual TablesHANA Tables
  18. 18. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 18 SMART DATA ACCESS WITH VIRTUAL TABLES Location transparency of remote data is enabled by creating a local virtual table that maps to an existing object at the remote data source site. Example DDL: CREATE VIRTUAL TABLE my_schema.my_table AT remote_source.catalog.schema.object Remote Table datatypes, column definitions are used to create the Virtual table When Virtual table is created, HANA system catalog will be updated to include local column names/datatypes, remote names/datatypes, index information, etc. Table Table Virtual Table Remote Object (Table, View) Remote Catalog Object SAP HANA REMOTE SYSTEM
  19. 19. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 19 Example: Smart Data Access Steps For Creating and Using Virtual Tables 1. Create table in HIVE 2. On SAP HANA, create DSN, e.g. “hive1” 3. With SAP HANA Studio or using DLL command, create a remote source: oCREATE REMOTE SOURCE HIVE1 ADAPTER "hiveodbc" CONFIGURATION 'DSN=hive1' WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=dftest;password=dftest'; 4. Using a DLL command, create a virtual table for Hive:  CREATE VIRTUAL TABLE "HIVE1_PRODUCT" AT "HIVE1"."default"."default"."product"; 5. Execute a query on virtual table:  SELECT * FROM HIVE1_PRODUCT; 6. Drop a virtual table  DROP REMOTE SOURCE HIVE1 CASCADE;
  20. 20. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 20 Execute Query Query Feder- ation Split Query Execute Consolidate Execute Execute Query Federation Between SAP HANA and multiple data stores (including Hadoop) BI and analytics software from SAP In-memory Disk-based data ware- house (SAP Sybase IQ) … and/or ... Analytic engine Analytic engine Hadoop Data storage (Hadoop Distributed File system) Job Management Computation Engine(s) Hive HBase … Users
  21. 21. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 21 Example: UDF to invoke MapReduce job in Hadoop Creating UDF to Return Results as a Table For example: CREATE virtual FUNCTION word_count() RETURNS TABLE ( word NVARCHAR(60), count INT) PACKAGE “SYSTEM”.”WORD_COUNT” CONFIGURATION ‘sap.hana.hadoop.mapper=com.sap.hadoop.examples.WordCountMapper; sap.hana.hadoop.reducer=com.sap.hadoop.examples.WordCountReducer;sap.hana.hadoo p.input=’/path/to/input' AT ‘HS1' When UDF is created, we specify package WORD_COUNT, which is a Jar file contains JAVA MapReduce program to calculate word count.
  22. 22. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 22 Parallel load of valuable data Hortonworks Data Platform Data Reservoir Load, then transform at scale: (MR, Pig, Java) SAP/Hadoop ETL Rationalization (loading data faster) SAP HANA Real-Time Analytics, Interactive Data Exploration & Application Platform Federated Smart Data Access OLAP Engine Predictive Engine Spatial Engine Application Logic & Rendering(XS) Dataorchestration Services Batch TransactionalSystems,Databases, FlatFiles,BatchDataFeeds 2 3 Falcon 1 ► Low Latency ingestion of data from operational systems ► Tiered Storage model offers partitioning into Time Critical and less time sensitive data during ingestion. ► On-the-fly transformation for Time Critical Data can be performed in memory using HANA ► Off-load pre-processing of data to the Hadoop Platform
  23. 23. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 23 Hortonworks Data Platform Big Data Interactive Data Exploration SAP HANA Real-Time Analytics, Interactive Data Exploration & Application Platform Federated Smart Data Access OLAP Engine Predictive Engine Spatial Engine Application Logic & Rendering(XS) Dataorchestration Services Batch TransactionalSystems,Databases, FlatFiles,BatchDataFeeds ► Interactive high performance Analytics and Visualization ► Agile modeling and shorter turn-around on reports & dashboards ► Exploration of Data in –memory and interactively with Hadoop. ► Uniform Data Science Experience on in-memory and multi-terabyte data sets Visualization and Reporting Hive (Interactive SQL) Science thru scalable stats and analysis (SAS, ML, custom) Hcatalog (late-binding schemas) 1
  24. 24. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 24 Hortonworks Data Platform Data Reservoir SAP/Hadoop Real-Time Stream processing SAP HANA Real-Time Analytics, Interactive Data Exploration & Application Platform Federated Smart Data Access OLAP Engine Predictive Engine Spatial Engine Application Logic & Rendering(XS) Dataorchestration Services Batch TransactionalSystems,Databases, FlatFiles,BatchDataFeeds ► Real-time ingestion from operational systems, sensors and smart devices ► Pattern detection, anomaly detection and streaming analytics on data in flight. ► Scalable storage for offline model tuning and data science. ► Instant visibility across operations and corporate functions Visualization and Reporting Storm 2 Mobile AppsOnline Apps App events, mobile location data into platform for analysis 1 StreamingDataEvents,ReplicateData TablesfromTransactionalApplications Real-time Real-TimeDataAcquisition SAP ESP SAP Replication Server SAP SLT
  25. 25. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 25 Hortonworks Data Platform Data Reservoir SAP/Hadoop Real-Time insights and models SAP HANA Real-Time Analytics, Interactive Data Exploration & Application Platform Federated Smart Data Access OLAP Engine Predictive Engine Spatial Engine Application Logic & Rendering(XS) Dataorchestration Services Batch TransactionalSystems,Databases, FlatFiles,BatchDataFeeds ► Real-Time Data Ingestion Real-Time Recommendation Applications ► Real-Time Response Inline Predictive Analytics for Transactional Applications ► Close-Looped Analytics Smart Mobile Applications Visualization and Reporting Storm 2 Mobile AppsOnline Apps StreamingDataEvents,ReplicateData TablesfromTransactionalApplications Real-time Real-TimeDataAcquisition SAP ESP SAP Replication Server SAP SLT
  26. 26. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 26© 2014 SAP AG or an SAP affiliate company. All rights reserved. 26Customer Gain competitive advantage by becoming a solution provider rather than an equipment manufacturer Requires predictive analytics and algorithms to forecast equipment health Impact gross transaction value Optimize offerings from sellers with buyer demand in the eBay economy by finding signals within 50+ PB of noise daily
  27. 27. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 27© 2014 SAP AG or an SAP affiliate company. All rights reserved. 27Customer SAP Big Data Practice To refine data into industry insights Leading experts in 26 industries, 12 lines of business Global data science team who know how to turn your data into relevant insights Design Thinking experts trained to help you see new opportunities in your business Consulting and services with the experience to make your project successful
  28. 28. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 28 Every company deserves a “data scientist” Achieve break-through results for your top business priorities with Data Science Data Science - Delivering on your business imperatives
  29. 29. SAP Lumira: Visualizing Big Data unleash analyst creativity Provides the freedom to understand your data, personalize it, and create beautiful content Download and install on your desktop in less than 5 minutes Insight from many data sources Combine, manipulate and enrich data to apply it to your business scenarios Self-service visualizations and analytics to tell your story Optimized for SAP HANA for real-time on detailed data Self Service for Analysts 29
  30. 30. © 2014 SAP AG or an SAP affiliate company. All rights reserved. Thank you Contact information: John Schitka john.schitka@sap.com @johnschitka www.sap.com/bigdata facebook.com/sapanalytics twitter.com/#!/@sapinmemory

×