SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)

9,420 views

Published on

In this presentation I argue that the future of data management may see a split between (1) real-time in-memory systems such as SAP HANA for most enterprise workloads (2) disk-based free and open-source Apache Hadoop for certain specialized big data uses.

The presentation starts with a definition of what is intended by the term big data, then talks about SAP HANA and Apache Hadoop from the perspective of suitability for enterprise use with a special concentration on Hadoop. (The basics of SAP HANA were covered in the immediately preceding session). This is followed by a description of currently available SAP support for Apache Hadoop in SAP BI 4.0 and SAP Data Services / EIM. Due to time constraints I did not discuss Apache Hadoop support built into Sybase IQ.

Published in: Technology

SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)

  1. 1. HANA & Hadoop forBig Data ManagementWill Gardella, Senior DirectorSAP Applied Research - Big Data Programwilliam.gardella@sap.com
  2. 2. Safe Harbor StatementThe information in this presentation is confidential and proprietary to SAP and may not be disclosed withoutthe permission of SAP. This presentation is not subject to your license agreement or any other service orsubscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in thisdocument or any related presentation, or to develop or release any functionality mentioned therein. Thisdocument, or any related presentation and SAPs strategy and possible future developments, products and orplatforms directions and functionality are all subject to change and may be changed by SAP at any time forany reason without notice. The information on this document is not a commitment, promise or legal obligationto deliver any material, code or functionality. This document is provided without a warranty of any kind, eitherexpress or implied, including but not limited to, the implied warranties of merchantability, fitness for aparticular purpose, or non-infringement. This document is for informational purposes and may not beincorporated into a contract. SAP assumes no responsibility for errors or omissions in this document, exceptif such damages were caused by SAP intentionally or grossly negligent.All forward-looking statements are subject to various risks and uncertainties that could cause actual resultsto differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in makingpurchasing decisions. © 2012 SAP AG. All rights reserved. 2
  3. 3. SAP REAL-TIME DATA PLATFORMA GAME-CHANGER SAP real time data platform Open APIs and Protocols Federated Access Common Landscape Common Design & Transactional In-Memory Analytics Environment Environment Mobile Data Modelling Data Data EDW Data Management Management Management Management Information Management & Real-Time Data Movement
  4. 4. Traditional data management approaches are changing 1980s / 1990s Today 100101 011010 100101 ?© 2012 SAP AG. All rights reserved. 4
  5. 5. What is Big Data?The 3 + 1 V’s Data Volume in Whole WorldVolume Structured DataExplosion in the Location- Automobilesamount of data based Data Machine DataVarietyMultiple data formats;non-structured data boom Mobile Click Stream 7.9 ! Zettabytes IMHO, it’s great! Text DataVelocityFast collection, processing Point of Sale Socialand consumption Network Customer Data RFIDValue 1.8 Zettabytes Smart MeterKeep everything, not onlyhigh value data 2011 2015 Future© 2012 SAP AG. All rights reserved. 5
  6. 6. HANA for Big DataKey Characteristics: in-memory, row & column, real timeHow’s HANA for Big Data?Volume: Billions of recordsVariety: Text processing & searchVelocity: Real timeValue: High value data© 2012 SAP AG. All rights reserved. 6
  7. 7. Data management today: systems optimize for speed or capacity Most enterprise systems today not too large for real time Speed Near 0 marginal storage cost Size© 2012 SAP AG. All rights reserved. 7
  8. 8. New storage and processing techniques required Real-time queries High value data Targeted data read In-memory Columnar Row Distributed Batch queries Flexible data sets All data read© 2012 SAP AG. All rights reserved. 8
  9. 9. Building an IT landscape for Big Data Business Intelligence Insight Discovery Real-time OperationsPresent BI Tools Analytic Tools, Custom Data Business Analysis Applications Applications & ProcessesProcess Information Views Data Mining / Predictive Analysis Store EDW / Data Marts Analytic Data Warehouse Real-time Database ETL, Data Quality Text Analysis Real-time Loading Ingest© 2012 SAP AG. All rights reserved. 9
  10. 10. Hadoop
  11. 11. What is Apache Hadoop?Apache Hadoop is open source software that enables reliable, scalable, distributed computing onclusters of inexpensive serversReliable Software is fault tolerant, it expects and handles hardware and software failuresScalable Designed for massive scale of processors, memory, and local attached storageDistributed Handles replication. Offers massively parallel programming model, MapReduceHadoop framework handles: partitioning, scheduling, dispatch, execution, communication, failurehandling, monitoring, reporting and more© 2012 SAP AG. All rights reserved. 11
  12. 12. The Apache Hadoop technology familylogical view* Non-Relational DB Scripting Machine Learning Fine-grained data handling Hive HBase Pig Mahout “Data warehouse” that provides SQL Column oriented, schema-less, distributed Platform for manipulating and Machine learning libraries for interface. Data structure is projected ad database modeled after Google’s analyzing large data sets. recommendations, clustering, hoc onto unstructured underlying data BigTable. Random realtime read/write Scripting language for analysts classification and itemsets MapReduce Hadoop Common  Parallel programming  Large block data handling HDFS MapReduce (e.g. 64MB) Distributes & replicates data across Distributes & monitors tasks, restarts machines failed work * For simplicity, mappings to servers is omitted© 2012 SAP AG. All rights reserved. 12
  13. 13. What does Hadoop bring to the table?Cost efficient data storage and processing for large volumes of structured, semi-structured, andunstructured data such as web logs, machine data, text data, call data records (CDRs), audio, video dataBatch ProcessingWhere fast response times are less critical than reliability and scalabilityComplex Information ProcessingEnable heavily recursive algorithms, machine learning, & queries that cannot be easily expressed in SQLLow Value Data ArchiveData stays available, though access is slowerPost-hoc AnalysisMine raw data that is either schema-less or where schema changes over time© 2012 SAP AG. All rights reserved. 13
  14. 14. Example: Retail Point of Sales Demo Scenario http://youtu.be/HmmPje38e1k SAP BusinessObjects Explorer 18 Million Visitor Session Records 9 TB Web Logs 1.1 Billion POS Records HANA Hadoop© 2012 SAP AG. All rights reserved. 14
  15. 15. Apache Hadoop bottom lineStrengths Weaknesses+ Huge data volumes - Not efficient at small scale+ Unstructured data - Real time is best case challenging, typically not possible+ Reliable - Requires skilled engineering, operation and analyst resources+ Scalable - Hiring qualified talent+ Lowest cost - Less mature than SQL+ Open source - Governance+ No hardware lock in - Lack of user role support in access model+ Batch processing© 2012 SAP AG. All rights reserved. 15
  16. 16. Hadoop & Enterprise Information Management Business Intelligence Insight Discovery Real-time OperationsPresent BI Tools Analytic Tools, Custom Data Business Analysis Applications Applications & ProcessesProcess Information Views Data Mining / Predictive Analysis Store EDW / Data Marts Analytic Data Warehouse Real-time A Database ETL, Data Quality Text Analysis Real-time Loading Ingest© 2012 SAP AG. All rights reserved. 16
  17. 17. IT administrator: Extract, transform, and load data quickly Metadata Modeler Repository Server Open Hub Data Load Database Engine BW Designer and Management Console SAP Data Integrator SAP HANA or SAP Sybase IQ Any Source© 2012 SAP AG. All rights reserved. 17
  18. 18. Loading data from Hadoop into your database1. Based on target, SAP Data Services translates queries into: o Hive Query Language (HQL)  Hive SAP Data Services o Pig script  HDFS Job Process Database Process 5 Data Loader 62. Hive/Pig converts queries to Map/Reduce jobs 1 HQL Pig HDFS ODBC/3. Result data files are generated on the Generator Generator 4 FileReader JDBC driver HDFS system Text data processing4. SAP Data Services use multiple threads to process data from Hive/Pig M/R 2 M/R Hive5. Optional transforms: Data quality Result operations HDFS set Join tables, order / filter data, apply 3 functions6. Load results into database© 2012 SAP AG. All rights reserved. 18
  19. 19. SAP Data Services: Simple GUI build and run ETL process Push down to Hadoop through HiveSQL or Pig Scripts Bulk load into EDW© 2012 SAP AG. All rights reserved. 19
  20. 20. Processing text to extract relevant data from Hadoop1 Use SAP Data Services to extract:  Core entities (who, what, when, where, etc.)  Domains (voice of customer, public sector, enterprise, etc)  Sentiment analysis (strong positive, weak positive, neutral, weak negative, strong negative)2 Perform transformations  Map text into pre-defined structures  Cleanse, match, de-duplicate data3 Load results quickly into EDW  Map text to structure© 2012 SAP AG. All rights reserved. 20
  21. 21. Hadoop Analytics Business Intelligence B Insight Discovery Real-time OperationsPresent BI Tools Analytic Tools, Custom Data Business Analysis Applications Applications & ProcessesProcess Information Views Data Mining / Predictive Analysis Store EDW / Data Marts Analytic Data Warehouse Real-time Database ETL, Data Quality Text Analysis Real-time Loading Ingest© 2012 SAP AG. All rights reserved. 21
  22. 22. Business Analyst: Viewing data in Hadoop using GUI tools Automatically generates HiveQL statements that are executed on a Hadoop cluster© 2012 SAP AG. All rights reserved. 22
  23. 23. SAP BusinessObjects BI: Hadoop for Business Analysts Common user experience for all front-end tools Empower all analysts, enable all workflows Web Intelligence Crystal Reports Dashboards Explorer Best access method for each specific data source High performance, feature rich, secure Universe Access Direct Access All data sources Extract, define, & manipulate metadata HADOOP SAP BW Sybase SAP HANA 3rd party Files Web databases databases Services HIVE© 2012 SAP AG. All rights reserved. 23
  24. 24. Simple Tools for the BI administrator to define data access Build a Data Foundation against a Hive schema ■ Draw joins between Hive tables, aliases, derived tables, Hive views and Hive partitioned tables© 2012 SAP AG. All rights reserved. 24
  25. 25. Data Scientist: Flexibility is of the essence Chooses the variables that offer the most promise 100101 011010 100101 Chooses the best tool based on data mining technique Chooses best analysis engine based on algorithm & data© 2012 SAP AG. All rights reserved. 25
  26. 26. Data Scientist: Open integration is key M/R algorithmsIn-database In-database Analytics Analytics© 2012 SAP AG. All rights reserved. 26
  27. 27. Building an IT landscape for Big Data Business Intelligence Insight Discovery Real-time Operations Engage BI Tools Analytic Tools, Custom Data Business Analysis Applications Applications & Processes Process Information Views Data Mining / Predictive Analysis Store Real-time EDW / Data Marts Analytic Data Warehouse Database Ingest ETL, Data Quality Text Analysis Real-time Loading© 2012 SAP AG. All rights reserved. 27
  28. 28. Questions?william.gardella@sap.com
  29. 29. © 2012 SAP AG. All rights reserved.No part of this publication may be reproduced or transmitted in any form or for any purpose without the express Google App Engine, Google Apps, Google Checkout, Google Data API, Google Maps, Google Mobile Ads,permission of SAP AG. The information contained herein may be changed without prior notice. Google Mobile Updater, Google Mobile, Google Store, Google Sync, Google Updater, Google Voice, Google Mail, Gmail, YouTube, Dalvik and Android are trademarks or registered trademarks of Google Inc.Some software products marketed by SAP AG and its distributors contain proprietary software components ofother software vendors. INTERMEC is a registered trademark of Intermec Technologies Corporation.Microsoft, Windows, Excel, Outlook, PowerPoint, Silverlight, and Visual Studio are registered trademarks of Wi-Fi is a registered trademark of Wi-Fi Alliance.Microsoft Corporation. Bluetooth is a registered trademark of Bluetooth SIG Inc.IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System Motorola is a registered trademark of Motorola Trademark Holdings LLC.z10, z10, z/VM, z/OS, OS/390, zEnterprise, PowerVM, Power Architecture, Power Systems, POWER7,POWER6+, POWER6, POWER, PowerHA, pureScale, PowerPC, BladeCenter, System Storage, Storwize, Computop is a registered trademark of Computop Wirtschaftsinformatik GmbH.XIV, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, AIX, Intelligent Miner, WebSphere,Tivoli, Informix, and Smarter Planet are trademarks or registered trademarks of IBM Corporation. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, SAP HANA, and other SAP products and services mentioned herein as well as their respective logos areLinux is the registered trademark of Linus Torvalds in the United States and other countries. trademarks or registered trademarks of SAP AG in Germany and other countries.Adobe, the Adobe logo, Acrobat, PostScript, and Reader are trademarks or registered trademarks of Adobe Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, WebSystems Incorporated in the United States and other countries. Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business ObjectsOracle and Java are registered trademarks of Oracle and its affiliates. is an SAP company.UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and servicesCitrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase Inc.registered trademarks of Citrix Systems Inc. Sybase is an SAP company.HTML, XML, XHTML, and W3C are trademarks or registered trademarks of W3C®, World Wide Web Crossgate, m@gic EDDY, B2B 360°, and B2B 360° Services are registered trademarks of Crossgate AGConsortium, Massachusetts Institute of Technology. in Germany and other countries. Crossgate is an SAP company.Apple, App Store, iBooks, iPad, iPhone, iPhoto, iPod, iTunes, Multi-Touch, Objective-C, Retina, Safari, Siri, All other product and service names mentioned are the trademarks of their respective companies. Dataand Xcode are trademarks or registered trademarks of Apple Inc. contained in this document serves informational purposes only. National product specifications may vary.IOS is a registered trademark of Cisco Systems Inc. The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior written permission of SAP AG.RIM, BlackBerry, BBM, BlackBerry Curve, BlackBerry Bold, BlackBerry Pearl, BlackBerry Torch, BlackBerryStorm, BlackBerry Storm2, BlackBerry PlayBook, and BlackBerry App World are trademarks or registeredtrademarks of Research in Motion Limited. © 2012 SAP AG. All rights reserved. 29
  30. 30. © 2012 SAP AG. Alle Rechte vorbehalten.Weitergabe und Vervielfältigung dieser Publikation oder von Teilen daraus sind, zu welchem Zweck und in Google App Engine, Google Apps, Google Checkout, Google Data API, Google Maps, Google Mobile Ads,welcher Form auch immer, ohne die ausdrückliche schriftliche Genehmigung durch SAP AG nicht gestattet. Google Mobile Updater, Google Mobile, Google Store, Google Sync, Google Updater, Google Voice,In dieser Publikation enthaltene Informationen können ohne vorherige Ankündigung geändert werden. Google Mail, Gmail, YouTube, Dalvik und Android sind Marken oder eingetragene Marken von Google Inc.Die von SAP AG oder deren Vertriebsfirmen angebotenen Softwareprodukte können Softwarekomponenten INTERMEC ist eine eingetragene Marke der Intermec Technologies Corporation.auch anderer Softwarehersteller enthalten. Wi-Fi ist eine eingetragene Marke der Wi-Fi Alliance.Microsoft, Windows, Excel, Outlook, und PowerPoint sind eingetragene Marken der Microsoft Corporation. Bluetooth ist eine eingetragene Marke von Bluetooth SIG Inc.IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System Motorola ist eine eingetragene Marke von Motorola Trademark Holdings, LLC.z10, z10, z/VM, z/OS, OS/390, zEnterprise, PowerVM, Power Architecture, Power Systems, POWER7,POWER6+, POWER6, POWER, PowerHA, pureScale, PowerPC, BladeCenter, System Storage, Storwize, Computop ist eine eingetragene Marke der Computop Wirtschaftsinformatik GmbH.XIV, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, AIX, Intelligent Miner, WebSphere,Tivoli, Informix und Smarter Planet sind Marken oder eingetragene Marken der IBM Corporation. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, SAP HANA und weitere im Text erwähnte SAP-Produkte und Dienstleistungen sowie die entsprechendenLinux ist eine eingetragene Marke von Linus Torvalds in den USA und anderen Ländern. Logos sind Marken oder eingetragene Marken der SAP AG in Deutschland und anderen Ländern.Adobe, das Adobe-Logo, Acrobat, PostScript und Reader sind Marken oder eingetragene Marken von Business Objects und das Business-Objects-Logo, BusinessObjects, Crystal Reports, Crystal Decisions,Adobe Systems Incorporated in den USA und/oder anderen Ländern. Web Intelligence, Xcelsius und andere im Text erwähnte Business-Objects-Produkte und Dienstleistungen sowie die entsprechenden Logos sind Marken oder eingetragene Marken der Business Objects Software Ltd.Oracle und Java sind eingetragene Marken von Oracle und/oder ihrer Tochtergesellschaften. Business Objects ist ein Unternehmen der SAP AG.UNIX, X/Open, OSF/1 und Motif sind eingetragene Marken der Open Group. Sybase und Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere und weitere im Text erwähnte Sybase-Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame und MultiWin sind Marken oder Produkte und -Dienstleistungen sowie die entsprechenden Logos sind Marken oder eingetragene Marken dereingetragene Marken von Citrix Systems, Inc. Sybase Inc. Sybase ist ein Unternehmen der SAP AG.HTML, XML, XHTML und W3C sind Marken oder eingetragene Marken des W3C®, World Wide Web Crossgate, m@gic EDDY, B2B 360°, B2B 360° Services sind eingetragene Marken der Crossgate AG inConsortium, Massachusetts Institute of Technology. Deutschland und anderen Ländern. Crossgate ist ein Unternehmen der SAP AG.Apple, App Store, iBooks, iPad, iPhone, iPhoto, iPod, iTunes, Multi-Touch, Objective-C, Retina, Safari, Siri Alle anderen Namen von Produkten und Dienstleistungen sind Marken der jeweiligen Firmen. Die Angaben imund Xcode sind Marken oder eingetragene Marken der Apple Inc. Text sind unverbindlich und dienen lediglich zu Informationszwecken. Produkte können länderspezifische Unterschiede aufweisen.IOS ist eine eingetragene Marke von Cisco Systems Inc. Die in dieser Publikation enthaltene Information ist Eigentum der SAP. Weitergabe und Vervielfältigung dieserRIM, BlackBerry, BBM, BlackBerry Curve, BlackBerry Bold, BlackBerry Pearl, BlackBerry Torch, BlackBerry Publikation oder von Teilen daraus sind, zu welchem Zweck und in welcher Form auch immer, nur mitStorm, BlackBerry Storm2, BlackBerry PlayBook und BlackBerry App World sind Marken oder eingetragene ausdrücklicher schriftlicher Genehmigung durch SAP AG gestattet.Marken von Research in Motion Limited. © 2012 SAP AG. All rights reserved. 30

×