Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SAP Data Hub – What is it, and what’s new? (Sefan Linders)

359 views

Published on

SAP Inside Track talk by Sefan Linders

Data Hub – What is it, and what’s new?
It’s a year after the launch of Data Hub, and the new 2.3 version has just been released. We’ll take a look at the new metadata features, discuss several use cases, and it’s position within the ever growing data domain.

Published in: Business
  • Be the first to like this

SAP Data Hub – What is it, and what’s new? (Sefan Linders)

  1. 1. PUBLIC SAP HANA Data Management Suite Sefan Linders Big Data Warehouse Architect Customer Innovation & Enterprise Platform November 2018
  2. 2. 2PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Legal disclaimer The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP’s strategy and possible future developments, products, and platforms, directions, and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, or noninfringement. This document is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP’s willful misconduct or gross negligence. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
  3. 3. 3PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ What problem are we addressing? Business users need to have all the data relevant to their decision and they need to trust the security and accuracy of their data Businesses need to harness the power of all their data – business and new data types – and to anticipate and influence business outcomes Businesses need to provide all users with the right information in context at the right moment for the task at hand
  4. 4. 4PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Supply Chain Finance HR Manufacturing Sales Connected Assets Third-party Finance and Planning Visualization Tools Statistical Analytics Spreadsheets SAP BusinessObjects Decision Intelligence Systems TACTICAL REPORTS FUNCTIONAL REPORTS STRATEGIC REPORTS INNOVATION APPS BW Today: Data sprawl, impossible to govern, security complexity
  5. 5. 5PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Business situation and implications Most enterprises now have data in 6-8 clouds Data has become less accessible due to the proliferation of cloud based solutions and business unit build applications further fragmenting the data landscape Company’s understanding of their customers, suppliers, products has been in decline, caused by data being inaccessible Substantial legal risks due to lack of governance, e.g. GDPR Difficulty of operationalizing data science use in everyday business processes
  6. 6. 6PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Business situation and implications Most enterprises now have data in 6-8 clouds Data has become less accessible due to the proliferation of cloud based solutions and business unit build applications further fragmenting the data landscape Company’s understanding of their customers, suppliers, products has been in decline, caused by data being inaccessible Substantial legal risks due to lack of governance, e.g. GDPR Difficulty of operationalizing data science use in everyday business processes More trusted data More connected, intelligent data More cloud and architecture flexibility
  7. 7. 8PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Third-party Finance and Planning Visualization Tools Statistical Analytics Spreadsheets SAP BusinessObjects Decision Intelligence Systems Supply Chain Finance HR Manufacturing Sales Connected Assets Vision: Common data model, all data used by everyone, simple TACTICAL REPORTS FUNCTIONAL REPORTS STRATEGIC REPORTS INNOVATION APPS BW SAP HANA DATA MANAGEMENT SUITE In-Memory Data Management | Single logical data model across entire organization | Data Flow Modeling and Control | Insights from powerful analytics engines
  8. 8. 11PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ SAP HANA Data Management Suite Trusted Data | Connected, Intelligent Data | Cloud Architecture Flexibility SAP Intelligent Enterprise Suite SAP Leonardo and SAP Analytics Cloud Third-Party Applications SAP HANA Data Management Suite In-memory transaction & analytics Data discovery & governance Data orchestration & integration Data cleansing & enrichment Data storage & compute SAP HANA SAP Data Hub SAP Enterprise Architecture Designer SAP Big Data Services Third-Party Services & Products Spark Hadoop Third-party Databases Third-party Data Management HybridCloudManagement SAPCloudPlatform Business Data Cloud Application Data IoT Spatial Social Image On Premises Multi-Cloud SAP Add-On API Services & Products SAP HANA Spatial services SAP HANA Blockchain service SAP HANA Streaming Analytics Other SAP Cloud Platform and SAP Leonardo Services SAP EIM Solutions Hybrid
  9. 9. 13PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Development platform for applications that need analytics on real-time transactions Harmonized UX across administration and development tools Data governance, anonymization, and pipeline flow to protect and refine data across the landscape Modelling across business, data, and technology Applied AI to automate data operations and pre-defined business application scenarios In-memory multi-model analytics and data processing on a distributed computing framework Common metadata catalog, business models, and comprehensive data governance SAP HANA Data Management Suite SAP HANA Data Management Suite Common capabilities today and tomorrow On Premise | Hybrid | Multi-cloud Today Future Seamless Cloud Service
  10. 10. 14PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ 2019 SAP HANA and SAP Data Hub engine integration Shared capabilities in SAP HANA & SAP Data Hub: spatial data, SQL, Graph, Doc store and common SQL Connectivity Automatic connectivity between HANA and Data Hub Lifecycle management / DevOps / deployment Data Hub as a Service (beta) Tooling / UX Consistent navigation through HDMS tooling Meta data model and content repository Common Meta Data Catalog across HDMS and 3rd party stores and data orchestration with end-to-end lineage Security and system enablement Enhanced secure connections between HDMS components (hybrid, multicloud, on-premises) SAP HANA and SAP Data Hub engine integration Extension of shared capabilities in HANA & Data Hub: spatial (adv), graph & doc data types, loading of parquet or OCR files Data tiering Data Tiering as cloud service with BDS integration & HANA Native storage extension Lifecycle management / DevOps / deployment Scenario based HDMS deployment of HANA & Data Hub in SAP Cloud Platform Cross-cloud federation support One Backup, recovery, and High Availability approach Common Lifecycle handling –content lifecycle, platform lifecycle(e.g. upgrade) across all HDMS components and engines Further deployment options for cloud providers & data center Deeper EAD integration w meta data catalog and lineage Data Science Data Hub to execute pipelines using common ML libraries (PAL, APL) with HANA, consume additional ML frameworks (Leonardo, 3rd party services, etc.) Common custom ML operators for TF & R serving deployed by Data Hub and consumed in HANA Tooling / UX Alignment and harmonization of tooling Meta data model and content repository Partner ecosystem for SAP Hana Data Management Suite content Security and system enablement Streamlined user management, authorizations and authentications across full logical DW managed by HDMS SAP HANA Data Management Suite Roadmap 20192018 The SAP HANA Data Management Suite roadmap follows a ‘cloud first’ strategy. Relevant capabilities will be available in on-premises versions on later dates.
  11. 11. 15PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Big Data Warehouse Leonardo Platform S/4HANA Expansion Spatial Analytics Analytics Data Mart SAP BW/4HANA SAP Leonardo SAP S/4HANA SAP HANA Earth Observation Analysis SAP Cloud Platform Spatial SAP HANA SAP Data Hub Business Intelligence Tools Multiple Patterns from One Architecture Cloud and architecture flexibility Cloud freedom for data systems, applications, and system development SAP HANA SAP Data Hub Big Data services from SAP SAP EA Designer SAP HANA SAP Data Hub Big Data services from SAP SAP EA Designer SAP HANA SAP Data Hub Big Data services from SAP SAP EA Designer Business Intelligence Tools SAP EA Designer
  12. 12. 16PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ SAP HANA Data Management Suite Trusted Data | Connected, Intelligent Data | Cloud Architecture Flexibility SAP Intelligent Enterprise Suite SAP Leonardo and SAP Analytics Cloud Third-Party Applications SAP HANA Data Management Suite In-memory transaction & analytics Data discovery & governance Data orchestration & integration Data cleansing & enrichment Data storage & compute SAP HANA SAP Data Hub SAP Enterprise Architecture Designer SAP Big Data Services Third-Party Services & Products Spark Hadoop Third-party Databases Third-party Data Management HybridCloudManagement SAPCloudPlatform Business Data Cloud Application Data IoT Spatial Social Image On Premises Multi-Cloud SAP Add-On API Services & Products SAP HANA Spatial services SAP HANA Blockchain service SAP HANA Streaming Analytics Other SAP Cloud Platform and SAP Leonardo Services SAP EIM Solutions Hybrid
  13. 13. 22PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Key capabilities ▪ Strategy: Define the business strategy with common business architecture standards to build a plan to act ▪ Design: Create business and technical architecture using industry- standard models to define the implementation ▪ Implementation: Align development with strategy and design to drive or represent the implementation ▪ Consume: Communicate understanding and drive action across all stakeholders SAP Enterprise Architecture Designer Architecture and design Cloud | On premise DeveloperBusiness user Architect Strategy Design Implementation SAP Enterprise Architecture Designer Knowledge worker Landscape Big Data DatabasesRequirements Capabilities Processes
  14. 14. 23PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
  15. 15. SAP Data Hub What’s New
  16. 16. 26PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ What is SAP Data Hub? The Lego Analogy Streams: live data feed (e.g. audio, video, twitter) Events : alert/notification (e.g. IoT) Semi-structured: JSON, XML Structured: RDBMS, CRM, ERP, Legacy, File, etc. Unstructured: PPTs, Words, video, audio, image Information Catalog | Monitoring & Scheduling | Orchestration | Pipelines Hybrid Stream Subscribe Ingest Validate TransformEnrich Compute Machine Learning Mask Custom Code Image Processing Compute Refine Publish Trigger Action Data Consumption Disparate Data Landscapes Intelligent apps Automated processes On-Premises Cloud
  17. 17. 27PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Release Cycle - SAP Data Hub version 2.3 SAP Data Hub 1.4 SAP Vora 2.2 Innovation SAP Data Hub 2.3 Release Scope:  Lean deployment and installation with a complete containerized setup ready for any deployment  Unified User Experience in one modeling environment  Introducing Metadata Explorer and Cataloging  Unifying SAP Vora & SAP Data Hub release cycle with a synchronized delivery Motivation: Enables enterprises to build scalable data-driven applications rapidly
  18. 18. 28PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Release Theme – SAP Data Hub version 2.3 Deployment & Consumption User Experience Metadata Governance Data Integration & Processing • Deployment on cloud environments with managed Kubernetes • Individual SAP Data Hub Applications • All components are containerized • Unified Modeling Tool for Workflows, Pipelines and Data Transforms • Self Service Data Preparation with SAP Agile Data Preparation • Comprehensive Monitoring & Diagnostic Framework • Information Catalog to discover, define and understand sources • Search for Metadata attributes and Tags • Automated Metadata Crawling for SAP HANA, Cloud Stores, & SAP Vora • Enhanced Connectivity (Databases, Big Data Stores, Cloud native Technologies) • Data Integration into SAP S/4HANA, SAP Coud solutions (Hybris, etc), Master Data management • Data Quality Management
  19. 19. 29PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Deployment & Consumption Cloud Deployments and Decoupling of Hadoop & Hana Simplified deployment of SAP Data Hub in cloud and on-premise environments • All components are fully containerized and delivered as Docker images including SAP HANA • remove the pre-requisites of installing SAP HANA database and XS advanced. • remove Hadoop as a pre-requisites of setting up a Hadoop cluster • Decoupling data processing from storage platforms (any supported cloud stores). All runtime execution is now occurred in Kubernetes • Deployable on most popular Kubernetes managed environments*. Supports: • managed Kubernetes Services of the major cloud providers (i.e. AWS, Microsoft Azure, Google Cloud Platform), • private cloud, and • on-premise installations * See Product Availability Matrix for detailed version dependencies
  20. 20. 30PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ User Experience Introducing Launchpad – a fresh new look UI One central entry point to all services and applications • Connection Management • Monitoring SAP Data Hub v2.3 • Metadata Explorer • Modeler
  21. 21. 31PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Metadata Governance SAP Data Hub Metadata Explorer A centralized location for browse connections | monitoring | metadata catalog | search datasets | publications | labels
  22. 22. 32PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ The connectivity framework (Flowagent) serves as the underlying infrastructure with the goal to rapidly grow and enhance the native connectivity and integration functionalities: Data Integration & Processing The unified connectivity framework SAP Data Hub Metadata & Applications SAP Data Hub Connectivity Framework (FlowAgent) Metadata Extractor Adapter HDFS, BW4HANA, Oracle, S3, … 1. Metadata Services (Browsing, Profiling, Data Preview)  Hadoop (HDFS)  Cloud Object Storages (AWS S3, GCP GCS, Azure Data Lake, WASB)  Oracle*, ABAP/ODP*, OData* 2. Connection Operators (Consumer, Producer)  HDFS, S3, GCS, ADL, WASB  Oracle**, ABAP/ODP**, OData**  Support custom adapters 3. Spark code generation • HDFS *profiling is planned in future release **producer is planned in future release
  23. 23. 33PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Orchestration (external): ▪ SAP BW Process Chain – Trigger execution of a process chain on a BW system ▪ Data Transfer (BW) – Transfer data from a BW system into Vora tables (created on the fly) ▪ Data Services – Execute remote data services jobs ((demo) ▪ SAP HANA Flowgraph – Trigger execution of a HANA flowgraph using SDI REST API (XSC) ▪ Spark / Hadoop – Submit Spark jobs, Hive queries, etc. to Hadoop clusters Execution (internal): ▪ Pipeline – Start a pipeline on a local or remote SAP Data Hub Pipeline engine – Wait for completion of pipeline (or if set continue immediately) ▪ Data Transform – Run relational transformations (join, union, filter, etc.) on structured data (tables, CSV, Parquet, etc.) User Experience Workflows Definition
  24. 24. 34PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Connectivity: Connectivity via Flowagent: DQMm: Leonardo ML: Data Integration & Processing Predefined Connectivity Snapshot - Azure Data Lake (ADL) - Local File System (file) - Google Cloud Storage (GCS) - HDFS - Amazon S3 - Azure Storage Blob (WASB) - WebHDFS SAP Vora: Spark / Hadoop: - Spark - Spark SQL - PySpark - Hive
  25. 25. 35PUBLIC© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ Subengines: ▪ Develop and compile new operators locally using SDKs ▪ Register and run custom operators in available pipeline subengine Process / Command Executors: ▪ Run a process within a pipeline and give contiguous stream to it ▪ Run a shell command for each arrival of a message within a pipeline Programming Operators: ▪ Write and run custom scripts for data manipulation within a pipeline ▪ Build re-usable operators in different programming languages Data Integration & Processing Data Processing
  26. 26. Thank you.
  27. 27. © 2018 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. See https://www.sap.com/copyright for additional trademark information and notices. www.sap.com/contactsap Follow us

×