Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Benefits of Transferring Real-Time Data to Hadoop at Scale

685 views

Published on

Today’s Big Data teams demand solutions designed for Big Data that are optimized, secure, and adaptable to changing workload requirements. Working together, Hortonworks, IBM, and Attunity have designed an integrated solution that transfers large volumes of data to a platform that can handle rapid ingest, processing and analysis of data of all types from all sources, at scale.

https://hortonworks.com/webinar/benefits-transferring-real-time-data-hadoop-scale-ibm-hortonworks-attunity/

Published in: Technology
  • Be the first to comment

Benefits of Transferring Real-Time Data to Hadoop at Scale

  1. 1. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Welcome to the Waitless World Benefits of Transferring Real-time Data to Hadoop at Scale
  2. 2. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Guest Speakers Ali Bajwa Principal Partner Solutions Engineer, Hortonworks Steve Roberts Offering Manager, Power Systems Big Data & Analytics Solutions, IBM Dan Potter VP of Product Management & Marketing, Attunity
  3. 3. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation • Connected customers, vehicles, devices • Socially crowd-sourced requirements • Digital design and analysis • Digital prototypes and tests (simulations) • Connected factories, sensors, devices • Human-robotic interaction • 3D-printing on demand • Connected trucks, inventory • Location, traffic, weather-aware distribution • Real-time inventory visibility • Dynamic rerouting • Connected customers, devices • Omni-channel demand sensing • Real-time Recommendations • Connected assets • Remote service monitoring & delivery • Predictive maintenance • OTA Updates Development Manufacturing Distribution Marketing/Sales Service The New Way of Business Is Fueled by Connected Data
  4. 4. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Technology Trends: Shifting the Data Paradigm Artificial IntelligenceInternet of Things Cloud Computing Streaming Data Industrial Internet Connected business Consumer devices Smart devices Autonomy Prescriptive analytics SaaS/PaaS applications Ephemeral use cases Operational efficiency Collaboration Real-time applications Targeted retail Recommendations Industrial applications
  5. 5. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Hortonworks Enabling the Modern Data Architecture • Our durable and reliable mission continues… • Make Hadoop an enterprise viable data platform • Bring all data under management—all sources and types • Enable pre and post transaction analysis Hortonworks consistent and continuous track record of innovation
  6. 6. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Powering the Modern Data Architecture DATA AT RESTDATA IN MOTION ACTIONABLE INTELLIGENCE COMPLETE DATA LIFECYCLE MANAGEMENT RUN CONTAINERIZED APPLICATIONS CONCURRENTLY EDGECLOUD H O L I S T I C M A N A G E M E N T , G O V E R N A N C E A N D S E C U R I T Y ON-PREMISES MULTI-WORKLOADS MULTI-TYPE MULTI-TIER Data Science SQL Query Engine
  7. 7. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Hortonworks Value: Platform Flexibility CloudSensors/Sources  Constrained  High-latency  Localized context  On-premise and cloud  Low-latency  Global context Data Centers
  8. 8. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Hortonworks DataFlow and Analytics Reference Platform Applications Edge/Sensor/3rd Party Data Flow and Streaming Analytics and Data Science Field Data Capture Office, Datacenter or Cloud Industrial Protocols such as OPC Files / Other Unstructured Data Video IoT Gateways PLC / RTU SCADA, DCS, Historians Hortonworks Data Platform SQL Hortonworks DataFlow Data Flow Managemen t Message Queues Stream Processing In-stream Analytics NoSQL Machine Learning Resource Management Distributed File StorageStructured Data Sets Location 1 Time Series Storage Data Acquisitio n Event Processin g Location N Time Series Storage Data Acquisitio n Event Processin g
  9. 9. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Complementing Attunity and IBM Ecosystem Applications Edge/Sensor/3rd Party Data Flow and Streaming Analytics and Data Science Field Data Capture Office, Datacenter or Cloud Industrial Protocols such as OPC Files / Other Unstructured Data Video IoT Gateways PLC / RTU SCADA, DCS, Historians Hortonworks Data Platform SQL Hortonworks DataFlow Data Flow Managemen t Message Queues IBM Stream Computing In-stream Analytics NoSQL Machine Learning Resource Management Distributed File StorageStructured Data Sets Location 1 Time Series Storage Data Acquisitio n Event Processin g Location N Time Series Storage Data Acquisitio n Event Processin g IBM Bluemix IBM Bluemix IBM Spectrum Scale IBM Watson IBM Watson IBM also resells HDP and HDF IBM Big SQL DATA INGESTION
  10. 10. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Hortonworks DataPlane Service A common set of services that: ⬢ Supports enterprise deployment strategy and move to the cloud ⬢ Addresses compliance and regulatory requirements for enterprise ⬢ Eliminates policy silos and ensures security & governance moves with data ⬢ Simplifies data asset management and provides access for analysts and data scientists ⬢ Extensible to new services: Services enablement layer brings new offerings to market rapidly Next Chapter: Announcing Hortonworks DataPlane Service Enabling the Modern Data Architecture
  11. 11. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Enterprise Data Science at Scale Enterprise- Grade Leverage enterprise- grade security, governance and operations Tools Enhance productivity by enabling data scientists to use their favorite tools, technologies and libraries Deployment Compress the time to insight by deploying models into production faster Data Build more robust models by using all the data in the data lake The Power of Data Science for your Enterprise
  12. 12. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation DATA AT REST DATA-IN-MOTION HDP® HORTONWORKS DATA PLATFORM Powered by Apache Hadoop® HDF™ HORTONWORKS DATAFLOW Powered by Apache™ NiFi DATA-AT-REST Powering Modern Data Applications IBM Analytics  Hortonworks Resell IBM DSX IBM BigSQL IBM Analytics  Re-sell BigInsights’ existing customers migrated to HDP IBM resells HDP & HDF IBM Systems  Co-Sell • IBM Power Systems (Compute) • IBM Spectrum Scale (Storage) + Bringing it all Together DATA INGESTION
  13. 13. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Centralized Mainframes Cognitive Era E-Business Distributed Computing Smarter Planet Office Productivity Client/ Server Personal Computer Data Warehousing Big Data & Predictive Analytics Cognitive A new era of computing has emerged Data InsightContext Transactional Database Business Intelligence Big Data & Analytics Actionable Insight in context Reporting Cloud
  14. 14. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Accelerated compute and storage delivered on prem, in the cloud or via Watson Power Systems is now part of Cognitive Systems REINVENTING COMPUTING FOR DATA-INTENSIVE AND COGNITIVE WORKLOADS
  15. 15. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Open to the core for true differentiation in performance & cost 315+ OpenPOWER members across 31 countries Ecosystem-driven Customer Choice Growing ecosystem of OpenPOWER Servers Growing ecosystem of OpenPOWER Innovation
  16. 16. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Power Systems S822LC for Big Data - Not Just Another Intel Server Linux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server HGST: Optional NVMe Adapters Alpha Data with Xilinx FPGA: Optional CAPI Accelerator Broadcom: Optional PCIe Adapters QLogic: Optional Fiber Channel PCIe Samsung: SSDs & NVMe Hynix, Samsung, Micron: DDR4 NVIDIA: Tesla K80 GPU Accelerator IBM: POWER8 CPU 16
  17. 17. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation 17 Available until Dec 31, 2017!
  18. 18. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation TCO at Scale with HDP on Power Systems with Elastic Storage Server 18 • Up to 3X reduction of storage and compute infrastructure moving to Power Systems and Elastic Storage Server vs commodity scale out x86 • More flexible and scalable vs EMC Isilon using IBM Spectrum Scale • Position for future growth, avoid hitting the data center wall with cluster sprawl E E InfiniBand (RDMA) / 40 GigE / 10 GigE Scale Compute Nodes • IBM Power Systems • Only Hadoop services and HDFS client ESS HDP HDP HDP HDP HDP ESS Elastic Storage Server (Powered by Spectrum Scale and Power Systems) C C C C CC C Spectrum Scale Clientv HDP Hortonworks Data Platform Scale Storage as Required
  19. 19. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Artificial Intelligence and Cognitive Applications Machine Learning Deep Learning (Neural Networks) The deeper you go, the more value you gain, and the more you know
  20. 20. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation simple machine learning deep learning
  21. 21. accident risk rate 90% inspection times 10X number of inspections
  22. 22. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation enterprise-ready software distribution built on open source tools for ease of development performance faster training times for data scientists +
  23. 23. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation 9Days Acceleration training …. days become hours 4Hours Recognition Shape Attenuation Boundary Recognition Shape Attenuation Boundary 54x Learning runs with Power 8 4Hours 4Hours 4Hours 4Hours . . . . . . . . . . . . . . . 4Hours What will you do? Iterate more and create more accurate models? Create more models? Both? IBM S822LC for HPC
  24. 24. Data Integration for Modern Analytics Data Ingestion Patterns
  25. 25. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Data Integration for Modern Analytics
  26. 26. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Modern Data Ingest METADATA HIVE OPTIMIZED STREAM OPTIMIZED CHANGE DATA CAPTURE CLOUD ON PREM WAREHOUSE MAINFRAME RDBMS SAP CDC (log-based) for high performance, low latency and low impact Single platform for all key enterprise systems Hive-optimized for HDP and Stream-optimized for HDF Point-and-Click with NO coding and NO agents
  27. 27. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation In Memory and File Optimized Data Transport Real-Time Data Integration Streaming Change Data Capture (CDC) – Apply transactions sequentially – Stream batched changes – Integrate with DW native loaders to ingest and merge – Stream changes to Kafka message brokers R1 R1 R2 R1 R2 R1 R2Batch CDC Data Warehouse Ingest-Merge SQL n 2 1 SQL SQL Transactional CDC Message Encoded CDC Flexible Real-Time Options
  28. 28. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Simplify Data Integration Zero Footprint Architecture – CDC identifies source updates by scanning change logs – No software agents required on sources or targets – Minimal administrative tasks • Log based CDC • Source specific optimization Hadoop File s RDBMS EDW Mainframe Hadoop Files RDBMS EDW Kafka Streamlined Process
  29. 29. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Simplify Data Integration Go Agile with Automation – No manual coding – Automated end-to-end – Optimized and configurable • Target schema creation • Heterogeneous data type mapping • Batch to CDC transition • DDL change propagation • Filtering • Transformations Hadoop File s RDBMS Mainframe Hadoop Files RDBMS Kafka EDW EDW
  30. 30. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation – Intuitive web-based GUI – Drag and drop, wizard-assisted configuration steps – Consistent process for all sources and targets Simplify Data Integration Guided User Experience
  31. 31. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation zzzz zz RDBMS Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix DW Exadata Teradata Netezza Vertica Hortonworks Cloudera MapR HADOOP DB2 for z/OS IMS/DB VSAM SQL/MP Enscribe RMS MAINFRAME AWS RDS Salesforce Snowflake CLOUD RDBMS Oracle SQL Server DB2 LUW MySQL PostgreSQL Sybase ASE Informix DW Microsoft PDW Exadata Teradata Netezza Vertica Sybase IQ Amazon Redshift Actian Vector SAP HANA Hortonworks Cloudera MapR Pivotal Amazon EMR HADOOP MongoDB NOSQL Amazon RDS Amazon Redshift Amazon EMR Google Cloud SQL Google Cloud Dataproc Azure SQL DW Azure SQL DB CLOUD Azure Event Hubs Kafka MapR STREAMING TARGETS SOURCES SAP ECC on Oracle ECC on SQL ECC on DB2 SAP HANA 12 Universal Data Integration
  32. 32. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Feeding the Data Lake with Attunity Replicate Results 4500 applications DB2 MF SQL Oracle • Consolidating massive data volumes for global analytics • Hadoop Data Lake with Kafka • Minimizing labor and cost • Realizing faster insights and competitive advantage Fortune 100 auto maker
  33. 33. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation The results are impressive! 3x Faster! + + 3 x faster than alternative solutions
  34. 34. Q&A
  35. 35. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation REFERENCE CHARTS 35
  36. 36. © 2015 IBM Corporation Welcome to the Waitless World © 2017 IBM Corporation Hortonworks HDP 3X POWER8 Price-Performance Guarantee 36 IBM Power Systems guarantees the Power S822LC for Big Data system built with POWER8 delivers at least a 3X price-performance advantage vs. x86 based results when running a customer application/workload with Tez/Hive LLAP on Hortonworks HDP under the conditions noted below. A Worker Node is a server carrying out the HDP query functions, with one Worker Node per server. 3X price-performance means that the customer's documented throughput performance on the cluster of S822LC for Big Data Worker Nodes divided by the price of the cluster of Worker Nodes will be at least 3 times higher than the customer's documented throughput performance on the cluster of x86 based Worker Nodes divided by the price of the cluster of x86 Worker Nodes. EX: If queries per second on the cluster of S822LC Worker Nodes are 30,000 and 10,000 on the cluster of x86 based Worker Nodes, while the price of the S822LC Worker Node cluster is $10,000, and the price of the x86 based Worker Node cluster is $10,000, then the Throughput Performance Per Price would be exactly 3 times higher and the guarantee would be met." Notes: 1. Client’s Power S822LC for BD Worker Nodes and the x86 Worker Nodes must be running at similar utilization rates of at least 50% or higher, using the same software stack as described in Note #4, and which are configured similarly. 2. Client’s Power S822LC for BD performance cannot be constrained by I/O subsystem. Specifically, the I/O subsystem on the Power S822LC for BD Worker Node must achieve greater than or equal I/O bandwidth and operations per second than the x86 Worker Node. 3. Client’s Power S822LC for BD Worker Node’s physical memory must be the same or greater than the physical memory on the x86 Worker Node. 4. Applicable software stack is Tez/Hive LLAP on HDP 2.6 or later for both the Power S822LC and x86-based Worker Nodes. 5. Client is responsible for demonstrating comparable real-world representative workload between the Power S822LC for BD Worker Node and the x86 Worker Node through the use of the IBM provided tools and comparable tools on x86 systems. 6. 3X guarantee is based on a list price for x86 servers from Dell, Cisco, HP or Lenovo based on E5-2600 v4 or earlier processor technology and the IBM S822LC for Big Data. The IBM Power S822LC for Big Data servers (22-core/2.89 GHz) used as Worker Nodes must be purchased from IBM or an authorized IBM Business Partner prior to September 30, 2017. The guarantee period is valid for three (3) months from the date of purchase. The x86-based Worker Nodes must be comparably configured branded servers from Cisco, Dell, HP, or Lenovo and the client is responsible for all Hortonworks licenses. 3X throughput performance per price means that the customer's documented throughput performance on the cluster of Power S822LC for BD Worker Nodes based on either queries, operations or transactions per second divided by the price of the cluster of Worker Nodes will be at least 3 times higher than the customer's same documented throughput performance on the cluster of x86 Worker Nodes divided by the price of said cluster of x86 Worker Nodes. Remediation: IBM will provide additional performance optimization and tuning services consistent with IBM Best Practices, at no charge. If unable to reach the guaranteed level of price-performance, IBM will provide additional equally configured Worker Nodes to those already purchased to reach the guaranteed level of price- performance. Only Available until Dec 31, 2017!

×