Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Path to Digital Transformation

831 views

Published on

In the past, many organizations invested heavily in the development and growth of enterprise data warehouses (EDW). Today, the EDW is often at or over capacity. As EDWs hit their limits, many forward-looking organizations are shifting data warehouse processing functions to Hadoop for its scalability and economic benefits.

With a robust EDW offload solution, an organization can accelerate ETL processing, work easily with a wide range of new data sources and formats, and make better use of existing EDW investments.

Thought leaders from Dell EMC, Cloudera and Syncsort discuss how best to begin a big data journey by taking control of all data, controlling costs, and identifying the first use case, so you can move forward with confidence to transform your business.

Published in: Technology
  • Be the first to comment

The Path to Digital Transformation

  1. 1. The Road to Digital Transformation Dell EMC Cloudera Syncsort ETL Offload Hadoop Solution December 2016
  2. 2. Armando Acosta Dell EMC Sean Anderson Cloudera Mark Muncy Syncsort Ted Arden Dell EMC
  3. 3. Dell - Internal Use - Confidential3 of 123 of 22 The digital transformation will cause disruption 48% don’t know what their industry will look like in 3 years 78% feel threatened by digital startups 45% fear they may become obsolete in 3-5 years Business leaders see a chaotic, uncertain future ahead Source: Digital Transformation Index, October, 2016 Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.
  4. 4. Dell - Internal Use - Confidential4 of 12 Businesses still have a huge opportunity to get this right 73% say a centralized tech strategy needs to be a priority 72% plan to expand their software development capabilities 66% are incentivized to invest in IT infrastructure and digital skills leadership This is how leaders plan to leap ahead Source: Digital Transformation Index, October, 2016 Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future. 4 of 22
  5. 5. Dell - Internal Use - Confidential5 of 12 Leaders agreed the following digital business attributes are imperatives to success Source: Digital Transformation Index, October, 2016 Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future. Predictively spot new opportunities Demonstrate transparency and trust Deliver unique and personalized experiences Innovate in agile ways Operate in real time Big Data and Analytics will be at the core to enabling all these attributes 5 of 22
  6. 6. Dell - Internal Use - Confidential6 of 12 Data-driven organizations are more effective greater revenue growth for businesses that leverage data effectively 50% But 44% Become data-driven. A journey begins with a single step. Align IT / Business goals Improve operational efficiency Transform your organization of organizations do not know how to start… Data from Dell Global Technology Adoption Index, November 2015 6 of 22
  7. 7. Dell - Internal Use - Confidential7 of 12 Align business and IT Dell helps by Utilizing ALL data to deliver deeper insights and enhanced data-driven decision making. Organizational goals . . . Empower end Users Control costs Improve outcomes S Reducing TCO and seamlessly integrating with existing investments to enable greater ROI Providing secure anywhere, anytime access to data and analytics for improved productivity. 7 of 22
  8. 8. Ted Arden, Dell EMC 8 of 22
  9. 9. Dell - Internal Use - Confidential9 of 12 Traditional tools are not working #1 Challenge Organizations cite TCO as biggest obstacle to data integration tools Dell accelerates time to value by lowering data transformation costs & improve performance by augmenting the Enterprise Data Warehouse (EDW) Dell EMC Cloudera Syncsort ETL Offload Hadoop Solution reduces Hadoop deployment to weeks, develop Hadoop ETL jobs within hours, and become fully productive within days after deployment of all Data Warehouses are performance and capacity constrained *Gartner 70% Data integration and transformation drive a majority of the EDW capacity 80% 9 of 98
  10. 10. Dell - Internal Use - Confidential10 of 12 Too many workloads in the EDW Modernize the data pipeline with Hadoop Traditional data pipeline Enterprise data warehouse + ETL Data transformation jobs Business reporting Query Data staging tool Extract and load data Clean and parse data Disparate data sources The results Longer data transformation job times Not meeting SLAs for business reporting Slow Ad Hoc Query Too costly to scale Perf Capacity 10 of 98 Modern data pipeline Enterprise data warehouse Business reporting Query Hadoop + ETL Data transformation jobs Clean, parse, transform Disparate data sources The results Reduced data transformation job times Improved SLAs for business reporting Fast Ad Hoc Query Scales Economically Perf Capacity
  11. 11. Dell - Internal Use - Confidential11 of 12 Customer value Dell Services Reference Architecture ETL Offload PE R730XD, Networking Solution stack Components Customer value Faster deployment from months to weeks Hadoop Distribution Cloudera 5.9 Data management and security Data Transformation Syncsort DMX-h version 9.1 Convert SQL jobs into native Hadoop execution Deployment business application Build operational efficiency with Hadoop No other vendor offers this solution 11 of 98
  12. 12. Dell - Internal Use - Confidential12 of 12 Dell data solutions drive operational efficiency Reduce data warehouse administrative costs up to 76% Control costs Transform data 60% faster for analysis Improve productivity Develop and design complex data transformation jobs up to 54% faster Simplify ongoing operations 12 of 98
  13. 13. Dell - Internal Use - Confidential13 of 12 Dell EMC Cloudera Syncsort ETL offload Hadoop Solution Solution benefits • Integrates easily with Hadoop® • No coding necessary for easy deployment • No need for expertise on Apache Pig™, Hive™, and Sqoop™ • Closes the skills gap using Syncsort Differentiation • Reduces EDW admin costs up to 76%1 • Transforms data 60 percent faster for analysis2 • Designs transformation jobs up to 54% faster3 Primary use case: Scale out solution to optimize data management, processing and analytics Pod Network 2x Dell EMC Networking S4048 10GbE Pod Switches 1x S3124 iDRAC Switch Data Nodes 10x Dell EMC PowerEdge R730xd with 3.5 Drives – 48 TB or 10x PowerEdge R730xd with 2.5” Drives – 24TB or 20x PowerEdge FC630 / FD332 – 32 TB Infrastructure Nodes 1x Dell EMC PowerEdge™ R630 Admin Node 3x PowerEdge R730xd Name Nodes 1x PowerEdge R730xd Edge Node or 1x PowerEdge FC630 Name Nodes Admin Node 3x PowerEdge FC630 Name Nodes 1x PowerEdge FC630 Edge Node Cluster Network 2x Dell EMC Networking S6000 40GbE Cluster Switches Cloudera ™ Enterprise Syncsort™ DMX-h™ 1Cost advantages report 2Performance advantages report 3Design advantages report 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42Stack-ID LNK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ACT 50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 53 Stack-ID LNK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ACT 50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 53 120124 112116 104108 96100 8892 8084 7276 6468 5660 4852 4044 3236 2428 1620 812 04 Stack ID 120124 112116 104108 96100 8892 8084 7276 6468 5660 4852 4044 3236 2428 1620 812 04 Stack ID Stack No. 1 2 25 26SFP+ 3 5 7 9 11 4 6 8 10 12 13 15 17 19 21 14 16 18 20 22 24 LNK ACT1 2 23 LNK ACT COMBO PORTS 23 24 6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17 6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17 6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17 13 of 22
  14. 14. Dell - Internal Use - Confidential14 of 12 Operational Efficiency: From use case to action Source 1. Connect 3. Act2. Analyze Preventive Maintenance IT Resource Capacity and Unitization Operational Process Improvement Business Process Cost Optimization Cyber Security Analytics Improved Forecasting Compliance and Reporting Operational data sources Extract, transform load Business reporting and query Enterprise data warehouse Enterprise data warehouse Relational management database Relational Management database Data mart Data mart Services • Management • Infrastructure • Security • Dell Financial Services Parse Clean Translate Sort Aggregate Group Compute + Data 14 of 22
  15. 15. Sean Anderson, Cloudera 15 of 22
  16. 16. 16© Cloudera, Inc. All rights reserved. Traditional Monolithic Analytic Databases No Cloud Elasticity or Cloud Storage Integration Rigid Data Model with Tightly Coupled Storage/Compute Limited to SQL with Data Movement Necessary Static Sizing ∞ COMPUTE STORE
  17. 17. 17© Cloudera, Inc. All rights reserved. Challenges Across the Business Enterprise Architect Existing Systems Hitting Their Limits • How long does it take to bring in more data/use cases? And what would the cost be? • What is your process for scaling today? • What is your plan for cloud? Missed SLAs & Overloaded Bottleneck • How much time do you spend troubleshooting vs developing new uses? • How long does it take to deliver on business requests? Limited Data & Insights of Latent Value • What limits on users, data, and time period exist? • How long does it take to get new reports/data? • Are you able to run actionable real-time analysis? Meet Compliance Needs & Protect Data • How do you manage siloed security & governance across workloads and systems? • Is sensitive data available for analysis? IT/DBA Security Team & Data Steward SQL Developer & Business Analyst
  18. 18. 18© Cloudera, Inc. All rights reserved. Cloudera’s Analytic Database Solution Identify, offload, & optimize workloads to Hadoop Navigator Optimizer Intelligent SQL editor Hue Audit, lineage, encryption, key management, & policy lifecycles Navigator Integration with the leading BI tools BI Partners Interactive query engine for BI & SQL analytics Impala Large-scale ETL & batch processing engine Hive-on- Spark Multi-Storage, Multi-Environment
  19. 19. 19© Cloudera, Inc. All rights reserved. The DCC Rule D C C Complexity Maximize your optimization opportunities by exposing complex access patterns that make the best use of Hadoop’s architecture Compatibility Reduce development time by leveraging existing query compatibilities with Hadoop tools and get guidance for query rewrites Duplication Improve performance by easily detecting workload duplication and recommending top queries to optimize
  20. 20. 20© Cloudera, Inc. All rights reserved. Cloudera Navigator Optimizer Unlock Your Best Hadoop Strategy, Instantly Active Data Optimization for Hadoop to save you time and money • Instant workload insights • Intelligent optimization guidance • Reduce Hadoop workload development effort
  21. 21. Mark Muncy, Syncsort 21 of 22
  22. 22. 22 of 22
  23. 23. Goals of the Modern Data Architecture • Centralize all your data Collect raw data from every source from within the enterprise, regardless of complexity. Only when you are able to collect and retain all your data, you can see the full picture. • Turn raw data into insight Cleanse, blend and transform your data, give it context and meaning so decision makers can execute. • Maintain governance, compliance and security standards Increase consistency and confidence in decision making by preserving the confidentiality, integrity and availability of information. Protect data from unauthenticated and unauthorized access. • Eliminate complexities within IT Your Modern Data Architecture should automate and optimize your data needs, keep pace with the evolution of technology, and homogenize platforms and infrastructures. 23Syncsort Confidential and Proprietary - do not copy or distribute
  24. 24. Shift Data and ELT Workloads out of Data Warehouses 24Syncsort Confidential and Proprietary - do not copy or distribute
  25. 25. Simplify Big Data Integration with Syncsort 25Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Comply Simplify Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming. Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry. Design once, deploy anywhere & insulate your organization from rapidly changing eco- system. Future proof your applications for new compute frameworks, on premise or in the cloud.
  26. 26. Simplify Big Data Integration with Syncsort 26Syncsort Confidential and Proprietary - do not copy or distribute Access Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.
  27. 27. Access: Bring ALL Enterprise Data Securely to the Data Lake • Collect virtually any data from mainframe to relational, cloud and NoSQL sources • Batch & streaming sources • Access, re-format and load data directly into Hive & Parquet. No staging required! • Pull hundreds of tables at once into your data hub, whole DB schemas in one invocation • Load more data into Hadoop in less time 27Syncsort Confidential and Proprietary - do not copy or distribute Build Your Enterprise Data Hub
  28. 28. Access: Get Your Database data into Hadoop, At the Press of a Button • Pull multiple data sources and funnel into your data lake -- extract and move whole DB schemas in one invocation • One-step data movement, auto-generating jobs • Process multiple funnels in parallel on your edge node or from data nodes ‒ Leverages DMX-h high speed data engine via DTL ‒ Generated applications can be imported into GUI • In-flight transformations ‒ Filtering, funnel dependency ordering, mixed source/target, data type filtering, table exclusion/inclusion 28Syncsort Confidential and Proprietary - do not copy or distribute DMX DataFunnel™
  29. 29. Simplify Big Data Integration with Syncsort 29Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.
  30. 30. Integrate: Achieve the Fastest Path from Raw Data to Insight • Prepare data on-the-fly • Load into Hadoop without staging • Write directly into Big Data formats (Parquet, Hive, etc.) • Connect fast to NoSQL databases (Cassandra, HBase, etc.) • Cloud Connectivity: Amazon AWS, Google Cloud Platform, Microsoft Azure • Get the fastest, most efficient data joins and sorts • Dynamic planning/optimization at runtime • Create Tableau & Qlikview files with one click • Fastest parallel loads to Amazon Redshift, Greenplum, Netezza, Oracle, Teradata & Vertica 30Syncsort Confidential and Proprietary - do not copy or distribute Feed Business Intelligence Visualization
  31. 31. A single tool for designing both streaming and batch jobs Integrate: Single Interface for Streaming & Batch • Kafka, Spark, Apache Nifi, HDF • Combine legacy batch and cutting edge streaming data sources • Easy development in GUI – no need to write Scala, C or Java code 31Syncsort Confidential and Proprietary - do not copy or distribute Simplify Streaming Data Integration
  32. 32. Simplify Big Data Integration with Syncsort 32Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Comply Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming. Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry.
  33. 33. Comply: Secure, Manage & Monitor Your Cluster • Kerberos-secured clusters – Authenticated browsing – Authenticated sampling • Apache Sentry security certified • Cloudera Manager – Deploy DMX-h across cluster – Monitor DMX-h jobs 33Syncsort Confidential and Proprietary - do not copy or distribute
  34. 34. Comply: Get Governance, Metadata and Lineage • Metadata and data lineage for Hive, Avro and Parquet through HCatalog • Metadata lineage export from DMX – Simplify audits, analytics dashboards, metrics – Integrate with enterprise metadata repositories • Cloudera Navigator certified integration – Extends HCatalog metadata – HDFS, YARN, Spark and other metadata – Lineage, tagging – Business and structural metadata 34Syncsort Confidential and Proprietary - do not copy or distribute
  35. 35. Simplify Big Data Integration with Syncsort 35Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Comply Simplify Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming. Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry. Design once, deploy anywhere & insulate your organization from rapidly changing eco- system. Future proof your applications for new compute frameworks, on premise or in the cloud.
  36. 36. Simplify: Design Once, Deploy Anywhere • Use existing ETL skills • No need to worry about mappers, reducers, big side or small side of joins, and so on • Automatic optimization for best performance, load balancing, etc. • No changes or tuning required, even if you change execution frameworks • Future-proof job designs for emerging compute frameworks, e.g. Spark Single GUI Execute Anywhere! 36Syncsort Confidential and Proprietary - do not copy or distribute Intelligent Execution - Insulate your organization from underlying complexities of Hadoop.
  37. 37. Using the Dell | Cloudera | Syncsort solution for Hadoop, an entry-level technician developed and deployed Hadoop ETL jobs in 53.7% less time than a Hadoop expert Simplify: Reclaim days of valuable time Fact dimension load with type 2 SCD Data validation and pre-processing Vendor mainframe file integration Load Validate Int. Source: http://en.community.dell.com/techcenter/blueprints/m/resources 37Syncsort Confidential and Proprietary - do not copy or distribute Cut Development Time in Half! 8.3 Days 3.8 Days
  38. 38. Thank You 38 of 22

×