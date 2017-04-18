1 © Hortonworks Inc. 2011 –2016. All Rights Reserved1 © Hortonworks Inc. 2011 –2017. All Rights Reserved
Hortonworks EDW Optimization Solution Components Hadoop Scalable Storage and Compute Hive LLAP High Performance SQL Data Mart Fast, scalable SQL analytics Intelligent in-memory caching
15 © Hortonworks Inc. 2011 –2016. All Rights Reserved Hortonworks EDW Optimization Solution Components Syncsort High-Perfo...
16 © Hortonworks Inc. 2011 –2016. All Rights Reserved Hortonworks EDW Optimization Solution Components Syncsort High-Perfo...
17 © Hortonworks Inc. 2011 –2016. All Rights Reserved EDW Optimization: Fast BI on Hadoop Ã The Problem: – Proprietary EDW...
18 © Hortonworks Inc. 2011 –2016. All Rights Reserved EDW Optimization: ETL Offload Ã The Problem: – EDWs consume between ...
19 © Hortonworks Inc. 2011 –2016. All Rights Reserved EDW Optimization: Active Archive Ã The Problem: – Increasing data vo...
Powering the Connected Data Platform With EDW Optimization Paige Roberts Big Data Product Manager @RobertsPaige
Goals of the Modern Data Architecture •Centralize all your data •Turn raw data into insights •Maintain governance, complia...
Syncsort Strategic Focus on Big Data & Hadoop Light footprint Self-tuning engine Single install. No 3rd party dependenc...
Ã Connect to virtual any data source, including mainframe and MPP databases. Ã Move data into and out of Hadoop up to 6x ...
Bring ALL Enterprise Data Securely to the Data Lake 24 • Collect virtually any data from mainframe to relational, cloud a...
Get Your Database data into Hadoop, At the Press of a Button 25 • Pull multiple data sources and funnel into your data lak...
Intelligent Execution Layer Design Once, Deploy Anywhere 26 Intelligent Execution - Insulate your people from underlying...
Insurance: Easy Access to ALL Data for Better Analytics 27 • Challenge: Needed hard-to-access operational data for advance...
Hotel Chain: Ease of Use, Timely & Up-to-Date Reporting 28 • Challenge: More timely collection & reporting on room availa...
Leading Media Company: Accelerate New Business Initiatives 29 • Challenge: Build scalable platform to support new business...
30 © Hortonworks Inc. 2011 –2016. All Rights Reserved Q / A Learn More: Ã EDW Optimization with HDP – http://hortonworks.c...
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights

  Goals of the Modern Data Architecture •Centralize all your data •Turn raw data into insights •Maintain governance, compliance and security standards •Eliminate complexities within IT
  Syncsort Strategic Focus on Big Data & Hadoop Light footprint Self-tuning engine Single install. No 3rd party dependencies World-class data processing, mainframe expertise JIRA: MAPREDUCE-2454 MAPREDUCE-4807 MAPREDUCE-4049 MAPREDUCE-5455 HIVE-8347 SQOOP-1272 PARQUET-134 Spark-packages and more! Ongoing Contributions to the Open Source Community Leverage Syncsort Technology Innovations & Mainframe Heritage Strong Partnerships with Strategic Big Data & Hadoop Players
  Ã Connect to virtual any data source, including mainframe and MPP databases. Ã Move data into and out of Hadoop up to 6x faster without the need for manual scripts. Ã Develop ETL processes without writing code. Ã Automatically optimize Hadoop performance and scalability for ETL operations. Ã Fully certified and integrated with Hortonworks Data Platform and HDF Ã Secure – Kerberos, Ranger, Sentry Syncsort Benefits Syncsort: High Performance Import from Existing Sources
  Bring ALL Enterprise Data Securely to the Data Lake • Collect virtually any data from mainframe to relational, cloud and NoSQL sources • Access, re-format and load data directly into Hive & Hadoop file formats. No staging required! • Batch & streaming sources • Pull hundreds of tables at once into your data hub, whole DB schemas in one invocation •Load more data into Hadoop in less time
  Get Your Database data into Hadoop, At the Press of a Button • Pull multiple data sources and funnel into your data lake • Extract and map whole DB schemas in one invocation • Extract from multiple data sources: DB2/z, Netezza, Oracle, Teradata,… • One-step data movement, auto-generating jobs, auto-generating Hive target tables, and update Hive statistics • Process multiple funnels in parallel on your edge node or from data nodes ‒ Leverages DMX-h high speed data engine via DTL ‒ Generated applications can be imported into GUI • In-flight transformations ‒ Filtering, funnel dependency ordering, mixed source/target, data type filtering, table exclusion/inclusion DMX DataFunnel™
  Intelligent Execution Layer Design Once, Deploy Anywhere Intelligent Execution - Insulate your people from underlying complexities of Hadoop. One interface to design jobs to run on: Single Node, Cluster MapReduce, Spark, Future Platforms Windows, Unix, Linux On-Premise, Cloud Batch, Streaming • Use existing ETL skills. • No worries about mappers, reducers, big side, small side, and so on. • Automatic optimization for best performance, load balancing, etc. • No changes or tuning required, even if you change execution frameworks • Future-proof job designs for emerging compute frameworks, e.g. Spark
  Insurance: Easy Access to ALL Data for Better Analytics • Challenge: Needed hard-to-access operational data for advanced analytics • Solution: • Quickly load ~1000 database tables into HDP with the click of a button • Access & integrate complex Mainframe VSAM files, data from DB2/z, Oracle & SQL Server • Track changes & keep data up to date • Benefits: • Insight: Better and faster analytics • Agility: Reclaim development time; single tool to ingest, detect changes and populate the data lake • Compliance: Build audit trails, keep HDP data lake current • Productivity: No need for deep understanding of Hadoop
  Hotel Chain: Ease of Use, Timely & Up-to-Date Reporting • Challenge: More timely collection & reporting on room availability, event bookings, inventory and other hotel data from 4,000+ properties globally • Solution: • Near real-time reporting • DMX-h consumes property updates from Kafka every 10s • DMX-h processes data on HDP, loading to TD every 30 min • Deployed on Google Cloud Platform • Benefits: • Time to Value: DMX-h ease of use drastically cut development time • Agility: Reports updated every 30 minutes vs every 24 hours • Productivity: Leveraging ETL team for Hadoop (Spark), visual understanding of data pipeline • Insight: Up-to-date data = better business decisions = happier customers
  Leading Media Company: Accelerate New Business Initiatives • Challenge: Build scalable platform to support new business initiatives & scale for double-digit data growth, while reducing escalating EDW & ELT Costs • Solution: • Shift data storage & processing out of the EDW into Hadoop • Migrate 500+ SQL ELT workloads to DMX-h on HDP • Benefits: • Agility: Scalable architecture to deploy new business initiatives – analyze more set top box data, blend website user activity data, etc. • Cost: Millions of dollars in savings from EDW, including SQL tuning & maintenance costs • Productivity: ETL developers can stop coding & tuning, and get up & running on Hadoop quickly
