Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Making Enterprise Big Data Small with Ease

373 views

Published on

Every division in an organization builds its own database to keep track of its business. When the organization becomes big, those individual databases grow as well. The data from each database may become silo-ed and have no idea about the data in the other database.

https://hortonworks.com/webinar/making-enterprise-big-data-small-ease/

Published in: Technology
  • Be the first to comment

Making Enterprise Big Data Small with Ease

  1. 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Making Enterprise Big Data Small with Ease Nagapriya Tiruthani, Offering Manager, IBM Db2 Big SQL Roni Fontaine, Director, Product Marketing, Hortonworks March 28, 2018
  2. 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Presenters Nagapriya Tiruthani Offering Manager IBM Db2 Big SQL Roni Fontaine Director of Product Marketing HDP, Operations and Cloud
  3. 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Agenda ÃData today and its benefits ÃDb2 Big SQL Use cases ÃDb2 Big SQL’s Federation and how it works ÃHow Hive can Help ÃDb2 Big SQL and Hive Working Together ÃResources / Q & A
  4. 4. Need to accommodate volume, variety and velocity of data Mobile and IoT are driving need for more real-time, accurate information Increased adoption of advanced analytics and self-service discovery Need for agile data services with high levels of security Rules of the Game have Changed © 2018 IBM Corporation
  5. 5. • Business Intelligence • Extending a warehouse to include outside data • Prototyping new applications using disparate data • Handling mergers and acquisitions as well as staged migrations between databases • Warehouse or ODS maintenance (trickle-feed) • Business Integration • Real-time, targeted access to multiple operational databases • Replicate/consolidate distributed data to a central store • Portal development –access disparate data and make available for Web deployment • Warehouse Augmentation • Capacity Relief • Queryable archive • Multi-temperate business models • Migrate dimensioned data from 3rd party RDBMS • Discovery & Exploration 5© 2018 IBM Corporation Benefits of Bringing all the Data Together
  6. 6. SQL Ad-hoc queries, data preparation Federation Transactional with fast lookups High performance Machine learning Complex SQL, Deep analytics, Many users Application portability Db2 Big SQL Use Cases © 2018 IBM Corporation
  7. 7. Db2 Big SQL Data lakes Object stores NoSQL RDBMS Federation: Eliminating the Information Bottleneck © 2018 IBM Corporation
  8. 8. Transparent § Appears to be one source § Programmers don’t need to know how / where data is stored Heterogeneous § Accesses data from diverse sources High Function § Full query support against all data § Capabilities of sources as well Autonomous § Non-disruptive to data sources, existing applications, systems. High Performance § Optimization of distributed queries SQLtools&applications Datasources Db2 Big SQL Oracle SQL Server Teradata DB2 Others.. SAP Hana WebHDFS S3 What does Db2 Big SQL Federation get you? © 2018 IBM Corporation
  9. 9. Federation – Virtualize Heterogeneous Data 9 MS SQL Server Netezza (PDA) Oracle PostgreSQL Teradata DB2 LUW, Db2z, DB2 on i Informix WebHDFS Object Store (S3) Hive HBase HDFS Hortonworks Data Platform (HDP) Db2 Big SQL NoSQL ML Model Db2 Big SQL queries heterogeneous systems in a single query Only SQL-on-Hadoop that virtualizes more than 10 different data sources: RDBMS, NoSQL, HDFS or Object Store © 2018 IBM Corporation
  10. 10. Optimizer Flow for federated queries 10 © 2018 IBM Corporation Pushdown analysis is only part of the optimizer’s job. 1. Rewrite user queries (syntactic transformations) 2. Select best local execution plan 3. Generate correct SQL for remote query fragments (relational sources) Parse Query Rewrite Query Cost-based Plan Selection Remote SQL Generation Pushdown Analysis (relational nicknames) Code Generation, local query portions
  11. 11. • Db2 Big SQL decides whether some or all parts of a query can be "pushed-down", i.e. processed at the remote data source(s). Pushdown-ability depends on • availability of needed functionality at remote source • server options (example: is collating sequence at Federated server and remote source the same?) • Example: A remote source that can handle an equality predicate, but not count(*).... 11 “Pushdown” of Query Operations © 2018 IBM Corporation Select count(*) from t1 where col = 27 Select count(*) from t1 where col = 27 Application IBM database Non-IBM database Select ‘1’ from t1 where col = 27
  12. 12. • Just because processing can be pushed down doesn't mean it will be. Decision influenced by estimates of rows processed/returned. • Consider a join of two nicknames ORA.T1 and ORA.T2 on a single remote source that is "nearly" a Cartesian product. May be better to do the join at the Db2 Big SQL engine to avoid retrieval of many rows. • Retrieving (10,000 + 25) rows to do a local join is probably faster than retrieving (10,000 * 25) = 250,000-row remote join result 12 Actual Pushdown is Cost Based © 2018 IBM Corporation ORA.T1 (25 rows) ORA.T2 (10,000 rows) Select … from ORA.T1, ORA.T2 where T1.a = T2.b;
  13. 13. ✓ Easily access information on demand ✓ Combine data in Hadoop with disparate sources to form a data lake ✓ Quickly extend your data warehouse by enriching it Connect Query Monitor Data Placement § Quick access to Data value § Common Framework § ODBC/JDBC § Spark integration enables new data sources § Connect all data sources in single query § Intelligent Query Routing § Cost-based optimizer § SQL pushdown § Local data caching § ANSI-compliant SQL § Easily define & manage through a common UI § Simple point & click to discover and query § Monitor and visualize active queries § Schema conversion when moving data § Bulk data copy to Hadoop § Filtered subsets of data Rich Capabilities that Brings Data Together © 2018 IBM Corporation
  14. 14. Query Augment Monitor Security Performance § Intelligent Query Routing § Cost-based optimizer § SQL pushdown § ANSI-compliant SQL § Quick access to Data value § Query with open source Hadoop file formats § Access data in RDBMS & NoSQL data sources § Operationalize ML models § Spark integration for in- memory data exchange § Local data caching for federated queries using MQTs § Automatic memory manager manages queries to completion § Manage workloads and prioritize § Audit queries § Simple point & click to discover and query § Monitor and visualize active queries § Granular SQL level access control for row filtering and column masking § Define policies in Ranger for centralized security management § Create MQTs to cache data aggregate for fast response § Enable elastic boost to maximize resources consumption and parallel execution ✓ Executes complex queries with high performance ✓ Combine data in Hadoop with disparate sources to form a data lake ✓ Enables reusing application and skills IBM Db2 Big SQL One engine for all enterprise needs on Hadoop © 2018 IBM Corporation
  15. 15. • With Db2 Big SQL, users can focus on what they want to do, and not worry about how it is executed • Data Scientists/Business Analysts can be 3-4 times more productive using Db2 Big SQL compared to Spark SQL. • Proof points: • Able to successfully run all 99 TPC-DS queries @ 100TB in 4-concurrent streams • Performance leadership • Uses fewer cluster resources • Simpler configuration with mature self-tuning and workload management features Db2 Big SQL is the best SQL over Hadoop engine for complex analytical workloads To Summarize © 2018 IBM Corporation
  16. 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved How Hive Helps
  17. 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved HDP 2.6: A Major Milestone for Apache Hive à Major Improvements: • Hive LLAP Now GA • ACID MERGE • SQL: All 99 TPC-DS out-of-the-box with only trivial rewrites • Hive View 2.0: Great Features for DBAs • Diagnostics: Tez UI Total Timeline View • Hive OLAP Indexes powered by Druid à At a High Level: • 1200+ features, improvements and bug fixes in Hive since HDP 2.5. • 400+ of these from outside of Hortonworks. 820 413 From Hortonworks From Community HDP 2.6 Improvements Hive LLAP GA+ SQL MERGE+ All TPC-DS Queries+
  18. 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved HDP 2.6 Makes Hadoop Data Management a Reality with SQL MERGE • Hive implements ANSI-standard SQL MERGE. • MERGE makes data maintenance 8x simpler with 5x higher performance. • Legacy Hive or Spark approaches don’t protect applications against dirty reads or partial failures. Complexity of a Type 2 SCD Update with and without MERGE Number of Queries Number of Full Table Scans Isolation Applications Protected From Partial Failures Hive MERGE 1 1 Yes Yes Old Techniques 8 5 No No
  19. 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Using Hive & Db2 Big SQL
  20. 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Application Portability & Integration Data shared with Hadoop ecosystem Comprehensive file format support Superior enablement of IBM and Third Party software Performance Modern MPP runtime Powerful SQL query rewriter Cost based optimizer Optimized for concurrent user throughput Results not constrained by memory Federation Distributed requests to multiple data sources within a single SQL statement Main data sources supported: DB2, Teradata, Oracle, Netezza, Informix, SQL Server Enterprise Features Advanced security/auditing Resource and workload management Self tuning memory management Comprehensive monitoring Rich SQL Comprehensive SQL Support IBM SQL PL compatibility Extensive Analytic Functions How Db2 Big SQL Complements Hive
  21. 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Breaking Things Down: Where IBM Db2 Big SQL Shines Run Oracle, DB2 or Netezza Workloads on Hadoop+ Federate Hadoop and Non-Hadoop Data+ Complex SQL Workloads+
  22. 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Breaking Things Down: Where Apache Hive Shines Fast SQL That Scales from Terabytes to Petabytes+ Easy to Keep Data Fresh with ACID MERGE+ Join Historical and Streaming Data in Real Time+
  23. 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Resources
  24. 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Resources • HWX Db2 Big SQL Web Page: lhttps://hortonworks.com/partners/ibm-bigsql/ • Db2 Big SQL Solutions Sheet: https://2xbbhjxc6wk3v21p62t8n4d4-wpengine.netdna- ssl.com/wp-content/uploads/2017/08/IBM-Big-SQL-Solution-Sheet_final.pdf • Db2 Big SQL Data Sheet https://2xbbhjxc6wk3v21p62t8n4d4-wpengine.netdna- ssl.com/wp-content/uploads/2017/08/IBM-Big-SQL-Datasheet_final.pdf • Db2 Big SQL Resources • Db2 Big SQL Web Page/Sandbox https://www.ibm.com/us-en/marketplace/big-sql • Db2 Big SQL Master Class Videos https://www.youtube.com/playlist?list=PL7FnN5oi7Ez9itAnZ6rs9A30YYjVB1wN_ • Db2 Big SQL Blog: https://hortonworks.com/blog/big-sql-apache-hadoop-across- enterprise/
  25. 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

×