Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing Code

32,720 views

Published on

Testing of Hadoop, NoSQL and Data Warehouses Visually
-----------------------------------------------------------------------------
We just made automated data testing really easy. Automate your Big Data testing visually, with no programming needed.

See how to automate Hadoop, No SQL and Data Warehouse testing visually, without writing any SQL or HQL. See how QuerySurge, the leading Big Data testing solution, provides novices and non-technical team members with a fast & easy way to be productive immediately while speeding up testing for team members skilled in SQL/HQL.

This webinar is geared towards:
- Big Data & Data Warehouse Architects, ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors

You will learn how to:
• Improve your Data Quality
• Accelerate your data testing cycles
• Reduce your costs & risks
• Realize a huge ROI

Published in: Software

Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing Code

  1. 1. built by QuerySurge™ Automated Big Data Testing without Writing Code Testing of Hadoop and Data Warehouses Visually Bill Hayduk CEO/President RTTS Jeff Bocarsly, PhD Chief Architect QuerySurge /RTTS
  2. 2. PresentationTopics built by QuerySurge™ • Testing a Data Warehouse • Testing Big Data • Current Data Testing Strategies • About QuerySurge • Demo
  3. 3. built by QuerySurge™ About FACTS Founded: 1996 Headquarters: New York Customer profile: • Fortune 1000 • 600+ customers Strategic Partners: IBM, Microsoft, HP, Oracle, Teradata, HortonWorks, Cloudera, Amazon Web Services Software: QuerySurge RTTS is the leading provider of software & data quality for critical business systems
  4. 4. “70% of enterprises have either deployed or are planning to deploy big data projects and programs this year” – analyst firm IDG “46% of companies cite data quality as a barrier for adopting Business Intelligence products.” - InformationWeek “Poor data quality is a primary reason for 40% of all business initiatives failing to achieve their targeted benefits.” - analyst firm Gartner Data Quality Issues built by QuerySurge™
  5. 5. Business Intelligence (BI) software CxOs are using Business Intelligence & Analytics to make critical business decisions – with the assumption that the underlying data is fine. “The average organization loses $14.2 million annually through poor Data Quality.” - Gartner The Executive Office & Critical Data potential problem areas ETL Data Architecture Flat Files
  6. 6. Data Warehouse Testing built by
  7. 7. Data Warehouse: the Marketplace “The data warehousing market will see a compound annual growth rate of 11.5% …to reach a total of $13.2 billion in revenue.” - consulting specialist The 451 Group Data Warehouse software vendors - Analyst firm Gartner’s Magic Quadrant for Data Warehouse Database Management Systems Leaders Challengers built by QuerySurge™
  8. 8. Extract built by QuerySurge™ Legacy DB CRM/ERP DB Finance DB Testing the Data Warehouse: the ETL process Source Data ETL Process Target Data Warehouse Transform Load
  9. 9. Testing the Data Warehouse: Test Entry Points Recommended functional test strategy: Test every entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal: provide rapid localization of data issues between points test entry point test entry point test entry points built by QuerySurge™ Legacy DB CRM/ERP DB Finance DB ETL ETL Source Data ETL Process Target DW ETL Process Data Mart Business Intelligence software
  10. 10. Big Data Testing built by
  11. 11. Big Data Vendors built by QuerySurge™ Big Data technology & services market will grow at a 26.4% CAGR to $41.5 billion through 2018, or about 6x the growth rate of the overall IT market. - Analyst firm IDC
  12. 12. Basic Hadoop Architecture MapReduce (Task Tracker) HDFS (Data Node) MapReduce – processing part that manages the programming jobs. (a.k.a. Task Tracker) HDFS (Hadoop Distributed File System) – stores data on the machines. (a.k.a. Data Node) machine Cluster Add more machines for scaling, from 1 to 100 to 1,000 Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Name Node Coordination for HDFS. Inserts and extraction are communicated through the Name Node. accepts jobs, assigns tasks, identifies failed machines
  13. 13. MapReduce (Task Tracker) HDFS (Data Node)HiveQLHiveQL HiveQLHiveQL HiveQL Hive - a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hive provides a mechanism to query the data using a SQL-like language called HiveQL that interacts with the HDFS files • create • insert • update • delete • select Hive
  14. 14. 2 Use Cases: Hadoop Data Warehouse NoSQL Hadoop Data Warehouse
  15. 15. Recommended functional test strategy: Test every entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal: provide rapid localization of data issues between points test entry point built by Business Intelligence software ETL Source Data Source Hadoop ETL Process Target DWH built by QuerySurge™ Use Case #1: Data Warehouse & Hadoop test entry point test entry points
  16. 16. Use Case #2: MongoDB, Hadoop, Data Warehouse Relational DB & Data WarehousingSource Data @ BI, Analytics & ReportingIngestion built by ™ test entry point test entry point test entry point test entry point test entry point
  17. 17. 2 Prevalent DataTesting Strategies built by 1) Stare & Compare (also known as sampling) 2) Minus Queries
  18. 18. Strategy #1: Stare & Compare built by QuerySurge™ • Review Mapping Document (business rules, data flow mapping, data movement requirements) • Write Tests in SQL editor • Execute 2 Tests: 1 at Source & 1 at Target • Dump results to 2 Excel files • Compare results by eye (‘Stare & Compare’ or ‘sampling’) Issue with Stare & Compare: Impossible to visually compare billions of data sets. Result: usually less than 1% of data is compared Example: Current QuerySurge customer has: • a single test with 100 million rows & 200 columns • = 20 billion data sets • the client has > 7,000 total tests
  19. 19. built by QuerySurge™ MINUS QUERIES subtract one result set from another result set to show difference Comment: MINUS QUERIES need to be executed 2x (Source MINUS Target; Target MINUS Source) Result sets may not be accurate when dealing with duplicate rows of data No historical data from past testing – audit and regulatory issues Processing of minus queries puts pressure on the servers Double execution means 2x testing time and resource utilization Potential for false positives (bad data could exist on both sides of an ETL leg) DataTesting Strategy #2: Minus Queries Minus Query #1: Table_1 MINUS Table_2 Minus Query #2: Table_2 MINUS Table_1 Result Set #1 Result Set #2 ISSUES with MINUS QUERIES Write 2 MINUS queries in SQL editor Execute MINUS queries 2x
  20. 20. DataTesting Strategies built by QuerySurge™ a fundamental issue with both current strategies: Assumption that all team members understand and can write SQL or HQL code
  21. 21. About QuerySurge™ built by
  22. 22. What is QuerySurge™? the collaborative Big Data Testing solution that finds bad data & provides a holistic view of your data’s health built by
  23. 23. the QuerySurge advantage built by QuerySurge™ Automate the entire testing cycle  Automate kickoff, tests, comparison, auto-emailed results Create Tests easily with no programming  ensures minimal time & effort to create tests / obtain results Test across different platforms  data warehouse, Hadoop, NoSQL, database, flat file, XML Collaborate with team  Data Health dashboard, shared tests & auto-emailed reports Verify more data & do it quickly  verifies up to 100% of all data up to 1,000 x faster Integrate for Continuous Delivery  Integrates with most Build, ETL & QA management software
  24. 24. Collaboration Testers - functional testing - regression testing - result analysis Developers / DBAs - unit testing - result analysis Data Analysts - review, analyze data - verify mapping failures Operations teams - monitoring - result analysis Managers - oversight - result analysis Share information on the built by QuerySurge™
  25. 25. QuerySurge™ Architecture Web-based… Installs on... Linux Connects to… …or any other JDBC compliant data source built by QuerySurge™ QuerySurge Controller QuerySurge Server QuerySurge Agents Flat Files
  26. 26. SQL HQL SQL HQL SQL SQL  QS pulls data from data sources  QS pulls data from target data store  QS compares data quickly  QS generates reports, audit trails How QuerySurge Works Reports, Data Health Dashboard, auto emails built by QuerySurge™ Source Data Target Data Data Stores • Databases • Data Warehouses • Data Marts Flat Files • Fixed Width • Delimited • Excel Big Data stores • Hadoop • NoSQL Data Warehouses XML
  27. 27. built by QuerySurge™ all QuerySurge™ Modules Design Library SchedulingDeep-Dive Reporting Run Dashboard Query Wizards Data Health Dashboard
  28. 28. Design Library • Create Query Pairs (source & target SQLs) • Great for team members skilled with SQL QuerySurge™ Modules Scheduling  Build groups of Query Pairs  Schedule Test Runs built by QuerySurge™
  29. 29. Deep-Dive Reporting  Examine and automatically email test results Run Dashboard  View real-time execution  Analyze real-time results QuerySurge™ Modules built by QuerySurge™
  30. 30. QuerySurge Test Management Connectors built by QuerySurge™  Drive QuerySurge execution from your Test Management Solution  Outcome results (Pass/Fail/etc.) are returned from QuerySurge to your Test Management Solution  Results are linked in your Test Management Solution so that you can click directly into detailed QuerySurge results • HP ALM (Quality Center) • Microsoft Team Foundation Server • IBM Rational Quality Manager Integration with leading Test Management Solutions
  31. 31. QuerySurge & DevOps: Continuous Delivery & Integration built by QuerySurge™ Automated Testing Automated Reporting Automated Launch Data Integration/ETL solutions QuerySurge™ and many others… email report Test Management solutions QuerySurge™ email report and many others… QuerySurge™ Automated Build solutions email report
  32. 32. built by Introducing the new We just made data testing REALLY EASY! No programming needed Testing Big Data Visually
  33. 33. built by From a recent poll1 of: • Big Data Experts • Data Warehouse Architects • Solution Architects • ETL Architects Recent Survey: Data Experts Consensus Answer: 80% of data columns have no transformation at all Our Question: What % of columns in your projects have no transformations at all? 1Poll conducted by RTTS on targeted LinkedIn groups Why is this important?
  34. 34. Fast and Easy. No programming needed. built by QuerySurge™ QuerySurge™ Modules Compare by Table, Column & Row • Perform 80% of all data tests •Automatically generates SQL & HQL code • Opens up testing to novice & non- technical team members • Speeds up testing for skilled SQL coders • provides a huge Return-On-Investment
  35. 35. built by QuerySurge™ QuerySurge™ Modules 3 Types of Data Comparison Wizards: The also provide you with automated features for: o filtering (‘Where’ clause) and o sorting (‘Order By’ clause) Column-Level Comparison: This is great for Big Data stores and Data Warehouses Table-Level Comparison: This comparator is great for Data Migrations and Database Upgrades. Row Count Comparison: Great for all - Big Data stores, Data Warehouses, Data Migrations and Database Upgrades.
  36. 36. Uses: Tests the columns that have no transformations, which means it tests approximately 80% of your data store without you writing any SQL code Tests: Big Data, Data Warehouses Value added: novice or non-technical: no coding needed, productive immediately experienced user: saves time built by QuerySurge™
  37. 37. built by QuerySurge™ (we picked Column-Level Comparison)
  38. 38. Uses: Verifies data loads when no transformation occurs Tests: data migrations, upgrades Value added: novice or non-technical: no coding needed experienced user: saves time built by QuerySurge™
  39. 39. Use: Verify that the amount of rows from the source match the amount from the target Tests: Big data, data warehouse, data migration, database upgrades, data interfaces Value added: novice: no coding needed experienced user: saves time built by QuerySurge™ _________ Total
  40. 40. 10/15/2015 40 built by QuerySurge™ Training Courses Data Warehouse Testing • Data Warehouse & ETL Testing Fundamentals (1 day) • Fundamentals of QuerySurge (1 day) • Introduction to SQL for QuerySurge (1 day) • Advanced SQL techniques for QuerySurge (1 day) Big Data Testing • Big Data And ETL Testing Fundamentals • Introduction To Big Data Testing Using Hive And HQL Consulting RTTS, the software quality experts (and developer of QuerySurge), provides consulting solutions to the challenges of Big Data & Data Warehouse / ETL Testing • Jumpstart 2-week program – combines training courses, mentoring, consulting • Staff Augmentation – add additional RTTS resources to your team • Outsourcing - RTTS can perform all testing, including planning, design, execution
  41. 41. (1) Trial in the Cloud of QuerySurgeTM, including self-learning tutorial that works with sample data for 3 days (2) Downloaded Trial of QuerySurgeTM, including self-learning tutorial with sample data or your data for 15 days (3) Proof of Concept of QuerySurgeTM includes our team of experts assisting you for 30 days for more information on (1), (2) and (3), Go to http://www.querysurge.com/compare-trial-options TRIAL IN THE CLOUD built by QuerySurge™ Free TrialsQuerySurge™ Proof of Concept
  42. 42. built by QuerySurge™ QuerySurge Demo

×