Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BigData Testing by Shreya Pal


Published on

Agile Testing Alliance hosted it's 16th Meetup in Pune on 9th Dec, 2017. Shreya Pal was one of the speakers in the meetup and gave a insightful session on BigData Testing. All the rights belong the author

Published in: Technology
  • Login to see the comments

BigData Testing by Shreya Pal

  1. 1. Shreya Pal Bigdata Testing
  2. 2. Shreya Pal Bigdata Testing Presented as a part of ATA Pune 16th Meetup on 9th Dec, 2017
  3. 3. What is Bigdata?
  4. 4. Why not the old approach? Traditional relational databases like Oracle, MySQL, SQL Server cannot be used to big data since most of the data we have will be in unstructured format. • Variety of data – Data can be in the form of images, video, pictures, text, audio etc. This could be military records, surveillance videos, biological records, genomic data, research data etc. This data cannot be stored in the row and column format of the RDBMS. • The volume of data stored in big data is huge. This data needs to be processed fast and this requires parallel processing of the data. Parallel processing of RDBMS data will be extremely expensive and inefficient. • Data creation velocity – Traditional databases cannot handle the velocity with which large volumes of data is created. Example: 6000 tweets are created every second. 510,000 comments are created every minute. Traditional databases cannot handle this velocity of data being stored or retrieved.
  5. 5. New Class of systems
  6. 6. Testing Big Data Applications
  7. 7. High Level Architecture Data Ingestion Data Storage Data Processing Data Consumption
  8. 8. Data Ingestion Full/Incremental load Multi source Integration Checksum validation CDC Data Ingestion
  9. 9. Compression ArchivalFile Format Purging Data Storage Data Storage
  10. 10. Data Quality Data Harmonization Data Transformation and aggregation Data Standardization Data Processing Data Processing
  11. 11. Metrics and object validation Dashboard validation Reports validation Data Consumption Other Device Compatibility Data Consumption
  12. 12. What Else ? • Infrastructure Testing • Performance Testing • Security Testing • Functional Testing
  13. 13. Infrastructure Testing In premise setup Connectivity between nodes Cloud setup Infrastructure External Integrations
  14. 14. Performance Testing Data Processing speed Memory/CPU Utilization Performance Dashboard Rendering Data Load Performance Sub System performance
  15. 15. Security Testing Authentication Role based AuthorizationSingle Sign On Access to Name node and data node Encryption Security
  16. 16. Functional Testing KPI calculations Data Quality Rules Data Aggregation rules Functional Source To Target Mapping
  17. 17. Bigdata vs traditional
  18. 18. Properties Traditional database testing Big data testing Data Tester work with structured data Tester works with both structured as well as unstructured data Testing approach is well defined and time-tested Testing approach requires focused R&D efforts Tester has the option of "Sampling" strategy doing manually or "Exhaustive Verification" strategy by automation tool "Sampling" strategy in Big data is a challenge
  19. 19. Properties Traditional database testing Big data testing Validation Tools Tester uses either the Excel based macros or UI based automation tools Tester works with both structured as well as unstructured data No defined tools, the range is vast from programming tools like MapReduce to HIVEQL Testing Tools can be used with basic operating knowledge and less training. It requires a specific set of skills and training to operate testing tool. Also, the tools are in their nascent stage and overtime it may come up with new features.
  20. 20. Challenges
  21. 21. Data volume
  22. 22. variety
  23. 23. Technology Landscape
  24. 24. Thanks Linkedin: Blog: -