Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
How to Test Big Data
Systems
The
definition
of Big
Data
| Big Data is perceived as a huge amount of data and information
| However, it is a lot more th...
The three parameters on which Big
Data is defined
3
Testing Big Data can be quite a
challenge for organizations
| Traditional analysis techniques have certain limitations
| D...
Aspects of Big Data Testing
5
Risk of failing
| Failure in Big Data Testing could have negative consequences and It may result in:
| Production of poor ...
Functional
Testing
| Functional Testing is performed in three stages:
| Pre-Hadoop Process Testing
| MapReduce Process Val...
Pre-Hadoop Process Testing
| HDFS stands for Hadoop Distributed File System
| HDFS lets you store huge amount of data on a...
Initial stage of Testing
| Verification of the data acquired from the original source to check if it is corrupted or
not
|...
MapReduce Process Validation
| MapReduce Processing is a data processing concept used to compress the massive
amount of da...
Extract-Transform-Load Process
Validation and Report Testing
| ETL Process Validation and Report Testing: ETL stands for E...
Purposes of ETL Process Validation &
Report Testing
| To check the correct application of transformation rules
| Inspectio...
Non-
Functional
Testing
| Hadoop processes large chunks of data of varying variety and speed
| Hence it becomes imperative...
Performance Testing
| Performance Testing performs the testing of:
| Job completion time
| Memory utilization
| Data throu...
Performance Testing Process
| Obtain the metrics of performance of Big Data systems i.e. response time, maximum
data proce...
Failover Testing
| Failover testing is done to verify seamless processing of data in case of failure of data
nodes
| It va...
Big Data Testing Process
17
Conclusion
| Many big firms including cloud enablers and various project management tools
platforms are using Big Data
| T...
www.QualiTestGroup.com
Thank You!
Upcoming SlideShare
Loading in …5
×

How to Test Big Data Systems | QualiTest Group

3,306 views

Published on

Big Data is perceived as a huge amount of data and information but it is a lot more than this. Big Data may be said to be a whole set of approach, tools and methods of processing large volumes of unstructured as well as structured data. The three parameters on which Big Data is defined i.e. Volume, Variety and Velocity describes how you have to process an enormous amount of data in different formats at different rates.

QualiTest is the world’s second largest pure play software testing and QA company. Testing and QA is all that we do! visit us at: www.QualiTestGroup.com

Published in: Software
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/cglVT ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

How to Test Big Data Systems | QualiTest Group

  1. 1. How to Test Big Data Systems
  2. 2. The definition of Big Data | Big Data is perceived as a huge amount of data and information | However, it is a lot more than this | Big Data may be said to be a whole set of approaches, tools and methods of processing large volumes of unstructured as well as structured data | Big Data is defined on three parameters | These describe how you have to process an enormous amount of data in different formats at different rates 2
  3. 3. The three parameters on which Big Data is defined 3
  4. 4. Testing Big Data can be quite a challenge for organizations | Traditional analysis techniques have certain limitations | Dealing with such large sets of data owes to its complexity | Especially challenging is Testing Big Data for organizations with very little knowledge with regard to what to test and how to test it | There are certain basic aspects of Big Data processing | On that basis further testing procedures can be determined 4
  5. 5. Aspects of Big Data Testing 5
  6. 6. Risk of failing | Failure in Big Data Testing could have negative consequences and It may result in: | Production of poor quality of data | Delays in testing | Increased cost of testing | Big Data Testing can be performed in two ways: functional and nonfunctional testing | A very strong test data and test environment management are required to ensure error-free processing of data 6
  7. 7. Functional Testing | Functional Testing is performed in three stages: | Pre-Hadoop Process Testing | MapReduce Process Validation | Extract-Transform-Load Process Validation and Report Testing 7
  8. 8. Pre-Hadoop Process Testing | HDFS stands for Hadoop Distributed File System | HDFS lets you store huge amount of data on a cloud of machines Pre-Hadoop Process Testing | When the data is extracted from various sources such as web logs, social media, RDBMS, etc., and uploaded into HDFS, an initial stage of testing is carried out 8
  9. 9. Initial stage of Testing | Verification of the data acquired from the original source to check if it is corrupted or not | Validation of data files if they were uploaded into correct HDFS location | Checking the file partition and then copying them to different data units | Determination of a complete set of data to be checked | Verification of synchronicity of the source data with that of the data uploaded into HDFS 9
  10. 10. MapReduce Process Validation | MapReduce Processing is a data processing concept used to compress the massive amount of data into practical aggregated compact data packets: | Testing of business logic first on a single node then on a set of nodes or multiple nodes | Validation of the MapReduce process to ensure the correct generation of the “key- value” pair | After the “reduce” operation, validation of aggregation and consolidation of data | Comparison of the output generated data with the input files to make sure the generated output file meets all the requirements 10
  11. 11. Extract-Transform-Load Process Validation and Report Testing | ETL Process Validation and Report Testing: ETL stands for Extraction, Transformation, and Load testing approach. This is the last stage of testing in the queue where data generated by the previous stage is first unloaded and then loaded into the downstream repository system i.e. Enterprise Data Warehouse (EDW) where reports are generated or a transactional system analysis is done for further processing. 11
  12. 12. Purposes of ETL Process Validation & Report Testing | To check the correct application of transformation rules | Inspection of data aggregation to ensure there is no distortion of data and it is loaded into the target system | To ensure there is no data corruption by comparing with the HDFS file system data | Validation of reports that include the required data and all indicators are displayed correctly 12
  13. 13. Non- Functional Testing | Hadoop processes large chunks of data of varying variety and speed | Hence it becomes imperative to perform architectural testing of the Big Data systems | To ensure success of the projects in question | This non-functional testing is performed in two ways: | 1) Performance Testing | 2) Failover Testing 13
  14. 14. Performance Testing | Performance Testing performs the testing of: | Job completion time | Memory utilization | Data throughput of big Data Systems | The main objective of performance testing is not restricted to only an acknowledgment of application performance | But to improve the performance of the Big Data system as whole too 14
  15. 15. Performance Testing Process | Obtain the metrics of performance of Big Data systems i.e. response time, maximum data processing capacity, speed of data consumption, etc. | Determine conditions which cause performance problems i.e. assessing performance limiting conditions | Verification of speed with which MapReduce processing (sorts, merges) is executed | Verification of storage of data at different nodes | Test JVM Parameters such as heap size, GC Collection Algorithms, etc. | Test the values for connection timeout, query timeout, etc. 15
  16. 16. Failover Testing | Failover testing is done to verify seamless processing of data in case of failure of data nodes | It validates the recovery process and the processing of data when switched to other data nodes | Two types of metrics are observed during this testing: | 1) Recovery Time Objective | 2) Recovery Point Objective 16
  17. 17. Big Data Testing Process 17
  18. 18. Conclusion | Many big firms including cloud enablers and various project management tools platforms are using Big Data | The main challenge faced by such organizations today is how to test Big Data and how to improve the performance and processing power of Big Data systems | The aforementioned Testing is performed to ensure all is working well - the data extracted and processed is undistorted and in sync with the original data | Big Data processing could be batch, real-time or interactive | Hence when dealing with such huge amount of data, Big Data testing becomes imperative as well as inevitable 18
  19. 19. www.QualiTestGroup.com Thank You!

×