Testing “Big Data” can mean big time investment; several hours often spent just realize you made a simple typo. You fix the typo and then wait another couple hours for your script to hopefully this time run to completion. Even if the Big Data script or program ran to completion are you sure your data analysis is correct? Getting programs to run to completion and to assure functional accuracy per the requirements are some of the biggest hidden problems in big data today.
During this overview presentation we will first introduce unit and functional testing techniques and high level concepts to consider in the Hadoop Ecosystem. The second half of the presentation we will explore real testing examples using tools such as PigUnit, JUnit for UDF testing, BeeTest and Hive limited test data set testing.