• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development
 

Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development

on

  • 831 views

Around the currently available large piles of Big Data, there's happening quite a mixed gathering: Business Engineers define which insightswould be precious, Analysts build models, Hadoop programmers ...

Around the currently available large piles of Big Data, there's happening quite a mixed gathering: Business Engineers define which insightswould be precious, Analysts build models, Hadoop programmers tame the flood of data, and Operations people setup machines and networks. It's exactly the interplay of all participants which is central to project success. This setup together with the distributed nature of processing poses new challenges to well-established models of assuring software artifact quality: How can non-programmers define acceptance criteria? How can functionalities be tested which depend on cluster execution, orchestration of, e.g., different hadoop jobs without delaying the development process? Which data selection is suited best for simulating the live environment? How can intermediate results in arbitrary serialization formats be inspected?

In this talk, experiences and best practices from approaching these problems in a large-scale log data analysis project will be presented. At 1&1, our team develops hadoop applications which process roughly 1 billion log events (~1 TB) per day. We will give an overview of the hard- and software setup of our quality assurance environment, which includes FitNesse as a wiki-style acceptance testing framework.Starting from a comparison with existing test frameworks like MRUnit, we will explain how we automate the parameterized deployment of our applications, choose test data sampling strategies, perform workflow management and orchestration of jobs / applications, and use Pig for inspection of intermediate results and definition of final acceptance criteria. Our conclusion is that test-driven development in the field of Big Data requires adaption of existing paradigms, but is crucial for maintaining high quality standards for the resulting applications.

Statistics

Views

Total Views
831
Views on SlideShare
728
Embed Views
103

Actions

Likes
2
Downloads
0
Comments
0

11 Embeds 103

http://dmitrykan.blogspot.fi 44
http://dmitrykan.blogspot.com 20
http://dmitrykan.blogspot.ru 11
https://twitter.com 8
http://www.blogger.com 6
http://dmitrykan.blogspot.de 4
http://dmitrykan.blogspot.nl 3
http://dmitrykan.blogspot.com.es 3
http://dmitrykan.blogspot.ca 2
http://news.google.com 1
http://dmitrykan.blogspot.in 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment