The document discusses PigUnit, a library for unit testing Pig scripts. It allows running Pig scripts from JUnit tests, overriding script variables with test values, inspecting variable values, and making STORE statements no-ops. The document provides examples of simple Pig scripts, test programs using PigUnit, and more complex testing scenarios where PigUnit can rewrite scripts to load test input from temporary files.
2. dachisgroup.com
What is PigUnit?
• Not really a *Unit framework.
• Library that you can use within your JUnit tests that allows you to
• Run your Pig scripts from within your JUnit tests.
• Override variables in your Pig scripts so that they get values from your JUnit
tests rather than reading external sources such as HDFS.
• Inspect the values of your Pig script variables.
• Make your STORE statements into no-ops so that your Pig scripts run
without side effects.
® 2011 Dachis Group.
3. dachisgroup.com
Simple Pig Script
minutes_and_goals = LOAD 'minutes_and_goals' USING BinStorage() AS (
name: chararray,
team: chararray,
minutes: long,
goals: long
);
top_goal_scorers = FILTER minutes_and_goals BY goals >= $MIN_GOALS;
minutes_per_goal_unsorted = FOREACH top_goal_scorers
GENERATE name, minutes/goals AS minutes_per_goal;
minutes_per_goal = ORDER minutes_per_goal_unsorted BY minutes_per_goal;
STORE minutes_per_goal INTO 'minutes_per_goal' USING BinStorage();
® 2011 Dachis Group.
5. dachisgroup.com
More Complex Pig Script
(reads two input files)
players = LOAD 'minutes_and_goals' USING BinStorage() AS (
name: chararray,
team: chararray,
minutes: long,
goals: long
);
teams = LOAD 'team_goals' USING BinStorage() AS (
name: chararray,
goals: long
);
player_and_team = JOIN players BY team, teams BY name;
percent_of_team_goals_unsorted = FOREACH player_and_team
GENERATE players::name, teams::name,
(players::goals * 100) / teams::goals
AS percent_of_team_goals;
percent_of_team_goals = ORDER percent_of_team_goals_unsorted
BY percent_of_team_goals DESC, teams::name;
STORE percent_of_team_goals INTO 'percent_of_team_goals' USING BinStorage();
® 2011 Dachis Group.
6. dachisgroup.com
Methods on PigTest
Iterator<Tuple> getAlias(String alias);
Iterator<Tuple> getAlias(); // Fetches value of last variable used in a STORE command
void override(String alias, String query);
void unoverride(String alias);
void assertOutput(String[] expected);
void assertOutput(String alias, String[] expected);
void assertOutput(File expected);
void assertOutput(String alias, File expected);
void assertOutput(String aliasInput, String[] input, String alias, String[] expected);
There is no simple way to override the
values of multiple input variables!
® 2011 Dachis Group.
7. dachisgroup.com
Method override() Saves the Day!
public class InputMocker {
protected PigTest test;
protected PigServer pigServer;
protected Cluster cluster;
protected List<String> overrideFiles;
public InputMocker(PigTest test, PigServer pigServer, Cluster cluster) {
this.test = test;
this.pigServer = pigServer;
this.cluster = cluster;
this.overrideFiles = new ArrayList<String>();
}
public void mockInputAlias(String alias, String[] input) throws Exception {
test.runScript();
StringBuilder sb = new StringBuilder();
Schema.stringifySchema(sb, pigServer.dumpSchema(alias), DataType.TUPLE);
String destination = alias + "-pigunit-input-overridden.txt";
overrideFiles.add(destination);
cluster.copyFromLocalFile(input, destination, true);
test.override(alias,
String.format("%s = LOAD '%s' AS %s;", alias, destination, sb.toString()));
}
public void cleanup() throws Exception {
for (String overrideFile: overrideFiles) {
cluster.delete(new Path(overrideFile));
}
}
}
® 2011 Dachis Group.
8. dachisgroup.com
Allows You to Rewrite Pig Script
players = LOAD 'minutes_and_goals' USING BinStorage() AS (
name: chararray,
team: chararray,
minutes: long,
goals: long
);
teams = LOAD 'team_goals' USING BinStorage() AS (
name: chararray,
goals: long
);
Test input data written to temp files and Pig script rewritten
to read those files.
players = LOAD ’players-pigunit-input-overridden.txt’ AS (
name: chararray,
team: chararray,
minutes: long,
goals: long
);
teams = LOAD ’teams-pigunit-input-overridden.txt’ AS (
name: chararray,
goals: long
);
® 2011 Dachis Group.
9. dachisgroup.com
Test Program - Initialization
public void testSamplePigScript2() throws Exception {
PigServer pigServer = new PigServer(ExecType.LOCAL);
Cluster cluster = new Cluster(pigServer.getPigContext());
String[] args = new String[] {};
PigTest test = new PigTest
("/Users/clintmiller/blah/sampleScript2.pig",
args, pigServer, cluster);
InputMocker mocker = new InputMocker(test, pigServer, cluster);
® 2011 Dachis Group.