SlideShare a Scribd company logo
1 of 11
dachisgroup.com




Dachis Group
Las Vegas 2012




  Pig Unit Testing


     Clint Miller
     Pigout Hackday, Austin TX
     May 11, 2012
® 2011 Dachis Group.
dachisgroup.com




What is PigUnit?

  • Not really a *Unit framework.
  • Library that you can use within your JUnit tests that allows you to
                 • Run your Pig scripts from within your JUnit tests.
                 • Override variables in your Pig scripts so that they get values from your JUnit
                   tests rather than reading external sources such as HDFS.
                 • Inspect the values of your Pig script variables.
                 • Make your STORE statements into no-ops so that your Pig scripts run
                   without side effects.




® 2011 Dachis Group.
dachisgroup.com




Simple Pig Script

  minutes_and_goals = LOAD 'minutes_and_goals' USING BinStorage() AS (
              name: chararray,
              team: chararray,
              minutes: long,
              goals: long
            );

  top_goal_scorers = FILTER minutes_and_goals BY goals >= $MIN_GOALS;

  minutes_per_goal_unsorted = FOREACH top_goal_scorers
                 GENERATE name, minutes/goals AS minutes_per_goal;

  minutes_per_goal = ORDER minutes_per_goal_unsorted BY minutes_per_goal;

  STORE minutes_per_goal INTO 'minutes_per_goal' USING BinStorage();




® 2011 Dachis Group.
dachisgroup.com




Simple Test Program

     public void testSamplePigScript() throws Exception {
       String[] args = {
          "MIN_GOALS=20"
       };

         PigTest test = new PigTest("/Users/clintmiller/blah/sampleScript.pig", args);

         String[] input = {
            "BenzematReal Madridt2165t20",
            "RonaldotReal Madridt3264t45",
            "FalcaotAtletico Madridt2852t23",
            "MessitBarcelonat3177t50",
            "XavitBarcelonat2079t10",
            "HiguaintReal Madridt1641t22",
            "SancheztBarcelonat1678t12"
         };

         String[] expectedOutput = {
            "(Messi,63)",
            "(Ronaldo,72)",
            "(Higuain,74)",
            "(Benzema,108)",
            "(Falcao,124)"
         };

         test.assertOutput("minutes_and_goals", input, "minutes_per_goal", expectedOutput);
     }

® 2011 Dachis Group.
dachisgroup.com




More Complex Pig Script
(reads two input files)


  players = LOAD 'minutes_and_goals' USING BinStorage() AS (
          name: chararray,
          team: chararray,
          minutes: long,
          goals: long
        );

  teams = LOAD 'team_goals' USING BinStorage() AS (
        name: chararray,
        goals: long
      );

  player_and_team = JOIN players BY team, teams BY name;

  percent_of_team_goals_unsorted = FOREACH player_and_team
                    GENERATE players::name, teams::name,
                         (players::goals * 100) / teams::goals
                         AS percent_of_team_goals;

  percent_of_team_goals = ORDER percent_of_team_goals_unsorted
                BY percent_of_team_goals DESC, teams::name;

  STORE percent_of_team_goals INTO 'percent_of_team_goals' USING BinStorage();




® 2011 Dachis Group.
dachisgroup.com




Methods on PigTest

  Iterator<Tuple> getAlias(String alias);

  Iterator<Tuple> getAlias(); // Fetches value of last variable used in a STORE command

  void override(String alias, String query);

  void unoverride(String alias);

  void assertOutput(String[] expected);

  void assertOutput(String alias, String[] expected);

  void assertOutput(File expected);

  void assertOutput(String alias, File expected);

  void assertOutput(String aliasInput, String[] input, String alias, String[] expected);



                            There is no simple way to override the
                            values of multiple input variables!




® 2011 Dachis Group.
dachisgroup.com




Method override() Saves the Day!
   public class InputMocker {
       protected PigTest test;
       protected PigServer pigServer;
       protected Cluster cluster;
       protected List<String> overrideFiles;

         public InputMocker(PigTest test, PigServer pigServer, Cluster cluster) {
           this.test = test;
           this.pigServer = pigServer;
           this.cluster = cluster;
           this.overrideFiles = new ArrayList<String>();
         }

         public void mockInputAlias(String alias, String[] input) throws Exception {
           test.runScript();

             StringBuilder sb = new StringBuilder();
             Schema.stringifySchema(sb, pigServer.dumpSchema(alias), DataType.TUPLE);

             String destination = alias + "-pigunit-input-overridden.txt";
             overrideFiles.add(destination);

             cluster.copyFromLocalFile(input, destination, true);
             test.override(alias,
                      String.format("%s = LOAD '%s' AS %s;", alias, destination, sb.toString()));
         }

         public void cleanup() throws Exception {
           for (String overrideFile: overrideFiles) {
              cluster.delete(new Path(overrideFile));
           }
         }
     }


® 2011 Dachis Group.
dachisgroup.com




Allows You to Rewrite Pig Script
  players = LOAD 'minutes_and_goals' USING BinStorage() AS (
          name: chararray,
          team: chararray,
          minutes: long,
          goals: long
        );

  teams = LOAD 'team_goals' USING BinStorage() AS (
        name: chararray,
        goals: long
      );




                                                      Test input data written to temp files and Pig script rewritten
                                                      to read those files.


                                       players = LOAD ’players-pigunit-input-overridden.txt’ AS (
                                               name: chararray,
                                               team: chararray,
                                               minutes: long,
                                               goals: long
                                             );

                                       teams = LOAD ’teams-pigunit-input-overridden.txt’ AS (
                                             name: chararray,
                                             goals: long
                                           );


® 2011 Dachis Group.
dachisgroup.com




Test Program - Initialization
     public void testSamplePigScript2() throws Exception {
       PigServer pigServer = new PigServer(ExecType.LOCAL);
       Cluster cluster = new Cluster(pigServer.getPigContext());

        String[] args = new String[] {};

        PigTest test = new PigTest
          ("/Users/clintmiller/blah/sampleScript2.pig",
           args, pigServer, cluster);

        InputMocker mocker = new InputMocker(test, pigServer, cluster);




® 2011 Dachis Group.
dachisgroup.com




Test Program – Overriding Inputs
       String[] players = {
          "BenzematReal Madridt2165t20",
          "RonaldotReal Madridt3264t45",
          "FalcaotAtletico Madridt2852t23",
          "MessitBarcelonat3177t50",
          "XavitBarcelonat2079t10",
          "HiguaintReal Madridt1641t22",
          "SancheztBarcelonat1678t12"
       };

       String[] teams = {
          "Barcelonat112",
          "Real Madridt117",
          "Atletico Madridt52"
       };

       mocker.mockInputAlias("players", players);
       mocker.mockInputAlias("teams", teams);




® 2011 Dachis Group.
dachisgroup.com




Test Program – Testing Results
         String[] percentOfTeamGoals = {
            "(Falcao,Atletico Madrid,44)",
            "(Messi,Barcelona,44)",
            "(Ronaldo,Real Madrid,38)",
            "(Higuain,Real Madrid,18)",
            "(Benzema,Real Madrid,17)",
            "(Sanchez,Barcelona,10)",
            "(Xavi,Barcelona,8)"
         };

         test.assertOutput("percent_of_team_goals", percentOfTeamGoals);

         mocker.cleanup();
     }




® 2011 Dachis Group.

More Related Content

What's hot

PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)Nikita Popov
 
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB
 
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...Johannes Schildgen
 
Pig Hands On November
Pig Hands On NovemberPig Hands On November
Pig Hands On NovemberRyan Bosshart
 
Terraform for fun and profit
Terraform for fun and profitTerraform for fun and profit
Terraform for fun and profitBram Vogelaar
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
JJUG CCC 2011 Spring
JJUG CCC 2011 SpringJJUG CCC 2011 Spring
JJUG CCC 2011 SpringKiyotaka Oku
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04Krishna Sankar
 
Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Davide Rossi
 
Python Ireland Nov 2010 Talk: Unit Testing
Python Ireland Nov 2010 Talk: Unit TestingPython Ireland Nov 2010 Talk: Unit Testing
Python Ireland Nov 2010 Talk: Unit TestingPython Ireland
 
轻量级文本工具集
轻量级文本工具集轻量级文本工具集
轻量级文本工具集March Liu
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsLucas Jellema
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Qiangning Hong
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to PigChris Wilkes
 

What's hot (20)

PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)
 
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
 
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
 
Java and xml
Java and xmlJava and xml
Java and xml
 
R code
R codeR code
R code
 
Pig Hands On November
Pig Hands On NovemberPig Hands On November
Pig Hands On November
 
Terraform for fun and profit
Terraform for fun and profitTerraform for fun and profit
Terraform for fun and profit
 
GORM
GORMGORM
GORM
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
JJUG CCC 2011 Spring
JJUG CCC 2011 SpringJJUG CCC 2011 Spring
JJUG CCC 2011 Spring
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
 
Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Grails: a quick tutorial (1)
Grails: a quick tutorial (1)
 
Python Ireland Nov 2010 Talk: Unit Testing
Python Ireland Nov 2010 Talk: Unit TestingPython Ireland Nov 2010 Talk: Unit Testing
Python Ireland Nov 2010 Talk: Unit Testing
 
轻量级文本工具集
轻量级文本工具集轻量级文本工具集
轻量级文本工具集
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statements
 
Perl object ?
Perl object ?Perl object ?
Perl object ?
 
zinno
zinnozinno
zinno
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to Pig
 
Living with garbage
Living with garbageLiving with garbage
Living with garbage
 

Similar to Unit testing pig

Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGMatthew McCullough
 
Python client api
Python client apiPython client api
Python client apidreampuf
 
Jakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheelJakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheeltcurdt
 
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."sjabs
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
Speed Things Up with Transients
Speed Things Up with TransientsSpeed Things Up with Transients
Speed Things Up with TransientsCliff Seal
 
Php unit the-mostunknownparts
Php unit the-mostunknownpartsPhp unit the-mostunknownparts
Php unit the-mostunknownpartsBastian Feder
 
Java best practices
Java best practicesJava best practices
Java best practicesRay Toal
 
생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트기룡 남
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+ConFoo
 
New in MongoDB 2.6
New in MongoDB 2.6New in MongoDB 2.6
New in MongoDB 2.6christkv
 
Charla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo WebCharla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo WebMikel Torres Ugarte
 
Apache Commons - Don\'t re-invent the wheel
Apache Commons - Don\'t re-invent the wheelApache Commons - Don\'t re-invent the wheel
Apache Commons - Don\'t re-invent the wheeltcurdt
 

Similar to Unit testing pig (20)

Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
Python client api
Python client apiPython client api
Python client api
 
Jakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheelJakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheel
 
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
Anti patterns
Anti patternsAnti patterns
Anti patterns
 
Speed Things Up with Transients
Speed Things Up with TransientsSpeed Things Up with Transients
Speed Things Up with Transients
 
What is new in Java 8
What is new in Java 8What is new in Java 8
What is new in Java 8
 
Php unit the-mostunknownparts
Php unit the-mostunknownpartsPhp unit the-mostunknownparts
Php unit the-mostunknownparts
 
Java best practices
Java best practicesJava best practices
Java best practices
 
생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
 
New in MongoDB 2.6
New in MongoDB 2.6New in MongoDB 2.6
New in MongoDB 2.6
 
Jersey Guice AOP
Jersey Guice AOPJersey Guice AOP
Jersey Guice AOP
 
Charla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo WebCharla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo Web
 
Apache Commons - Don\'t re-invent the wheel
Apache Commons - Don\'t re-invent the wheelApache Commons - Don\'t re-invent the wheel
Apache Commons - Don\'t re-invent the wheel
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
A Test of Strength
A Test of StrengthA Test of Strength
A Test of Strength
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Unit testing pig

  • 1. dachisgroup.com Dachis Group Las Vegas 2012 Pig Unit Testing Clint Miller Pigout Hackday, Austin TX May 11, 2012 ® 2011 Dachis Group.
  • 2. dachisgroup.com What is PigUnit? • Not really a *Unit framework. • Library that you can use within your JUnit tests that allows you to • Run your Pig scripts from within your JUnit tests. • Override variables in your Pig scripts so that they get values from your JUnit tests rather than reading external sources such as HDFS. • Inspect the values of your Pig script variables. • Make your STORE statements into no-ops so that your Pig scripts run without side effects. ® 2011 Dachis Group.
  • 3. dachisgroup.com Simple Pig Script minutes_and_goals = LOAD 'minutes_and_goals' USING BinStorage() AS ( name: chararray, team: chararray, minutes: long, goals: long ); top_goal_scorers = FILTER minutes_and_goals BY goals >= $MIN_GOALS; minutes_per_goal_unsorted = FOREACH top_goal_scorers GENERATE name, minutes/goals AS minutes_per_goal; minutes_per_goal = ORDER minutes_per_goal_unsorted BY minutes_per_goal; STORE minutes_per_goal INTO 'minutes_per_goal' USING BinStorage(); ® 2011 Dachis Group.
  • 4. dachisgroup.com Simple Test Program public void testSamplePigScript() throws Exception { String[] args = { "MIN_GOALS=20" }; PigTest test = new PigTest("/Users/clintmiller/blah/sampleScript.pig", args); String[] input = { "BenzematReal Madridt2165t20", "RonaldotReal Madridt3264t45", "FalcaotAtletico Madridt2852t23", "MessitBarcelonat3177t50", "XavitBarcelonat2079t10", "HiguaintReal Madridt1641t22", "SancheztBarcelonat1678t12" }; String[] expectedOutput = { "(Messi,63)", "(Ronaldo,72)", "(Higuain,74)", "(Benzema,108)", "(Falcao,124)" }; test.assertOutput("minutes_and_goals", input, "minutes_per_goal", expectedOutput); } ® 2011 Dachis Group.
  • 5. dachisgroup.com More Complex Pig Script (reads two input files) players = LOAD 'minutes_and_goals' USING BinStorage() AS ( name: chararray, team: chararray, minutes: long, goals: long ); teams = LOAD 'team_goals' USING BinStorage() AS ( name: chararray, goals: long ); player_and_team = JOIN players BY team, teams BY name; percent_of_team_goals_unsorted = FOREACH player_and_team GENERATE players::name, teams::name, (players::goals * 100) / teams::goals AS percent_of_team_goals; percent_of_team_goals = ORDER percent_of_team_goals_unsorted BY percent_of_team_goals DESC, teams::name; STORE percent_of_team_goals INTO 'percent_of_team_goals' USING BinStorage(); ® 2011 Dachis Group.
  • 6. dachisgroup.com Methods on PigTest Iterator<Tuple> getAlias(String alias); Iterator<Tuple> getAlias(); // Fetches value of last variable used in a STORE command void override(String alias, String query); void unoverride(String alias); void assertOutput(String[] expected); void assertOutput(String alias, String[] expected); void assertOutput(File expected); void assertOutput(String alias, File expected); void assertOutput(String aliasInput, String[] input, String alias, String[] expected); There is no simple way to override the values of multiple input variables! ® 2011 Dachis Group.
  • 7. dachisgroup.com Method override() Saves the Day! public class InputMocker { protected PigTest test; protected PigServer pigServer; protected Cluster cluster; protected List<String> overrideFiles; public InputMocker(PigTest test, PigServer pigServer, Cluster cluster) { this.test = test; this.pigServer = pigServer; this.cluster = cluster; this.overrideFiles = new ArrayList<String>(); } public void mockInputAlias(String alias, String[] input) throws Exception { test.runScript(); StringBuilder sb = new StringBuilder(); Schema.stringifySchema(sb, pigServer.dumpSchema(alias), DataType.TUPLE); String destination = alias + "-pigunit-input-overridden.txt"; overrideFiles.add(destination); cluster.copyFromLocalFile(input, destination, true); test.override(alias, String.format("%s = LOAD '%s' AS %s;", alias, destination, sb.toString())); } public void cleanup() throws Exception { for (String overrideFile: overrideFiles) { cluster.delete(new Path(overrideFile)); } } } ® 2011 Dachis Group.
  • 8. dachisgroup.com Allows You to Rewrite Pig Script players = LOAD 'minutes_and_goals' USING BinStorage() AS ( name: chararray, team: chararray, minutes: long, goals: long ); teams = LOAD 'team_goals' USING BinStorage() AS ( name: chararray, goals: long ); Test input data written to temp files and Pig script rewritten to read those files. players = LOAD ’players-pigunit-input-overridden.txt’ AS ( name: chararray, team: chararray, minutes: long, goals: long ); teams = LOAD ’teams-pigunit-input-overridden.txt’ AS ( name: chararray, goals: long ); ® 2011 Dachis Group.
  • 9. dachisgroup.com Test Program - Initialization public void testSamplePigScript2() throws Exception { PigServer pigServer = new PigServer(ExecType.LOCAL); Cluster cluster = new Cluster(pigServer.getPigContext()); String[] args = new String[] {}; PigTest test = new PigTest ("/Users/clintmiller/blah/sampleScript2.pig", args, pigServer, cluster); InputMocker mocker = new InputMocker(test, pigServer, cluster); ® 2011 Dachis Group.
  • 10. dachisgroup.com Test Program – Overriding Inputs String[] players = { "BenzematReal Madridt2165t20", "RonaldotReal Madridt3264t45", "FalcaotAtletico Madridt2852t23", "MessitBarcelonat3177t50", "XavitBarcelonat2079t10", "HiguaintReal Madridt1641t22", "SancheztBarcelonat1678t12" }; String[] teams = { "Barcelonat112", "Real Madridt117", "Atletico Madridt52" }; mocker.mockInputAlias("players", players); mocker.mockInputAlias("teams", teams); ® 2011 Dachis Group.
  • 11. dachisgroup.com Test Program – Testing Results String[] percentOfTeamGoals = { "(Falcao,Atletico Madrid,44)", "(Messi,Barcelona,44)", "(Ronaldo,Real Madrid,38)", "(Higuain,Real Madrid,18)", "(Benzema,Real Madrid,17)", "(Sanchez,Barcelona,10)", "(Xavi,Barcelona,8)" }; test.assertOutput("percent_of_team_goals", percentOfTeamGoals); mocker.cleanup(); } ® 2011 Dachis Group.