SlideShare a Scribd company logo
1 of 11
dachisgroup.com




Dachis Group
Las Vegas 2012




  Pig Unit Testing


     Clint Miller
     Pigout Hackday, Austin TX
     May 11, 2012
® 2011 Dachis Group.
dachisgroup.com




What is PigUnit?

  • Not really a *Unit framework.
  • Library that you can use within your JUnit tests that allows you to
                 • Run your Pig scripts from within your JUnit tests.
                 • Override variables in your Pig scripts so that they get values from your JUnit
                   tests rather than reading external sources such as HDFS.
                 • Inspect the values of your Pig script variables.
                 • Make your STORE statements into no-ops so that your Pig scripts run
                   without side effects.




® 2011 Dachis Group.
dachisgroup.com




Simple Pig Script

  minutes_and_goals = LOAD 'minutes_and_goals' USING BinStorage() AS (
              name: chararray,
              team: chararray,
              minutes: long,
              goals: long
            );

  top_goal_scorers = FILTER minutes_and_goals BY goals >= $MIN_GOALS;

  minutes_per_goal_unsorted = FOREACH top_goal_scorers
                 GENERATE name, minutes/goals AS minutes_per_goal;

  minutes_per_goal = ORDER minutes_per_goal_unsorted BY minutes_per_goal;

  STORE minutes_per_goal INTO 'minutes_per_goal' USING BinStorage();




® 2011 Dachis Group.
dachisgroup.com




Simple Test Program

     public void testSamplePigScript() throws Exception {
       String[] args = {
          "MIN_GOALS=20"
       };

         PigTest test = new PigTest("/Users/clintmiller/blah/sampleScript.pig", args);

         String[] input = {
            "BenzematReal Madridt2165t20",
            "RonaldotReal Madridt3264t45",
            "FalcaotAtletico Madridt2852t23",
            "MessitBarcelonat3177t50",
            "XavitBarcelonat2079t10",
            "HiguaintReal Madridt1641t22",
            "SancheztBarcelonat1678t12"
         };

         String[] expectedOutput = {
            "(Messi,63)",
            "(Ronaldo,72)",
            "(Higuain,74)",
            "(Benzema,108)",
            "(Falcao,124)"
         };

         test.assertOutput("minutes_and_goals", input, "minutes_per_goal", expectedOutput);
     }

® 2011 Dachis Group.
dachisgroup.com




More Complex Pig Script
(reads two input files)


  players = LOAD 'minutes_and_goals' USING BinStorage() AS (
          name: chararray,
          team: chararray,
          minutes: long,
          goals: long
        );

  teams = LOAD 'team_goals' USING BinStorage() AS (
        name: chararray,
        goals: long
      );

  player_and_team = JOIN players BY team, teams BY name;

  percent_of_team_goals_unsorted = FOREACH player_and_team
                    GENERATE players::name, teams::name,
                         (players::goals * 100) / teams::goals
                         AS percent_of_team_goals;

  percent_of_team_goals = ORDER percent_of_team_goals_unsorted
                BY percent_of_team_goals DESC, teams::name;

  STORE percent_of_team_goals INTO 'percent_of_team_goals' USING BinStorage();




® 2011 Dachis Group.
dachisgroup.com




Methods on PigTest

  Iterator<Tuple> getAlias(String alias);

  Iterator<Tuple> getAlias(); // Fetches value of last variable used in a STORE command

  void override(String alias, String query);

  void unoverride(String alias);

  void assertOutput(String[] expected);

  void assertOutput(String alias, String[] expected);

  void assertOutput(File expected);

  void assertOutput(String alias, File expected);

  void assertOutput(String aliasInput, String[] input, String alias, String[] expected);



                            There is no simple way to override the
                            values of multiple input variables!




® 2011 Dachis Group.
dachisgroup.com




Method override() Saves the Day!
   public class InputMocker {
       protected PigTest test;
       protected PigServer pigServer;
       protected Cluster cluster;
       protected List<String> overrideFiles;

         public InputMocker(PigTest test, PigServer pigServer, Cluster cluster) {
           this.test = test;
           this.pigServer = pigServer;
           this.cluster = cluster;
           this.overrideFiles = new ArrayList<String>();
         }

         public void mockInputAlias(String alias, String[] input) throws Exception {
           test.runScript();

             StringBuilder sb = new StringBuilder();
             Schema.stringifySchema(sb, pigServer.dumpSchema(alias), DataType.TUPLE);

             String destination = alias + "-pigunit-input-overridden.txt";
             overrideFiles.add(destination);

             cluster.copyFromLocalFile(input, destination, true);
             test.override(alias,
                      String.format("%s = LOAD '%s' AS %s;", alias, destination, sb.toString()));
         }

         public void cleanup() throws Exception {
           for (String overrideFile: overrideFiles) {
              cluster.delete(new Path(overrideFile));
           }
         }
     }


® 2011 Dachis Group.
dachisgroup.com




Allows You to Rewrite Pig Script
  players = LOAD 'minutes_and_goals' USING BinStorage() AS (
          name: chararray,
          team: chararray,
          minutes: long,
          goals: long
        );

  teams = LOAD 'team_goals' USING BinStorage() AS (
        name: chararray,
        goals: long
      );




                                                      Test input data written to temp files and Pig script rewritten
                                                      to read those files.


                                       players = LOAD ’players-pigunit-input-overridden.txt’ AS (
                                               name: chararray,
                                               team: chararray,
                                               minutes: long,
                                               goals: long
                                             );

                                       teams = LOAD ’teams-pigunit-input-overridden.txt’ AS (
                                             name: chararray,
                                             goals: long
                                           );


® 2011 Dachis Group.
dachisgroup.com




Test Program - Initialization
     public void testSamplePigScript2() throws Exception {
       PigServer pigServer = new PigServer(ExecType.LOCAL);
       Cluster cluster = new Cluster(pigServer.getPigContext());

        String[] args = new String[] {};

        PigTest test = new PigTest
          ("/Users/clintmiller/blah/sampleScript2.pig",
           args, pigServer, cluster);

        InputMocker mocker = new InputMocker(test, pigServer, cluster);




® 2011 Dachis Group.
dachisgroup.com




Test Program – Overriding Inputs
       String[] players = {
          "BenzematReal Madridt2165t20",
          "RonaldotReal Madridt3264t45",
          "FalcaotAtletico Madridt2852t23",
          "MessitBarcelonat3177t50",
          "XavitBarcelonat2079t10",
          "HiguaintReal Madridt1641t22",
          "SancheztBarcelonat1678t12"
       };

       String[] teams = {
          "Barcelonat112",
          "Real Madridt117",
          "Atletico Madridt52"
       };

       mocker.mockInputAlias("players", players);
       mocker.mockInputAlias("teams", teams);




® 2011 Dachis Group.
dachisgroup.com




Test Program – Testing Results
         String[] percentOfTeamGoals = {
            "(Falcao,Atletico Madrid,44)",
            "(Messi,Barcelona,44)",
            "(Ronaldo,Real Madrid,38)",
            "(Higuain,Real Madrid,18)",
            "(Benzema,Real Madrid,17)",
            "(Sanchez,Barcelona,10)",
            "(Xavi,Barcelona,8)"
         };

         test.assertOutput("percent_of_team_goals", percentOfTeamGoals);

         mocker.cleanup();
     }




® 2011 Dachis Group.

More Related Content

What's hot

PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)Nikita Popov
 
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB
 
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...Johannes Schildgen
 
Pig Hands On November
Pig Hands On NovemberPig Hands On November
Pig Hands On NovemberRyan Bosshart
 
Terraform for fun and profit
Terraform for fun and profitTerraform for fun and profit
Terraform for fun and profitBram Vogelaar
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
JJUG CCC 2011 Spring
JJUG CCC 2011 SpringJJUG CCC 2011 Spring
JJUG CCC 2011 SpringKiyotaka Oku
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04Krishna Sankar
 
Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Davide Rossi
 
Python Ireland Nov 2010 Talk: Unit Testing
Python Ireland Nov 2010 Talk: Unit TestingPython Ireland Nov 2010 Talk: Unit Testing
Python Ireland Nov 2010 Talk: Unit TestingPython Ireland
 
轻量级文本工具集
轻量级文本工具集轻量级文本工具集
轻量级文本工具集March Liu
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsLucas Jellema
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Qiangning Hong
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to PigChris Wilkes
 

What's hot (20)

PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)
 
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
 
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggreg...
 
Java and xml
Java and xmlJava and xml
Java and xml
 
R code
R codeR code
R code
 
Pig Hands On November
Pig Hands On NovemberPig Hands On November
Pig Hands On November
 
Terraform for fun and profit
Terraform for fun and profitTerraform for fun and profit
Terraform for fun and profit
 
GORM
GORMGORM
GORM
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
JJUG CCC 2011 Spring
JJUG CCC 2011 SpringJJUG CCC 2011 Spring
JJUG CCC 2011 Spring
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
 
Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Grails: a quick tutorial (1)
Grails: a quick tutorial (1)
 
Python Ireland Nov 2010 Talk: Unit Testing
Python Ireland Nov 2010 Talk: Unit TestingPython Ireland Nov 2010 Talk: Unit Testing
Python Ireland Nov 2010 Talk: Unit Testing
 
轻量级文本工具集
轻量级文本工具集轻量级文本工具集
轻量级文本工具集
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statements
 
Perl object ?
Perl object ?Perl object ?
Perl object ?
 
zinno
zinnozinno
zinno
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to Pig
 
Living with garbage
Living with garbageLiving with garbage
Living with garbage
 

Similar to PigUnit Testing Framework Allows Rewriting Pig Scripts for Testing

Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGMatthew McCullough
 
Python client api
Python client apiPython client api
Python client apidreampuf
 
Jakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheelJakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheeltcurdt
 
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."sjabs
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
Speed Things Up with Transients
Speed Things Up with TransientsSpeed Things Up with Transients
Speed Things Up with TransientsCliff Seal
 
Php unit the-mostunknownparts
Php unit the-mostunknownpartsPhp unit the-mostunknownparts
Php unit the-mostunknownpartsBastian Feder
 
Java best practices
Java best practicesJava best practices
Java best practicesRay Toal
 
생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트기룡 남
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+ConFoo
 
New in MongoDB 2.6
New in MongoDB 2.6New in MongoDB 2.6
New in MongoDB 2.6christkv
 
Charla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo WebCharla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo WebMikel Torres Ugarte
 
Apache Commons - Don\'t re-invent the wheel
Apache Commons - Don\'t re-invent the wheelApache Commons - Don\'t re-invent the wheel
Apache Commons - Don\'t re-invent the wheeltcurdt
 

Similar to PigUnit Testing Framework Allows Rewriting Pig Scripts for Testing (20)

Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
Python client api
Python client apiPython client api
Python client api
 
Jakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheelJakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheel
 
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
Anti patterns
Anti patternsAnti patterns
Anti patterns
 
Speed Things Up with Transients
Speed Things Up with TransientsSpeed Things Up with Transients
Speed Things Up with Transients
 
What is new in Java 8
What is new in Java 8What is new in Java 8
What is new in Java 8
 
Php unit the-mostunknownparts
Php unit the-mostunknownpartsPhp unit the-mostunknownparts
Php unit the-mostunknownparts
 
Java best practices
Java best practicesJava best practices
Java best practices
 
생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
 
New in MongoDB 2.6
New in MongoDB 2.6New in MongoDB 2.6
New in MongoDB 2.6
 
Jersey Guice AOP
Jersey Guice AOPJersey Guice AOP
Jersey Guice AOP
 
Charla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo WebCharla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo Web
 
Apache Commons - Don\'t re-invent the wheel
Apache Commons - Don\'t re-invent the wheelApache Commons - Don\'t re-invent the wheel
Apache Commons - Don\'t re-invent the wheel
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
A Test of Strength
A Test of StrengthA Test of Strength
A Test of Strength
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

PigUnit Testing Framework Allows Rewriting Pig Scripts for Testing

  • 1. dachisgroup.com Dachis Group Las Vegas 2012 Pig Unit Testing Clint Miller Pigout Hackday, Austin TX May 11, 2012 ® 2011 Dachis Group.
  • 2. dachisgroup.com What is PigUnit? • Not really a *Unit framework. • Library that you can use within your JUnit tests that allows you to • Run your Pig scripts from within your JUnit tests. • Override variables in your Pig scripts so that they get values from your JUnit tests rather than reading external sources such as HDFS. • Inspect the values of your Pig script variables. • Make your STORE statements into no-ops so that your Pig scripts run without side effects. ® 2011 Dachis Group.
  • 3. dachisgroup.com Simple Pig Script minutes_and_goals = LOAD 'minutes_and_goals' USING BinStorage() AS ( name: chararray, team: chararray, minutes: long, goals: long ); top_goal_scorers = FILTER minutes_and_goals BY goals >= $MIN_GOALS; minutes_per_goal_unsorted = FOREACH top_goal_scorers GENERATE name, minutes/goals AS minutes_per_goal; minutes_per_goal = ORDER minutes_per_goal_unsorted BY minutes_per_goal; STORE minutes_per_goal INTO 'minutes_per_goal' USING BinStorage(); ® 2011 Dachis Group.
  • 4. dachisgroup.com Simple Test Program public void testSamplePigScript() throws Exception { String[] args = { "MIN_GOALS=20" }; PigTest test = new PigTest("/Users/clintmiller/blah/sampleScript.pig", args); String[] input = { "BenzematReal Madridt2165t20", "RonaldotReal Madridt3264t45", "FalcaotAtletico Madridt2852t23", "MessitBarcelonat3177t50", "XavitBarcelonat2079t10", "HiguaintReal Madridt1641t22", "SancheztBarcelonat1678t12" }; String[] expectedOutput = { "(Messi,63)", "(Ronaldo,72)", "(Higuain,74)", "(Benzema,108)", "(Falcao,124)" }; test.assertOutput("minutes_and_goals", input, "minutes_per_goal", expectedOutput); } ® 2011 Dachis Group.
  • 5. dachisgroup.com More Complex Pig Script (reads two input files) players = LOAD 'minutes_and_goals' USING BinStorage() AS ( name: chararray, team: chararray, minutes: long, goals: long ); teams = LOAD 'team_goals' USING BinStorage() AS ( name: chararray, goals: long ); player_and_team = JOIN players BY team, teams BY name; percent_of_team_goals_unsorted = FOREACH player_and_team GENERATE players::name, teams::name, (players::goals * 100) / teams::goals AS percent_of_team_goals; percent_of_team_goals = ORDER percent_of_team_goals_unsorted BY percent_of_team_goals DESC, teams::name; STORE percent_of_team_goals INTO 'percent_of_team_goals' USING BinStorage(); ® 2011 Dachis Group.
  • 6. dachisgroup.com Methods on PigTest Iterator<Tuple> getAlias(String alias); Iterator<Tuple> getAlias(); // Fetches value of last variable used in a STORE command void override(String alias, String query); void unoverride(String alias); void assertOutput(String[] expected); void assertOutput(String alias, String[] expected); void assertOutput(File expected); void assertOutput(String alias, File expected); void assertOutput(String aliasInput, String[] input, String alias, String[] expected); There is no simple way to override the values of multiple input variables! ® 2011 Dachis Group.
  • 7. dachisgroup.com Method override() Saves the Day! public class InputMocker { protected PigTest test; protected PigServer pigServer; protected Cluster cluster; protected List<String> overrideFiles; public InputMocker(PigTest test, PigServer pigServer, Cluster cluster) { this.test = test; this.pigServer = pigServer; this.cluster = cluster; this.overrideFiles = new ArrayList<String>(); } public void mockInputAlias(String alias, String[] input) throws Exception { test.runScript(); StringBuilder sb = new StringBuilder(); Schema.stringifySchema(sb, pigServer.dumpSchema(alias), DataType.TUPLE); String destination = alias + "-pigunit-input-overridden.txt"; overrideFiles.add(destination); cluster.copyFromLocalFile(input, destination, true); test.override(alias, String.format("%s = LOAD '%s' AS %s;", alias, destination, sb.toString())); } public void cleanup() throws Exception { for (String overrideFile: overrideFiles) { cluster.delete(new Path(overrideFile)); } } } ® 2011 Dachis Group.
  • 8. dachisgroup.com Allows You to Rewrite Pig Script players = LOAD 'minutes_and_goals' USING BinStorage() AS ( name: chararray, team: chararray, minutes: long, goals: long ); teams = LOAD 'team_goals' USING BinStorage() AS ( name: chararray, goals: long ); Test input data written to temp files and Pig script rewritten to read those files. players = LOAD ’players-pigunit-input-overridden.txt’ AS ( name: chararray, team: chararray, minutes: long, goals: long ); teams = LOAD ’teams-pigunit-input-overridden.txt’ AS ( name: chararray, goals: long ); ® 2011 Dachis Group.
  • 9. dachisgroup.com Test Program - Initialization public void testSamplePigScript2() throws Exception { PigServer pigServer = new PigServer(ExecType.LOCAL); Cluster cluster = new Cluster(pigServer.getPigContext()); String[] args = new String[] {}; PigTest test = new PigTest ("/Users/clintmiller/blah/sampleScript2.pig", args, pigServer, cluster); InputMocker mocker = new InputMocker(test, pigServer, cluster); ® 2011 Dachis Group.
  • 10. dachisgroup.com Test Program – Overriding Inputs String[] players = { "BenzematReal Madridt2165t20", "RonaldotReal Madridt3264t45", "FalcaotAtletico Madridt2852t23", "MessitBarcelonat3177t50", "XavitBarcelonat2079t10", "HiguaintReal Madridt1641t22", "SancheztBarcelonat1678t12" }; String[] teams = { "Barcelonat112", "Real Madridt117", "Atletico Madridt52" }; mocker.mockInputAlias("players", players); mocker.mockInputAlias("teams", teams); ® 2011 Dachis Group.
  • 11. dachisgroup.com Test Program – Testing Results String[] percentOfTeamGoals = { "(Falcao,Atletico Madrid,44)", "(Messi,Barcelona,44)", "(Ronaldo,Real Madrid,38)", "(Higuain,Real Madrid,18)", "(Benzema,Real Madrid,17)", "(Sanchez,Barcelona,10)", "(Xavi,Barcelona,8)" }; test.assertOutput("percent_of_team_goals", percentOfTeamGoals); mocker.cleanup(); } ® 2011 Dachis Group.