Pig, Making Hadoop Easy

Engineer, Hacker, Author
Aug. 17, 2010
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
1 of 16

More Related Content

What's hot

Another Intro To HadoopAnother Intro To Hadoop
Another Intro To HadoopAdeel Ahmad
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
Intro to HadoopIntro to Hadoop
Intro to Hadoopjeffturner
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
Map ReduceMap Reduce
Map ReduceRahul Agarwal

Viewers also liked

Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
Big data and HadoopBig data and Hadoop
Big data and HadoopRahul Agarwal

Similar to Pig, Making Hadoop Easy

Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateYahoo Developer Network
03 pig intro03 pig intro
03 pig introSubhas Kumar Ghosh
Apache PIG introductionApache PIG introduction
Apache PIG introductionJackson dos Santos Olveira
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigJason Shao
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand

More from Nick Dimiduk

Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixNick Dimiduk
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
HBase Data TypesHBase Data Types
HBase Data TypesNick Dimiduk

Pig, Making Hadoop Easy

Editor's Notes

  1. How many have used Pig? How many have looked at it and have a basic understanding of it?
  2. Demo script:Show group query first, talk about: load and schema (none, declared, from data) data types data sources need not be from HDFS or even from files parallel clause, how parallelism is determined on maps how grouping works in Pig LatinSo far what I’ve shown you is a simple join/group query. Now let’s look at something less straight forward in SQLOften people want to group data a number of different ways. Look at multiquery script: Note how there’s a branch in the logic nowOften want to operate on the result of each record in a previous statement. Look at top5 query Note nested foreach allows you to operate on each record coming out of group by Since result of group by is a bag in each record, can apply operators to that bag Currently support order, distinct, filter, limit Use of flatten at the end Use of positional parametersThere will always be logic you need to write that you can’t get from Pig Latin. This is where rich support of UDFs come in. Look at session query Note registering UDF UDF now called like any other Pig builtin function (in fact Pig builtins implemented as UDFs)Look at SessionAnalysis.java Class name is UDF name Input to UDF is always a Tuple, avoids need to declare expected input, means UDF has to check what it gets Talk about how projection of bags works Talk about how EvalFunc is templatized on return typeAlso easy to write load and store functions to fit your data needs