Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache PIG introduction


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Apache PIG introduction

  1. 1. APACHE PIG Jackson Oliveira @cyber_jso
  2. 2. A High Level Analysis Platform
  3. 3. Which can be plugged on Hadoop
  4. 4. How it works?
  5. 5. How it works?
  6. 6. What is the point in using PIG?!
  7. 7. MR is not difficult in theory...
  8. 8. But the reality can be different...
  9. 9. We want it easy to understand Users = LOAD 'myfile.txt' ‘users’ USING PigStorage('t') AS (name, age); Filtered = FILTER Users BY age >= 18 AND age <= 25; Pages = LOAD ‘pages’ AS (user, url); Joined = JOIN Filtered BY name, Pages BY user; Grouped = GROUP Joined BY url; Summed = FOREACH Grouped generate GROUP, COUNT(Joined) AS clicks; Sorted = ORDER Summed BY clicks DESC;
  10. 10. Also easy to extend (UDFs)...
  11. 11. It takes care of the execution plan for you
  12. 12. When use apache pig?
  13. 13. If you want thing being done faster
  14. 14. An active community
  15. 15. You might need rethink complicated things