Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data Analytics using Pig

Scope of PPT – BIG Data Analytics via PIG
ᗍ Introduction to Big Data and Hadoop
ᗍ Introduction to Pig
ᗍ Hadoop Pig Architecture
ᗍ BIG Data Analytics via Pig
ᗍ BIG Data & Hadoop Job Trends
ᗍ BIG Data & Hadoop Course Syllabus
Get Started with BIG Data & Hadoop

Big Data and its Challenges

Big Data and its Challenges
Big data is the term for a collection of data sets so
large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications
Systems / Enterprises generate huge amount of
data from Terabytes to and even Petabytes of
information
It’s very difficult to manage such huge data……

Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop

Hadoop can be used for easy processing of such huge Data…..
We will answer how?
Before that let’s understand what is Hadoop?

Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop
Characteristics
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop

Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks (MPI,
GIRAPH)
YARN
Cluster Resource Management
Hadoop Ecosystem

© 2015 Blue Camphor Technologies (P) Ltd. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Need for Pig
Java is not a preferred language
for many data analysts
200 Java LOC ~ 10 Pig LOC Many built-in operations are
available for common data
operations like join,
grouping, filtering etc.

Where to use Pig?
Pig is a Data Flow language, thus it is most suitable for:
ᗍ Quickly changing data processing requirements
ᗍ Processing data from multiple channels
ᗍ Quick hypothesis testing
ᗍ Time sensitive data refreshes
ᗍ Data profiling using sampling

What is Pig?
ᗍ It is an open source data flow language
ᗍ Pig Latin is used to express the queries and data manipulation operations in simple scripts
ᗍ Pig converts the scripts into a sequence of underlying Map Reduce jobs

Let’s internalize Pig
Let’s find out people who “overall” visit “highly ranked” pages
User URL Time
John www.cbn.com 7:00
John www.trap.com 7:05
John www.myblog.com 9:00
John www.flickr.com 9:05
Linda cnn.com/index.htm 11:00
Visits
Page URL Page Rank
www.cbn.com 0.9
www.flickr.com 0.9
www.myblog.com 0.6
www.trap.com 0.3
Pages

Internalizing Pig
Join
url = url
Load
Visits (user, url, time)
Load
Pages (url, pagerank)
Group by User
Compute Average
Pagerank
Group by User

Pig in Industry
Since Pig is a data flow language, it naturally suits for:
ᗍ Data factory operations
ᗍ Typically data is brought from multiple servers to HDFS
ᗍ Pig is used for cleaning the data and preprocessing it
ᗍ It helps data analysts and researchers for quickly prototyping their theories
ᗍ Since Pig is extensible, it becomes way easier for data analysts to spawn their scripting
language programs (like Ruby, Python programs) effectively against large data sets

Ways to Handle Pig
ᗍ Grunt Mode:
• It’s interactive mode of Pig
• Very useful for testing syntax checking and ad-hoc data
exploration
ᗍ Script Mode:
• Runs set of instructions from a file
• Similar to a SQL script file
ᗍ Embedded Mode:
• Executes Pig programs from a Java program
• Suitable to create Pig Scripts on the fly
Script
Grunt
Embedded

Modes of Pig
All of the different Pig invocations can run in the following modes:
Local
ᗍ In this mode, entire Pig job runs as a single JVM process
ᗍ Picks and stores data from local Linux path
Map Reduce
ᗍ In this mode, Pig job runs as a series of map reduce jobs
ᗍ Input and output paths are assumed as HDFS paths

Pig Components
Pig Data Flows
Pig Latin is used to
express data flows
Execution
Environments
Distributed execution
on a Hadoop Cluster
Local execution in a
single JVM
1.
2.

Pig is just a wrapper on top of Map Reduce layer
It parses, optimizes and converts the Pig script to a series of Map Reduce jobs
Pig A series of MapReduce Jobs
Turns the transformations into…
Pig Programs Execution

Job Trends – Hadoop

Why SkillSpeed?
Course
Curriculum
from Industry
Experts
Instructor Led
Live Virtual
Sessions
Lifetime access
to Course
Content via
LMS
100% Placement
Assistance
24x7 Support

Course Topics
Module 1
Introduction to Big
Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and
Data Loading
Module 3
Introduction to Map
Reduce
Module 4
Advanced Map Reduce
Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and
Introduction to Hive
Module 7
Advanced Hive
Concepts
Module 8
Extending Hive and
HBase Introduction
Module 9
Advanced HBase and
Oozie Introduction
Module 10
Project Set-up
Discussion

Corporate Partners

Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact Us

Image References
Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots
http://pixshark.com/big-data-comic.htm
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010

Introduction to Pig | Pig Architecture | Pig Fundamentals

Introduction to Pig | Pig Architecture | Pig Fundamentals

More Related Content

What's hot

Viewers also liked

Similar to Introduction to Pig | Pig Architecture | Pig Fundamentals

More from Skillspeed

Recently uploaded

Introduction to Pig | Pig Architecture | Pig Fundamentals

Editor's Notes