SlideShare a Scribd company logo
1 of 12
Download to read offline
Pig Latin
Data Model with Load and
Store functions
There are two data types for the Pig Latin
Data Model
1. Single value or Atomic data types:
consists of single and atomic value, irrespective of
their data type. Atomic values can be integer,
float, long, chararray, bytearrary and a field is a
piece of single data value.
By default Pig takes any data value/type as
bytearray if it is not explicitly defined.
2. Complex data types or Non- Atomic data types:
consists of map, tuple, bag.
Rupak Roy
Data Model: Complex data types or
Non- Atomic data types:
Rupak Roy
Complex PIG data types in detail
1. Map contains Key-Value pairs
for example :
‘chararray-key1#Value1’,’ chararray-key2#Value2’
‘#’ is the separator of key and value
Key1 i.e. the key is chararray (character array)
Value1 i.e. the value and can be of any data type.
2. Tuple: is a collection of elements and each element can be of
any data type and because of collection of elements are
sequentially ordered it is possible to refer the field by position.
Example: (‘Ryan’, 22,’ St.JohnsSchool’, ’NewAvenue’)
We will perform this example in our next chapter.
Rupak Roy
Complex PIG data types in detail
3. Bag:
Is a collection of tuples in a non-sequentially manner or
we can say unordered manner.
A bag is represented by ‘ { } ’
Example:
{(‘Ryan’, 22,’ St.JohnsSchool’, ’NewAvenue’), (‘Bob’,
23,’ St.EdmundSchool’, ’Downtown’), (‘Alica’, 22, ’Don
bosco’, ’ParkAvenue’) }
Inner bag can also be a field in relation
Example – { Bob, 23 (9834514, bob@gmail.com}}
Rupak Roy
 Now let’s perform assigning the data types while
loading a data in PIG
#lets start the pig in local mode
grunt> pig –x local
grunt> data = LOAD ‘/pig/student.csv’ AS ( name:
chararray, age: bytearray, school: chararray,
location: chararray);
Grunt> describe data;
data: {name; chararray,age:bytearray, school:chararray,
location:chararrray}
Rupak Roy
Load and Store
In grunt shell use the following commands:
Grunt> data= LOAD ‘/pig/student.csv’;
Grunt> describe data ;
Output
Schema for data unknown i.e. the structure for data is unknown.
Since we havn’t told pig how we want to identify the data. Now let’s
describe the schema.
Grunt> data = LOAD ‘/pig/students.csv’ AS (name, age, school,
location);
Grunt> describe data;
Output
data:{ name: bytearrary, age: bytearray, school: bytearray ,
location:bytearrary }
Since we haven't assign the data types, PIG tries to assign the data
types for each data value by its best guess.
Rupak Roy
Pig loads data to the field which is defined, if less field is defined then it will
not load the next fields.
Suppose
Grunt> data = LOAD ‘/pig/students.csv’ AS (name, age, school);
Here we have assigned the field upto the ‘school’ variable, so the next
field i.e. ‘location‘ will not be loaded.
----------------------------------------------------------------
What if we define more fields then actually the file have?
Suppose
Grunt> data = LOAD ‘/pig/student.csv’ AS (name, age, school, location,
abc_extrafield);
Then the ‘abc_extrafield’ column will give NULLL VALUES.
----------------------------------------------------------
By default the pig loads the data as a tab delimiter file. If a tab delimiter is
not found then pig will consider all the fields as one field and will load the
entire record into the first field/column keeping the other columns as null.
Rupak Roy
So, if we want to load data that is not of delimited? Then we will use
Pig Storage function.
Pig Storage is a build-in function of Pig that is most commonly used
to load data by parsing the text data with an arbitrary delimiter.
Suppose student.csv is a comma separated file. Then
grunt> data= LOAD /pig/student.csv using PigStorage(‘,’) AS
( name: chararray, age: int, school: chararray, location: chararray);
Even we can also use for it TAB delimited files.
grunt> data= LOAD /pig/student.csv using PigStorage(‘t’) AS
( name: chararray, age: int, school: chararray, location: chararray);
Or simply type PigStorage(‘ ’)
grunt> data= LOAD /pig/student.csv using PigStorage(‘ ’) A S
( name: chararray, age: int, school: chararray, location: chararray);
Rupak Roy
Storing The Pig Output
To store the output physically into HDFS type command:
Grunt> STORE data INTO ‘/pig/output/data’;
By default it stores the output as tab delimited output file in HDFS
---------------------------------------------------
Another important function in Pig is DUMP.
This function is used to view the intermediate results without actually
storing the physical output in the HDFS.
DUMP is very useful in debugging.
To use DUMP simply type:
Grunt> DUMP data;
Rupak Roy
Load and Store in HDFS(cluster mode)
First load the pig in mapReduce mode
grunt> Pig –x mapreduce
Load:
grunt> data = LOAD
‘hdfs://localhost:9000/pigdata/student.csv USING
PigStorage(‘,’) as (name:chararray, age: bytearray,
school: chararray, location: chararray);
Store:
grunt> STORE data INTO
‘hdfs://localhost:90000/pigoutput/’ USING PigStorage
(‘,’);
Rupak Roy
Next
 We will learn PIG casting and reference
field by position.
Rupak Roy

More Related Content

What's hot

Database Management System
Database Management SystemDatabase Management System
Database Management SystemNANDINI SHARMA
 
Distributed system architecture
Distributed system architectureDistributed system architecture
Distributed system architectureYisal Khan
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memoryMazin Alwaaly
 
Introduction to the client server computing By Attaullah Hazrat
Introduction to the client server computing By Attaullah HazratIntroduction to the client server computing By Attaullah Hazrat
Introduction to the client server computing By Attaullah HazratAttaullah Hazrat
 
Lect 08 materialized view
Lect 08 materialized viewLect 08 materialized view
Lect 08 materialized viewBilal khan
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
Data Structures and Algorithm - Module 1.pptx
Data Structures and Algorithm - Module 1.pptxData Structures and Algorithm - Module 1.pptx
Data Structures and Algorithm - Module 1.pptxEllenGrace9
 
Leaky bucket algorithm
Leaky bucket algorithmLeaky bucket algorithm
Leaky bucket algorithmUmesh Gupta
 
The database applications
The database applicationsThe database applications
The database applicationsDolat Ram
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed systemishapadhy
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.Triloki Gupta
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rulesHarini Balamurugan
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSKumar Pritam
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed systemSunita Sahu
 

What's hot (20)

Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Distributed system architecture
Distributed system architectureDistributed system architecture
Distributed system architecture
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)
 
Introduction to the client server computing By Attaullah Hazrat
Introduction to the client server computing By Attaullah HazratIntroduction to the client server computing By Attaullah Hazrat
Introduction to the client server computing By Attaullah Hazrat
 
Lect 08 materialized view
Lect 08 materialized viewLect 08 materialized view
Lect 08 materialized view
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
Data Structures and Algorithm - Module 1.pptx
Data Structures and Algorithm - Module 1.pptxData Structures and Algorithm - Module 1.pptx
Data Structures and Algorithm - Module 1.pptx
 
Leaky bucket algorithm
Leaky bucket algorithmLeaky bucket algorithm
Leaky bucket algorithm
 
The database applications
The database applicationsThe database applications
The database applications
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 

Similar to Pig Latin, Data Model with Load and Store Functions

Java Data Types
Java Data TypesJava Data Types
Java Data TypesSpotle.ai
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slidesAnandMHadoop
 
Retrieving big data for the non developer
Retrieving big data for the non developerRetrieving big data for the non developer
Retrieving big data for the non developerGustaf Cavanaugh
 
Introduction to Data Structure
Introduction to Data StructureIntroduction to Data Structure
Introduction to Data Structurechouguleamruta24
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
A Statistical and Schema Independent Approach to Identify Equivalent Properti...
A Statistical and Schema Independent Approach to Identify Equivalent Properti...A Statistical and Schema Independent Approach to Identify Equivalent Properti...
A Statistical and Schema Independent Approach to Identify Equivalent Properti...Kalpa Gunaratna
 
Brief Summary Of C++
Brief Summary Of C++Brief Summary Of C++
Brief Summary Of C++Haris Lye
 
R Spatial Analysis using SP
R Spatial Analysis using SPR Spatial Analysis using SP
R Spatial Analysis using SPtjagger
 
Unit 6
Unit 6Unit 6
Unit 6siddr
 

Similar to Pig Latin, Data Model with Load and Store Functions (16)

Pig workshop
Pig workshopPig workshop
Pig workshop
 
Java Data Types
Java Data TypesJava Data Types
Java Data Types
 
Pig latin
Pig latinPig latin
Pig latin
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slides
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
C
CC
C
 
Programming in C
Programming in CProgramming in C
Programming in C
 
Retrieving big data for the non developer
Retrieving big data for the non developerRetrieving big data for the non developer
Retrieving big data for the non developer
 
Apache pig
Apache pigApache pig
Apache pig
 
Introduction to Data Structure
Introduction to Data StructureIntroduction to Data Structure
Introduction to Data Structure
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
A Statistical and Schema Independent Approach to Identify Equivalent Properti...
A Statistical and Schema Independent Approach to Identify Equivalent Properti...A Statistical and Schema Independent Approach to Identify Equivalent Properti...
A Statistical and Schema Independent Approach to Identify Equivalent Properti...
 
Brief Summary Of C++
Brief Summary Of C++Brief Summary Of C++
Brief Summary Of C++
 
Property Alignment on Linked Open Data
Property Alignment on Linked Open DataProperty Alignment on Linked Open Data
Property Alignment on Linked Open Data
 
R Spatial Analysis using SP
R Spatial Analysis using SPR Spatial Analysis using SP
R Spatial Analysis using SP
 
Unit 6
Unit 6Unit 6
Unit 6
 

More from Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPRupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPRupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLPRupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical StepsRupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular ExpressionsRupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations Rupak Roy
 

More from Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

Pig Latin, Data Model with Load and Store Functions

  • 1. Pig Latin Data Model with Load and Store functions
  • 2. There are two data types for the Pig Latin Data Model 1. Single value or Atomic data types: consists of single and atomic value, irrespective of their data type. Atomic values can be integer, float, long, chararray, bytearrary and a field is a piece of single data value. By default Pig takes any data value/type as bytearray if it is not explicitly defined. 2. Complex data types or Non- Atomic data types: consists of map, tuple, bag. Rupak Roy
  • 3. Data Model: Complex data types or Non- Atomic data types: Rupak Roy
  • 4. Complex PIG data types in detail 1. Map contains Key-Value pairs for example : ‘chararray-key1#Value1’,’ chararray-key2#Value2’ ‘#’ is the separator of key and value Key1 i.e. the key is chararray (character array) Value1 i.e. the value and can be of any data type. 2. Tuple: is a collection of elements and each element can be of any data type and because of collection of elements are sequentially ordered it is possible to refer the field by position. Example: (‘Ryan’, 22,’ St.JohnsSchool’, ’NewAvenue’) We will perform this example in our next chapter. Rupak Roy
  • 5. Complex PIG data types in detail 3. Bag: Is a collection of tuples in a non-sequentially manner or we can say unordered manner. A bag is represented by ‘ { } ’ Example: {(‘Ryan’, 22,’ St.JohnsSchool’, ’NewAvenue’), (‘Bob’, 23,’ St.EdmundSchool’, ’Downtown’), (‘Alica’, 22, ’Don bosco’, ’ParkAvenue’) } Inner bag can also be a field in relation Example – { Bob, 23 (9834514, bob@gmail.com}} Rupak Roy
  • 6.  Now let’s perform assigning the data types while loading a data in PIG #lets start the pig in local mode grunt> pig –x local grunt> data = LOAD ‘/pig/student.csv’ AS ( name: chararray, age: bytearray, school: chararray, location: chararray); Grunt> describe data; data: {name; chararray,age:bytearray, school:chararray, location:chararrray} Rupak Roy
  • 7. Load and Store In grunt shell use the following commands: Grunt> data= LOAD ‘/pig/student.csv’; Grunt> describe data ; Output Schema for data unknown i.e. the structure for data is unknown. Since we havn’t told pig how we want to identify the data. Now let’s describe the schema. Grunt> data = LOAD ‘/pig/students.csv’ AS (name, age, school, location); Grunt> describe data; Output data:{ name: bytearrary, age: bytearray, school: bytearray , location:bytearrary } Since we haven't assign the data types, PIG tries to assign the data types for each data value by its best guess. Rupak Roy
  • 8. Pig loads data to the field which is defined, if less field is defined then it will not load the next fields. Suppose Grunt> data = LOAD ‘/pig/students.csv’ AS (name, age, school); Here we have assigned the field upto the ‘school’ variable, so the next field i.e. ‘location‘ will not be loaded. ---------------------------------------------------------------- What if we define more fields then actually the file have? Suppose Grunt> data = LOAD ‘/pig/student.csv’ AS (name, age, school, location, abc_extrafield); Then the ‘abc_extrafield’ column will give NULLL VALUES. ---------------------------------------------------------- By default the pig loads the data as a tab delimiter file. If a tab delimiter is not found then pig will consider all the fields as one field and will load the entire record into the first field/column keeping the other columns as null. Rupak Roy
  • 9. So, if we want to load data that is not of delimited? Then we will use Pig Storage function. Pig Storage is a build-in function of Pig that is most commonly used to load data by parsing the text data with an arbitrary delimiter. Suppose student.csv is a comma separated file. Then grunt> data= LOAD /pig/student.csv using PigStorage(‘,’) AS ( name: chararray, age: int, school: chararray, location: chararray); Even we can also use for it TAB delimited files. grunt> data= LOAD /pig/student.csv using PigStorage(‘t’) AS ( name: chararray, age: int, school: chararray, location: chararray); Or simply type PigStorage(‘ ’) grunt> data= LOAD /pig/student.csv using PigStorage(‘ ’) A S ( name: chararray, age: int, school: chararray, location: chararray); Rupak Roy
  • 10. Storing The Pig Output To store the output physically into HDFS type command: Grunt> STORE data INTO ‘/pig/output/data’; By default it stores the output as tab delimited output file in HDFS --------------------------------------------------- Another important function in Pig is DUMP. This function is used to view the intermediate results without actually storing the physical output in the HDFS. DUMP is very useful in debugging. To use DUMP simply type: Grunt> DUMP data; Rupak Roy
  • 11. Load and Store in HDFS(cluster mode) First load the pig in mapReduce mode grunt> Pig –x mapreduce Load: grunt> data = LOAD ‘hdfs://localhost:9000/pigdata/student.csv USING PigStorage(‘,’) as (name:chararray, age: bytearray, school: chararray, location: chararray); Store: grunt> STORE data INTO ‘hdfs://localhost:90000/pigoutput/’ USING PigStorage (‘,’); Rupak Roy
  • 12. Next  We will learn PIG casting and reference field by position. Rupak Roy