Enhance analysis with detailed examples of Relational Operators - II includes Foreash, Filter, Join, Co-Group, Union and much more.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Get to know the implementation of apache Pig relational operators like order, limit, distinct, groupby.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Conduct ways to impute missing values for categorical, factor, and continuous variables. Let me know if anything is required ping me at google #bobrupakroy
Passing Parameters using File and Command LineRupak Roy
Explore well versed other functions, flatten operator and other available options to pass parameters
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Impute missing values for categorical and continuous variables in ways using R Studio and R programming. If you wish to try the same using python check out my other articles or ping me @ google #bobrupakroy
Get to know the implementation of apache Pig relational operators like order, limit, distinct, groupby.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Conduct ways to impute missing values for categorical, factor, and continuous variables. Let me know if anything is required ping me at google #bobrupakroy
Passing Parameters using File and Command LineRupak Roy
Explore well versed other functions, flatten operator and other available options to pass parameters
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Impute missing values for categorical and continuous variables in ways using R Studio and R programming. If you wish to try the same using python check out my other articles or ping me @ google #bobrupakroy
METHODS DESCRIPTION
copy() They copy() method returns a shallow copy of the dictionary.
clear() The clear() method removes all items from the dictionary.
pop() Removes and returns an element from a dictionary having the given key.
popitem() Removes the arbitrary key-value pair from the dictionary and returns it as tuple.
get() It is a conventional method to access a value for a key.
dictionary_name.values() returns a list of all the values available in a given dictionary.
str() Produces a printable string representation of a dictionary.
update() Adds dictionary dict2’s key-values pairs to dict
setdefault() Set dict[key]=default if key is not already in dict
keys() Returns list of dictionary dict’s keys
items() Returns a list of dict’s (key, value) tuple pairs
has_key() Returns true if key in dictionary dict, false otherwise
fromkeys() Create a new dictionary with keys from seq and values set to value.
type() Returns the type of the passed variable.
cmp() Compares elements of both dict.
This presentation educated you about R - Factors with example syntax and demo program of Factors in Data Frame, Changing the Order of Levels and Generating Factor Levels.
For more topics stay tuned with Learnbay.
The slide shows a full gist of reading different types of data in R thanks to coursera it was much comprehensive and i made some additional changes too.
It covers- Introduction to R language, Creating, Exploring data with Various Data Structures e.g. Vector, Array, Matrices, and Factors. Using Methods with examples.
RELATIONAL DATABASES & Database design
CIS276
EmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27
Employee
Table Name
Field Names
Records (rows or tuples)
Fields (columns or attributes)
Tables
StateAbbrevStateNameEnterUnionOrderStateBirdStatePopulationCTConnecticut5Robin3,590,347MIMichigan26Robin9,883,360SDSouth Dakota40Pheasant833,354
Primary Key
Alternate keys
Keys
State
StateAbbrevStateNameEnterUnionOrderStateBirdStatePopulationCTConnecticut5Robin3,590,347MIMichigan26Robin9,883,360SDSouth Dakota40Pheasant833,354StateAbbrevCityNameCityPopulationCTHartford124,062CTMadison18,803CTPortland9,551MILansing119,128SDMadison6,482SDPierre13,899
Primary key (State table)
Keys
Composite primary key (City table)
Foreign Key
State
City
Relationships- One to ManyEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27DeptNumDeptNameDeptHead24Finance811227Marketing217331Technology4519
Primary key for the one to many relationship
Primary Key
Foreign key for the one to many relationship
Employee
Department
1:M or 1:N
Relationships- One to OneEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27EmployeeNumUserNamePassword2173bhennessey********4519lnoordsy********8005Pamidon********
Employee
Credential
Primary key for the one to one relationship
Foreign key for the one to one relationship
1:1
Relationships- Many to ManyEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27PositIDPositDescPayGrade1Director452Manager403Analyst30EmployeeNumPositIDStartDateEndDate2173212/14/20114519104/23/20134519311/11/200704/22/20138005306/05/201208/25/20138005207/02/201006/04/2012
Employee
Position
Employment
Primary Key (Employee table)
Primary Key (Position table)
Composite primary key of join table
Foreign keys related to the Employee and Position tables
M:N
Integrity Constraints
Entity integrity constraint
Primary key cannot be null
Referential integrity
Each non-null foreign key value must match a primary key value in the primary table
Domain integrity constraint
A domain is a set of values from which one or more fields draw their actual values
A rule you specify for a field (text size, validation rule, etc.)
Dependencies and DeterminantsEmployeeNumPositIDLastNamePositDescStartDateHealthPlanPlanDesc21732HennesseyManager12/14/2011BManaged HMO45191NoordsyDirector04/23/2013AManaged PPO45193NoordsyAnalyst11/11/2007AManaged PPO80053AmidonAnalyst06/05/2012CHealth Savings80054AmidonClerk07/02/2010CHealth Savings
StartDate
EmployeeNum
PositID
HealthPlan
LastName
PlanDesc
PositDesc
Composite Key
Transitive Dependancy
AnomaliesEmployeeNumPositIDLastNamePositDescStartDateHealthPlanPlanDesc21732HennesseyManager12/14/2011BManaged HMO45191NoordsyDirector04/23/2013AManaged PPO45193NoordsyAnalyst11/11/2007AManaged PPO80053AmidonAnalyst06/05/2012CHealth Savings80054AmidonClerk07/02/2010CHealth Savings
Composite Key
Insertion anomal ...
METHODS DESCRIPTION
copy() They copy() method returns a shallow copy of the dictionary.
clear() The clear() method removes all items from the dictionary.
pop() Removes and returns an element from a dictionary having the given key.
popitem() Removes the arbitrary key-value pair from the dictionary and returns it as tuple.
get() It is a conventional method to access a value for a key.
dictionary_name.values() returns a list of all the values available in a given dictionary.
str() Produces a printable string representation of a dictionary.
update() Adds dictionary dict2’s key-values pairs to dict
setdefault() Set dict[key]=default if key is not already in dict
keys() Returns list of dictionary dict’s keys
items() Returns a list of dict’s (key, value) tuple pairs
has_key() Returns true if key in dictionary dict, false otherwise
fromkeys() Create a new dictionary with keys from seq and values set to value.
type() Returns the type of the passed variable.
cmp() Compares elements of both dict.
This presentation educated you about R - Factors with example syntax and demo program of Factors in Data Frame, Changing the Order of Levels and Generating Factor Levels.
For more topics stay tuned with Learnbay.
The slide shows a full gist of reading different types of data in R thanks to coursera it was much comprehensive and i made some additional changes too.
It covers- Introduction to R language, Creating, Exploring data with Various Data Structures e.g. Vector, Array, Matrices, and Factors. Using Methods with examples.
RELATIONAL DATABASES & Database design
CIS276
EmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27
Employee
Table Name
Field Names
Records (rows or tuples)
Fields (columns or attributes)
Tables
StateAbbrevStateNameEnterUnionOrderStateBirdStatePopulationCTConnecticut5Robin3,590,347MIMichigan26Robin9,883,360SDSouth Dakota40Pheasant833,354
Primary Key
Alternate keys
Keys
State
StateAbbrevStateNameEnterUnionOrderStateBirdStatePopulationCTConnecticut5Robin3,590,347MIMichigan26Robin9,883,360SDSouth Dakota40Pheasant833,354StateAbbrevCityNameCityPopulationCTHartford124,062CTMadison18,803CTPortland9,551MILansing119,128SDMadison6,482SDPierre13,899
Primary key (State table)
Keys
Composite primary key (City table)
Foreign Key
State
City
Relationships- One to ManyEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27DeptNumDeptNameDeptHead24Finance811227Marketing217331Technology4519
Primary key for the one to many relationship
Primary Key
Foreign key for the one to many relationship
Employee
Department
1:M or 1:N
Relationships- One to OneEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27EmployeeNumUserNamePassword2173bhennessey********4519lnoordsy********8005Pamidon********
Employee
Credential
Primary key for the one to one relationship
Foreign key for the one to one relationship
1:1
Relationships- Many to ManyEmployeeNumFirstNameLastNameDeptNum2173BarbaraHennessey274519LeeNoordsy318005PatAmidon27PositIDPositDescPayGrade1Director452Manager403Analyst30EmployeeNumPositIDStartDateEndDate2173212/14/20114519104/23/20134519311/11/200704/22/20138005306/05/201208/25/20138005207/02/201006/04/2012
Employee
Position
Employment
Primary Key (Employee table)
Primary Key (Position table)
Composite primary key of join table
Foreign keys related to the Employee and Position tables
M:N
Integrity Constraints
Entity integrity constraint
Primary key cannot be null
Referential integrity
Each non-null foreign key value must match a primary key value in the primary table
Domain integrity constraint
A domain is a set of values from which one or more fields draw their actual values
A rule you specify for a field (text size, validation rule, etc.)
Dependencies and DeterminantsEmployeeNumPositIDLastNamePositDescStartDateHealthPlanPlanDesc21732HennesseyManager12/14/2011BManaged HMO45191NoordsyDirector04/23/2013AManaged PPO45193NoordsyAnalyst11/11/2007AManaged PPO80053AmidonAnalyst06/05/2012CHealth Savings80054AmidonClerk07/02/2010CHealth Savings
StartDate
EmployeeNum
PositID
HealthPlan
LastName
PlanDesc
PositDesc
Composite Key
Transitive Dependancy
AnomaliesEmployeeNumPositIDLastNamePositDescStartDateHealthPlanPlanDesc21732HennesseyManager12/14/2011BManaged HMO45191NoordsyDirector04/23/2013AManaged PPO45193NoordsyAnalyst11/11/2007AManaged PPO80053AmidonAnalyst06/05/2012CHealth Savings80054AmidonClerk07/02/2010CHealth Savings
Composite Key
Insertion anomal ...
DBIx-DataModel is an object-relational mapping framework for Perl5. Schema declarations are inspired from UML modelling. The API provides efficient interaction with the DBI layer, detailed control on statement execution steps, flexible and powerful treatment of database joins. More on http://search.cpan.org/dist/DBIx-DataModel.
Talk presented at YAPC::EU::2011 Riga (updated from a previous version presented at FPW2010).
IMG1.jpg
IMG2.jpg
IMG3.jpg
2016 6 19 1:56 Page 1
1
use "/Users/Tina/Downloads/Quebec.dta
gen YEARSED=7 if HDGREE==1
replace YEARSED=12 if HDGREE==2
replace YEARSED=13 if HDGREE>=3 & HDGREE<=5
replace YEARSED=14 if HDGREE>=6 & HDGREE<=8
replace YEARSED=16 if HDGREE==9
replace YEARSED=16 if HDGREE>=9 & HDGREE<=10
replace YEARSED=20 if HDGREE==11
replace YEARSED=17 if HDGREE==12
replace YEARSED=23 if HDGREE==13
tab YEARSED HDGREE
cap gen female = SEX==1
tabstat TOTINC if female ==1, by(HDGREE) c(s) s(mean min max)
tabstat TOTINC if female ==0, by(HDGREE) c(s) s(mean min max)
drop if YRSED>20
tab YRSED HDGREE
generate age = 7 if AGEGRP==6
replace age = 12 if AGEGRP==7
replace age = 17 if AGEGRP==8
replace age = 22 if AGEGRP==9
replace age = 27 if AGEGRP==10
replace age = 32 if AGEGRP==11
replace age = 37 if AGEGRP==12
replace age = 42 if AGEGRP==13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
2016 6 19 1:56 Page 2
1
replace age = 47 if AGEGRP==14
replace age = 52 if AGEGRP==15
replace age = 57 if AGEGRP==16
replace age = 62 if AGEGRP==17
replace age = 67 if AGEGRP==18
replace age = 72 if AGEGRP==19
replace age = 77 if AGEGRP==20
replace age = 82 if AGEGRP==21
tabstat TOTINC if female==1 & age==27, by(YRSED) c(s) s(mean sd min
max)
tabstat TOTINC if female==0 & age==27, by(YRSED) c(s) s(mean sd min
max)
preserve
collapse (mean) YRSED, by(AGEGRP SEX)
tabstat YRSED if SEX==1, by(AGEGRP)
twoway (line YRSED AGEGRP if SEX==1, clcolor(red)) (line YRSED
AGEGRP if SEX==2, clcolor(blue) ytitle("Average years of education")
xtitle("Age group"))
restore
ssc instal catplot
catplot AGEGRP, percent(YRSED)
gen YRSED9=YRSED==9
catplot AGEGRP, percent(YRSED9)
gen YRSED9=YRSED==9
gen expfem=age-YRSED-6 if SEX==1
gen expfem_sq=expfem^2
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
2016 6 19 1:56 Page 3
1
gen expmale=age-YRSED-6 if SEX==2
gen expmale_sq=expmale^2
93
94
95
96
97
98
99
100
101
EC328_2014_lnwagereg_STATA.pdf
2016 6 19 6:20 Page 1
EC328_2014_lnwagereg_STATA-3.do
************INTRODUCTION TO STATA - EC328 - SPRING
2014***************************
* This file created by Justin Smith, and adapted by Christine
Neill for learning
* basic STATA commands, September 2013 through May2015.
Contact: [email protected];
* To run this program, download SLID 2010 data from ODESI,
unzip it, rename the file
* "SLID_2010.dta", and change the working directory (in (2)
below) to wherever you
* saved the file;
** (1) Preamble;
*---------;
*The command below tells Stata to use ";" to indicate the end
of a line of code
# delimit ;
*This file will walk you through the basic commands that you
might
use if you were doing statistics with Stata. The first thing to
note is that
the asterisk .
In this session you will learn:
PIG
Loads in Pig Continued
Verification
Filters
Macros in Pig
For more information, visit: https://www.mindsmapped.com/courses/big-data-hadoop/hadoop-developer-training-a-step-by-step-tutorial/
The GPars (Groovy Parallel Systems) project provides multiple abstractions for concurrent, parallel programming in Groovy and Java. Rather than dealing directly with threads, synchronization, and locks, or even the java.util.concurrent classes added in Java 5, the project allows you to think in terms of actors, data flows, or composable asynchronous functions (to name a few).
In this talk, I covered the basics of GPars, including what it's like to learn to use it. Although I've done a fair amount of concurrent programming, I've just started using GPars. As such, this talk should be suitable for Groovy beginners.
Perl, a cross-platform, open-source computer programming language used widely in the commercial and private computing sectors. Perl is a favourite among Web developers for its flexible, continually evolving text-processing and problem-solving capabilities.
Hierarchical Clustering - Text Mining/NLPRupak Roy
Documented Hierarchical clustering using Hclust for text mining, natural language processing.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Clustering K means and Hierarchical - NLPRupak Roy
Classify to cluster the natural language processing via K means, Hierarchical and more.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Network Analysis using 3D interactive plots along with their steps for implementation.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Widely accepted steps for sentiment analysis.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Process the sentiments of NLP with Naive Bayes Rule, Random Forest, Support Vector Machine, and much more.
Thanks, for your time, if you enjoyed this short slide there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Detailed Pattern Search using regular expressions using grepl, grep, grepexpr and Replace with sub, gsub and much more.
Thanks, for your time, if you enjoyed this short slide there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Detailed documented with the definition of text mining along with challenges, implementing modeling techniques, word cloud and much more.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Bundled with the documentation to the introduction of Apache Hbase to the configuration.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Understand and implement the terminology of why partitioning the table is important and the Hive Query Language (HQL)
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Installing Apache Hive, internal and external table, import-export Rupak Roy
Perform Hive installation with internal and external table import-export and much more
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Well illustrated with definitions of Apache Hive with its architecture workflows plus with the types of data available for Apache Hive
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Automate the complete big data process from import to export data from HDFS to RDBMS like sql with apache sqoop
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
Familiar with scoop advanced functions like import with append and last modified mode.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Get acquainted with the differences in scoop, the added advantages with hands-on implementation
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Get acquainted with a distributed, reliable tool/service for collecting a large amount of streaming data to centralized storage with their architecture.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
take care!
Get to know about casting of data from one to another type and reference field by position and much more
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Pig Latin, Data Model with Load and Store FunctionsRupak Roy
Documented with the two data types of PiG Data Model including Complex PIG data types in detail.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Well-versed explanation of apache pig for analyzing the massive amount of data with its components pig latin, execution environments, and the high-level language pig architecture.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
2. Relational operator: foreach
foreach the name itself describes for each record do
something. It is similar to For-Loop for specifying the
iteration that is executed repeatedly.
Example: select few columns
grunt> a =foreach dataTransaction Generate $0,$1,$2 ;
It can also be used for various arithmetic operations such as
grunt> A= FOREACH dataTransaction Generate $0,($3+$4)
as S;
or
grunt> a =foreach dataTransaction Generate $0,
(TransAmt1+TransaAmt2) as S;
Rupak Roy
3. grunt > B= FOREACH A GENERATE $1/100;
or
grunt> b = foreach A GENERATE ($1/100) as D
C= FOREACH B GENERATE ( (D >50)?’above’ :
‘below’);
or
C= foreach B generate ( (D==50)?’Equal’ :
((D>50)?’above’:’down’));
Rupak Roy
4. Relational Operators: filter
It is used to select the required tuple based on conditions.
Or simply we can say filter helps to remove unwanted data/records based
on requirements.
Example such as:
grunt> F = Filter dataTransaction by TransAmt1 > 500;
Or
grunt> F1 = filter dataTransaction by (($4+$5)/100) > 2 ;
Or
grunt> F2 = filter dataTransaction by $6 == ‘Nunavut’;
Or
grunt> F3 = filter data Transaction by $1 MATCHES ‘ Car.*’;
#it will give all the names that starts with CA….
Or
grunt> F4 = filter dataTRansaction by NOT $1 MATCHES ‘Car.*’;
#it will give all the names that doesnot starts with CA
Rupak Roy
5. Relational Operators: filter
Or
grunt>F5 = filter dataTransaction by CustomerName MATCHES ‘Ca.*s’;
#it will filter the records based on names starting with ‘Ca’ and ends with
‘s ’ . To represent any number of characters we use * and in this case we
want any number of characters before ‘s’but after Ca
Or
grunt> F5 = filter dataTransaction by CustomerName MATCHES
‘ .*(nica|los) .* ‘
#now here the dot start ( .* ) means it can have any number of characters
before and after .*(nica or los) .*
nica = MONICA Federle
los = Carlos Daly
Rupak Roy
6. Relational operators: Join
Join Operator is used when we have to combine
two or more datasets.
Joining the two or more datasets is done based
on a common key from the datasets.
Joins can be of 3 types
1. Self-join
2. Inner-join
3. Outer-join – left join, right join and full join
Rupak Roy
7. Self – join
Self join is used for joining a table itself.
Let’s understand this with the help of an example:
#Load the same dataset under different Alias name:
grunt> join1= LOAD ‘/home/hduser/datasets/join1.csv’
using PigStorage(‘,’) as ( CustomerNAme:chararray,
Transaction_ID:bytearray, ProductName: chararray);
grunt> join11= LOAD
‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’)
as ( CustomerNAme:chararray,
Transaction_ID:bytearray, ProductName: chararray);
Rupak Roy
8. #perform Self-join using JOIN operator
grunt> selfjoin = JOIN join1 by Transaction_ID, join11
by Transaction_ID;
grunt> dump selfjoin;
Rupak Roy
9. Inner-join
Is also known as equijoin.
Inner join returns rows when there is a match in both
tables based on a common key or a value.
#Load data2
grunt> join2= LOAD ‘/home/hduser/datasets/join2.csv’
using PigStorage(‘,’) as ( CustomerNAme:chararray,
Transaction_ID:bytearray, Department: chararray);
grunt> innerjoin = JOIN join1 by Transaction_ID, join2 by
Transaction_ID;
grunt> dump innerjoin;
Rupak Roy
10. Outer Join
Left Outer Join returns all rows
from the left table, even if there is no
match in the right table and
it will take only the values from the right table that matches
with the left table.
grunt> leftouter = JOIN join1 by Transaction_ID LEFT OUTER, join2 BY Transaction_ID;
Right Outer Join: is the opposite of Left Outer Join. It returns all
the rows from the right table even if there are no matches in
the left table and it will take only the values from the left table
that matches with the
right table
grunt> rightouter =JOIN join1 by Transaction_ID
RIGHT OUTER ,
join2 by Transaction_ID;
Rupak Roy
11. Outer Join
Full Outer Join: returns all the rows from
both the tables when there is a match in
one of the relations.
grunt> fullouter = JOIN join1 by
Transaction_ID FULL OUTER, join2 BY
Transaction_ID;
Rupak Roy
13. CO-Group: which essentially performs a join and
a group at the same time.
COGROUP on multiple datasets results in a record
with a key dataset.
To perform COGROUP type:
grunt> COGROUP join1 on Transaction_ID, join2 on
Transaction_ID;
Rupak Roy
14. Relational Operator: UNION
Is to merge the contents of two and more datasets.
grunt> U = UNION join1, join2;
dump U;
What if we want to merge two datasets that has different schemas exampe:
join1= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray);
join1u= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:int, Department: chararray);
join2= LOAD ‘/home/hduser/datasets/join2.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray);
Unioned= UNION join1u,join2 ;
Describe Unioned; it will through an error ‘cannot cast to byte array ‘ due to different data
types of transaction ID.
Rupak Roy
15. It will be very tedious and time consuming to go
back and forth and load the data to change the
schema. We can also explicitly define the schema
while using relational queries without disturbing the
original schema.
grunt> joinM= FOREACH join2 generate $0,(int)$1,$2;
unioned = UNION joinM, join1u;
describe unioned;
Alternatively to perform UNION for incompatible
data type using ONSCHEMA;
grunt>U= UNION ONSCHEMA join1u, join2;
Rupak Roy
16. Relational Operator: RANK
Returns rank to each tuple with a relation;
Example:
grunt> vi names
Zara,1,F
David,2,F
David,2,T
Alan,2,M
Calvin,3,M
Alan,5,M
Chris,8,M
Ellie ,7,F
Bob,8,M
Carlos,2,M
Then press ‘ ESC’ key then type ‘ :wq! ‘ to save
grunt> names = load ‘/home/hduser/datasets/names’ using PigStorage (‘,’) as
( n1:charrray,n2:int,n3:chararray);
grunt> DUMP names;
Rupak Roy
17. grunt> ranked = RANK names;
grunt> dump ranked;
(1, Zara,1,F)
(2, David,2,F)
(3 David,2,T)
(4 Alan,2,M)
(5, Calvin,2,M)
(6, Alan,5,M)
(7, Chris,8,M)
(8, Ellie ,7,F)
(9, Bob,8,M)
(10,Carlos,2,F)
We can also implement rank using two fields, each one with
different sorting order.
grunt> ranked2 = RANK names by N1 ASC, N2 DESC;
grunt> dump ranked2;
Rupak Roy
18. Sometimes we might encounter the RANK has been
assigned to 2 fields or 2 records with a same rank.
To overcome the issue we have a small function call
DENSE
grunt> rankedG = RANK names by N1 DESC, N2 ASC DENSE;
(1,Zara,1,F)
(2,Elie,7,F)
(3,David,2,F)
(3,David,2,T)
(4,Chris,8,M)
(5,Carlos,2,F)
(6,Calvin,3,M)
(7,bob,8,M)
(8,Alan,2,M)
(9,Alan,5,M)
Rupak Roy
19. Next
We will learn UDF (User Define Function).
Rupak Roy