SlideShare a Scribd company logo
1 of 18
Download to read offline
RHive tutorial - basic functions
This tutorial explains how to load RHive library and use basic Functions for
RHive.

Loading RHive
Load RHive with the method used when using any R package. Load RHive
like below:

library(RHive)	
  

But before loading RHive, you must not forget to configure HADOOP_HOME
and HIVE_HOME environment
And if they are not set then you can temporarily set them before loading the
library, like as follows.
HADOOP_HOME is the home directory where Hadoop is installed and
HIVE_HOME is the home directory where Hive is installed.
Consult RHive tutorial - RHive installation and setting for details on
environment variables.

Sys.setenv(HIVE_HOME="/service/hive-­‐0.7.1")	
  
Sys.setenv(HADOOP_HOME="/service/hadoop-­‐0.20.203.0")	
  
library(RHive)	
  


rhive.init
rhive.init is a procedure that internally initializes and if, before loading RHive,
environment variables were calibrated accurately then they will automatically
run.
But if these environment variable were not configured while RHive was loaded
via library(RHIve) then the following error message will result.

rhive.connect()	
  
Error	
  in	
  .jcall("java/lang/Class",	
  "Ljava/lang/Class;",	
  
"forName",	
  cl,	
  	
  :	
  
	
  	
  No	
  running	
  JVM	
  detected.	
  Maybe	
  .jinit()	
  would	
  help.	
  
Error	
  in	
  .jfindClass(as.character(class))	
  :	
  
	
  	
  No	
  running	
  JVM	
  detected.	
  Maybe	
  .jinit()	
  would	
  help.	
  
For this case then designate HADOOP_HOME and HADOOP_HOME as
shown below or exit R then configure environment variables and restart R.

Sys.setenv(HIVE_HOME="/service/hive-­‐0.7.1")	
  
Sys.setenv(HADOOP_HOME="/service/hadoop-­‐0.20.203.0")	
  
rhive.init()	
  

Or,

close	
  R	
  
export	
  HIVE_HOME="/service/hive-­‐0.7.1"	
  
export	
  HADOOP_HOME="/service/hadoop-­‐0.20.203.0"	
  
open	
  R	
  


rhive.connect
All Functions of RHive will only work after having connected to Hive server.
If before using other Functions of RHive, you have not established a
connection by using the rhive.connect Function,
All RHive Functions will malfunction and produce the following errors when
running.

Error	
  in	
  .jcast(hiveclient[[1]],	
  new.class	
  =	
  
"org/apache/hadoop/hive/service/HiveClient",	
  	
  :	
  
	
  	
  cannot	
  cast	
  anything	
  but	
  Java	
  objects	
  

Establishing a connection with Hive server to use RHive is simple with the
following:

rhive.connect()	
  

The example above can additionally assign a few more things.

rhiveConnection	
  <-­‐	
  rhive.connect("10.1.1.1")	
  

In the case the user’s Hive server is installed to a server other than the one
with RHive installed, and has to remotely connect,
a connection can be made by handing arguments over to the rhive.connect
Function.
Then if you have multiple Hadoop and Hive clusters, then after making the
right configurations to have RHive activated, and you want to switch between
the Hives then
just like using DB client such as MySQL, you should make connections and
hand it over to the Functions via arguments to explicitly select connection.

rhive.query
If the user has experience in using Hive, then he/she probably knows that
Hive supports SQL syntax to handle the data for Map/Reduce and HDFS.
rhive.query gives SQL to Hive and receives results from Hive.
Users who know SQL syntax will find this a frequently encountered example.


rhive.query("SELECT	
  *	
  FROM	
  usarrests")	
  

If you run the example above then you will see the contents of a table named
‘usarrests’ printed on the screen.
Or, on top of printing the returned result on the screen, you can also assign to
a data.frame object those results.


resultDF	
  <-­‐	
  rhive.query("SELECT	
  *	
  FROM	
  usarrests")	
  

A thing to beware of is if the data returned from rhive.query is bigger than the
RHive server’s memory or laptop’s, exhaustion of available memory will
induce an error message.
That is why you must not receive and put into object any data of such size.
It is better to first create a temporary table and then put the results of the SQL
to the temporary table.
You can do it as the following.

rhive.query("	
  
CREATE	
  TABLE	
  new_usarrests	
  (	
  
	
  	
  rowname	
  	
  	
  	
  string,	
  
	
  	
  murder	
  	
  	
  	
  double,	
  
	
  	
  assault	
  	
  	
  	
  	
  	
  	
  int,	
  
	
  	
  urbanpop	
  	
  	
  	
  	
  	
  	
  int,	
  
	
  	
  rape	
  	
  	
  	
  double	
  
)")	
  
 	
  
 rhive.query("INSERT	
  OVERWRITE	
  TABLE	
  new_usarrests	
  SELECT	
  *	
  
 FROM	
  usarrest")	
  

Consult a Hive document for a detailed account of how to use Hive SQL.

rhive.close
If you have finished using Hive and do not wish to use RHive Functions any
longer, you can use the rhive.close Function to terminate the connection.

 rhive.close()	
  

Alternatively, you can assign a specific connection to close it.

 conn	
  <-­‐	
  rhive.connect()	
  
 rhive.close(conn)	
  


rhive.list.tables
The rhive.list.tables Function returns the results of tables in Hive.

 rhive.list.tables()	
  
 	
  	
  	
  	
  	
  	
  	
  tab_name	
  
 1	
  	
  	
  	
  	
  	
  	
  	
  	
  aids2	
  
 2	
  new_usarrests	
  
 3	
  	
  	
  	
  	
  usarrests	
  

This is effectively identical to this:

 rhive.query("SHOW	
  TABLES")	
  


rhive.desc.table
The rhive.desc.table Function shows the description of the chosen table.

 rhive.desc.table("usarrests")	
  
 	
  col_name	
  data_type	
  comment	
  
 1	
  	
  rowname	
  	
  	
  	
  string	
  
 2	
  	
  	
  murder	
  	
  	
  	
  double	
  
 3	
  	
  assault	
  	
  	
  	
  	
  	
  	
  int	
  
 4	
  urbanpop	
  	
  	
  	
  	
  	
  	
  int	
  
 5	
  	
  	
  	
  	
  rape	
  	
  	
  	
  double	
  

This is effectively identical to this:

 rhive.query("DESC	
  usarrests")	
  


rhive.load.table
The rhive.load.table Function loads Hive tables’ contents as R’s data.frame
object.

 df1	
  <-­‐	
  rhive.load.table("usarrests")	
  
 df1	
  

This is effectively identical to this:

 df1	
  <-­‐	
  rhive.query("SELECT	
  *	
  FROM	
  usarrests")	
  
 df1	
  


rhive.write.table
The rhive.write.table Function is the antithesis of rhive.load.table.
But it is more useful than rhive.load.table.
If you wish to add data to a table located in Hive, you must first make a table.
But using rhive.write.table does not require any additional work, and simply
creates R’s dataframe into Hive and inserts all data.

 head(UScrime)	
  
 	
  	
  	
  	
  M	
  So	
  	
  Ed	
  Po1	
  Po2	
  	
  LF	
  	
  M.F	
  Pop	
  	
  NW	
  	
  U1	
  U2	
  GDP	
  
 Ineq	
  	
  	
  	
  	
  Prob	
  	
  	
  	
  Time	
  	
  	
  	
  y	
  
 1	
  151	
  	
  1	
  	
  91	
  	
  58	
  	
  56	
  510	
  	
  950	
  	
  33	
  301	
  108	
  41	
  394	
  	
  261	
  0.084602	
  
 26.2011	
  	
  791	
  
2	
  143	
  	
  0	
  113	
  103	
  	
  95	
  583	
  1012	
  	
  13	
  102	
  	
  96	
  36	
  557	
  	
  194	
  0.029599	
  
25.2999	
  1635	
  
3	
  142	
  	
  1	
  	
  89	
  	
  45	
  	
  44	
  533	
  	
  969	
  	
  18	
  219	
  	
  94	
  33	
  318	
  	
  250	
  0.083401	
  
24.3006	
  	
  578	
  
4	
  136	
  	
  0	
  121	
  149	
  141	
  577	
  	
  994	
  157	
  	
  80	
  102	
  39	
  673	
  	
  167	
  0.015801	
  
29.9012	
  1969	
  
5	
  141	
  	
  0	
  121	
  109	
  101	
  591	
  	
  985	
  	
  18	
  	
  30	
  	
  91	
  20	
  578	
  	
  174	
  0.041399	
  
21.2998	
  1234	
  
6	
  121	
  	
  0	
  110	
  118	
  115	
  547	
  	
  964	
  	
  25	
  	
  44	
  	
  84	
  29	
  689	
  	
  126	
  0.034201	
  
20.9995	
  	
  682	
  
	
  	
  
rhive.write.table(UScrime)	
  
[1]	
  "UScrime"	
  
	
  	
  
rhive.list.tables()	
  
	
  	
  	
  	
  	
  	
  	
  tab_name	
  
1	
  	
  	
  	
  	
  	
  	
  	
  	
  aids2	
  
2	
  new_usarrests	
  
3	
  	
  	
  	
  	
  usarrests	
  
4	
  	
  	
  	
  	
  	
  	
  uscrime	
  
	
  	
  
rhive.query("SELECT	
  *	
  FROM	
  uscrime	
  LIMIT	
  10")	
  
	
  	
  	
  rowname	
  	
  	
  m	
  so	
  	
  ed	
  po1	
  po2	
  	
  lf	
  	
  	
  mf	
  pop	
  	
  nw	
  	
  u1	
  u2	
  gdp	
  
ineq	
  	
  	
  	
  	
  prob	
  	
  	
  	
  time	
  
1	
  	
  	
  	
  	
  	
  	
  	
  1	
  151	
  	
  1	
  	
  91	
  	
  58	
  	
  56	
  510	
  	
  950	
  	
  33	
  301	
  108	
  41	
  394	
  	
  261	
  
0.084602	
  26.2011	
  
2	
  	
  	
  	
  	
  	
  	
  	
  2	
  143	
  	
  0	
  113	
  103	
  	
  95	
  583	
  1012	
  	
  13	
  102	
  	
  96	
  36	
  557	
  	
  194	
  
0.029599	
  25.2999	
  
3	
  	
  	
  	
  	
  	
  	
  	
  3	
  142	
  	
  1	
  	
  89	
  	
  45	
  	
  44	
  533	
  	
  969	
  	
  18	
  219	
  	
  94	
  33	
  318	
  	
  250	
  
0.083401	
  24.3006	
  
4	
  	
  	
  	
  	
  	
  	
  	
  4	
  136	
  	
  0	
  121	
  149	
  141	
  577	
  	
  994	
  157	
  	
  80	
  102	
  39	
  673	
  	
  167	
  
0.015801	
  29.9012	
  
5	
  	
  	
  	
  	
  	
  	
  	
  5	
  141	
  	
  0	
  121	
  109	
  101	
  591	
  	
  985	
  	
  18	
  	
  30	
  	
  91	
  20	
  578	
  	
  174	
  
0.041399	
  21.2998	
  
6	
  	
  	
  	
  	
  	
  	
  	
  6	
  121	
  	
  0	
  110	
  118	
  115	
  547	
  	
  964	
  	
  25	
  	
  44	
  	
  84	
  29	
  689	
  	
  126	
  
0.034201	
  20.9995	
  
7	
  	
  	
  	
  	
  	
  	
  	
  7	
  127	
  	
  1	
  111	
  	
  82	
  	
  79	
  519	
  	
  982	
  	
  	
  4	
  139	
  	
  97	
  38	
  620	
  	
  168	
  
0.042100	
  20.6993	
  
8	
  	
  	
  	
  	
  	
  	
  	
  8	
  131	
  	
  1	
  109	
  115	
  109	
  542	
  	
  969	
  	
  50	
  179	
  	
  79	
  35	
  472	
  	
  206	
  
0.040099	
  24.5988	
  
9	
  	
  	
  	
  	
  	
  	
  	
  9	
  157	
  	
  1	
  	
  90	
  	
  65	
  	
  62	
  553	
  	
  955	
  	
  39	
  286	
  	
  81	
  28	
  421	
  	
  239	
  
0.071697	
  29.4001	
  
10	
  	
  	
  	
  	
  	
  10	
  140	
  	
  0	
  118	
  	
  71	
  	
  68	
  632	
  1029	
  	
  	
  7	
  	
  15	
  100	
  24	
  526	
  	
  174	
  
0.044498	
  19.5994	
  
	
  	
  	
  	
  	
  	
  y	
  
1	
  	
  	
  791	
  
2	
  	
  1635	
  
3	
  	
  	
  578	
  
4	
  	
  1969	
  
5	
  	
  1234	
  
6	
  	
  	
  682	
  
7	
  	
  	
  963	
  
8	
  	
  1555	
  
9	
  	
  	
  856	
  
10	
  	
  705	
  

The rhive.write.table Function encounters an error and does not work if the
table to be saved into Hive already exists.
Hence, if attempting to save to Hive any dataframes with the same name and
symbol as any table already in Hive, it is imperative that you delete them
before using rhive.write.table.

if	
  (rhive.exist.table("uscrime"))	
  {	
  
	
  	
  rhive.query("DROP	
  TABLE	
  uscrime")	
  
}	
  
	
  	
  
rhive.write.table(UScrime)	
  
RHive - alias functions
RHive’s Functions look similar to S3 generic’s naming rules but many are
actually not generic. This is for the S3 generic Functions which RHive may or
may not support in the future.
For users who detest confusion wrought by Functions that, despite containing
“.” yet still do not count as generic, there exist some Functions with different
names but serve the same roles. The following alias Functions are such as
described below.

hiveConnect
This is same as rhive.connect.



hiveQuery
This is same as rhive.query.



hiveClose
This is same as hive.close.



hiveListTables
This is same as hive.list.tables.



hiveDescTable
This is same as hive.desc.table.



hiveLoadTable
This is same as hive.load.table.
rhive.basic.cut
rhive.basic.cut converts one numerical column from a table to one factorized
column. First, the range of the numerical column is divided into intervals, and
the values in the numerical column are factorized according to which interval
they fall. Rhive.basic.cut receives the following six arguments, tablename(a
table name), col(a numerical column name), breaks, right, summary, and
forcedRef. breaks are numerical cut points for the numerical column. right
indicates if the ends of the intervals are open or closed. If TRUE, the intervals
are closed on the right and open on the left. If not, vice versa. summary =
TRUE spits out total counts of numerical values corresponding to the intervals.
If FALSE, the name of a new table updated by the factorized table is returned.
forcedRef = TRUE forces rhive.basic.cut to return a table name instead of a
data frame for forcedRef = FALSE. The defaults of right, summary,
and forcedRef are TRUE, FALSE, and TRUE respectively.

Example for summary = FALSE

>	
  table_name	
  =	
  rhive.basic.cut(tablename	
  =	
  "iris",	
  col	
  =	
  
"sepallength",	
  breaks	
  =	
  seq(0,	
  5,	
  0.5),	
  right	
  =	
  FALSE,	
  summary	
  
=	
  FALSE,	
  forcedRef	
  =	
  TRUE)	
  
>	
  table_name	
  
[1]	
  "rhive_result_1330382904"	
  
attr(,"result:size")	
  
[1]	
  4296	
  
>	
  results	
  =	
  rhive.query("select	
  *	
  from	
  
rhive_result_1330382904")	
  
>	
  head(results)	
  
	
  	
  rowname	
  sepalwidth	
  petallength	
  petalwidth	
  species	
  sepallength	
  
1	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  3.5	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  NULL	
  
2	
  	
  	
  	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  3.0	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  [4.5,5.0)	
  
3	
  	
  	
  	
  	
  	
  	
  3	
  	
  	
  	
  	
  	
  	
  	
  3.2	
  	
  	
  	
  	
  	
  	
  	
  	
  1.3	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  [4.5,5.0)	
  
4	
  	
  	
  	
  	
  	
  	
  4	
  	
  	
  	
  	
  	
  	
  	
  3.1	
  	
  	
  	
  	
  	
  	
  	
  	
  1.5	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  [4.5,5.0)	
  
5	
  	
  	
  	
  	
  	
  	
  5	
  	
  	
  	
  	
  	
  	
  	
  3.6	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  NULL	
  
6	
  	
  	
  	
  	
  	
  	
  6	
  	
  	
  	
  	
  	
  	
  	
  3.9	
  	
  	
  	
  	
  	
  	
  	
  	
  1.7	
  	
  	
  	
  	
  	
  	
  	
  0.4	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  NULL	
  

Example for summary = TRUE
>	
  summary	
  =	
  rhive.basic.cut(tablename	
  =	
  "iris",	
  col	
  =	
  
"sepallength",	
  breaks	
  =	
  seq(0,	
  5,	
  0.5),	
  right	
  =	
  FALSE,	
  summary	
  
=	
  TRUE,	
  forcedRef	
  =	
  TRUE)	
  
>	
  summary	
  
	
  	
  	
  	
  	
  NULL	
  [4.0,4.5)	
  [4.5,5.0)	
  
	
  	
  	
  	
  	
  	
  128	
  	
  	
  	
  	
  	
  	
  	
  	
  4	
  	
  	
  	
  	
  	
  	
  	
  18	
  




rhive.basic.cut2
rhive.basic.cut2 converts two numerical columns from a table to two factorized
columns. That is, the range of each numerical column is divided into intervals,
and the values in each numerical column are factorized according to which
interval they fall. Rhive.basic.cut2 receives the following eight arguments,
tablename(a table name), col1, col2(two column names), breaks1, breaks2,
right, keepCol, and forcedRef. breaks1 and breaks2 are numerical cut points
for the two numerical columns. right indicates if the ends of the intervals are
open or closed. If TRUE, the intervals are closed on the right and open on the
left. If not, vice versa. keepCol = TRUE makes the two numerical columns
kept even after the conversion. Otherwise, the factorized columns replace the
original numerical columns. forcedRef = TRUE forces rhive.basic.cut to return
a table name instead of a data frame for forcedRef = FALSE. The defaults of
right, summary, and forcedRef are TRUE, FALSE, and TRUE respectively.

Example for right = TRUE and keepCol = FALSE

> table_name = rhive.basic.cut2(tablename = "iris", col1 = "sepallength", col2
= "petallength", breaks1 = seq(0, 5, 0.5), breaks2 = seq(0, 5, 0.5), right =
TRUE, keepCol = FALSE, forcedRef = TRUE)

> table_name

[1] "rhive_result_1330385833"

attr(,"result:size")

[1] 5272

> results = rhive.query("select * from rhive_result_1330385833")

> head(results)
rowname sepalwidth petalwidth species sepallength petallength rep

1              1                 3.5                   0.2 setosa                                  NULL                 (1.0,1.5]                  1

2              2                 3.0                   0.2 setosa                          (4.5,5.0]                  (1.0,1.5]                  1

3	
  	
  	
  	
  	
  	
  	
  3	
  	
  	
  	
  	
  	
  	
  	
  3.2	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  (4.5,5.0]	
  	
  	
  (1.0,1.5]
	
  	
  	
  1	
  
4	
  	
  	
  	
  	
  	
  	
  4	
  	
  	
  	
  	
  	
  	
  	
  3.1	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  (4.5,5.0]	
  	
  	
  (1.0,1.5]
	
  	
  	
  1	
  
5	
  	
  	
  	
  	
  	
  	
  5	
  	
  	
  	
  	
  	
  	
  	
  3.6	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  (4.5,5.0]	
  	
  	
  (1.0,1.5]
	
  	
  	
  1	
  
6	
  	
  	
  	
  	
  	
  	
  6	
  	
  	
  	
  	
  	
  	
  	
  3.9	
  	
  	
  	
  	
  	
  	
  	
  0.4	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  NULL	
  	
  	
  (1.5,2.0]
	
  	
  	
  1	
  

Example for right = FALSE and keepCol = TRUE

>	
  table_name	
  =	
  rhive.basic.cut2(tablename	
  =	
  "iris",	
  col1	
  =	
  
"sepallength",	
  col2	
  =	
  "petallength",	
  breaks1	
  =	
  seq(0,	
  5,	
  0.5),	
  
breaks2	
  =	
  seq(0,	
  5,	
  0.5),	
  right	
  =	
  FALSE,	
  keepCol	
  =	
  TRUE,	
  
forcedRef	
  =	
  TRUE)	
  
>	
  table_name	
  
[1]	
  "rhive_result_1330315663"	
  
attr(,"result:size")	
  
[1]	
  6374	
  
>	
  results	
  =	
  rhive.query("select	
  *	
  from	
  
rhive_result_1330315663")	
  
>	
  head(results)	
  
	
  	
  rowname	
  sepalwidth	
  petalwidth	
  species	
  sepallength	
  
sepallength_cut	
  petallength	
  petallength_cut	
  rep	
  
1	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  3.5	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  	
  5.1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  N
ULL	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  [1.0,1.5)	
  	
  	
  1	
  
2	
  	
  	
  	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  3.0	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  	
  4.9	
  	
  	
  	
  	
  	
  	
  [4.5,5
.0)	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  [1.0,1.5)	
  	
  	
  1	
  
3	
  	
  	
  	
  	
  	
  	
  3	
  	
  	
  	
  	
  	
  	
  	
  3.2	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  	
  4.7	
  	
  	
  	
  	
  	
  	
  [4.5,5
.0)	
  	
  	
  	
  	
  	
  	
  	
  	
  1.3	
  	
  	
  	
  	
  	
  	
  [1.0,1.5)	
  	
  	
  1	
  
4	
  	
  	
  	
  	
  	
  	
  4	
  	
  	
  	
  	
  	
  	
  	
  3.1	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  	
  4.6	
  	
  	
  	
  	
  	
  	
  [4.5,5
.0)	
  	
  	
  	
  	
  	
  	
  	
  	
  1.5	
  	
  	
  	
  	
  	
  	
  [1.5,2.0)	
  	
  	
  1	
  
5	
  	
  	
  	
  	
  	
  	
  5	
  	
  	
  	
  	
  	
  	
  	
  3.6	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  	
  	
  	
  	
  	
  	
  	
  	
  5.0	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  N
ULL	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  [1.0,1.5)	
  	
  	
  1	
  


rhive.basic.xtabs
rhive.basic.xtabs makes a contingency table from cross-classifying factors. A
formula object and a table name are used as input arguments and a
contingency table with matrix format is returned based on the given formula.
For instance, two column names, agegp and alcgp from a table are cross-
classifying factors in this formula, "ncontrols ~ agegp + alcgp".
Also, observations    for   each    combination     of the  cross-classifying
factors are summed up through another column name, ncontrols.

Example for esoph data

>	
  xtab_formula	
  	
  =	
  as.formula(paste("ncontrols","~",	
  "agegp",	
  
"+","alcgp",sep	
  =""))	
  
>	
  xtab_formula	
  
ncontrols	
  ~	
  agegp	
  +	
  alcgp	
  
>	
  table_result	
  =	
  rhive.basic.xtabs(formula	
  =	
  xtab_formula,	
  
tablename	
  =	
  "esoph")	
  
>	
  head(table_result)	
  
	
  	
  	
  	
  	
  	
  	
  alcgp	
  
agegp	
  	
  	
  0-­‐39g/day	
  120+	
  40-­‐79	
  80-­‐119	
  
	
  	
  25-­‐34	
  	
  	
  	
  	
  	
  	
  	
  61	
  	
  	
  	
  5	
  	
  	
  	
  45	
  	
  	
  	
  	
  	
  5	
  
	
  	
  35-­‐44	
  	
  	
  	
  	
  	
  	
  	
  89	
  	
  	
  10	
  	
  	
  	
  80	
  	
  	
  	
  	
  20	
  
	
  	
  45-­‐54	
  	
  	
  	
  	
  	
  	
  	
  78	
  	
  	
  15	
  	
  	
  	
  81	
  	
  	
  	
  	
  39	
  
	
  	
  55-­‐64	
  	
  	
  	
  	
  	
  	
  	
  89	
  	
  	
  26	
  	
  	
  	
  84	
  	
  	
  	
  	
  43	
  
	
  	
  65-­‐74	
  	
  	
  	
  	
  	
  	
  	
  71	
  	
  	
  	
  8	
  	
  	
  	
  53	
  	
  	
  	
  	
  29	
  
	
  	
  75+	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  27	
  	
  	
  	
  3	
  	
  	
  	
  12	
  	
  	
  	
  	
  	
  2	
  



rhive.basic.t.test
The rhive.basic.t.test Function runs Welch's t-test on two samples. In this case
the two sample's mean difference is tested while holding the alternative
hypothesis, "two sample's mean difference is not 0." Thus, two-side test is
performed.
The following is an example of test the mean difference between the irises'
sepal widths and petal widths. Pay attention to how the Functions that used
the "sepallength" and "petallength" variables were called.

>	
  rhive.basic.t.test("iris",	
  "sepallength",	
  "iris",	
  
"petallength")	
  
[1]	
  "t	
  =	
  13.1422338118038,	
  df	
  =	
  211.542688378717,	
  p-­‐value	
  =	
  
0,	
  mean	
  of	
  x	
  :	
  5.84333333333333,	
  mean	
  of	
  y	
  :	
  3.758"	
  
$statistic	
  
	
  	
  	
  	
  	
  	
  	
  t	
  	
  
13.14223	
  	
  
$parameter	
  
	
  	
  	
  	
  	
  	
  df	
  	
  
211.5427	
  	
  
$p.value	
  
[1]	
  0	
  
$estimate	
  
$estimate[[1]]	
  
mean	
  of	
  x	
  	
  
	
  5.843333	
  	
  
$estimate[[2]]	
  
mean	
  of	
  y	
  	
  
	
  	
  	
  	
  3.758	
  	
  
>	
  

Interpreting the results gives you a p-value of 0, thus revealing a difference
between the means of petal width and sepal width. The resulting statistics are
converted as an R list Object, and the string made from amassed statistics is
printed onto console.

Iris data is 150 observation cases provided by R. Using this data for R's t.test
results in a slightly off t-statistic of 13.0984. This is due to the variance used
by t.test Function to find t-statistic is sample variance, while rhive.basic.t.test
Function uses population variance. Like the example scenario, in the case of
little data, t-statistic deviance may exist but the larger the data gets the
deviance dwindles. With rhive.basic.t.test being a Function made for massive
data analysis in mind, population variance is used for speedy calculations.
rhive.block.sample
The percent argument is an optional argument that sets the percentage of
data to extract from the total data. It has a default value of 0.01, which means
it extracts 0.01% of the total data. But this percent argument's value is not the
ratio of the actually sampled data count to the total data count but more akin
to the ratio of Blocks to the total Blocks. Thus, rhive.block.sample Function
takes Samples by the Block.

Thus the entire data may be returned when using the rhive.block.sample
Function on Hive Tables of small data size. This occurs when the data is
smaller than the Block size set in Hive.

The seed variable is for specifying the Random Seed used when executing
Block Sampling in Hive. Should the Random Seeds be identical, Hive's Block
Sampling returns the same results. Thus in order to guarantee Random
Samples for every sampling, it is best to assign a value for the seed variable
in rhive.block.sample, by using the Sample Function of R.

The subset variable is an optional variable that can specify the condition for
the data to be extracted from the Table targeted by Hive, when returning
Sample Block. This argument uses the character type and corresponds to the
'where' clause in Hive HQL. Thus it must use syntax appropriate for HQL's
where clause.

rhive.block.sample Function's return values are the character values of the
name of the Hive Table that contain Sample Block results. That is, the
rhive.block.sample Function uses Sample Block to automatically create a
temporary Hive Table and return that Table's name. The following example
involves sampling data worth 0.01% of the Hive Table called
listvirtualmachines. This example used R's sample Function for the Random
Seed to be used during Block Sampling of Hive.

seedNumber	
  <-­‐	
  sample(1:2^16,	
  1)	
  
	
  	
  
rhive.block.sample("listvirtualmachines",	
  seed=seedNumber	
  )	
  
	
  	
  
[1]	
  "rhive_sblk_1330404552"
As per this example, a Hive Table of the name "rhive_sblk_1330404552"
bearing 0.01% worth of data from the Hive Table, "listvirtualmachines", has
been created.



rhive.basic.scale
The rhive.basic.scale function converts numerical data with 0 average and 1
deviation. Input table name for the first argument, and the output column
name for the second.

In the returned list, there is added a "scaled_column name" column saved as
a string. This is also approachable/editable in RHive, along with/just like other
Hive tables.

scaled	
  <-­‐	
  rhive.basic.scale("iris",	
  "sepallength")	
  
attr(scaled,	
  "scaled:center")	
  
#	
  [1]	
  5.843333	
  
attr(scaled,	
  "scaled:scale")	
  
#	
  [1]	
  0.8253013	
  
>	
  rhive.desc.table(scaled[[1]])	
  
col_name	
  data_type	
  comment	
  
#	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  rowname	
  	
  	
  	
  string	
  
#	
  2	
  	
  	
  	
  	
  	
  	
  	
  	
  sepalwidth	
  	
  	
  	
  double	
  
#	
  3	
  	
  	
  	
  	
  	
  	
  	
  petallength	
  	
  	
  	
  double	
  
#	
  4	
  	
  	
  	
  	
  	
  	
  	
  	
  petalwidth	
  	
  	
  	
  double	
  
#	
  5	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  species	
  	
  	
  	
  string	
  
#	
  6	
  	
  	
  	
  	
  	
  	
  	
  sepallength	
  	
  	
  	
  double	
  
#	
  7	
  sacled_sepallength	
  	
  	
  	
  double	
  




rhive.basic.by
The rhive.basic.by Function consists of code that runs group by for a
specified/particular column. Thus the code below excecutes/applies group by
for "species" column, and returns the result of applying the sum Function on
"sepallength". In the results you will find the sum of each species and
sepallength.

rhive.basic.by("iris",	
  "species",	
  "sum","sepallength")	
  
#	
  species	
  	
  	
  sum	
  
#	
  1	
  	
  	
  	
  	
  setosa	
  250.3	
  
#	
  2	
  versicolor	
  296.8	
  
#	
  3	
  	
  virginica	
  329.4	
  




rhive.basic.merge
rhive.basic.merge makes new data set from merging two tables, based on
their common rows.

#	
  checking	
  data	
  
	
  rhive.query('select	
  *	
  from	
  iris	
  limit	
  5')	
  
	
  	
  rowname	
  sepallength	
  sepalwidth	
  petallength	
  petalwidth	
  species	
  
1	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  5.1	
  	
  	
  	
  	
  	
  	
  	
  3.5	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  
2	
  	
  	
  	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  	
  4.9	
  	
  	
  	
  	
  	
  	
  	
  3.0	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  
3	
  	
  	
  	
  	
  	
  	
  3	
  	
  	
  	
  	
  	
  	
  	
  	
  4.7	
  	
  	
  	
  	
  	
  	
  	
  3.2	
  	
  	
  	
  	
  	
  	
  	
  	
  1.3	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  
4	
  	
  	
  	
  	
  	
  	
  4	
  	
  	
  	
  	
  	
  	
  	
  	
  4.6	
  	
  	
  	
  	
  	
  	
  	
  3.1	
  	
  	
  	
  	
  	
  	
  	
  	
  1.5	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  
5	
  	
  	
  	
  	
  	
  	
  5	
  	
  	
  	
  	
  	
  	
  	
  	
  5.0	
  	
  	
  	
  	
  	
  	
  	
  3.6	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  setosa	
  
	
  	
  
	
  rhive.query('select	
  *	
  from	
  usarrests	
  limit	
  5')	
  
	
  	
  	
  	
  	
  rowname	
  murder	
  assault	
  urbanpop	
  rape	
  
1	
  	
  	
  	
  Alabama	
  	
  	
  13.2	
  	
  	
  	
  	
  236	
  	
  	
  	
  	
  	
  	
  58	
  21.2	
  
2	
  	
  	
  	
  	
  Alaska	
  	
  	
  10.0	
  	
  	
  	
  	
  263	
  	
  	
  	
  	
  	
  	
  48	
  44.5	
  
3	
  	
  	
  	
  Arizona	
  	
  	
  	
  8.1	
  	
  	
  	
  	
  294	
  	
  	
  	
  	
  	
  	
  80	
  31.0	
  
4	
  	
  	
  Arkansas	
  	
  	
  	
  8.8	
  	
  	
  	
  	
  190	
  	
  	
  	
  	
  	
  	
  50	
  19.5	
  
5	
  California	
  	
  	
  	
  9.0	
  	
  	
  	
  	
  276	
  	
  	
  	
  	
  	
  	
  91	
  40.6	
  
	
  	
  
##rhive.basic.merge	
  
	
  rhive.basic.merge('iris','usarrests',by.x='sepallength',by.y='
murder')	
  
	
  	
  	
  sepallength	
  sepalwidth	
  petallength	
  petalwidth	
  	
  	
  	
  species	
  
assault	
  urbanpop	
  rape	
  rowname	
  
1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.3	
  	
  	
  	
  	
  	
  	
  	
  3.0	
  	
  	
  	
  	
  	
  	
  	
  	
  1.1	
  	
  	
  	
  	
  	
  	
  	
  0.1	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
102	
  	
  	
  	
  	
  	
  	
  62	
  16.5	
  	
  	
  	
  	
  	
  14	
  
2	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.4	
  	
  	
  	
  	
  	
  	
  	
  2.9	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
149	
  	
  	
  	
  	
  	
  	
  85	
  16.3	
  	
  	
  	
  	
  	
  	
  9	
  
3	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.4	
  	
  	
  	
  	
  	
  	
  	
  3.0	
  	
  	
  	
  	
  	
  	
  	
  	
  1.3	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
149	
  	
  	
  	
  	
  	
  	
  85	
  16.3	
  	
  	
  	
  	
  	
  39	
  
4	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.4	
  	
  	
  	
  	
  	
  	
  	
  3.2	
  	
  	
  	
  	
  	
  	
  	
  	
  1.3	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
149	
  	
  	
  	
  	
  	
  	
  85	
  16.3	
  	
  	
  	
  	
  	
  43	
  
5	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.9	
  	
  	
  	
  	
  	
  	
  	
  3.1	
  	
  	
  	
  	
  	
  	
  	
  	
  1.5	
  	
  	
  	
  	
  	
  	
  	
  0.1	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
159	
  	
  	
  	
  	
  	
  	
  67	
  29.3	
  	
  	
  	
  	
  	
  10	
  

Merge is similar with ‘join’ in SQL. Followings are same with that.

#	
  Use	
  join	
  to	
  extract	
  and	
  print	
  the	
  names	
  of	
  all	
  rows	
  not	
  found	
  
to	
  be	
  common	
  after	
  merging.	
  
#	
  Should	
  row	
  names	
  overlap,	
  only	
  print	
  out	
  the	
  name	
  of	
  the	
  
former	
  row.	
  	
  
rhive.big.query('select	
  
a.sepallength,a.sepalwidth,a.petallength,a.petalwidth,a.species
,b.assault,b.urbanpop,b.rape,a.rowname	
  from	
  iris	
  a	
  join	
  
usarrests	
  b	
  on	
  a.sepallength	
  =	
  b.murder')	
  
	
  	
  	
  sepallength	
  sepalwidth	
  petallength	
  petalwidth	
  	
  	
  	
  species	
  
assault	
  urbanpop	
  rape	
  rowname	
  
1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.3	
  	
  	
  	
  	
  	
  	
  	
  3.0	
  	
  	
  	
  	
  	
  	
  	
  	
  1.1	
  	
  	
  	
  	
  	
  	
  	
  0.1	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
102	
  	
  	
  	
  	
  	
  	
  62	
  16.5	
  	
  	
  	
  	
  	
  14	
  
2	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.4	
  	
  	
  	
  	
  	
  	
  	
  2.9	
  	
  	
  	
  	
  	
  	
  	
  	
  1.4	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
149	
  	
  	
  	
  	
  	
  	
  85	
  16.3	
  	
  	
  	
  	
  	
  	
  9	
  
3	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.4	
  	
  	
  	
  	
  	
  	
  	
  3.0	
  	
  	
  	
  	
  	
  	
  	
  	
  1.3	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
149	
  	
  	
  	
  	
  	
  	
  85	
  16.3	
  	
  	
  	
  	
  	
  39	
  
4	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.4	
  	
  	
  	
  	
  	
  	
  	
  3.2	
  	
  	
  	
  	
  	
  	
  	
  	
  1.3	
  	
  	
  	
  	
  	
  	
  	
  0.2	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
149	
  	
  	
  	
  	
  	
  	
  85	
  16.3	
  	
  	
  	
  	
  	
  43	
  
5	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.9	
  	
  	
  	
  	
  	
  	
  	
  3.1	
  	
  	
  	
  	
  	
  	
  	
  	
  1.5	
  	
  	
  	
  	
  	
  	
  	
  0.1	
  	
  	
  	
  	
  setosa	
  	
  	
  	
  	
  
159	
  	
  	
  	
  	
  	
  	
  67	
  29.3	
  	
  	
  	
  	
  	
  10	
  
rhive.basic.mode
rhive.basic.mode returns the mode and its frequency within a specified row of
the Hive table.

rhive.basic.mode('iris',	
  'sepallength')	
  
	
  	
  sepallength	
  freq	
  
1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  5	
  	
  	
  10	
  




rhive.basic.range
rhive.basic.range returns the greatest and lowest values within the specified
numerical row of the Hive table.

rhive.basic.range('iris',	
  'sepallength')	
  
[1]	
  4.3	
  7.9

More Related Content

What's hot

Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)Bopyo Hong
 
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016Comsysto Reply GmbH
 
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Comsysto Reply GmbH
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Stacki: Remove Commands
Stacki: Remove CommandsStacki: Remove Commands
Stacki: Remove CommandsStackIQ
 
Shell Script Disk Usage Report and E-Mail Current Threshold Status
Shell Script  Disk Usage Report and E-Mail Current Threshold StatusShell Script  Disk Usage Report and E-Mail Current Threshold Status
Shell Script Disk Usage Report and E-Mail Current Threshold StatusVCP Muthukrishna
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
San diego hug lens presentation
San diego hug lens presentationSan diego hug lens presentation
San diego hug lens presentationSiva Jayaraman
 
Rman cloning guide
Rman cloning guideRman cloning guide
Rman cloning guideAmit87_dba
 
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀EXEM
 
Introduction to linux day1
Introduction to linux day1Introduction to linux day1
Introduction to linux day1Gourav Varma
 
Backup andrecoverychecklist
Backup andrecoverychecklistBackup andrecoverychecklist
Backup andrecoverychecklistpraveen_01236
 
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀EXEM
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2DataStax
 
Moving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASMMoving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASMMonowar Mukul
 
Create & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptxCreate & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptxvishal choudhary
 
Database decommission process
Database decommission processDatabase decommission process
Database decommission processK Kumar Guduru
 
Development of Fault-Tolerant Failover Tools with MySQL Utilities - MySQL Con...
Development of Fault-Tolerant Failover Tools with MySQL Utilities - MySQL Con...Development of Fault-Tolerant Failover Tools with MySQL Utilities - MySQL Con...
Development of Fault-Tolerant Failover Tools with MySQL Utilities - MySQL Con...Paulo Jesus
 

What's hot (20)

Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
 
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Linux intro 5 extra: awk
Linux intro 5 extra: awkLinux intro 5 extra: awk
Linux intro 5 extra: awk
 
Stacki: Remove Commands
Stacki: Remove CommandsStacki: Remove Commands
Stacki: Remove Commands
 
Shell Script Disk Usage Report and E-Mail Current Threshold Status
Shell Script  Disk Usage Report and E-Mail Current Threshold StatusShell Script  Disk Usage Report and E-Mail Current Threshold Status
Shell Script Disk Usage Report and E-Mail Current Threshold Status
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
San diego hug lens presentation
San diego hug lens presentationSan diego hug lens presentation
San diego hug lens presentation
 
Rman cloning guide
Rman cloning guideRman cloning guide
Rman cloning guide
 
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
 
Introduction to linux day1
Introduction to linux day1Introduction to linux day1
Introduction to linux day1
 
Backup andrecoverychecklist
Backup andrecoverychecklistBackup andrecoverychecklist
Backup andrecoverychecklist
 
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
 
Moving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASMMoving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASM
 
Create & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptxCreate & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptx
 
Linux intro 3 grep + Unix piping
Linux intro 3 grep + Unix pipingLinux intro 3 grep + Unix piping
Linux intro 3 grep + Unix piping
 
Database decommission process
Database decommission processDatabase decommission process
Database decommission process
 
Development of Fault-Tolerant Failover Tools with MySQL Utilities - MySQL Con...
Development of Fault-Tolerant Failover Tools with MySQL Utilities - MySQL Con...Development of Fault-Tolerant Failover Tools with MySQL Utilities - MySQL Con...
Development of Fault-Tolerant Failover Tools with MySQL Utilities - MySQL Con...
 

Viewers also liked

Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceAiden Seonghak Hong
 
R hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveR hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveAiden Seonghak Hong
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수Aiden Seonghak Hong
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수Aiden Seonghak Hong
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정Aiden Seonghak Hong
 
시맨틱 소셜 네트워크 분석 사례 소개
시맨틱 소셜 네트워크 분석 사례 소개시맨틱 소셜 네트워크 분석 사례 소개
시맨틱 소셜 네트워크 분석 사례 소개webscikorea
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache SparkGustavo Arjones
 
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...Ryan Rosario
 
[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우NAVER D2
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksAdrien Blind
 

Viewers also liked (17)

RHive tutorial - Installation
RHive tutorial - InstallationRHive tutorial - Installation
RHive tutorial - Installation
 
RHive tutorial - HDFS functions
RHive tutorial - HDFS functionsRHive tutorial - HDFS functions
RHive tutorial - HDFS functions
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduce
 
R hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveR hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhive
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
 
시맨틱 소셜 네트워크 분석 사례 소개
시맨틱 소셜 네트워크 분석 사례 소개시맨틱 소셜 네트워크 분석 사례 소개
시맨틱 소셜 네트워크 분석 사례 소개
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache Spark
 
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
 
[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined Networks
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 

Similar to RHive tutorials - Basic functions

Ruby on Rails Developer - Allerin
Ruby on Rails Developer - AllerinRuby on Rails Developer - Allerin
Ruby on Rails Developer - AllerinLauree R
 
Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2JollyRogers5
 
RxJava2 Slides
RxJava2 SlidesRxJava2 Slides
RxJava2 SlidesYarikS
 
Boredom comes to_those_who_wait
Boredom comes to_those_who_waitBoredom comes to_those_who_wait
Boredom comes to_those_who_waitRicardo Bánffy
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciencesalexstorer
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 FacebookHive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 FacebookZheng Shao
 
Deferred Processing in Ruby - Philly rb - August 2011
Deferred Processing in Ruby - Philly rb - August 2011Deferred Processing in Ruby - Philly rb - August 2011
Deferred Processing in Ruby - Philly rb - August 2011rob_dimarco
 
Composable and streamable Play apps
Composable and streamable Play appsComposable and streamable Play apps
Composable and streamable Play appsYevgeniy Brikman
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabAbhinav Singh
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...TAISEEREISA
 
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014ChinaNetCloud
 

Similar to RHive tutorials - Basic functions (20)

Ruby on Rails Developer - Allerin
Ruby on Rails Developer - AllerinRuby on Rails Developer - Allerin
Ruby on Rails Developer - Allerin
 
Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2
 
RxJava2 Slides
RxJava2 SlidesRxJava2 Slides
RxJava2 Slides
 
Boredom comes to_those_who_wait
Boredom comes to_those_who_waitBoredom comes to_those_who_wait
Boredom comes to_those_who_wait
 
Rhive 0.0 3
Rhive 0.0 3Rhive 0.0 3
Rhive 0.0 3
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Websocket on Rails
Websocket on RailsWebsocket on Rails
Websocket on Rails
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 FacebookHive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
 
Deferred Processing in Ruby - Philly rb - August 2011
Deferred Processing in Ruby - Philly rb - August 2011Deferred Processing in Ruby - Philly rb - August 2011
Deferred Processing in Ruby - Philly rb - August 2011
 
Composable and streamable Play apps
Composable and streamable Play appsComposable and streamable Play apps
Composable and streamable Play apps
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
 
Hive
HiveHive
Hive
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
 
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
 

More from Aiden Seonghak Hong

RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치Aiden Seonghak Hong
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치Aiden Seonghak Hong
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치Aiden Seonghak Hong
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스Aiden Seonghak Hong
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수Aiden Seonghak Hong
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveAiden Seonghak Hong
 

More from Aiden Seonghak Hong (8)

IoT and Big data with R
IoT and Big data with RIoT and Big data with R
IoT and Big data with R
 
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
 
R hive tutorial 1
R hive tutorial 1R hive tutorial 1
R hive tutorial 1
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing Hive
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

RHive tutorials - Basic functions

  • 1. RHive tutorial - basic functions This tutorial explains how to load RHive library and use basic Functions for RHive. Loading RHive Load RHive with the method used when using any R package. Load RHive like below: library(RHive)   But before loading RHive, you must not forget to configure HADOOP_HOME and HIVE_HOME environment And if they are not set then you can temporarily set them before loading the library, like as follows. HADOOP_HOME is the home directory where Hadoop is installed and HIVE_HOME is the home directory where Hive is installed. Consult RHive tutorial - RHive installation and setting for details on environment variables. Sys.setenv(HIVE_HOME="/service/hive-­‐0.7.1")   Sys.setenv(HADOOP_HOME="/service/hadoop-­‐0.20.203.0")   library(RHive)   rhive.init rhive.init is a procedure that internally initializes and if, before loading RHive, environment variables were calibrated accurately then they will automatically run. But if these environment variable were not configured while RHive was loaded via library(RHIve) then the following error message will result. rhive.connect()   Error  in  .jcall("java/lang/Class",  "Ljava/lang/Class;",   "forName",  cl,    :      No  running  JVM  detected.  Maybe  .jinit()  would  help.   Error  in  .jfindClass(as.character(class))  :      No  running  JVM  detected.  Maybe  .jinit()  would  help.  
  • 2. For this case then designate HADOOP_HOME and HADOOP_HOME as shown below or exit R then configure environment variables and restart R. Sys.setenv(HIVE_HOME="/service/hive-­‐0.7.1")   Sys.setenv(HADOOP_HOME="/service/hadoop-­‐0.20.203.0")   rhive.init()   Or, close  R   export  HIVE_HOME="/service/hive-­‐0.7.1"   export  HADOOP_HOME="/service/hadoop-­‐0.20.203.0"   open  R   rhive.connect All Functions of RHive will only work after having connected to Hive server. If before using other Functions of RHive, you have not established a connection by using the rhive.connect Function, All RHive Functions will malfunction and produce the following errors when running. Error  in  .jcast(hiveclient[[1]],  new.class  =   "org/apache/hadoop/hive/service/HiveClient",    :      cannot  cast  anything  but  Java  objects   Establishing a connection with Hive server to use RHive is simple with the following: rhive.connect()   The example above can additionally assign a few more things. rhiveConnection  <-­‐  rhive.connect("10.1.1.1")   In the case the user’s Hive server is installed to a server other than the one with RHive installed, and has to remotely connect, a connection can be made by handing arguments over to the rhive.connect Function.
  • 3. Then if you have multiple Hadoop and Hive clusters, then after making the right configurations to have RHive activated, and you want to switch between the Hives then just like using DB client such as MySQL, you should make connections and hand it over to the Functions via arguments to explicitly select connection. rhive.query If the user has experience in using Hive, then he/she probably knows that Hive supports SQL syntax to handle the data for Map/Reduce and HDFS. rhive.query gives SQL to Hive and receives results from Hive. Users who know SQL syntax will find this a frequently encountered example. rhive.query("SELECT  *  FROM  usarrests")   If you run the example above then you will see the contents of a table named ‘usarrests’ printed on the screen. Or, on top of printing the returned result on the screen, you can also assign to a data.frame object those results. resultDF  <-­‐  rhive.query("SELECT  *  FROM  usarrests")   A thing to beware of is if the data returned from rhive.query is bigger than the RHive server’s memory or laptop’s, exhaustion of available memory will induce an error message. That is why you must not receive and put into object any data of such size. It is better to first create a temporary table and then put the results of the SQL to the temporary table. You can do it as the following. rhive.query("   CREATE  TABLE  new_usarrests  (      rowname        string,      murder        double,      assault              int,      urbanpop              int,      rape        double   )")  
  • 4.     rhive.query("INSERT  OVERWRITE  TABLE  new_usarrests  SELECT  *   FROM  usarrest")   Consult a Hive document for a detailed account of how to use Hive SQL. rhive.close If you have finished using Hive and do not wish to use RHive Functions any longer, you can use the rhive.close Function to terminate the connection. rhive.close()   Alternatively, you can assign a specific connection to close it. conn  <-­‐  rhive.connect()   rhive.close(conn)   rhive.list.tables The rhive.list.tables Function returns the results of tables in Hive. rhive.list.tables()                tab_name   1                  aids2   2  new_usarrests   3          usarrests   This is effectively identical to this: rhive.query("SHOW  TABLES")   rhive.desc.table The rhive.desc.table Function shows the description of the chosen table. rhive.desc.table("usarrests")  
  • 5.    col_name  data_type  comment   1    rowname        string   2      murder        double   3    assault              int   4  urbanpop              int   5          rape        double   This is effectively identical to this: rhive.query("DESC  usarrests")   rhive.load.table The rhive.load.table Function loads Hive tables’ contents as R’s data.frame object. df1  <-­‐  rhive.load.table("usarrests")   df1   This is effectively identical to this: df1  <-­‐  rhive.query("SELECT  *  FROM  usarrests")   df1   rhive.write.table The rhive.write.table Function is the antithesis of rhive.load.table. But it is more useful than rhive.load.table. If you wish to add data to a table located in Hive, you must first make a table. But using rhive.write.table does not require any additional work, and simply creates R’s dataframe into Hive and inserts all data. head(UScrime)          M  So    Ed  Po1  Po2    LF    M.F  Pop    NW    U1  U2  GDP   Ineq          Prob        Time        y   1  151    1    91    58    56  510    950    33  301  108  41  394    261  0.084602   26.2011    791  
  • 6. 2  143    0  113  103    95  583  1012    13  102    96  36  557    194  0.029599   25.2999  1635   3  142    1    89    45    44  533    969    18  219    94  33  318    250  0.083401   24.3006    578   4  136    0  121  149  141  577    994  157    80  102  39  673    167  0.015801   29.9012  1969   5  141    0  121  109  101  591    985    18    30    91  20  578    174  0.041399   21.2998  1234   6  121    0  110  118  115  547    964    25    44    84  29  689    126  0.034201   20.9995    682       rhive.write.table(UScrime)   [1]  "UScrime"       rhive.list.tables()                tab_name   1                  aids2   2  new_usarrests   3          usarrests   4              uscrime       rhive.query("SELECT  *  FROM  uscrime  LIMIT  10")        rowname      m  so    ed  po1  po2    lf      mf  pop    nw    u1  u2  gdp   ineq          prob        time   1                1  151    1    91    58    56  510    950    33  301  108  41  394    261   0.084602  26.2011   2                2  143    0  113  103    95  583  1012    13  102    96  36  557    194   0.029599  25.2999   3                3  142    1    89    45    44  533    969    18  219    94  33  318    250   0.083401  24.3006   4                4  136    0  121  149  141  577    994  157    80  102  39  673    167   0.015801  29.9012   5                5  141    0  121  109  101  591    985    18    30    91  20  578    174   0.041399  21.2998   6                6  121    0  110  118  115  547    964    25    44    84  29  689    126  
  • 7. 0.034201  20.9995   7                7  127    1  111    82    79  519    982      4  139    97  38  620    168   0.042100  20.6993   8                8  131    1  109  115  109  542    969    50  179    79  35  472    206   0.040099  24.5988   9                9  157    1    90    65    62  553    955    39  286    81  28  421    239   0.071697  29.4001   10            10  140    0  118    71    68  632  1029      7    15  100  24  526    174   0.044498  19.5994              y   1      791   2    1635   3      578   4    1969   5    1234   6      682   7      963   8    1555   9      856   10    705   The rhive.write.table Function encounters an error and does not work if the table to be saved into Hive already exists. Hence, if attempting to save to Hive any dataframes with the same name and symbol as any table already in Hive, it is imperative that you delete them before using rhive.write.table. if  (rhive.exist.table("uscrime"))  {      rhive.query("DROP  TABLE  uscrime")   }       rhive.write.table(UScrime)  
  • 8. RHive - alias functions RHive’s Functions look similar to S3 generic’s naming rules but many are actually not generic. This is for the S3 generic Functions which RHive may or may not support in the future. For users who detest confusion wrought by Functions that, despite containing “.” yet still do not count as generic, there exist some Functions with different names but serve the same roles. The following alias Functions are such as described below. hiveConnect This is same as rhive.connect. hiveQuery This is same as rhive.query. hiveClose This is same as hive.close. hiveListTables This is same as hive.list.tables. hiveDescTable This is same as hive.desc.table. hiveLoadTable This is same as hive.load.table.
  • 9. rhive.basic.cut rhive.basic.cut converts one numerical column from a table to one factorized column. First, the range of the numerical column is divided into intervals, and the values in the numerical column are factorized according to which interval they fall. Rhive.basic.cut receives the following six arguments, tablename(a table name), col(a numerical column name), breaks, right, summary, and forcedRef. breaks are numerical cut points for the numerical column. right indicates if the ends of the intervals are open or closed. If TRUE, the intervals are closed on the right and open on the left. If not, vice versa. summary = TRUE spits out total counts of numerical values corresponding to the intervals. If FALSE, the name of a new table updated by the factorized table is returned. forcedRef = TRUE forces rhive.basic.cut to return a table name instead of a data frame for forcedRef = FALSE. The defaults of right, summary, and forcedRef are TRUE, FALSE, and TRUE respectively. Example for summary = FALSE >  table_name  =  rhive.basic.cut(tablename  =  "iris",  col  =   "sepallength",  breaks  =  seq(0,  5,  0.5),  right  =  FALSE,  summary   =  FALSE,  forcedRef  =  TRUE)   >  table_name   [1]  "rhive_result_1330382904"   attr(,"result:size")   [1]  4296   >  results  =  rhive.query("select  *  from   rhive_result_1330382904")   >  head(results)      rowname  sepalwidth  petallength  petalwidth  species  sepallength   1              1                3.5                  1.4                0.2    setosa                NULL   2              2                3.0                  1.4                0.2    setosa      [4.5,5.0)   3              3                3.2                  1.3                0.2    setosa      [4.5,5.0)   4              4                3.1                  1.5                0.2    setosa      [4.5,5.0)   5              5                3.6                  1.4                0.2    setosa                NULL   6              6                3.9                  1.7                0.4    setosa                NULL   Example for summary = TRUE
  • 10. >  summary  =  rhive.basic.cut(tablename  =  "iris",  col  =   "sepallength",  breaks  =  seq(0,  5,  0.5),  right  =  FALSE,  summary   =  TRUE,  forcedRef  =  TRUE)   >  summary            NULL  [4.0,4.5)  [4.5,5.0)              128                  4                18   rhive.basic.cut2 rhive.basic.cut2 converts two numerical columns from a table to two factorized columns. That is, the range of each numerical column is divided into intervals, and the values in each numerical column are factorized according to which interval they fall. Rhive.basic.cut2 receives the following eight arguments, tablename(a table name), col1, col2(two column names), breaks1, breaks2, right, keepCol, and forcedRef. breaks1 and breaks2 are numerical cut points for the two numerical columns. right indicates if the ends of the intervals are open or closed. If TRUE, the intervals are closed on the right and open on the left. If not, vice versa. keepCol = TRUE makes the two numerical columns kept even after the conversion. Otherwise, the factorized columns replace the original numerical columns. forcedRef = TRUE forces rhive.basic.cut to return a table name instead of a data frame for forcedRef = FALSE. The defaults of right, summary, and forcedRef are TRUE, FALSE, and TRUE respectively. Example for right = TRUE and keepCol = FALSE > table_name = rhive.basic.cut2(tablename = "iris", col1 = "sepallength", col2 = "petallength", breaks1 = seq(0, 5, 0.5), breaks2 = seq(0, 5, 0.5), right = TRUE, keepCol = FALSE, forcedRef = TRUE) > table_name [1] "rhive_result_1330385833" attr(,"result:size") [1] 5272 > results = rhive.query("select * from rhive_result_1330385833") > head(results)
  • 11. rowname sepalwidth petalwidth species sepallength petallength rep 1 1 3.5 0.2 setosa NULL (1.0,1.5] 1 2 2 3.0 0.2 setosa (4.5,5.0] (1.0,1.5] 1 3              3                3.2                0.2    setosa      (4.5,5.0]      (1.0,1.5]      1   4              4                3.1                0.2    setosa      (4.5,5.0]      (1.0,1.5]      1   5              5                3.6                0.2    setosa      (4.5,5.0]      (1.0,1.5]      1   6              6                3.9                0.4    setosa                NULL      (1.5,2.0]      1   Example for right = FALSE and keepCol = TRUE >  table_name  =  rhive.basic.cut2(tablename  =  "iris",  col1  =   "sepallength",  col2  =  "petallength",  breaks1  =  seq(0,  5,  0.5),   breaks2  =  seq(0,  5,  0.5),  right  =  FALSE,  keepCol  =  TRUE,   forcedRef  =  TRUE)   >  table_name   [1]  "rhive_result_1330315663"   attr(,"result:size")   [1]  6374   >  results  =  rhive.query("select  *  from   rhive_result_1330315663")   >  head(results)      rowname  sepalwidth  petalwidth  species  sepallength   sepallength_cut  petallength  petallength_cut  rep   1              1                3.5                0.2    setosa                  5.1                        N ULL                  1.4              [1.0,1.5)      1   2              2                3.0                0.2    setosa                  4.9              [4.5,5 .0)                  1.4              [1.0,1.5)      1   3              3                3.2                0.2    setosa                  4.7              [4.5,5 .0)                  1.3              [1.0,1.5)      1   4              4                3.1                0.2    setosa                  4.6              [4.5,5 .0)                  1.5              [1.5,2.0)      1  
  • 12. 5              5                3.6                0.2    setosa                  5.0                        N ULL                  1.4              [1.0,1.5)      1   rhive.basic.xtabs rhive.basic.xtabs makes a contingency table from cross-classifying factors. A formula object and a table name are used as input arguments and a contingency table with matrix format is returned based on the given formula. For instance, two column names, agegp and alcgp from a table are cross- classifying factors in this formula, "ncontrols ~ agegp + alcgp". Also, observations for each combination of the cross-classifying factors are summed up through another column name, ncontrols. Example for esoph data >  xtab_formula    =  as.formula(paste("ncontrols","~",  "agegp",   "+","alcgp",sep  =""))   >  xtab_formula   ncontrols  ~  agegp  +  alcgp   >  table_result  =  rhive.basic.xtabs(formula  =  xtab_formula,   tablename  =  "esoph")   >  head(table_result)                alcgp   agegp      0-­‐39g/day  120+  40-­‐79  80-­‐119      25-­‐34                61        5        45            5      35-­‐44                89      10        80          20      45-­‐54                78      15        81          39      55-­‐64                89      26        84          43      65-­‐74                71        8        53          29      75+                    27        3        12            2   rhive.basic.t.test The rhive.basic.t.test Function runs Welch's t-test on two samples. In this case the two sample's mean difference is tested while holding the alternative hypothesis, "two sample's mean difference is not 0." Thus, two-side test is performed.
  • 13. The following is an example of test the mean difference between the irises' sepal widths and petal widths. Pay attention to how the Functions that used the "sepallength" and "petallength" variables were called. >  rhive.basic.t.test("iris",  "sepallength",  "iris",   "petallength")   [1]  "t  =  13.1422338118038,  df  =  211.542688378717,  p-­‐value  =   0,  mean  of  x  :  5.84333333333333,  mean  of  y  :  3.758"   $statistic                t     13.14223     $parameter              df     211.5427     $p.value   [1]  0   $estimate   $estimate[[1]]   mean  of  x      5.843333     $estimate[[2]]   mean  of  y            3.758     >   Interpreting the results gives you a p-value of 0, thus revealing a difference between the means of petal width and sepal width. The resulting statistics are converted as an R list Object, and the string made from amassed statistics is printed onto console. Iris data is 150 observation cases provided by R. Using this data for R's t.test results in a slightly off t-statistic of 13.0984. This is due to the variance used by t.test Function to find t-statistic is sample variance, while rhive.basic.t.test Function uses population variance. Like the example scenario, in the case of little data, t-statistic deviance may exist but the larger the data gets the deviance dwindles. With rhive.basic.t.test being a Function made for massive data analysis in mind, population variance is used for speedy calculations.
  • 14. rhive.block.sample The percent argument is an optional argument that sets the percentage of data to extract from the total data. It has a default value of 0.01, which means it extracts 0.01% of the total data. But this percent argument's value is not the ratio of the actually sampled data count to the total data count but more akin to the ratio of Blocks to the total Blocks. Thus, rhive.block.sample Function takes Samples by the Block. Thus the entire data may be returned when using the rhive.block.sample Function on Hive Tables of small data size. This occurs when the data is smaller than the Block size set in Hive. The seed variable is for specifying the Random Seed used when executing Block Sampling in Hive. Should the Random Seeds be identical, Hive's Block Sampling returns the same results. Thus in order to guarantee Random Samples for every sampling, it is best to assign a value for the seed variable in rhive.block.sample, by using the Sample Function of R. The subset variable is an optional variable that can specify the condition for the data to be extracted from the Table targeted by Hive, when returning Sample Block. This argument uses the character type and corresponds to the 'where' clause in Hive HQL. Thus it must use syntax appropriate for HQL's where clause. rhive.block.sample Function's return values are the character values of the name of the Hive Table that contain Sample Block results. That is, the rhive.block.sample Function uses Sample Block to automatically create a temporary Hive Table and return that Table's name. The following example involves sampling data worth 0.01% of the Hive Table called listvirtualmachines. This example used R's sample Function for the Random Seed to be used during Block Sampling of Hive. seedNumber  <-­‐  sample(1:2^16,  1)       rhive.block.sample("listvirtualmachines",  seed=seedNumber  )       [1]  "rhive_sblk_1330404552"
  • 15. As per this example, a Hive Table of the name "rhive_sblk_1330404552" bearing 0.01% worth of data from the Hive Table, "listvirtualmachines", has been created. rhive.basic.scale The rhive.basic.scale function converts numerical data with 0 average and 1 deviation. Input table name for the first argument, and the output column name for the second. In the returned list, there is added a "scaled_column name" column saved as a string. This is also approachable/editable in RHive, along with/just like other Hive tables. scaled  <-­‐  rhive.basic.scale("iris",  "sepallength")   attr(scaled,  "scaled:center")   #  [1]  5.843333   attr(scaled,  "scaled:scale")   #  [1]  0.8253013   >  rhive.desc.table(scaled[[1]])   col_name  data_type  comment   #  1                        rowname        string   #  2                  sepalwidth        double   #  3                petallength        double   #  4                  petalwidth        double   #  5                        species        string   #  6                sepallength        double   #  7  sacled_sepallength        double   rhive.basic.by The rhive.basic.by Function consists of code that runs group by for a specified/particular column. Thus the code below excecutes/applies group by for "species" column, and returns the result of applying the sum Function on
  • 16. "sepallength". In the results you will find the sum of each species and sepallength. rhive.basic.by("iris",  "species",  "sum","sepallength")   #  species      sum   #  1          setosa  250.3   #  2  versicolor  296.8   #  3    virginica  329.4   rhive.basic.merge rhive.basic.merge makes new data set from merging two tables, based on their common rows. #  checking  data    rhive.query('select  *  from  iris  limit  5')      rowname  sepallength  sepalwidth  petallength  petalwidth  species   1              1                  5.1                3.5                  1.4                0.2    setosa   2              2                  4.9                3.0                  1.4                0.2    setosa   3              3                  4.7                3.2                  1.3                0.2    setosa   4              4                  4.6                3.1                  1.5                0.2    setosa   5              5                  5.0                3.6                  1.4                0.2    setosa        rhive.query('select  *  from  usarrests  limit  5')            rowname  murder  assault  urbanpop  rape   1        Alabama      13.2          236              58  21.2   2          Alaska      10.0          263              48  44.5   3        Arizona        8.1          294              80  31.0   4      Arkansas        8.8          190              50  19.5   5  California        9.0          276              91  40.6       ##rhive.basic.merge    rhive.basic.merge('iris','usarrests',by.x='sepallength',by.y='
  • 17. murder')        sepallength  sepalwidth  petallength  petalwidth        species   assault  urbanpop  rape  rowname   1                    4.3                3.0                  1.1                0.1          setosa           102              62  16.5            14   2                    4.4                2.9                  1.4                0.2          setosa           149              85  16.3              9   3                    4.4                3.0                  1.3                0.2          setosa           149              85  16.3            39   4                    4.4                3.2                  1.3                0.2          setosa           149              85  16.3            43   5                    4.9                3.1                  1.5                0.1          setosa           159              67  29.3            10   Merge is similar with ‘join’ in SQL. Followings are same with that. #  Use  join  to  extract  and  print  the  names  of  all  rows  not  found   to  be  common  after  merging.   #  Should  row  names  overlap,  only  print  out  the  name  of  the   former  row.     rhive.big.query('select   a.sepallength,a.sepalwidth,a.petallength,a.petalwidth,a.species ,b.assault,b.urbanpop,b.rape,a.rowname  from  iris  a  join   usarrests  b  on  a.sepallength  =  b.murder')        sepallength  sepalwidth  petallength  petalwidth        species   assault  urbanpop  rape  rowname   1                    4.3                3.0                  1.1                0.1          setosa           102              62  16.5            14   2                    4.4                2.9                  1.4                0.2          setosa           149              85  16.3              9   3                    4.4                3.0                  1.3                0.2          setosa           149              85  16.3            39   4                    4.4                3.2                  1.3                0.2          setosa           149              85  16.3            43   5                    4.9                3.1                  1.5                0.1          setosa           159              67  29.3            10  
  • 18. rhive.basic.mode rhive.basic.mode returns the mode and its frequency within a specified row of the Hive table. rhive.basic.mode('iris',  'sepallength')      sepallength  freq   1                      5      10   rhive.basic.range rhive.basic.range returns the greatest and lowest values within the specified numerical row of the Hive table. rhive.basic.range('iris',  'sepallength')   [1]  4.3  7.9