Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization, query, and analysis.
For More Information visit us:
http://www.kalyanhadooptraining.com
2. Database Vs Data Warehouse
Historical Data is the back bone of any business for
mission critical business decisions. Data is stored in some
form of tables in the database.So why the Business
Intelligence systems are using Data Warehouse rather than
Database to pull historical data?
What is the difference between Database and DataWarehouse while
both of them have some tables with Data, Index and key etc.?
Here are Differences….
http://www.orienit.com/
3. DATA BASE
• Used for Online Transactional Processing (OLTP). This
records the data from the user for history.
• The tables and joins are complex since they are
normalized. This is done to reduce redundant data and
to save storage space.
• Entity – Relational modeling techniques are used for
database design.
• Optimized for write operation.
• Performance is low for analysis queries.
http://www.orienit.com/
4. DATA WAREHOUSE
• Used for Online Analytical Processing (OLAP). This reads
the historical data for the Users for business decisions.
• The Tables and joins are simple since they are de-
normalized. This is done to reduce the response time for
analytical queries.
• Data – Modeling techniques are used for the Data Warehouse
design.
• Optimized for read operations.
• High performance for analytical queries.
http://www.orienit.com/
5. General Data Flow :
(Ex: Online Insurance Registration)
• Customer enters the details in the Online Registration
form.
• The details are saved into the Database when the
customer presses the Submit button in the form.
• Business Intelligence Team of the Insurance
Company uses an ETL tool to pull the data from
the Database tables to the Data Warehouse tables.
• Business Management uses Business Reporting
Tools to pull Data from Data Warehouse tables for
generating business reports.
http://www.orienit.com/
6. STRING FUNCTIONS IN HIVE
The string functions in Hive are listed below:
1.ASCII( string str ):
• The ASCII function converts the first character of the string into its
numeric ascii value.
Example1: ASCII('hadoop') returns 104
Example2: ASCII('A') returns 65
2.CONCAT( string str1, string str2... )
• The CONCAT function concatenates all the stings.
Example: CONCAT('hadoop','-','hive') returns 'hadoop-hive'
http://www.orienit.com/
7. 3. CONCAT_WS( string delimiter, string str1, string str2... )
• The CONCAT_WS function is similar to the CONCAT function.
Here you can also provide the delimiter, which can be used in between
the strings to concat.
Example: CONCAT_WS('-','hadoop','hive') returns 'hadoop-hive'
4. FIND_IN_SET( string search string, string source_string_list )
• The FIND_IN_SET function searches for the search string in the
source_string_list and returns the position of the first occurrence in the
source string list. Here the source string list should be comma delimited
one. It returns 0 if the first argument contains comma.
Example: FIND_IN_SET('ha','hao,mn,hc,ha,hef') returns 4
http://www.orienit.com/
8. 5. LENGTH( string str ) :
• The LENGTH function returns the number of characters in a
string.
Example: LENGTH('hive') returns 4
6. LOWER( string str ), LCASE( string str )
• The LOWER or LCASE function converts the string into lower case letters
Example: LOWER('HiVe') returns 'hive'
7. LPAD( string str, int len, string pad )
• The LPAD function returns the string with a length of len characters
left-padded with pad.
Example: LPAD('hive',6,'v') returns 'vvhive'
http://www.orienit.com/
9. 8. LTRIM( string str ) :
• The LTRIM function removes all the trailing spaces from the string.
9. REPEAT( string str, int n ) :
10.RPAD( string str, int len, string pad ) :
• The RPAD function returns the string with a length of len characters right-
padded with pad.
• The REPEAT function repeats the specified string n times.
http://www.orienit.com/
Example: LTRIM(' hive') returns 'hive'
Example: REPEAT('hive',2) returns 'hivehive'
Example: RPAD('hive',6,'v') returns 'hivevv'
10. 11. REVERSE( string str ) :
• The REVERSE function gives the reversed string.
12. RTRIM( string str ) :
• The RTRIM function removes all the leading spaces from the string.
13. SPACE( int number_of_spaces ) :
• The SPACE function returns the specified number of spaces.
14. SPLIT( string str, string pat ) :
• The SPLIT function splits the string around the pattern pat and returns an
array of strings. You can specify regular expressions as patterns.
http://www.orienit.com/
Example: REVERSE('hive') returns 'evih'
Example: LTRIM('hive ') returns 'hive'
Example: SPACE(4) returns ' '
Example: SPLIT('hive:hadoop',':') returns
["hive","hadoop"]
11. 15. SUBSTR( string source_str, int start_position [,int
length] ), SUBSTRING( string source_str, int start_position
[,int length] ) :
• The SUBSTR or SUBSTRING function returns a part of the source string from
the start position with the specified length of characters. If the length is not given,
then it returns from the start position to the end of the string.
16.TRIM( string str ) :
• The TRIM function removes both the trailing and leading spaces from
the string.
Example: LTRIM(' hive ') returns 'hive'
http://www.orienit.com/
Example1: SUBSTR('hadoop',4) returns 'oop'
Example2: SUBSTR('hadoop',4,2) returns 'oo'
12. 17.UPPER( string str ), UCASE( string str ) :
• The UPPER or LCASE function converts the string into upper case
letters.
For More Information :
http://www.kalyanhadooptraining.com/
http://www.orienit.com/
Example: UPPER('HiVe') returns 'HIVE'