Your SlideShare is downloading. ×
0
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hive: Data Warehousing for Hadoop

1,565

Published on

Ben Lever, NICTA …

Ben Lever, NICTA
Meetup #2, 27 Mar 2012 - http://sydney.bigdataaustralia.com.au/events/53934632/

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,565
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • # of users = 943# of movies = 1682# of ratings = 100,000
  • ShellDriverCompilerExecution engineMetastore
  • Transcript

    • 1. Hive: Data Warehousing for Hadoop Ben Lever @bmlever Big Data Analytics Meetup 27 March 2012
    • 2. Another Data Warehousing System?• Problem: – Lots of data• Partial solution: – Hadoop• Another problem: – MapReduce can be hard – Schema information embedded in program – a lot of data is still structured
    • 3. Solution: Hive• A system for querying and managing structured data within Hadoop – MapReduce for execution – HDFS for storage• Designed for end-users that know more SQL than Java• Apache v2• hive.apache.org
    • 4. Working example: MovieLens• Movie ratings• 3 “tables”: Users Movies Ratings id id user id age title movie id gender release date rating (1 – 5) occupation action timestamp zip code adventure romance ... www.grouplens.org
    • 5. Demo
    • 6. So far• Hive shell• Creating and loading tables• Data model: – INT, BIGINT, TINYINT, STRING, etc – Also: FLOAT, DOUBLE, ARRAY, MAP, STRUCT• Simple queries with filtering• Table data is immutable• Schema on readvsschema on write
    • 7. Hive components TABLE customer ( customer_id BIGINT, Metastore gender STRING, ... schema info launch MapReduce Driver MapReduc e job Hive query HDFS (SQL-like) raw source data (compressed)SELECT *FROM customers CLIWHERE gender = ‘M’;
    • 8. Metastore Hadoop – The Definitive Guide
    • 9. Other SQL-like features• Aggregation – COUNT, AVG• JOIN• GROUP BY• SORT BY• Sub queries
    • 10. Demo
    • 11. Built in functions• Text mining: – ngrams() – context_ngrams() – sentences()• Statistics + mathematics: – stddev() – histogram_numeric() – log – radians
    • 12. User Defined Functions• Written in Java• User Defined Functions (UDFs): – Single row  Single row – e.g. mathematical and string functions• User Defined Aggregate Functions (UDAFs): – Multiple rows  Single row – e.g. AVG• User Defined Table Functions (UDTFs): – Single row  Multiple rows – e.g. “explode”
    • 13. Hive Clients Hadoop – The Definitive Guide
    • 14. Hive ServerJDBC ODBC
    • 15. Sqoop Move data between Hadoop and relational databasesRDBMS Sqoop Hadoop Hive Metastore schema http://incubator.apache.org/projects/sqoop.html
    • 16. Sqoop adapters
    • 17. Conclusion• Scales to handle much more data than traditional systems: – Leverages Hadoop HDFS and MapReduce – Relational/structured data – Schema on read vs schema on write• Supports rapid iteration of ad-hoc queries – SQL-like querying language – Complex queries (joins, etc) with minimal code• Is not a database replacement: – Treats data as immutable – No indexing

    ×