Hive Poster

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

  • + kbarcla kbarcla 1 month ago
    Could you make this downloadable too ragho?

    Thanks
Post a comment
Embed Video
Edit your comment Cancel

Favorites, Groups & Events

Hive Poster - Presentation Transcript

  1. Hive
    A Warehousing Solution
    Over a Map-Reduce Framework
    Map-Reduce
    SQL vsHadoop
    System Architecture
    hive> SELECT key, COUNT(1) FROM kv1 WHERE key > 100 GROUP BY key;
    vs.
    $ cat > /tmp/reducer.sh
    uniq -c | awk '{print $2" "$1}‘
    $ cat > /tmp/map.sh
    awk -F '01' '{if($1 > 100) print $1}‘
    $ bin/hadoop jar contrib/hadoop-0.19.2-dev-streaming.jar
    -input /user/hive/warehouse/kv1
    -file /tmp/map.sh -mappermap.sh
    -file /tmp/reducer.sh -reducer reducer.sh
    -output /tmp/largekey -numReduceTasks 1
    $ bin/hadoopdfs –cat /tmp/largekey/part*
    Data Model
    Queries & Map-Reduce Plans
    Query With Simple Filter
    Query With Multi-Table Insert
    SELECT count(1)
    FROM status_updates
    WHERE status LIKE ‘michaeljackson’
    FROM (SELECT a.status, b.school, b.gender
    FROM status_updates a JOIN profiles b
    ON (a.userid = b.userid
    AND a.ds='2009-03-20' )) subq1
    INSERT OVERWRITE TABLE gender_summaryPARTITION(ds='2009-03-20')
    SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender
    INSERT OVERWRITE TABLE school_summaryPARTITION(ds='2009-03-20')
    SELECT subq1.school, COUNT(1) GROUP BY subq1.school
    • Primitive Types
    • integer types, float, string, date, boolean
    • Nest-able Collections
    • array
    • map
    • User-defined types
    • structures with attributes which can be of any-type
    Extensibility
    • TYPES
    • QUERY LANGUAGE
    • User defined transformations and aggregations
    • Custom map/reduce scripts
    • ON-DISK DATA FORMAT
    • SERIALIZATION/DESERIALIZATION FORMAT
    • IN-MEMORY REPRESENTATION OF TYPES
    Performance
    Query with Custom Map-Reduce Scripts
    SELECT TRANSFORM(subq2.school, subq2.meme, subq2.cnt)
    USING `top10.py' AS (school,meme,cnt)
    FROM (SELECT subq1.school, subq1.meme, COUNT(1) AS cnt
    FROM (SELECT TRANSFORM(b.school, a.status)
    USING `meme-extractor.py' AS (school,meme)
    FROM status_updates a JOIN profiles b
    ON (a.userid = b.userid)
    ) subq1
    GROUP BY subq1.school, subq1.meme
    DISTRIBUTE BY school, meme
    SORT BY school, meme, cntdesc
    ) subq2;

+ raghoragho, 2 months ago

custom

76 views, 0 favs, 0 embeds more stats

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 76
    • 76 on SlideShare
    • 0 from embeds
  • Comments 1
  • Favorites 0
  • Downloads 0
Most viewed embeds

more

All embeds

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories