Hive Poster

897 views

Published on

Published in: Technology, Lifestyle
1 Comment
0 Likes
Statistics
Notes
  • Could you make this downloadable too ragho?

    Thanks
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
897
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Hive Poster

  1. 1. Hive<br />A Warehousing Solution <br />Over a Map-Reduce Framework <br />Map-Reduce<br />SQL vsHadoop<br />System Architecture<br />hive&gt; SELECT key, COUNT(1) FROM kv1 WHERE key &gt; 100 GROUP BY key;<br />vs.<br />$ cat &gt; /tmp/reducer.sh<br />uniq -c | awk &apos;{print $2&quot; &quot;$1}‘<br />$ cat &gt; /tmp/map.sh<br />awk -F &apos;01&apos; &apos;{if($1 &gt; 100) print $1}‘<br />$ bin/hadoop jar contrib/hadoop-0.19.2-dev-streaming.jar <br /> -input /user/hive/warehouse/kv1 <br /> -file /tmp/map.sh -mappermap.sh<br /> -file /tmp/reducer.sh -reducer reducer.sh<br /> -output /tmp/largekey -numReduceTasks 1 <br />$ bin/hadoopdfs –cat /tmp/largekey/part*<br />Data Model<br />Queries & Map-Reduce Plans<br />Query With Simple Filter<br />Query With Multi-Table Insert<br />SELECT count(1) <br />FROM status_updates<br />WHERE status LIKE ‘michaeljackson’<br />FROM (SELECT a.status, b.school, b.gender<br /> FROM status_updates a JOIN profiles b<br /> ON (a.userid = b.userid<br /> AND a.ds=&apos;2009-03-20&apos; )) subq1<br />INSERT OVERWRITE TABLE gender_summaryPARTITION(ds=&apos;2009-03-20&apos;)<br />SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender<br />INSERT OVERWRITE TABLE school_summaryPARTITION(ds=&apos;2009-03-20&apos;)<br />SELECT subq1.school, COUNT(1) GROUP BY subq1.school<br /><ul><li>Primitive Types
  2. 2. integer types, float, string, date, boolean
  3. 3. Nest-able Collections
  4. 4. array<any-type>
  5. 5. map<primitive-type, any-type>
  6. 6. User-defined types
  7. 7. structures with attributes which can be of any-type</li></ul>Extensibility<br /><ul><li>TYPES
  8. 8. QUERY LANGUAGE
  9. 9. User defined transformations and aggregations
  10. 10. Custom map/reduce scripts
  11. 11. ON-DISK DATA FORMAT
  12. 12. SERIALIZATION/DESERIALIZATION FORMAT
  13. 13. IN-MEMORY REPRESENTATION OF TYPES</li></ul>Performance<br />Query with Custom Map-Reduce Scripts<br />SELECT TRANSFORM(subq2.school, subq2.meme, subq2.cnt)<br /> USING `top10.py&apos; AS (school,meme,cnt)<br />FROM (SELECT subq1.school, subq1.meme, COUNT(1) AS cnt<br /> FROM (SELECT TRANSFORM(b.school, a.status)<br /> USING `meme-extractor.py&apos; AS (school,meme) <br /> FROM status_updates a JOIN profiles b<br /> ON (a.userid = b.userid)<br /> ) subq1<br /> GROUP BY subq1.school, subq1.meme<br /> DISTRIBUTE BY school, meme<br /> SORT BY school, meme, cntdesc<br />) subq2;<br />

×