SlideShare a Scribd company logo
HiveHive 介绍介绍
周海汉周海汉 2013.4.182013.4.18
目录目录
• HiveHive 介简介简
• HiveHive 特性特性
• HiveQLHiveQL
• UDFUDF
• 小技巧小技巧
• 讨论讨论
HiveHive 介简介简
• 官网官网
http://hive.apache.org/http://hive.apache.org/
• 最新版本最新版本 0.100.10
• facebookfacebook 献贡 给献贡 给 apacheapache
HiveHive 模式模式
• Metadb : embedded Derby database,mysql,otherMetadb : embedded Derby database,mysql,other
• local mode: Derbylocal mode: Derby ,, one userone user ,, one jobone job
• distribute mode: mysqldistribute mode: mysql ,, multi usermulti user
支持支持 HadoopHadoop 版本版本
• hadoop 0.20~hadoop 0.20~
• hadoop 0.23~hadoop 0.23~
HiveHive 特性特性
HiveHive 特性特性
• 数据仓库数据仓库
• HiveQLHiveQL
• HDFS & HBaseHDFS & HBase
• ^A^A 分隔的行分隔的行
HiveHive 特性特性
HiveQLHiveQL
HiveQL - SQLHiveQL - SQL 部分子集部分子集
• No Update or Delete statements.No Update or Delete statements.
• each query tables only from one databaseeach query tables only from one database
• not support IN/EXISTS, Having clausenot support IN/EXISTS, Having clause
• ......
HiveQL -HiveQL - 超出超出 SQLSQL 部分部分
• 数据复杂 结构数据复杂 结构
• structstruct
• arrayarray
• mapmap
• ......
HiveQL -HiveQL - 自 的部分函数带自 的部分函数带
• :统计:统计
– sum,count,avg,min,maxsum,count,avg,min,max
– 体 准差函数总 标体 准差函数总 标 : stddev_pop: stddev_pop
– 本 准差函数样 标本 准差函数样 标 : stddev_samp: stddev_samp
– 中位数函数中位数函数 : percentile: percentile
– 直方图直方图 : histogram_numeric: histogram_numeric
• 条件:条件:
– ifif
– casecase
HiveQL -HiveQL - 自 的部分函数带自 的部分函数带
• 时间时间 year date unix_timestamp ...year date unix_timestamp ...
• 逻辑逻辑 and or notand or not
• 算符运算符运 +-*/ % | & ^ ~+-*/ % | & ^ ~
• 数学数学 round floor ceil rand exp log log2 pow sqrt hexround floor ceil rand exp log log2 pow sqrt hex
sin ...sin ...
• 字符串 理处字符串 理处 trim substr length split get_json_objecttrim substr length split get_json_object
parse_url regexp_replace regexp_extractparse_url regexp_replace regexp_extract
HiveQLHiveQL 示例示例 -- 建创建创 HDFSHDFS 文本外表文本外表
• CREATE EXTERNAL TABLE login(CREATE EXTERNAL TABLE login(
• ldate string,ldate string,
• userid int,userid int,
• proid int,proid int,
• imei string,imei string,
• sysver string)sysver string)
• ROW FORMAT DELIMITED FIELDSROW FORMAT DELIMITED FIELDS
TERMINATED BY ' 'TERMINATED BY ' '
• LOCATIONLOCATION
• 'hdfs://h46:9000/flume/loginlog''hdfs://h46:9000/flume/loginlog'
HiveQLHiveQL 示例示例 -HBase-HBase 外表外表
• CREATE EXTERNAL TABLE lordstat_pid(CREATE EXTERNAL TABLE lordstat_pid(
• key string COMMENT 'from deserializer',key string COMMENT 'from deserializer',
• total int COMMENT 'from deserializer',total int COMMENT 'from deserializer',
• win int COMMENT 'from deserializer',win int COMMENT 'from deserializer',
• spring int COMMENT 'from deserializer')spring int COMMENT 'from deserializer')
• ROW FORMAT SERDEROW FORMAT SERDE
• 'org.apache.hadoop.hive.hbase.HBaseSerDe''org.apache.hadoop.hive.hbase.HBaseSerDe'
• STORED BYSTORED BY
• 'org.apache.hadoop.hive.hbase.HBaseStorageHandler''org.apache.hadoop.hive.hbase.HBaseStorageHandler'
• WITH SERDEPROPERTIES (WITH SERDEPROPERTIES (
• 'serialization.format'='1','serialization.format'='1',
• 'hbase.columns.mapping'=':key,i:t,i:win,i:spr')'hbase.columns.mapping'=':key,i:t,i:win,i:spr')
• TBLPROPERTIES (TBLPROPERTIES (
• 'hbase.table.name'='lordstat_pid');'hbase.table.name'='lordstat_pid');
HiveQLHiveQL 示例示例 -- 分区分区
• hive> create external table glog1(ldate string,ltime string ,threadidhive> create external table glog1(ldate string,ltime string ,threadid
string,userid int) partitioned by (pdate string) ROW FORMATstring,userid int) partitioned by (pdate string) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ' ';DELIMITED FIELDS TERMINATED BY ' ';
• hive> alter table login add partition(ym='201303',d='28') LOCATIONhive> alter table login add partition(ym='201303',d='28') LOCATION
'hdfs://h46:9000/flume/loginlog/201303/28/';'hdfs://h46:9000/flume/loginlog/201303/28/';
Hive UDFHive UDF
Hive UDFHive UDF
• StreamingStreaming
• UDFUDF
• UDAFUDAF
• UDTFUDTF
streamingstreaming
• 分割字符串分割字符串 pythonpython
• def calcwin():def calcwin():
• for line in sys.stdin:for line in sys.stdin:
• (ldate,userid,roundbet,fold,allin,chipwon) =(ldate,userid,roundbet,fold,allin,chipwon) =
line.strip().split()line.strip().split()
• print 't'.join(["%s:%s"%(ldate,userid),print 't'.join(["%s:%s"%(ldate,userid),
win,fold,allin])win,fold,allin])
streamingstreaming
• 用法类似用法类似
• hive> from testpoker selecthive> from testpoker select
transform(ldate,ltime,threadid,gameid,userid,pid,routransform(ldate,ltime,threadid,gameid,userid,pid,rou
ndbet,fold,allin,cardtype,cards,chipwon) usingndbet,fold,allin,cardtype,cards,chipwon) using
'calcpoker.py' as'calcpoker.py' as
(ldate,gameid,userid,pid,win,fold,allin,cardtype,card(ldate,gameid,userid,pid,win,fold,allin,cardtype,card
s) ;s) ;
UDFUDF
• public class UDFTest extends UDF {public class UDFTest extends UDF {
• public Integer evaluate(String s) {public Integer evaluate(String s) {
• if (s == null) { return null; }if (s == null) { return null; }
• return s.length(); }return s.length(); }
• }}
UDFUDF
• add jar /path/testudf.jar;add jar /path/testudf.jar;
• CREATE TEMPORARY FUNCTION testlength ASCREATE TEMPORARY FUNCTION testlength AS
'org.zhouhh.UDFTest';'org.zhouhh.UDFTest';
• SELECT testlength(src.value) FROM src;SELECT testlength(src.value) FROM src;
UDAFUDAF
• User-Defined Aggregation FuncationUser-Defined Aggregation Funcation
• public class UDAFCount extends UDAF {public class UDAFCount extends UDAF {
• public static class Evaluator implements UDAFEvaluator {public static class Evaluator implements UDAFEvaluator {
• private int mCount;private int mCount;
• public void init() { mcount = 0; }public void init() { mcount = 0; }
• public boolean iterate(Object o) {public boolean iterate(Object o) {
• if (o!=null) mCount++;if (o!=null) mCount++;
• return true; }return true; }
• public Integer terminatePartial() {return mCount; }public Integer terminatePartial() {return mCount; }
• public boolean merge(Integer o) {public boolean merge(Integer o) {
• mCount += o;mCount += o; return true;return true; }}
• public Integer terminate() {return mCount; } }public Integer terminate() {return mCount; } }
UDAFUDAF
• add jar /path/testudaf.jar;add jar /path/testudaf.jar;
• CREATE TEMPORARY FUNCTION testcount ASCREATE TEMPORARY FUNCTION testcount AS
'org.zhouhh.'org.zhouhh.UDAFCountUDAFCount ';';
• SELECT testcount(src.id) FROM src;SELECT testcount(src.id) FROM src;
UDTFUDTF
• User-Defined Table-GeneratingUser-Defined Table-Generating
FunctionsFunctions
• 解决 入一行 出多行输 输解决 入一行 出多行输 输 (On-to-many(On-to-many
maping)maping) 的需求的需求
UDTFUDTF
• 承继承继
org.apache.hadoop.hive.ql.udf.generic.GenericUDTorg.apache.hadoop.hive.ql.udf.generic.GenericUDT
FF 。。
• 实现实现 initialize, process, closeinitialize, process, close 三个方法三个方法
UDTFUDTF
• 使用方法使用方法
• 1.1. 不可添加其他字段不可添加其他字段 ,, 不可不可 group bygroup by ,, sort bysort by 等等
• select explode_map(properties) as (col1,col2) fromselect explode_map(properties) as (col1,col2) from
src;src;
• 2.2. 用用 lateral viewlateral view
• select src.id, mytable.col1, mytable.col2 from srcselect src.id, mytable.col1, mytable.col2 from src
lateral view explode_map(properties) mytable aslateral view explode_map(properties) mytable as
col1, col2;col1, col2;
小技巧小技巧
小技巧小技巧
• structstruct
• 10158262351015826235
[{"product_id":220003038067,"timestamps":"134032[{"product_id":220003038067,"timestamps":"134032
1132000"},1132000"},
{"product_id":300003861266,"timestamps":"134027{"product_id":300003861266,"timestamps":"134027
1857000"}]1857000"}]
小技巧小技巧
• CREATE EXTERNAL TABLE IF NOT EXISTSCREATE EXTERNAL TABLE IF NOT EXISTS
SampleTableSampleTable
• (
• USER_ID BIGINT,
• NEW_ITEM ARRAY<STRUCT<PRODUCT_ID:
BIGINT,TIMESTAMPS:STRING>>)
小技巧小技巧
• SELECTSELECT
• user_id,
• prod_and_ts.product_id as product_id,
• prod_and_ts.timestamps as timestamps
• FROM
• SampleTable
• LATERAL VIEW explode(new_item)
exploded_table as prod_and_ts;
小技巧小技巧
• **USER_ID** | **PRODUCT_ID** | **TIMESTAMPS****USER_ID** | **PRODUCT_ID** | **TIMESTAMPS**
• ------------+------------------+----------------------------+------------------+----------------
• 1015826235 220003038067 13403211320001015826235 220003038067 1340321132000
• 1015826235 300003861266 13402718570001015826235 300003861266 1340271857000
•
•
•
讨论 ...
谢谢 !
http://abloz.com
2013.4.18
@abloz

More Related Content

What's hot

[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
Wanbok Choi
 
[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기
NAVER D2
 
Introduction to cloudforecast
Introduction to cloudforecastIntroduction to cloudforecast
Introduction to cloudforecast
Masahiro Nagano
 
Writing native bindings to node.js in C++
Writing native bindings to node.js in C++Writing native bindings to node.js in C++
Writing native bindings to node.js in C++
nsm.nikhil
 
Go Web Development
Go Web DevelopmentGo Web Development
Go Web Development
Cheng-Yi Yu
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Queue in swift
Queue in swiftQueue in swift
Queue in swift
joonjhokil
 
Everything as Code with Terraform
Everything as Code with TerraformEverything as Code with Terraform
Everything as Code with Terraform
Mitchell Pronschinske
 
Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09
Michelangelo van Dam
 
Php data structures – beyond spl (online version)
Php data structures – beyond spl (online version)Php data structures – beyond spl (online version)
Php data structures – beyond spl (online version)
Mark Baker
 
台科逆向簡報
台科逆向簡報台科逆向簡報
台科逆向簡報
耀德 蔡
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D2
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Yonik Seeley
 
Tajo Seoul Meetup-201501
Tajo Seoul Meetup-201501Tajo Seoul Meetup-201501
Tajo Seoul Meetup-201501
Jinho Kim
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboy
Kenneth Geisshirt
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesMatt Kocubinski
 
Node.js/io.js Native C++ Addons
Node.js/io.js Native C++ AddonsNode.js/io.js Native C++ Addons
Node.js/io.js Native C++ Addons
Chris Barber
 
Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011CodeIgniter Conference
 
SPL - The Undiscovered Library - PHPBarcelona 2015
SPL - The Undiscovered Library - PHPBarcelona 2015SPL - The Undiscovered Library - PHPBarcelona 2015
SPL - The Undiscovered Library - PHPBarcelona 2015
Mark Baker
 

What's hot (20)

[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
 
[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기
 
Introduction to cloudforecast
Introduction to cloudforecastIntroduction to cloudforecast
Introduction to cloudforecast
 
Writing native bindings to node.js in C++
Writing native bindings to node.js in C++Writing native bindings to node.js in C++
Writing native bindings to node.js in C++
 
V8
V8V8
V8
 
Go Web Development
Go Web DevelopmentGo Web Development
Go Web Development
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Queue in swift
Queue in swiftQueue in swift
Queue in swift
 
Everything as Code with Terraform
Everything as Code with TerraformEverything as Code with Terraform
Everything as Code with Terraform
 
Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09
 
Php data structures – beyond spl (online version)
Php data structures – beyond spl (online version)Php data structures – beyond spl (online version)
Php data structures – beyond spl (online version)
 
台科逆向簡報
台科逆向簡報台科逆向簡報
台科逆向簡報
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
 
Tajo Seoul Meetup-201501
Tajo Seoul Meetup-201501Tajo Seoul Meetup-201501
Tajo Seoul Meetup-201501
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboy
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
 
Node.js/io.js Native C++ Addons
Node.js/io.js Native C++ AddonsNode.js/io.js Native C++ Addons
Node.js/io.js Native C++ Addons
 
Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011
 
SPL - The Undiscovered Library - PHPBarcelona 2015
SPL - The Undiscovered Library - PHPBarcelona 2015SPL - The Undiscovered Library - PHPBarcelona 2015
SPL - The Undiscovered Library - PHPBarcelona 2015
 

Viewers also liked

User-Defined Table Generating Functions
User-Defined Table Generating FunctionsUser-Defined Table Generating Functions
User-Defined Table Generating Functionspauly1
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
amarsri
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
Uday Vakalapudi
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
Will Du
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
Minwoo Kim
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Julian Hyde
 
Replacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and HiveReplacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and Hive
JunHo Cho
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 

Viewers also liked (8)

User-Defined Table Generating Functions
User-Defined Table Generating FunctionsUser-Defined Table Generating Functions
User-Defined Table Generating Functions
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
Replacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and HiveReplacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and Hive
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 

Similar to Hive introduction 介绍

Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
rhatr
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
Shashwat Shriparv
 
HHVM and Hack: A quick introduction
HHVM and Hack: A quick introductionHHVM and Hack: A quick introduction
HHVM and Hack: A quick introduction
Kuan Yen Heng
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
Modern Data Stack France
 
How to implement a simple dalvik virtual machine
How to implement a simple dalvik virtual machineHow to implement a simple dalvik virtual machine
How to implement a simple dalvik virtual machine
Chun-Yu Wang
 
Getting to know Laravel 5
Getting to know Laravel 5Getting to know Laravel 5
Getting to know Laravel 5
Bukhori Aqid
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Software, Inc.
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
Big Data Joe™ Rossi
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Data Con LA
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesCharles Nutter
 
PHP Basics and Demo HackU
PHP Basics and Demo HackUPHP Basics and Demo HackU
PHP Basics and Demo HackU
Anshu Prateek
 
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Charles Nutter
 
Workshop Infrastructure as Code - Suestra
Workshop Infrastructure as Code - SuestraWorkshop Infrastructure as Code - Suestra
Workshop Infrastructure as Code - Suestra
Mario IC
 
Set Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into HadoopSet Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into Hadoop
Continuent
 
Hd insight programming
Hd insight programmingHd insight programming
Hd insight programmingCasear Chu
 
Lecture 03 - JQuery.pdf
Lecture 03 - JQuery.pdfLecture 03 - JQuery.pdf
Lecture 03 - JQuery.pdf
Lê Thưởng
 
Vim Script Programming
Vim Script ProgrammingVim Script Programming
Vim Script ProgrammingLin Yo-An
 
JRuby and Invokedynamic - Japan JUG 2015
JRuby and Invokedynamic - Japan JUG 2015JRuby and Invokedynamic - Japan JUG 2015
JRuby and Invokedynamic - Japan JUG 2015
Charles Nutter
 

Similar to Hive introduction 介绍 (20)

Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
HHVM and Hack: A quick introduction
HHVM and Hack: A quick introductionHHVM and Hack: A quick introduction
HHVM and Hack: A quick introduction
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
 
How to implement a simple dalvik virtual machine
How to implement a simple dalvik virtual machineHow to implement a simple dalvik virtual machine
How to implement a simple dalvik virtual machine
 
Getting to know Laravel 5
Getting to know Laravel 5Getting to know Laravel 5
Getting to know Laravel 5
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for Dummies
 
BIG DATA ANALYSIS
BIG DATA ANALYSISBIG DATA ANALYSIS
BIG DATA ANALYSIS
 
PHP Basics and Demo HackU
PHP Basics and Demo HackUPHP Basics and Demo HackU
PHP Basics and Demo HackU
 
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
 
Workshop Infrastructure as Code - Suestra
Workshop Infrastructure as Code - SuestraWorkshop Infrastructure as Code - Suestra
Workshop Infrastructure as Code - Suestra
 
Set Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into HadoopSet Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into Hadoop
 
Hd insight programming
Hd insight programmingHd insight programming
Hd insight programming
 
Lecture 03 - JQuery.pdf
Lecture 03 - JQuery.pdfLecture 03 - JQuery.pdf
Lecture 03 - JQuery.pdf
 
Vim Script Programming
Vim Script ProgrammingVim Script Programming
Vim Script Programming
 
JRuby and Invokedynamic - Japan JUG 2015
JRuby and Invokedynamic - Japan JUG 2015JRuby and Invokedynamic - Japan JUG 2015
JRuby and Invokedynamic - Japan JUG 2015
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 

Hive introduction 介绍

  • 2. 目录目录 • HiveHive 介简介简 • HiveHive 特性特性 • HiveQLHiveQL • UDFUDF • 小技巧小技巧 • 讨论讨论
  • 3. HiveHive 介简介简 • 官网官网 http://hive.apache.org/http://hive.apache.org/ • 最新版本最新版本 0.100.10 • facebookfacebook 献贡 给献贡 给 apacheapache
  • 4. HiveHive 模式模式 • Metadb : embedded Derby database,mysql,otherMetadb : embedded Derby database,mysql,other • local mode: Derbylocal mode: Derby ,, one userone user ,, one jobone job • distribute mode: mysqldistribute mode: mysql ,, multi usermulti user
  • 5. 支持支持 HadoopHadoop 版本版本 • hadoop 0.20~hadoop 0.20~ • hadoop 0.23~hadoop 0.23~
  • 7. HiveHive 特性特性 • 数据仓库数据仓库 • HiveQLHiveQL • HDFS & HBaseHDFS & HBase • ^A^A 分隔的行分隔的行
  • 10. HiveQL - SQLHiveQL - SQL 部分子集部分子集 • No Update or Delete statements.No Update or Delete statements. • each query tables only from one databaseeach query tables only from one database • not support IN/EXISTS, Having clausenot support IN/EXISTS, Having clause • ......
  • 11. HiveQL -HiveQL - 超出超出 SQLSQL 部分部分 • 数据复杂 结构数据复杂 结构 • structstruct • arrayarray • mapmap • ......
  • 12. HiveQL -HiveQL - 自 的部分函数带自 的部分函数带 • :统计:统计 – sum,count,avg,min,maxsum,count,avg,min,max – 体 准差函数总 标体 准差函数总 标 : stddev_pop: stddev_pop – 本 准差函数样 标本 准差函数样 标 : stddev_samp: stddev_samp – 中位数函数中位数函数 : percentile: percentile – 直方图直方图 : histogram_numeric: histogram_numeric • 条件:条件: – ifif – casecase
  • 13. HiveQL -HiveQL - 自 的部分函数带自 的部分函数带 • 时间时间 year date unix_timestamp ...year date unix_timestamp ... • 逻辑逻辑 and or notand or not • 算符运算符运 +-*/ % | & ^ ~+-*/ % | & ^ ~ • 数学数学 round floor ceil rand exp log log2 pow sqrt hexround floor ceil rand exp log log2 pow sqrt hex sin ...sin ... • 字符串 理处字符串 理处 trim substr length split get_json_objecttrim substr length split get_json_object parse_url regexp_replace regexp_extractparse_url regexp_replace regexp_extract
  • 14. HiveQLHiveQL 示例示例 -- 建创建创 HDFSHDFS 文本外表文本外表 • CREATE EXTERNAL TABLE login(CREATE EXTERNAL TABLE login( • ldate string,ldate string, • userid int,userid int, • proid int,proid int, • imei string,imei string, • sysver string)sysver string) • ROW FORMAT DELIMITED FIELDSROW FORMAT DELIMITED FIELDS TERMINATED BY ' 'TERMINATED BY ' ' • LOCATIONLOCATION • 'hdfs://h46:9000/flume/loginlog''hdfs://h46:9000/flume/loginlog'
  • 15. HiveQLHiveQL 示例示例 -HBase-HBase 外表外表 • CREATE EXTERNAL TABLE lordstat_pid(CREATE EXTERNAL TABLE lordstat_pid( • key string COMMENT 'from deserializer',key string COMMENT 'from deserializer', • total int COMMENT 'from deserializer',total int COMMENT 'from deserializer', • win int COMMENT 'from deserializer',win int COMMENT 'from deserializer', • spring int COMMENT 'from deserializer')spring int COMMENT 'from deserializer') • ROW FORMAT SERDEROW FORMAT SERDE • 'org.apache.hadoop.hive.hbase.HBaseSerDe''org.apache.hadoop.hive.hbase.HBaseSerDe' • STORED BYSTORED BY • 'org.apache.hadoop.hive.hbase.HBaseStorageHandler''org.apache.hadoop.hive.hbase.HBaseStorageHandler' • WITH SERDEPROPERTIES (WITH SERDEPROPERTIES ( • 'serialization.format'='1','serialization.format'='1', • 'hbase.columns.mapping'=':key,i:t,i:win,i:spr')'hbase.columns.mapping'=':key,i:t,i:win,i:spr') • TBLPROPERTIES (TBLPROPERTIES ( • 'hbase.table.name'='lordstat_pid');'hbase.table.name'='lordstat_pid');
  • 16. HiveQLHiveQL 示例示例 -- 分区分区 • hive> create external table glog1(ldate string,ltime string ,threadidhive> create external table glog1(ldate string,ltime string ,threadid string,userid int) partitioned by (pdate string) ROW FORMATstring,userid int) partitioned by (pdate string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';DELIMITED FIELDS TERMINATED BY ' '; • hive> alter table login add partition(ym='201303',d='28') LOCATIONhive> alter table login add partition(ym='201303',d='28') LOCATION 'hdfs://h46:9000/flume/loginlog/201303/28/';'hdfs://h46:9000/flume/loginlog/201303/28/';
  • 18. Hive UDFHive UDF • StreamingStreaming • UDFUDF • UDAFUDAF • UDTFUDTF
  • 19. streamingstreaming • 分割字符串分割字符串 pythonpython • def calcwin():def calcwin(): • for line in sys.stdin:for line in sys.stdin: • (ldate,userid,roundbet,fold,allin,chipwon) =(ldate,userid,roundbet,fold,allin,chipwon) = line.strip().split()line.strip().split() • print 't'.join(["%s:%s"%(ldate,userid),print 't'.join(["%s:%s"%(ldate,userid), win,fold,allin])win,fold,allin])
  • 20. streamingstreaming • 用法类似用法类似 • hive> from testpoker selecthive> from testpoker select transform(ldate,ltime,threadid,gameid,userid,pid,routransform(ldate,ltime,threadid,gameid,userid,pid,rou ndbet,fold,allin,cardtype,cards,chipwon) usingndbet,fold,allin,cardtype,cards,chipwon) using 'calcpoker.py' as'calcpoker.py' as (ldate,gameid,userid,pid,win,fold,allin,cardtype,card(ldate,gameid,userid,pid,win,fold,allin,cardtype,card s) ;s) ;
  • 21. UDFUDF • public class UDFTest extends UDF {public class UDFTest extends UDF { • public Integer evaluate(String s) {public Integer evaluate(String s) { • if (s == null) { return null; }if (s == null) { return null; } • return s.length(); }return s.length(); } • }}
  • 22. UDFUDF • add jar /path/testudf.jar;add jar /path/testudf.jar; • CREATE TEMPORARY FUNCTION testlength ASCREATE TEMPORARY FUNCTION testlength AS 'org.zhouhh.UDFTest';'org.zhouhh.UDFTest'; • SELECT testlength(src.value) FROM src;SELECT testlength(src.value) FROM src;
  • 23. UDAFUDAF • User-Defined Aggregation FuncationUser-Defined Aggregation Funcation • public class UDAFCount extends UDAF {public class UDAFCount extends UDAF { • public static class Evaluator implements UDAFEvaluator {public static class Evaluator implements UDAFEvaluator { • private int mCount;private int mCount; • public void init() { mcount = 0; }public void init() { mcount = 0; } • public boolean iterate(Object o) {public boolean iterate(Object o) { • if (o!=null) mCount++;if (o!=null) mCount++; • return true; }return true; } • public Integer terminatePartial() {return mCount; }public Integer terminatePartial() {return mCount; } • public boolean merge(Integer o) {public boolean merge(Integer o) { • mCount += o;mCount += o; return true;return true; }} • public Integer terminate() {return mCount; } }public Integer terminate() {return mCount; } }
  • 24. UDAFUDAF • add jar /path/testudaf.jar;add jar /path/testudaf.jar; • CREATE TEMPORARY FUNCTION testcount ASCREATE TEMPORARY FUNCTION testcount AS 'org.zhouhh.'org.zhouhh.UDAFCountUDAFCount ';'; • SELECT testcount(src.id) FROM src;SELECT testcount(src.id) FROM src;
  • 25. UDTFUDTF • User-Defined Table-GeneratingUser-Defined Table-Generating FunctionsFunctions • 解决 入一行 出多行输 输解决 入一行 出多行输 输 (On-to-many(On-to-many maping)maping) 的需求的需求
  • 26. UDTFUDTF • 承继承继 org.apache.hadoop.hive.ql.udf.generic.GenericUDTorg.apache.hadoop.hive.ql.udf.generic.GenericUDT FF 。。 • 实现实现 initialize, process, closeinitialize, process, close 三个方法三个方法
  • 27. UDTFUDTF • 使用方法使用方法 • 1.1. 不可添加其他字段不可添加其他字段 ,, 不可不可 group bygroup by ,, sort bysort by 等等 • select explode_map(properties) as (col1,col2) fromselect explode_map(properties) as (col1,col2) from src;src; • 2.2. 用用 lateral viewlateral view • select src.id, mytable.col1, mytable.col2 from srcselect src.id, mytable.col1, mytable.col2 from src lateral view explode_map(properties) mytable aslateral view explode_map(properties) mytable as col1, col2;col1, col2;
  • 30. 小技巧小技巧 • CREATE EXTERNAL TABLE IF NOT EXISTSCREATE EXTERNAL TABLE IF NOT EXISTS SampleTableSampleTable • ( • USER_ID BIGINT, • NEW_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>)
  • 31. 小技巧小技巧 • SELECTSELECT • user_id, • prod_and_ts.product_id as product_id, • prod_and_ts.timestamps as timestamps • FROM • SampleTable • LATERAL VIEW explode(new_item) exploded_table as prod_and_ts;
  • 32. 小技巧小技巧 • **USER_ID** | **PRODUCT_ID** | **TIMESTAMPS****USER_ID** | **PRODUCT_ID** | **TIMESTAMPS** • ------------+------------------+----------------------------+------------------+---------------- • 1015826235 220003038067 13403211320001015826235 220003038067 1340321132000 • 1015826235 300003861266 13402718570001015826235 300003861266 1340271857000 • • •