SlideShare a Scribd company logo
Hive User Group Meeting August 2009
Hive Overview
Why Another Data Warehousing System? ,[object Object],[object Object],[object Object]
What is HIVE? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why SQL on Hadoop? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*  Databases  – supported in metastore, but not yet in query language
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Types ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* More details about user defined types in extensibility section
Hive Architecture
Hive Query Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Query Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Application ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* sqoop - http://www.cloudera.com/hadoop-sqoop
Example Query (Filter) ,[object Object],[object Object]
Example Query (Aggregation) ,[object Object],[object Object]
Facebook
[object Object],Facebook
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example Query (multi-group-by)
 
Hive Metastore
Single User Mode (Default)
Multi User Mode
Remote Server
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Meta Data Model
Hive Optimizations
Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Column Pruning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Predicate Pushdown ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
Partition Pruning ,[object Object],[object Object]
Partition Pruning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Join in Map Reduce page_view user pv_users Map Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> Shuffle Sort Pageid age 1 25 2 25 pageid age 1 32
Hive QL – Join ,[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Join Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Map Join page_view user Hash table pv_users key value 111 <1,2> 222 <2> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male Pageid age 1 25 2 25 1 32
Map Join ,[object Object],[object Object],[object Object],[object Object]
Parameters ,[object Object],[object Object],[object Object]
Future ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Group By ,[object Object],[object Object],[object Object]
Hive QL – Group By in Map Reduce pv_users Map Reduce pageid age 1 25 1 25 pageid age count 1 25 3 pageid age 2 32 1 25 key value <1,25> 2 key value <1,25> 1 <2,32> 1 key value <1,25> 2 <1,25> 1 key value <2,32> 1 Shuffle Sort pageid age count 2 32 1
Group by Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object]
Multi GroupBy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Group By in Map Reduce pv_users Key: userid Value: gender, age gender age userid M 25 1 M 25 2 M 25 1 M 24 1 F 24 2 F 24 1 gender dist count M 2 4 F 2 2 gender dist count M 1 3 F 1 1 age dist 24 1 25 1 gender dist count M 1 1 F 1 1 age dist 25 1 24 1 age dist 25 1 24 1
[object Object],[object Object],[object Object],[object Object]
Merging of small files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Extensibility Features
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
Hive is an open system ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
File Format Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Existing File Formats Facebook *  Splitable: Capable of splitting the file so that a single huge file can be processed by multiple mappers in parallel. TEXTFILE SEQUENCEFILE RCFILE Data type text only  text/binary text/binary Internal Storage order Row-based Row-based Column-based Compression File-based Block-based Block-based Splitable* YES YES YES Splitable* after compression NO YES YES
When to add a new File Format ,[object Object],[object Object],Facebook
How to add a new File Format ,[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
SerDe Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
SerDe ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Existing SerDes Facebook * LazyObjects: deserialize the columns only when accessed. * Binary Sortable: binary format preserving the sort order. LazySimpleSerDe LazyBinarySerDe (HIVE-640) BinarySortable SerDe serialized format delimited proprietary binary proprietary binary sortable* deserialized format LazyObjects* LazyBinaryObjects* Writable ThriftSerDe (HIVE-706) RegexSerDe ColumnarSerDe serialized format Depends on the Thrift Protocol Regex formatted proprietary column-based deserialized format User-defined Classes, Java Primitive Objects ArrayList<String> LazyObjects*
When to add a new SerDe ,[object Object],[object Object],Facebook
How to add a new SerDe for text data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
How to add a new SerDe for binary data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
Map/Reduce Scripts Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Map/Reduce Scripts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
UDF Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
More efficient UDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Overloaded UDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Implicit Type Conversions for UDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Variable-length Arguments (HIVE-699) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Summary for UDFs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
UDAF Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Overloaded UDAF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
UDAF with Complex Intermediate Result ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
UDAF without Partial Aggregations ,[object Object],[object Object],[object Object],[object Object],Facebook
Summary for UDAFs ,[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Comparison of UDF/UDAF v.s. M/R scripts Facebook * UDTF: User-defined Table Generation Function HIVE-655 UDF/UDAF M/R scripts language Java any language data format in-memory objects serialized streams 1/1 input/output supported via UDF supported n/1 input/output supported via UDAF supported 1/n input/output not supported yet (UDTF) supported Speed faster Slower
[object Object],Facebook
How to contribute the work ,[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Q & A ,[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Hive – Roadmap
Branches and Releases ,[object Object],[object Object],[object Object],Facebook
Features available in 0.4 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Future stuff ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
More Future stuff ,[object Object],[object Object],[object Object],[object Object],Facebook

More Related Content

What's hot

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleDataWorks Summit
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
DataWorks Summit
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Sqoop
SqoopSqoop
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
Yue Chen
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


Cloudera, Inc.
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
DataWorks Summit
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
Recruit Technologies
 
ORC Files
ORC FilesORC Files
ORC Files
Owen O'Malley
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
Alluxio, Inc.
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
DataWorks Summit
 

What's hot (20)

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Sqoop
SqoopSqoop
Sqoop
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
ORC Files
ORC FilesORC Files
ORC Files
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
 

Similar to Hive User Meeting August 2009 Facebook

Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010ragho
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008
athusoo
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Namit Jain
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 HiveZheng Shao
 
Hive
HiveHive
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
Joydeep Sen Sarma
 
Hadoop and Hive
Hadoop and HiveHadoop and Hive
Hadoop and Hive
Zheng Shao
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Jeff Hammerbacher
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009
prasadc
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
Zheng Shao
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
Chirag Ahuja
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
InfluxData
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
Kelly Technologies
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
Jeff Hammerbacher
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
Sql
SqlSql

Similar to Hive User Meeting August 2009 Facebook (20)

Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
 
Hive
HiveHive
Hive
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
Hadoop and Hive
Hadoop and HiveHadoop and Hive
Hadoop and Hive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Sql
SqlSql
Sql
 

Recently uploaded

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Hive User Meeting August 2009 Facebook

  • 1. Hive User Group Meeting August 2009
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 18.
  • 19.
  • 20.  
  • 22. Single User Mode (Default)
  • 25.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Hive QL – Join in Map Reduce page_view user pv_users Map Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> Shuffle Sort Pageid age 1 25 2 25 pageid age 1 32
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. Hive QL – Map Join page_view user Hash table pv_users key value 111 <1,2> 222 <2> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male Pageid age 1 25 2 25 1 32
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. Hive QL – Group By in Map Reduce pv_users Map Reduce pageid age 1 25 1 25 pageid age count 1 25 3 pageid age 2 32 1 25 key value <1,25> 2 key value <1,25> 1 <2,32> 1 key value <1,25> 2 <1,25> 1 key value <2,32> 1 Shuffle Sort pageid age count 2 32 1
  • 49.
  • 50.
  • 51.
  • 52. Hive QL – Group By in Map Reduce pv_users Key: userid Value: gender, age gender age userid M 25 1 M 25 2 M 25 1 M 24 1 F 24 2 F 24 1 gender dist count M 2 4 F 2 2 gender dist count M 1 3 F 1 1 age dist 24 1 25 1 gender dist count M 1 1 F 1 1 age dist 25 1 24 1 age dist 25 1 24 1
  • 53.
  • 54.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61. Existing File Formats Facebook * Splitable: Capable of splitting the file so that a single huge file can be processed by multiple mappers in parallel. TEXTFILE SEQUENCEFILE RCFILE Data type text only text/binary text/binary Internal Storage order Row-based Row-based Column-based Compression File-based Block-based Block-based Splitable* YES YES YES Splitable* after compression NO YES YES
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67. Existing SerDes Facebook * LazyObjects: deserialize the columns only when accessed. * Binary Sortable: binary format preserving the sort order. LazySimpleSerDe LazyBinarySerDe (HIVE-640) BinarySortable SerDe serialized format delimited proprietary binary proprietary binary sortable* deserialized format LazyObjects* LazyBinaryObjects* Writable ThriftSerDe (HIVE-706) RegexSerDe ColumnarSerDe serialized format Depends on the Thrift Protocol Regex formatted proprietary column-based deserialized format User-defined Classes, Java Primitive Objects ArrayList<String> LazyObjects*
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87. Comparison of UDF/UDAF v.s. M/R scripts Facebook * UDTF: User-defined Table Generation Function HIVE-655 UDF/UDAF M/R scripts language Java any language data format in-memory objects serialized streams 1/1 input/output supported via UDF supported n/1 input/output supported via UDAF supported 1/n input/output not supported yet (UDTF) supported Speed faster Slower
  • 88.
  • 89.
  • 90.
  • 92.
  • 93.
  • 94.
  • 95.