Submit Search
Upload
ORC 2015: Faster, Better, Smaller
•
23 likes
•
5,025 views
DataWorks Summit
Follow
Hadoop Summit 2015
Read less
Read more
Technology
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 42
Recommended
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
DataWorks Summit
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Data organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
ORC 2015
ORC 2015
t3rmin4t0r
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Recommended
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
DataWorks Summit
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Data organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
ORC 2015
ORC 2015
t3rmin4t0r
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Eyad Garelnabi
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
ORC Files
ORC Files
Owen O'Malley
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
Llap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Owen O'Malley
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
HadoopFileFormats_2016
HadoopFileFormats_2016
Jakub Wszolek, PhD
cstore_fdw: Columnar Storage for PostgreSQL
cstore_fdw: Columnar Storage for PostgreSQL
Citus Data
Hive tuning
Hive tuning
Michael Zhang
More Related Content
What's hot
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Eyad Garelnabi
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
ORC Files
ORC Files
Owen O'Malley
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
Llap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Owen O'Malley
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
HadoopFileFormats_2016
HadoopFileFormats_2016
Jakub Wszolek, PhD
What's hot
(20)
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
ORC Files
ORC Files
HiveACIDPublic
HiveACIDPublic
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
Llap: Locality is Dead
Llap: Locality is Dead
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Tune up Yarn and Hive
Tune up Yarn and Hive
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
LLAP Nov Meetup
LLAP Nov Meetup
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
ORC Deep Dive 2020
ORC Deep Dive 2020
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
HadoopFileFormats_2016
HadoopFileFormats_2016
Viewers also liked
cstore_fdw: Columnar Storage for PostgreSQL
cstore_fdw: Columnar Storage for PostgreSQL
Citus Data
Hive tuning
Hive tuning
Michael Zhang
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
DataWorks Summit
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Differences of Deep Learning Frameworks
Differences of Deep Learning Frameworks
Seiya Tokui
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Spark Summit
Viewers also liked
(9)
cstore_fdw: Columnar Storage for PostgreSQL
cstore_fdw: Columnar Storage for PostgreSQL
Hive tuning
Hive tuning
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
Differences of Deep Learning Frameworks
Differences of Deep Learning Frameworks
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Similar to ORC 2015: Faster, Better, Smaller
Hive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
IoT:what about data storage?
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
ACID Transactions in Hive
ACID Transactions in Hive
Eugene Koifman
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
Similar to ORC 2015: Faster, Better, Smaller
(20)
Hive present-and-feature-shanghai
Hive present-and-feature-shanghai
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Hive acid and_2.x new_features
Hive acid and_2.x new_features
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
IoT:what about data storage?
IoT:what about data storage?
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
Apache Hive ACID Project
Apache Hive ACID Project
ACID Transactions in Hive
ACID Transactions in Hive
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
More from DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
More from DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Recently uploaded
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
MarianaLemus7
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Scott Keck-Warren
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
NavinnSomaal
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Recently uploaded
(20)
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
ORC 2015: Faster, Better, Smaller
1.
Page1 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC 2015: Faster, Better, Smaller Prasanth Jayachandran Apache Hive Team, Hortonworks @prasanth_j
2.
Page2 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Apache ORC – Optimized Row-Columnar File Apache TLP – orc.apache.org+ Type Specific Encodings+ Came out of Apache Hive+ Vectorized Readers (Java, C++)+ Projection and Predicate Pushdown+ Columnar Storage+ Block Compression+ Hive ACID transactions+ Single SerDe Format+ Protobuf Metadata Storage+
3.
Page 3 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Format Specification How ORC stores data?
4.
Page 4 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC File Layout File Footer and Postscript Stripes Indexes (Row group indexes and Bloom Filter interleaved) Min/Max stats, Positions for every 10K rows Data Multiple streams per column encoded and compressed independently Stripe Footer Locations to streams, type of encoding Full specification at [1]
5.
Page 5 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC Writer Schema: <i:int,m:map<k:string,v:struct<s:string,d:double>,t:time> One tree writer per flattened column Multiple streams per column PRESENT DATA LENGTH DICTIONARY_DATA SECONDARY ROW_INDEX BLOOM_FILTER
6.
Page 6 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC Data Streams Schema: <i:int,m:map<k:string,v:struct<s:string,d:double>,t:time> Streams can be suppressed. Example: PRESENT stream is suppressed when all values in a stripe are non-null. IS_PRESENT DATA DICTIONARY LENGTH SECONDARY Compression Buffers
7.
Page 7 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Features Timeline How ORC improved over time?
8.
Page 8 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline February 2013 Stinger Initiative Announcement* Roadmap to improve Apache Hive’s performance by 100x Delivered in 100% Apache Open Source * http://hortonworks.com/blog/100x-faster-hive/ | 2013 | 2014 | 2015 SQL Engine Vectorized SQL Engine Columnar Storage ORC + + Distributed Execution Apache Tez = 100x
9.
Page 9 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline March 2013 Optimized Row Columnar (ORC) file format committed to Hive Hive version: 0.11 Native data format in Hive | 2013 | 2014 | 2015
10.
Page 10 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline March 2013 | 2013 | 2014 | 2015 Predicate Pushdown SARG interface Prune stripes and row groups based on min/max statistics Improved Run Length Encoding Tighter bit packing Longer runs DELTA, SHORT_REPEATS, DIRECT, PATCHED_BASE
11.
Page 11 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Run Length Encoding Improvements RLE (hive 0.11) RLE (hive >= 0.12) Compression Ratio Encoding Time (in ms) Decoding Time (in ms) Compression Ratio Encoding Time (in ms) Decoding Time (in ms) Twitter Census API ID (24,556,361 records) 2.32 1770 1263 6.97 1558 864 HTTP Archive (bytes.json) 79.4 198 191 200.82 263 125 Github Archive (root.payload.name.txt.dict-len) 114.05 21 15 260.73 23 15 AOL Querylog Epoch (36,389,577 records) 2.51 553 364 3.7 652 246 Reference: https://issues.apache.org/jira/secure/attachment/12596722/ORC-Compression-Ratio-Comparison.xlsx
12.
Page 12 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline April 2013 | 2013 | 2014 | 2015 Vectorized ORC readers Read and process columns in batches of size 1024 Null stream suppression Suppress PRESENT stream if no nulls in a stripe Enables fast path in vectorization June 2013
13.
Page 13 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline October 2013 | 2013 | 2014 | 2015 Statistics Interface Writer – Update statistics during load time Reader – ANALYZE TABLE .. NOSCAN Split Elimination Stripe level column statistics Eliminate stripes that do not satisfy predicate conditions November 2013
14.
Page 14 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline February 2014 | 2013 | 2014 | 2015 Zero copy read path HDFS caching APIs to read directly into memory without extra data copies Serialization Improvements Bit width alignment (trade-off space for speed) Unrolled bit packing and unpacking Buffered double reader and writer June 2014
15.
Page 15 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Serialization Improvements 0 200 400 600 800 1000 1200 1400 1600 1800 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Read Integer Performance (smaller is better) hive 0.13 unpacking hive-1.0 unpacking (new)
16.
Page 16 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Serialization Improvements 241.679 171.045 174.163 0 50 100 150 200 250 300 hive <= 0.13 buffered + BE buffered + LE MeanTime(ms) Double Read Modes ORC Read Double Performance (smaller is better) ~1.4x improvement
17.
Page 17 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline June 2014 | 2013 | 2014 | 2015 Adaptive compression buffer size >1000 columns adjust compression buffer size based on available memory Avoids wide table OOMs Fast stripe level file merging Many small files to few large files No Decompression, No Decoding ALTER TABLE … CONCATENATE July 2014
18.
Page 18 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Fast File Merging 1091 651 245 816 0 200 400 600 800 1000 1200 1400 1600 ORC RCFile TotalTimeinseconds CONCAT Supporting File Formats ETL With File Merging – TPC-H 1000 Scale Lineitem (smaller is better) Merge Time Load Time 1336 1467 ~3.33x improvement in merge time
19.
Page 19 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline July 2014 | 2013 | 2014 | 2015 ORC Padding Improvements Pad bytes to avoid remote HDFS reads Last stripe is adjusted to fit within HDFS block boundary (worst case: 5% wastage) Decouple stripe size vs block size Smaller stripes (64MB) More stripes per block (4 per block) Better parallelism & split elimination
20.
Page 20 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline September 2014 | 2013 | 2014 | 2015 String Dictionary Improvements Row group level checking Remember decision across stripes Avoids expensive RBTree insertions
21.
Page 21 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved String Dictionary Improvements 767 540 0 100 200 300 400 500 600 700 800 900 hive <= 0.13 hive > 0.13 Timeinseconds Hive Version String Dictionary Improvements - TPC-H 1000 Scale Lineitem (smaller is better) Load Time ~1.4x improvement
22.
Page 22 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline September 2014 | 2013 | 2014 | 2015 Improved ZLIB compression Different streams compressed with different zlib strategies/levels Compress integers and doubles differently Data and Dictionary stream - Looks for smaller byte patterns All other streams - Less LZ77, More Huffman
23.
Page 23 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ZLIB Improvements 178.5 172.2 225.1 0 50 100 150 200 250 ORC + (old ZLIB) ORC + (new ZLIB) ORC + SNAPPY DataSizeinGBs File Format + Compression Codec Data Size Improvements - TPC-H 1000 Scale Lineitem (smaller is better) ~4% improvement ~1.3x smaller
24.
Page 24 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ZLIB Improvements 674 433 389 0 100 200 300 400 500 600 700 800 ORC + (old ZLIB) ORC + (new ZLIB) ORC + SNAPPY DataSizeinGBs File Format + Compression Codec Load Time Improvements - TPC-H 1000 Scale Lineitem (smaller is better) ~1.6x improvement Only ~10% slower than SNAPPY
25.
Page 25 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline September 2014 | 2013 | 2014 | 2015 ACID transactions Order of millions of rows Not designed for OLTP requirements Streaming Ingest via Flume or Storm Atomically add base and delta directories Minor compaction – Merge many delta files Major compaction – Re-write base files to incorporate delta file changes Broken pattern: Add Partitions for Atomicity-
26.
Page 26 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline January 2015 | 2013 | 2014 | 2015 hasNull flag in ORC internal index Better pruning of row groups Improves the performance of SELECT .. WHERE column IS NULL;
27.
Page 27 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved hasNull in Index Improvement Bytes Read: 208.77 GB vs 539 MB 66.73 7.87 0 10 20 30 40 50 60 70 80 hive < 1.1.0 hive >= 1.1.0 ExecutionTimeinseconds Hive Version select * from lineitem where l_shipdate is null (smaller is better) Execution Time~8.5x improvement
28.
Page 28 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline February 2015 | 2013 | 2014 | 2015 Bloom Filter Index Much better row group pruning when compared to min/max Bloom filter evaluated after the fast Min/Max based elimination
29.
Page 29 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Bloom Filter Indexes Improvements 5999989709 540,000 10,000 No Indexes Min-Max Indexes Bloomfilter Indexes select * from tpch_1000.lineitem where l_orderkey = 1212000001; (log scale – smaller is better) Rows Read
30.
Page 30 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Bloom Filter Indexes Improvements 74 4.5 1.34 No Indexes Min-Max Indexes Bloomfilter Indexes select * from tpch_1000.lineitem where l_orderkey=1212000001; (smaller is better) Time Taken (seconds) ~16x improvement ~3.3x improvement
31.
Page 31 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline April 2015 | 2013 | 2014 | 2015 Split Strategies BI – Skip reading file footer ETL – Read and cache file footer HYBRID – Default. Chooses BI/ETL based on number of files and average file size Group splits based on columnar projection size instead of file size
32.
Page 32 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved Timeline April 2015 | 2013 | 2014 | 2015 ORC became Apache Top Level Project C++ reader with contributions from Hortonworks, HP and Microsoft Column encryption to encrypt sensitive columns http://orc.apache.org/
33.
Page 33 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: In Production
34.
Page34 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC at Facebook Saved more than 1,400 servers worth of storage.(2) Compressioni Compression ratio increased from 5x to 8x globally.(2) Compressioni
35.
Page35 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC at Spotify 16x less HDFS read when using ORC versus Avro.(3) IOi 32x less CPU when using ORC versus Avro.(3) CPUi
36.
Page36 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC at Yahoo! 6-50x speedup when using ORC versus Text File.(4) Speedupi 1.6-30x speedup when using ORC versus RCFile.(4) Speedupi
37.
Page 37 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP and Sub-second ORC – Pushing for Sub-second
38.
Page38 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: LLAP - JIT Performance for short queries+ Row-group level caching+ Asynchronous IO Elevator+ + Multi-threaded Column Vector processing+
39.
Page39 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Vectorization + SIMD 0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm2 0x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm2 0x00007f13d2e6afba: movslq %eax,%r10 0x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3 ;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94) Example: Query: select ss_ext_tax + 1.0 from store_sales_orc; JVM Options: HADOOP_OPTS=“ -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly” Note: Make sure to have hotspot disassembler in $JAVA_HOME/jre/lib Generated Assembly: Allocation free tight inner loops enables JDK’s auto-vectorization Vectors can be filtered early in ORC String dictionary can be used to binary-search Vectorized SIMD Join Improves performance for single key joins AVX - Vector Addition Packed Double 4 doubles loaded to 256 bit registers
40.
Page40 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: LLAP (+ SIMD + Split Strategies + Row Indexes) select * from tpch_1000.lineitem where l_orderkey=1212000001;
41.
Page41 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Questions ? Interested? Stop by the Hortonworks booth to learn more
42.
Page42 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Endnotes (1) https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-orc- specORCFormatSpecification (2) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/ (3) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014 (4) http://www.slideshare.net/Hadoop_Summit/w-1205p230-aradhakrishnan-v3