SlideShare a Scribd company logo
1 of 11
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive-14535 : Cloud storage
Gopal V
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloud “FileSystems” are Strange Beasts
“There are no directories. Only paths.”
“There are no users. Only keys.”
“There are no permissions. Only acl rules.”
“There is consistency, but not as we know it.”
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Directories vs Paths.”
• Storage of Path information can be assumed to be a sorted hash-table.
• File listings are no longer listing off a tree, but prefix search
• Directories don’t need to necessarily exist for a path below it
• Listing a single level is more complex than a full-depth traversal
• Renames can cause rebalancing and moving about of the structure
• Adjacent files are sometimes more expensive than random ops
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Users & permissions vs keys & ACLs”
• Distinguishing the user for an accessing process has no meaning
• Access keys are often rotated and occasionally invalidated
• User identity can be mapped to a key (externally or by id management)
• Buckets are commonly used to differentiate stores, instead of permissions
• Permissions are rarely set or applied per-file, but across path patterns
• Permissions set to a directory need extra user checks to be useful (chmod +x)
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Consistency”
• Arguably the most complex issue
• Renames needn’t be consistent, creates can have collisions
• Reads can return old data for the same path when overwriting
• Versioned reads are complex to manage and hard to throw a “Time machine” over
• Cross-Region Replication often lags and doubles stale-read issues
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Micro-Managed Hive Tables
• Support for all Hive input formats, including user ones
• Avoid rename operations as much as possible
• Never collide final paths for different inserts
• Ongoing inserts should be atomic across > 1 partitions
• Snapshot isolation for data reads for existing partitions being back-filled
• Stage data without accidental partial-reads for bucket replication
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Micro-Managed Hive Tables
CREATE TABLE `web_returns_hive_commit`(…
`wr_net_loss` float)
PARTITIONED BY (`wr_returned_date_sk` int)
STORED AS <FORMAT>
LOCATION 's3a://hwdev-hive-14535/web_returns_hive_commit'
TBLPROPERTIES
('transactional'='true',
'transactional_properties'='insert_only');
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Micro-Managed Hive Tables
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0
-rw-rw-rw- 1 cloudbreak 1791 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0/000021_0
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821/mm_0
-rw-rw-rw- 1 cloudbreak 2186 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821/mm_0/000022_0
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822/mm_0
-rw-rw-rw- 1 cloudbreak 1814 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822/mm_0/000023_0
/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0/000021_0
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Take a number” for inserts
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Read: tracking committed data
• Similar to Hive-ACID (ORC)
• Committed txns disappear from the tracking data
• With each query, it takes a highest known txn + list of open/aborted txns
• All valid transactions are < max(transaction_id) and not IN (open_txns)
• The transaction filtering is done at the listing level for all formats
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Branch + Future Work
Current measurement has 21% reduction in partition load time (+HIVE-15368)
Time taken to load dynamic partitions: 350.846 seconds -> 274.715 seconds
Work continues in the branch for hive-14535
Work ongoing to optimize to take advantage of faster recursive listings
Discussions towards incremental refresh for cube engines for backfill
Questions?
Suggestions?

More Related Content

What's hot

Getting involved with Open Source at the ASF
Getting involved with Open Source at the ASFGetting involved with Open Source at the ASF
Getting involved with Open Source at the ASFHortonworks
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...DataWorks Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudDataWorks Summit/Hadoop Summit
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache HiveDataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 

What's hot (20)

Getting involved with Open Source at the ASF
Getting involved with Open Source at the ASFGetting involved with Open Source at the ASF
Getting involved with Open Source at the ASF
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 

Viewers also liked

How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHortonworks
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesHortonworks
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Hortonworks
 
Top 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsTop 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsHortonworks
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution Hortonworks
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonHortonworks
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsHortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambariHortonworks
 
Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23Hortonworks
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Hortonworks
 
The Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationThe Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationHortonworks
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureHortonworks
 
Credit Card Analytics on a Connected Data Platform
Credit Card Analytics on a Connected Data PlatformCredit Card Analytics on a Connected Data Platform
Credit Card Analytics on a Connected Data PlatformHortonworks
 

Viewers also liked (20)

How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Top 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsTop 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data Analytics
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
 
The Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationThe Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen Modernization
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Credit Card Analytics on a Connected Data Platform
Credit Card Analytics on a Connected Data PlatformCredit Card Analytics on a Connected Data Platform
Credit Card Analytics on a Connected Data Platform
 

Similar to Cloud storage filesystems and Hive transactional tables

Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and OracleTanel Poder
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageAvere Systems
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Chris Nauroth
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Big Data Conference April 2015
Big Data Conference April 2015Big Data Conference April 2015
Big Data Conference April 2015Aaron Benz
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ Docker, Inc.
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMark Swarbrick
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoBig Data Joe™ Rossi
 

Similar to Cloud storage filesystems and Hive transactional tables (20)

Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Big Data Conference April 2015
Big Data Conference April 2015Big Data Conference April 2015
Big Data Conference April 2015
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
MySQL highav Availability
MySQL highav AvailabilityMySQL highav Availability
MySQL highav Availability
 
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Revision
RevisionRevision
Revision
 
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Cloud storage filesystems and Hive transactional tables

  • 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive-14535 : Cloud storage Gopal V
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloud “FileSystems” are Strange Beasts “There are no directories. Only paths.” “There are no users. Only keys.” “There are no permissions. Only acl rules.” “There is consistency, but not as we know it.”
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved “Directories vs Paths.” • Storage of Path information can be assumed to be a sorted hash-table. • File listings are no longer listing off a tree, but prefix search • Directories don’t need to necessarily exist for a path below it • Listing a single level is more complex than a full-depth traversal • Renames can cause rebalancing and moving about of the structure • Adjacent files are sometimes more expensive than random ops
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved “Users & permissions vs keys & ACLs” • Distinguishing the user for an accessing process has no meaning • Access keys are often rotated and occasionally invalidated • User identity can be mapped to a key (externally or by id management) • Buckets are commonly used to differentiate stores, instead of permissions • Permissions are rarely set or applied per-file, but across path patterns • Permissions set to a directory need extra user checks to be useful (chmod +x)
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved “Consistency” • Arguably the most complex issue • Renames needn’t be consistent, creates can have collisions • Reads can return old data for the same path when overwriting • Versioned reads are complex to manage and hard to throw a “Time machine” over • Cross-Region Replication often lags and doubles stale-read issues
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Micro-Managed Hive Tables • Support for all Hive input formats, including user ones • Avoid rename operations as much as possible • Never collide final paths for different inserts • Ongoing inserts should be atomic across > 1 partitions • Snapshot isolation for data reads for existing partitions being back-filled • Stage data without accidental partial-reads for bucket replication
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Micro-Managed Hive Tables CREATE TABLE `web_returns_hive_commit`(… `wr_net_loss` float) PARTITIONED BY (`wr_returned_date_sk` int) STORED AS <FORMAT> LOCATION 's3a://hwdev-hive-14535/web_returns_hive_commit' TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Micro-Managed Hive Tables drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0 -rw-rw-rw- 1 cloudbreak 1791 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0/000021_0 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821/mm_0 -rw-rw-rw- 1 cloudbreak 2186 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821/mm_0/000022_0 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822/mm_0 -rw-rw-rw- 1 cloudbreak 1814 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822/mm_0/000023_0 /web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0/000021_0
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved “Take a number” for inserts
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Read: tracking committed data • Similar to Hive-ACID (ORC) • Committed txns disappear from the tracking data • With each query, it takes a highest known txn + list of open/aborted txns • All valid transactions are < max(transaction_id) and not IN (open_txns) • The transaction filtering is done at the listing level for all formats
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Branch + Future Work Current measurement has 21% reduction in partition load time (+HIVE-15368) Time taken to load dynamic partitions: 350.846 seconds -> 274.715 seconds Work continues in the branch for hive-14535 Work ongoing to optimize to take advantage of faster recursive listings Discussions towards incremental refresh for cube engines for backfill Questions? Suggestions?