Hive on Tez with LLAP (Late Loading Application) can achieve query processing speeds of over 100,000 queries per hour. Tuning various Hive and YARN parameters such as increasing the number of executor and I/O threads, memory allocation, and disabling consistent splits between LLAP daemons and data nodes was needed to reach this performance level on a test cluster of 45 nodes. Future work includes adding a web UI for monitoring LLAP clusters and implementing column-level access controls while allowing other frameworks like Spark to still access data through HiveServer2 and prevent direct access to HDFS for security reasons.
Hive analytic workloads hadoop summit san jose 2014alanfgates
- Hive has undergone significant development over the past few years focused on improving performance, scale, and SQL support. Major releases include 0.11, 0.12, and 0.13.
- The 0.13 release focuses on performance improvements like Hive on Tez and vectorized processing to improve query performance by 100x, as well as security features like SQL standard authorization.
- Ongoing work is focused on further SQL support, ACID compliance, and optimizations to the optimizer.
This document discusses strategies for achieving sub-second SQL query performance on Hadoop at scale. It describes two use cases: highly parallel batch reporting on a massive dataset, and online reporting with low latency requirements. For the latter use case, the document evaluates Hive LLAP and Phoenix, finding that Phoenix generally has lower latency, especially for queries with large result sets, through optimizations like skip scans, merging improvements, and table splitting. Tuning HBase and Phoenix configurations can further reduce latency.
The document discusses evolving HDFS to better support large scale deployments. It summarizes HDFS's strengths in scaling to large clusters and data sizes. However, scaling the large number of small files and blocks is challenging. The solution involves using partial namespaces to store only recently used metadata in memory, and block containers to group blocks together. This will generalize the storage layer to support different container types beyond HDFS blocks. Initial goals are to scale to billions of files and blocks per volume, with the ability to add more volumes for further scaling. The changes will also enable new use cases like block storage and caching data in cloud storage.
The document discusses evolving HDFS to support generalized storage containers in order to better scale the number of files and blocks. It proposes using block containers and a partial namespace approach to initially scale to billions of files and blocks, and eventually much higher numbers. The storage layer is being restructured to support various container types for use cases beyond HDFS like object storage and HBase.
This document discusses using Spark as an execution engine for Hive queries. It begins by explaining that Hive and Spark are both commonly used in the big data space, and that Hive on Spark uses the Hive optimizer with the Spark query engine, while Spark with a Hive context uses both the Catalyst optimizer and Spark engine. The document then covers challenges in deploying Hive on Spark, such as using a custom Spark JAR without Hive dependencies. It shows how the Hive EXPLAIN command works the same on Spark, and how the execution plan and stages differ between MapReduce and Spark. Overall, the document provides a high-level overview of using Spark as a query engine for Hive.
This document discusses adding ACID transaction support to Hive to allow for updates, deletes and inserts of rows. It describes how transactions will be implemented using delta files stored in HDFS and a transaction manager using the metastore database. The new features will initially support auto-commit transactions with snapshot isolation in Hive 0.13 and add explicit transaction commands like BEGIN, COMMIT, ROLLBACK in a later release. Streaming ingest of data is also supported using a new interface for small batch writes and commits. Limitations include it initially only supporting bucketed tables without sorting.
Hive on Tez with LLAP (Late Loading Application) can achieve query processing speeds of over 100,000 queries per hour. Tuning various Hive and YARN parameters such as increasing the number of executor and I/O threads, memory allocation, and disabling consistent splits between LLAP daemons and data nodes was needed to reach this performance level on a test cluster of 45 nodes. Future work includes adding a web UI for monitoring LLAP clusters and implementing column-level access controls while allowing other frameworks like Spark to still access data through HiveServer2 and prevent direct access to HDFS for security reasons.
Hive analytic workloads hadoop summit san jose 2014alanfgates
- Hive has undergone significant development over the past few years focused on improving performance, scale, and SQL support. Major releases include 0.11, 0.12, and 0.13.
- The 0.13 release focuses on performance improvements like Hive on Tez and vectorized processing to improve query performance by 100x, as well as security features like SQL standard authorization.
- Ongoing work is focused on further SQL support, ACID compliance, and optimizations to the optimizer.
This document discusses strategies for achieving sub-second SQL query performance on Hadoop at scale. It describes two use cases: highly parallel batch reporting on a massive dataset, and online reporting with low latency requirements. For the latter use case, the document evaluates Hive LLAP and Phoenix, finding that Phoenix generally has lower latency, especially for queries with large result sets, through optimizations like skip scans, merging improvements, and table splitting. Tuning HBase and Phoenix configurations can further reduce latency.
The document discusses evolving HDFS to better support large scale deployments. It summarizes HDFS's strengths in scaling to large clusters and data sizes. However, scaling the large number of small files and blocks is challenging. The solution involves using partial namespaces to store only recently used metadata in memory, and block containers to group blocks together. This will generalize the storage layer to support different container types beyond HDFS blocks. Initial goals are to scale to billions of files and blocks per volume, with the ability to add more volumes for further scaling. The changes will also enable new use cases like block storage and caching data in cloud storage.
The document discusses evolving HDFS to support generalized storage containers in order to better scale the number of files and blocks. It proposes using block containers and a partial namespace approach to initially scale to billions of files and blocks, and eventually much higher numbers. The storage layer is being restructured to support various container types for use cases beyond HDFS like object storage and HBase.
This document discusses using Spark as an execution engine for Hive queries. It begins by explaining that Hive and Spark are both commonly used in the big data space, and that Hive on Spark uses the Hive optimizer with the Spark query engine, while Spark with a Hive context uses both the Catalyst optimizer and Spark engine. The document then covers challenges in deploying Hive on Spark, such as using a custom Spark JAR without Hive dependencies. It shows how the Hive EXPLAIN command works the same on Spark, and how the execution plan and stages differ between MapReduce and Spark. Overall, the document provides a high-level overview of using Spark as a query engine for Hive.
This document discusses adding ACID transaction support to Hive to allow for updates, deletes and inserts of rows. It describes how transactions will be implemented using delta files stored in HDFS and a transaction manager using the metastore database. The new features will initially support auto-commit transactions with snapshot isolation in Hive 0.13 and add explicit transaction commands like BEGIN, COMMIT, ROLLBACK in a later release. Streaming ingest of data is also supported using a new interface for small batch writes and commits. Limitations include it initially only supporting bucketed tables without sorting.
The document discusses LLAP (Live Long and Process), a new execution layer for Hive that enables sub-second analytical queries. LLAP uses daemons running on worker nodes to cache data in memory and keep query fragments executing between queries for faster performance. It allows for highly concurrent queries without specialized YARN queues. Benchmarks show LLAP providing up to 90% faster performance over Hive for queries against large datasets. LLAP also aims to serve as a unified data access layer for other systems like Spark SQL.
This document proposes a container-based sizing framework for Apache Hadoop/Spark clusters that uses a multi-objective genetic algorithm approach. It emulates container execution on different cloud platforms to optimize configuration parameters for minimizing execution time and deployment cost. The framework uses Docker containers with resource constraints to model cluster performance on various public clouds and instance types. Optimization finds Pareto-optimal configurations balancing time and cost across objectives.
The document discusses large-scale stream processing in the Hadoop ecosystem. It provides examples of real-time stream processing use cases for computing player statistics and analyzing telco network data. It then summarizes several open source stream processing frameworks, including Apache Storm, Samza, Kafka Streams, Spark, Flink, and Apex. Key aspects like programming models, fault tolerance methods, and performance are compared for each framework. The document concludes with recommendations for further innovation in areas like dynamic scaling and batch integration.
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
This document provides an overview of stream processing with Apache Flink. It discusses the rise of stream processing and how it enables low-latency applications and real-time analysis. It then describes Flink's stream processing capabilities, including pipelining of data, fault tolerance through checkpointing and recovery, and integration with batch processing. The document also summarizes Flink's programming model, state management, and roadmap for further development.
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
Hive was the first popular SQL layer built on Hadoop and has long been known as a heavyweight SQL engine suitable mainly for long-running batch jobs. This has greatly changed since Hive was announced to the world over 8 years ago. Hortonworks and the open source community have evolved Apache Hive into a fast, dynamic SQL on Hadoop engine capable of running highly concurrent query workloads over large datasets with sub-second response time.
The latest Hortonworks and Azure HDInsight platform versions fully support Hive with LLAP execution engine for production use. In this webinar, we will go through the architecture of Hive + LLAP engine and explain how it differs from previous Hive versions. We will then dive deeper and show how features like query vectorization and LLAP columnar caching bring further automatic performance improvements.
In the end, we will show how Gluent brings these new performance benefits to traditional enterprise database platforms via transparent data virtualization, allowing even your largest databases to benefit from all this without changing any application code. Join this webinar to learn about significant improvements in modern Hive architecture and how Gluent and Hive LLAP on Hortonworks or Azure HDInsight platforms can accelerate cloud migrations and greatly improve hybrid query performance!
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
This document provides an overview of debugging Hive queries with Hadoop in the cloud. It discusses Altiscale's Hadoop as a Service platform and perspective as an operational service provider. It then covers Hadoop 2 architecture, debugging tools, accessing logs in Hadoop 2, the Hive and Hadoop architecture, Hive logs, common Hive issues and case studies on stuck jobs and missing directories. The document aims to help users better understand and troubleshoot Hive queries running on Hadoop clusters.
This document summarizes Richard Xu's presentation on tuning Yarn, Hive, and queries on a Hadoop cluster. The initial issues with the cluster included jobs taking hours to finish when they were supposed to take minutes. Initial tuning focused on cluster configuration best practices and increasing Yarn capacity. Further tuning involved limiting user capacity, increasing resources for application masters, and tuning memory settings for MapReduce and Tez. Specific Hive query issues addressed were full table scans, non-deterministic functions, join orders, and data type mismatches. Tools discussed for analysis included Tez visualization and Lipwig. Lessons learned emphasized a holistic tuning approach and understanding data structures and explain plans. Long-lived execution (LLAP) was presented as providing in
This document summarizes updates to Apache Storm presented by P. Taylor Goetz of Hortonworks at Hadoop Summit 2016. Some key points include: Storm 0.9.x added high availability features and expanded integration capabilities. Storm 1.0 focused on maturity and improved performance. New features in Storm 1.0 include Pacemaker replacing Zookeeper, distributed caching, high availability Nimbus, native streaming windows, and state management with automatic checkpointing. Storm usability was also improved with features like dynamic log levels, tuple sampling for debugging, and distributed log searching. Future integrations and performance optimizations were also discussed.
In this talk we speak about ORC (Optimized Row Columnar) file format, features and performance optimizations that went in after its initial version (Hive 0.11 back in May 2013). We will also briefly talk about the latest and greatest features, and future enhancements that are planned for Hive 0.15.
The document discusses new features in Apache Hadoop Common and HDFS for version 3.0. Key updates include upgrading the minimum Java version to Java 8, improving dependency management, adding a new Azure Data Lake Storage connector, and introducing erasure coding in HDFS to improve storage efficiency. Erasure coding in HDFS phase 1 allows for striping of small blocks and parallel writes/reads while trading off higher network usage compared to replication.
The document summarizes Apache Phoenix and its past, present, and future as a SQL interface for HBase. It describes Phoenix's architecture and key features like secondary indexes, joins, aggregations, and transactions. Recent releases added functional indexes, the Phoenix Query Server, and initial transaction support. Future plans include improvements to local indexes, integration with Calcite and Hive, and adding JSON and other SQL features. The document aims to provide an overview of Phoenix's capabilities and roadmap for building a full-featured SQL layer over HBase.
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
With YARN running Docker containers, it is possible to run applications that are not HDFS aware inside these containers. It is hard to customize these applications since most of them assume a Posix file system with rewrite capabilities. In this talk, we will dive into how we created a block storage, how it is being tested internally and the storage containers which makes it all possible.
The storage container framework was developed as part of Ozone (HDFS-7240). This is talk will also explore the current state of Ozone along with CBlocks. This talk will explore architecture of storage containers, how replication is handled, scaling to millions of volumes and I/O performance optimizations.
LLAP enables sub-second analytical queries in Hive by running query fragments directly in memory on compute nodes using a long-running daemon process. It provides high performance scans and execution through an in-memory columnar cache shared across queries. LLAP queries are coordinated independently by Tez while utilizing Hive operators for processing and Tez for data transfers. It improves upon traditional MapReduce and Tez by keeping intermediate query results in memory rather than writing to disk.
This document summarizes techniques for optimizing Hive queries, including recommendations around data layout, format, joins, and debugging. It discusses partitioning, bucketing, sort order, normalization, text format, sequence files, RCFiles, ORC format, compression, shuffle joins, map joins, sort merge bucket joins, count distinct queries, using explain plans, and dealing with skew.
The Apache Hive ACID project aims to make continuously adding and modifying data in Hive tables efficient and allow long-running queries to run concurrently with updates. It introduces transactional tables that support SQL insert, update, and delete operations. Data is stored in multiple versions to allow concurrent reads and writes. Updates are written to delta files and merged periodically with the base data to improve performance and self-tune storage over time.
Apache Hive is a data warehousing system for large volumes of data stored in Hadoop. However, the data is useless unless you can use it to add value to your company. Hive provides a SQL-based query language that dramatically simplifies the process of querying your large data sets. That is especially important while your data scientists are developing and refining their queries to improve their understanding of the data. In many companies, such as Facebook, Hive accounts for a large percentage of the total MapReduce queries that are run on the system. Although Hive makes writing large data queries easier for the user, there are many performance traps for the unwary. Many of them are artifacts of the way Hive has evolved over the years and the requirement that the default behavior must be safe for all users. This talk will present examples of how Hive users have made mistakes that made their queries run much much longer than necessary. It will also present guidelines for how to get better performance for your queries and how to look at the query plan to understand what Hive is doing.
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
This document summarizes Mark Grover's presentation on application architectures with Apache Hadoop. It discusses processing clickstream data from web logs using techniques like deduplication, filtering, and sessionization in Hadoop. Specifically, it describes how to implement sessionization in MapReduce by using the user's IP address and timestamp to group log lines into sessions in the reducer.
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
Apache Tez is the new data processing framework in the Hadoop ecosystem. It runs on top of YARN - the new compute platform for Hadoop 2. Learn how Tez is built from the ground up to tackle a broad spectrum of data processing scenarios in Hadoop/BigData - ranging from interactive query processing to complex batch processing. With a high degree of automation built-in, and support for extensive customization, Tez aims to work out of the box for good performance and efficiency. Apache Hive and Pig are already adopting Tez as their platform of choice for query execution.
Apache Ambari is an open-source tool for provisioning, managing, and monitoring Hadoop clusters. It allows users to deploy Hadoop clusters, install and manage services, configure settings, and perform rolling upgrades with minimal downtime. Ambari 2.4 includes new features like role-based access control, Grafana integration for visualization, log search capabilities, and improved upgrade workflows.
This document discusses new features in Apache Hive 2.0, including:
1) Adding procedural SQL capabilities through HPLSQL for writing stored procedures.
2) Improving query performance through LLAP which uses persistent daemons and in-memory caching to enable sub-second queries.
3) Speeding up query planning by using HBase as the metastore instead of a relational database.
4) Enhancements to Hive on Spark such as dynamic partition pruning and vectorized operations.
5) Default use of the cost-based optimizer and continued improvements to statistics collection and estimation.
This document discusses the Stinger initiative to improve the performance of Apache Hive. Stinger aims to speed up Hive queries by 100x, scale queries from terabytes to petabytes of data, and expand SQL support. Key developments include optimizing Hive to run on Apache Tez, the vectorized query execution engine, cost-based optimization using Optiq, and performance improvements from the ORC file format. The goals of Stinger Phase 3 are to deliver interactive query performance for Hive by integrating these technologies.
The document discusses LLAP (Live Long and Process), a new execution layer for Hive that enables sub-second analytical queries. LLAP uses daemons running on worker nodes to cache data in memory and keep query fragments executing between queries for faster performance. It allows for highly concurrent queries without specialized YARN queues. Benchmarks show LLAP providing up to 90% faster performance over Hive for queries against large datasets. LLAP also aims to serve as a unified data access layer for other systems like Spark SQL.
This document proposes a container-based sizing framework for Apache Hadoop/Spark clusters that uses a multi-objective genetic algorithm approach. It emulates container execution on different cloud platforms to optimize configuration parameters for minimizing execution time and deployment cost. The framework uses Docker containers with resource constraints to model cluster performance on various public clouds and instance types. Optimization finds Pareto-optimal configurations balancing time and cost across objectives.
The document discusses large-scale stream processing in the Hadoop ecosystem. It provides examples of real-time stream processing use cases for computing player statistics and analyzing telco network data. It then summarizes several open source stream processing frameworks, including Apache Storm, Samza, Kafka Streams, Spark, Flink, and Apex. Key aspects like programming models, fault tolerance methods, and performance are compared for each framework. The document concludes with recommendations for further innovation in areas like dynamic scaling and batch integration.
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
This document provides an overview of stream processing with Apache Flink. It discusses the rise of stream processing and how it enables low-latency applications and real-time analysis. It then describes Flink's stream processing capabilities, including pipelining of data, fault tolerance through checkpointing and recovery, and integration with batch processing. The document also summarizes Flink's programming model, state management, and roadmap for further development.
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
Hive was the first popular SQL layer built on Hadoop and has long been known as a heavyweight SQL engine suitable mainly for long-running batch jobs. This has greatly changed since Hive was announced to the world over 8 years ago. Hortonworks and the open source community have evolved Apache Hive into a fast, dynamic SQL on Hadoop engine capable of running highly concurrent query workloads over large datasets with sub-second response time.
The latest Hortonworks and Azure HDInsight platform versions fully support Hive with LLAP execution engine for production use. In this webinar, we will go through the architecture of Hive + LLAP engine and explain how it differs from previous Hive versions. We will then dive deeper and show how features like query vectorization and LLAP columnar caching bring further automatic performance improvements.
In the end, we will show how Gluent brings these new performance benefits to traditional enterprise database platforms via transparent data virtualization, allowing even your largest databases to benefit from all this without changing any application code. Join this webinar to learn about significant improvements in modern Hive architecture and how Gluent and Hive LLAP on Hortonworks or Azure HDInsight platforms can accelerate cloud migrations and greatly improve hybrid query performance!
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
This document provides an overview of debugging Hive queries with Hadoop in the cloud. It discusses Altiscale's Hadoop as a Service platform and perspective as an operational service provider. It then covers Hadoop 2 architecture, debugging tools, accessing logs in Hadoop 2, the Hive and Hadoop architecture, Hive logs, common Hive issues and case studies on stuck jobs and missing directories. The document aims to help users better understand and troubleshoot Hive queries running on Hadoop clusters.
This document summarizes Richard Xu's presentation on tuning Yarn, Hive, and queries on a Hadoop cluster. The initial issues with the cluster included jobs taking hours to finish when they were supposed to take minutes. Initial tuning focused on cluster configuration best practices and increasing Yarn capacity. Further tuning involved limiting user capacity, increasing resources for application masters, and tuning memory settings for MapReduce and Tez. Specific Hive query issues addressed were full table scans, non-deterministic functions, join orders, and data type mismatches. Tools discussed for analysis included Tez visualization and Lipwig. Lessons learned emphasized a holistic tuning approach and understanding data structures and explain plans. Long-lived execution (LLAP) was presented as providing in
This document summarizes updates to Apache Storm presented by P. Taylor Goetz of Hortonworks at Hadoop Summit 2016. Some key points include: Storm 0.9.x added high availability features and expanded integration capabilities. Storm 1.0 focused on maturity and improved performance. New features in Storm 1.0 include Pacemaker replacing Zookeeper, distributed caching, high availability Nimbus, native streaming windows, and state management with automatic checkpointing. Storm usability was also improved with features like dynamic log levels, tuple sampling for debugging, and distributed log searching. Future integrations and performance optimizations were also discussed.
In this talk we speak about ORC (Optimized Row Columnar) file format, features and performance optimizations that went in after its initial version (Hive 0.11 back in May 2013). We will also briefly talk about the latest and greatest features, and future enhancements that are planned for Hive 0.15.
The document discusses new features in Apache Hadoop Common and HDFS for version 3.0. Key updates include upgrading the minimum Java version to Java 8, improving dependency management, adding a new Azure Data Lake Storage connector, and introducing erasure coding in HDFS to improve storage efficiency. Erasure coding in HDFS phase 1 allows for striping of small blocks and parallel writes/reads while trading off higher network usage compared to replication.
The document summarizes Apache Phoenix and its past, present, and future as a SQL interface for HBase. It describes Phoenix's architecture and key features like secondary indexes, joins, aggregations, and transactions. Recent releases added functional indexes, the Phoenix Query Server, and initial transaction support. Future plans include improvements to local indexes, integration with Calcite and Hive, and adding JSON and other SQL features. The document aims to provide an overview of Phoenix's capabilities and roadmap for building a full-featured SQL layer over HBase.
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
With YARN running Docker containers, it is possible to run applications that are not HDFS aware inside these containers. It is hard to customize these applications since most of them assume a Posix file system with rewrite capabilities. In this talk, we will dive into how we created a block storage, how it is being tested internally and the storage containers which makes it all possible.
The storage container framework was developed as part of Ozone (HDFS-7240). This is talk will also explore the current state of Ozone along with CBlocks. This talk will explore architecture of storage containers, how replication is handled, scaling to millions of volumes and I/O performance optimizations.
LLAP enables sub-second analytical queries in Hive by running query fragments directly in memory on compute nodes using a long-running daemon process. It provides high performance scans and execution through an in-memory columnar cache shared across queries. LLAP queries are coordinated independently by Tez while utilizing Hive operators for processing and Tez for data transfers. It improves upon traditional MapReduce and Tez by keeping intermediate query results in memory rather than writing to disk.
This document summarizes techniques for optimizing Hive queries, including recommendations around data layout, format, joins, and debugging. It discusses partitioning, bucketing, sort order, normalization, text format, sequence files, RCFiles, ORC format, compression, shuffle joins, map joins, sort merge bucket joins, count distinct queries, using explain plans, and dealing with skew.
The Apache Hive ACID project aims to make continuously adding and modifying data in Hive tables efficient and allow long-running queries to run concurrently with updates. It introduces transactional tables that support SQL insert, update, and delete operations. Data is stored in multiple versions to allow concurrent reads and writes. Updates are written to delta files and merged periodically with the base data to improve performance and self-tune storage over time.
Apache Hive is a data warehousing system for large volumes of data stored in Hadoop. However, the data is useless unless you can use it to add value to your company. Hive provides a SQL-based query language that dramatically simplifies the process of querying your large data sets. That is especially important while your data scientists are developing and refining their queries to improve their understanding of the data. In many companies, such as Facebook, Hive accounts for a large percentage of the total MapReduce queries that are run on the system. Although Hive makes writing large data queries easier for the user, there are many performance traps for the unwary. Many of them are artifacts of the way Hive has evolved over the years and the requirement that the default behavior must be safe for all users. This talk will present examples of how Hive users have made mistakes that made their queries run much much longer than necessary. It will also present guidelines for how to get better performance for your queries and how to look at the query plan to understand what Hive is doing.
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
This document summarizes Mark Grover's presentation on application architectures with Apache Hadoop. It discusses processing clickstream data from web logs using techniques like deduplication, filtering, and sessionization in Hadoop. Specifically, it describes how to implement sessionization in MapReduce by using the user's IP address and timestamp to group log lines into sessions in the reducer.
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
Apache Tez is the new data processing framework in the Hadoop ecosystem. It runs on top of YARN - the new compute platform for Hadoop 2. Learn how Tez is built from the ground up to tackle a broad spectrum of data processing scenarios in Hadoop/BigData - ranging from interactive query processing to complex batch processing. With a high degree of automation built-in, and support for extensive customization, Tez aims to work out of the box for good performance and efficiency. Apache Hive and Pig are already adopting Tez as their platform of choice for query execution.
Apache Ambari is an open-source tool for provisioning, managing, and monitoring Hadoop clusters. It allows users to deploy Hadoop clusters, install and manage services, configure settings, and perform rolling upgrades with minimal downtime. Ambari 2.4 includes new features like role-based access control, Grafana integration for visualization, log search capabilities, and improved upgrade workflows.
This document discusses new features in Apache Hive 2.0, including:
1) Adding procedural SQL capabilities through HPLSQL for writing stored procedures.
2) Improving query performance through LLAP which uses persistent daemons and in-memory caching to enable sub-second queries.
3) Speeding up query planning by using HBase as the metastore instead of a relational database.
4) Enhancements to Hive on Spark such as dynamic partition pruning and vectorized operations.
5) Default use of the cost-based optimizer and continued improvements to statistics collection and estimation.
This document discusses the Stinger initiative to improve the performance of Apache Hive. Stinger aims to speed up Hive queries by 100x, scale queries from terabytes to petabytes of data, and expand SQL support. Key developments include optimizing Hive to run on Apache Tez, the vectorized query execution engine, cost-based optimization using Optiq, and performance improvements from the ORC file format. The goals of Stinger Phase 3 are to deliver interactive query performance for Hive by integrating these technologies.
June 2015 Berlin Buzzwords Presentation
http://berlinbuzzwords.de/file/bbuzz-2015-szehon-ho-hive-spark
https://berlinbuzzwords.de/session/hive-spark
Speaker Interview:
https://berlinbuzzwords.de/news/speaker-interview-szehon-ho
This document discusses Hive on Spark, which allows Apache Hive queries to run on Apache Spark. It provides background on Hive, Spark, and their limitations. Hive on Spark was developed by the Hive community to leverage Spark's more efficient execution while maintaining compatibility. Examples are given of how simple and join queries are translated from Hive operations to Spark transformations and actions. Improvements to Spark needed to better support Hive are also outlined. The author thanks contributors from various organizations working on Hive on Spark.
February 2015 Hive User Group meetup at LinkedIn
http://www.meetup.com/Hive-User-Group-Meeting/events/219794523/
Presentation about physical join strategies employed used by Apache Hive and how they may be employed to optimize workflows.
Keynote slides from Big Data Spain Nov 2016. Has some thoughts on how Hadoop ecosystem is growing and changing to support the enterprise, including Hive, Spark, NiFi, security and governance, streaming, and the cloud.
Cloud deployments of Apache Hadoop are becoming more commonplace. Yet Hadoop and it's applications don't integrate that well —something which starts right down at the file IO operations. This talk looks at how to make use of cloud object stores in Hadoop applications, including Hive and Spark. It will go from the foundational "what's an object store?" to the practical "what should I avoid" and the timely "what's new in Hadoop?" — the latter covering the improved S3 support in Hadoop 2.8+. I'll explore the details of benchmarking and improving object store IO in Hive and Spark, showing what developers can do in order to gain performance improvements in their own code —and equally, what they must avoid. Finally, I'll look at ongoing work, especially "S3Guard" and what its fast and consistent file metadata operations promise.
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015alanfgates
The document discusses using Hive, HBase, Phoenix, and Calcite to build a single data store for both analytics and transaction processing. It describes some recent improvements to Hive like LLAP (Live Long and Process) that aim to achieve sub-second query response times, as well as using HBase as the Hive metastore to improve performance.
Keynote from Apache Big Data EU. This introduces training that we are doing at Hortonworks to help our employees work understand and work well as part of the Apache Software Foundation
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Hive 0.14 adds ACID transactional support which allows for inserting, updating, and deleting rows in Hive tables. It uses a new transaction manager and lock manager to provide snapshot isolation across DML statements. Data is stored in HDFS in a layout of base files and transactional delta files which are compacted periodically. This allows Hive to support use cases beyond batch loads such as streaming data ingest and updating dimension tables.
This document discusses empowering Apache Hive with Apache Spark. It provides background on Hive and Spark, outlines the architecture and design principles for integrating the two, discusses challenges, and provides benchmarks comparing performance of Hive on Spark versus Hive on MapReduce and Tez. The key points are: Hive on Spark allows reusing existing Hive code and features while benefiting from Spark's improved performance; efforts involved contributions from both Hive and Spark communities; benchmarks on 320GB and 4TB datasets showed Hive on Spark was sometimes faster than Hive on Tez and significantly faster than Hive on MapReduce, though Tez performance improved with dynamic partition pruning not yet implemented for Hive on Spark.
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Kevin Mao
Strata Hadoop World 2017 San Jose
Today’s enterprise architectures are often composed of a myriad of heterogeneous devices. Bring-your-own-device policies, vendor diversification, and the transition to the cloud all contribute to a sprawling infrastructure, the complexity and scale of which can only be addressed by using modern distributed data processing systems.
Kevin Mao outlines the system that Capital One has built to collect, clean, and analyze the security-related events occurring within its digital infrastructure. Raw data from each component is collected and preprocessed using Apache NiFi flows. This raw data is then written into an Apache Kafka cluster, which serves as the primary communications backbone of the platform. The raw data is parsed, cleaned, and enriched in real time via Apache Metron and Apache Storm and ingested into ElasticSearch, allowing operations teams to detect and monitor events as they occur. The refined data is also transformed into the Apache ORC data format and stored in Amazon S3, allowing data scientists to perform long-term, batch-based analysis.
Kevin discusses the challenges involved with architecting and implementing this system, such as data quality, performance tuning, and the impact of additional financial regulations relating to data governance, and shares the results of these efforts and the value that the data platform brings to Capital One.
Hive on spark is blazing fast or is it finalHortonworks
This presentation was given at the Strata + Hadoop World, 2015 in San Jose.
Apache Hive is the most popular and most widely used SQL solution for Hadoop. To keep pace with Hadoop’s increasingly vital role in the Enterprise, Hive has transformed from a batch-only, high-latency system into a modern SQL engine capable of both batch and interactive queries over large datasets. Hive’s momentum is accelerating: With Spark integration and a shift to in-memory processing on the horizon, Hive continues to expand the boundaries of Big Data.
In this talk the speakers examined Hive performance, past, present and future. In particular they looked at Hive’s origins as a petabyte scale SQL engine.
Through some numbers and graphs, they showed how Hive became 100x faster by moving beyond MapReduce, by vectorizing execution and by introducing a cost-based optimizer.
They detailed and discussed the challenges of scalable SQL on Hadoop.
The looked into Hive’s sub-second future, powered by LLAP and Hive on Spark.
And showed just how fast Hive on Spark really is.
Architecting a Next Generation Data Platformhadooparchbook
This document discusses a presentation on architecting Hadoop application architectures for a next generation data platform. It provides an overview of the presentation topics which include a case study on using Hadoop for an Internet of Things and entity 360 application. It introduces the key components of the proposed high level architecture including ingesting streaming and batch data using Kafka and Flume, stream processing with Kafka streams and storage in Hadoop.
Este documento resume los conceptos clave de la investigación cualitativa. Explica que la investigación cualitativa estudia cómo se construye la realidad desde la perspectiva de los participantes, en contraste con la investigación cuantitativa. Luego describe las características, orígenes, métodos como la etnografía y la teoría fundamentada, y las diferencias con la investigación cuantitativa. Finalmente, cubre temas como el análisis de datos cualitativos y las representaciones utilizadas en los informes.
Strata San Jose 2017 - Ben Sharma PresentationZaloni
The document discusses creating a modern data architecture using a data lake. It describes Zaloni as a provider of data lake management solutions, including a data lake management and governance platform and self-service data platform. It outlines key features of a data lake such as storing different types of data, creating standardized datasets, and providing shorter time to insights. The document also discusses Zaloni's data lake maturity model and reference architecture.
Overview of stinger interactive query for hiveDavid Kaiser
This document provides an overview of the Stinger initiative to improve the performance of Hive interactive queries. The Stinger project worked to optimize Hive so that queries return results in seconds instead of minutes or hours by implementing features like Hive on Tez, vectorized processing, predicate pushdown, the ORC file format, and a cost-based optimizer. These optimizations improved Hive performance by over 100 times, allowing interactive use of Hive for the first time on large datasets.
This document discusses interactive querying in Hadoop. It describes how Hive facilitates SQL querying over data stored in HDFS. Hive performance is improved through optimizations like using Tez as the execution engine instead of MapReduce, vectorized queries, and ORC file format. Tez is a dataflow framework that allows expressing queries as directed acyclic graphs (DAGs) of vertices and edges, avoiding the multi-step MapReduce approach and improving latency. The document provides examples of expressing Hive queries in Tez and demonstrates its capabilities.
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
호튼웍스 아시아 기술 총괄 이사 제프 마크햄 (Jeff Markham) 이 테즈에 대한 소개를 합니다. 테즈는 맵리듀스를 대체하여 하둡의 질의 처리를 가속하는 소프트웨어입니다. 왜 테즈를 만들었고, 어떻게 구성되었으며, 최적화는 어떻게 진행되고, 그 성능은 얼마나 좋아졌는지 전반에 대해 설명합니다.
Operating multi-tenant clusters requires careful planning of capacity for on-time launch of big data projects and applications within expected budget and with appropriate SLA guarantees. Making such guarantees with a set of standard hardware configurations is key to operate big data platforms as a hosted service for your organization.
This talk highlights the tools, techniques and methodology applied on a per-project or user basis across three primary multi-tenant deployments in the Apache Hadoop ecosystem, namely MapReduce/YARN and HDFS, HBase, and Storm due to the significance of capital investments with increasing scale in data nodes, region servers, and supervisor nodes respectively. We will demo the estimation tools developed for these deployments that can be used for capital planning and forecasting, and cluster resource and SLA management, including making latency and throughput guarantees to individual users and projects.
As we discuss the tools, we will share considerations that got incorporated to come up with the most appropriate calculation across these three primary deployments. We will discuss the data sources for calculations, resource drivers for different use cases, and how to plan for optimum capacity allocation per project with respect to given standard hardware configurations.
Bobby Evans and Tom Graves, the engineering leads for Spark and Storm development at Yahoo will talk about how these technologies are used on Yahoo's grids and reasons why to use one or the other.
Bobby Evans is the low latency data processing architect at Yahoo. He is a PMC member on many Apache projects including Storm, Hadoop, Spark, and Tez. His team is responsible for delivering Storm as a service to all of Yahoo and maintaining Spark on Yarn for Yahoo (Although Tom really does most of that work).
Tom Graves a Senior Software Engineer on the Platform team at Yahoo. He is an Apache PMC member on Hadoop, Spark, and Tez. His team is responsible for delivering and maintaining Spark on Yarn for Yahoo.
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
The document discusses Hadoop 2.2.0 and new features in YARN and MapReduce. Key points include: YARN introduces a new application framework and resource management system that replaces the jobtracker, allowing multiple data processing engines besides MapReduce; MapReduce is now a library that runs on YARN; Tez is introduced as a new data processing framework to improve performance beyond MapReduce.
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
Apache Spark has grown to be one of the largest open source communities in big data, with over 190 developers and dozens of companies contributing. The latest 1.0 release alone includes contributions from 117 people. A clean API, interactive shell, distributed in-memory computation, stream processing, interactive SQL, and libraries delivering everything from machine learning to graph processing make it an excellent unified platform to solve a number of problems. Apache Spark works very well with a growing number of big data solutions, including Cassandra and Hadoop. Come learn about Apache Spark and see how easy it is for you to get started using Spark to build your own high performance big data applications today.
Owen O'Malley is an architect at Yahoo who works full-time on Hadoop. He discusses Hadoop's origins, how it addresses the problem of scaling applications to large datasets, and its key components including HDFS and MapReduce. Yahoo uses Hadoop extensively, including for building its Webmap and running experiments on large datasets.
Tez is a data processing framework that allows dataflow jobs to be expressed as directed acyclic graphs (DAGs). It is built on top of YARN for resource management and aims to provide better performance than MapReduce by enabling container reuse, late binding of tasks, and simplifying operations. Tez defines APIs for developers to express DAGs and processing logic to customize jobs.
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
The document provides an overview of the Hadoop ecosystem and how several large companies such as Google, Yahoo, Facebook, and others use Hadoop in production. It discusses the key components of Hadoop including HDFS, MapReduce, HBase, Pig, Hive, Zookeeper and others. It also summarizes some of the large-scale usage of Hadoop at these companies for applications such as web indexing, analytics, search, recommendations, and processing massive amounts of data.
This document discusses Hive on Spark, including background on Hive, Spark, and the Shark project. It describes how Hive on Spark keeps the same physical abstraction as Hive on Tez/MR to be architecturally compatible. Examples are provided of how a simple query and join query are executed in MapReduce and Spark formats. Improvements to Spark for reduce-side joins and remote Spark contexts are also discussed.
Hortonworks' mission is to enable modern data architectures by delivering an enterprise-ready Apache Hadoop platform. They contribute the majority of code to Apache Hadoop and its related projects. Hortonworks develops the Hortonworks Data Platform (HDP), which provides core Hadoop services along with operational and data services to make Hadoop an enterprise data platform. Hortonworks aims to power data architectures by enabling Hadoop as a multi-purpose platform for batch, interactive, streaming and other workloads through projects like YARN, Tez, and improvements to Hive.
Apache Tajo is a big data warehouse system that runs on Hadoop. It supports SQL standards and features powerful distributed processing, advanced query optimization, and the ability to handle long-running queries (hours) and interactive analysis queries (100 milliseconds). Tajo uses a master-slave architecture with a TajoMaster managing metadata and slave TajoWorkers running query tasks in a distributed fashion.
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
Todd Lipcon explains why you should be interested in Apache Hadoop, what it is, and how it works. Todd also brings to light the Hadoop ecosystem and real business use cases that evolve around Hadoop and the ecosystem.
The Fundamentals Guide to HDP and HDInsightGert Drapers
This session will give you the architectural overview and introduction in to inner workings of HDP 2.0 (http://hortonworks.com/products/hdp-windows/) and HDInsight. The world has embraced the Hadoop toolkit to solve their data problems from ETL, data warehouses to event processing pipelines. As Hadoop consists of many components, services and interfaces, understanding its architecture is crucial, before you can successfully integrate it in to your own environment.
This document discusses using Apache Spark with object stores like Amazon S3 and Microsoft Azure Blob Storage. It covers challenges around classpath configuration, credentials, code examples, and performance commitments when using these storage systems. Key points include using Hadoop connectors like S3A and WASB, configuring credentials through properties or environment variables, and tuning Spark for object store performance and consistency.
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
This document discusses Apache Spark, a fast and general engine for big data processing. It describes how Spark generalizes the MapReduce model through its Resilient Distributed Datasets (RDDs) abstraction, which allows efficient sharing of data across parallel operations. This unified approach allows Spark to support multiple types of processing, like SQL queries, streaming, and machine learning, within a single framework. The document also outlines ongoing developments like Spark SQL and improved machine learning capabilities.
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
This document provides an introduction to Big Data and Apache Hadoop. It defines Big Data as large and complex datasets that are difficult to process using traditional database tools. It describes how Hadoop uses MapReduce and HDFS to provide scalable storage and parallel processing of Big Data. It provides examples of companies using Hadoop to analyze exabytes of data and common Hadoop use cases like log analysis. Finally, it summarizes some popular Hadoop ecosystem projects like Hive, Pig, and Zookeeper that provide SQL-like querying, data flows, and coordination.
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
Hive 3 new SQL features including LLAP, workload management, SQL over Kafka and JDBC data sources, integration with Spark via Hive Warehouse Connector, ACID 2, and constraints and default values
This document discusses improvements to Hive performance and functionality in the Stinger initiative. Stinger includes changes to Hive and a new project called Tez, with two main goals: improve Hive performance by 100x and extend Hive SQL for analytics. Stinger is divided into three phases, with phase 1 focusing on optimizations, phase 2 adding YARN resource management and Hive on Tez, and phase 3 adding a buffer cache and cost-based optimizer. Hive 0.11 delivers performance gains through optimizations like improved map joins and collapsing jobs. It also introduces new technologies like Tez, ORC files, and vectorization. Standard queries now run much faster, with some seeing over 50x speedup. Future work will further reduce query startup
This document discusses how coordinating the many tools of big data has become more complex with the rise of cloud computing and large datasets. It argues that while having many tools provides flexibility, it also leads to inefficiencies as tools do not integrate well and developers end up duplicating work. The document proposes that Hadoop can help address these issues by providing shared services that tools can leverage, such as common table management, metadata access, and a new execution engine called Tez that allows for more efficient pipelining of jobs compared to the traditional MapReduce approach. Coordinating tools through shared services allows users to focus on selecting the right tool for a task while reducing redundant development work.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
query 52 star join followed by group/order (different keys), selective filterquery 55 same
query 28: 4subquery joinquery 12: star join over range of dates
query 1: SELECT pageURL, pageRank FROM rankings WHERE pageRank > X
SELECT SUBSTR(sourceIP, 1, X), SUM(adRevenue) FROM uservisits GROUP BYSUBSTR(sourceIP, 1, X)
SELECT sourceIP, totalRevenue, avgPageRankFROM (SELECT sourceIP, AVG(pageRank) as avgPageRank, SUM(adRevenue) as totalRevenue FROM Rankings AS R, UserVisits AS UV WHERE R.pageURL = UV.destURL AND UV.visitDate BETWEEN Date(`1980-01-01') AND Date(`X') GROUP BY UV.sourceIP)ORDER BY totalRevenue DESC LIMIT 1
With Hive and Stinger we are focused on enabling the SQL ecosystem and to do that we’ve put Hive on a clear roadmap to SQL compliance.That includes adding critical datatypes like character and date types as well as implementing common SQL semantics seen in most databases.