Apache Hive has been continuously evolving to support a broad range of use cases, bringing it beyond its batch processing roots to its current support for interactive queries with sub-second response times using LLAP. However, the development of its execution internals is not sufficient to guarantee efficient performance, since poorly optimized queries can create a bottleneck in the system. Hence, each release of Hive has included new features for its optimizer aimed to generate better plans and deliver improvements to query execution. In this talk, we present the development of the optimizer since its initial release. We describe its current state and how Hive leverages the latest Apache Calcite features to generate the most efficient execution plans. We show numbers demonstrating the improvements brought to Hive performance, and we discuss future directions for the next-generation Hive optimizer, which include an enhanced cost model, materialized views support, and complex query decorrelation.
Over the last eighteen months, we have seen significant adoption of Hadoop eco-system centric big data processing in Microsoft Azure and Amazon AWS. In this talk we present some of the lessons learned and architectural considerations for cloud-based deployments including security, fault tolerance and auto-scaling.
We look at how Hortonworks Data Cloud and Cloudbreak can automate that scaling of Hadoop clusters, showing how it can react dynamically to workloads, and what that can deliver in cost-effective Hadoop-in-cloud deployments.
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetDataWorks Summit
The landscape for storing your big data is quite complex, with several competing formats and different implementations of each format. Understanding your use of the data is critical for picking the format. Depending on your use case, the different formats perform very differently. Although you can use a hammer to drive a screw, it isn’t fast or easy to do so.
The use cases that we’ve examined are:
* reading all of the columns
* reading a few of the columns
* filtering using a filter predicate
* writing the data
Furthermore, different kinds of data have distinct properties. We've used three real schemas:
* the NYC taxi data http://tinyurl.com/nyc-taxi-analysis
* the Github access logs http://githubarchive.org
* a typical sales fact table with generated data
Finally, the value of having open source benchmarks that are available to all interested parties is hugely important and all of the code is available from Apache.
Speaker
Owen O'Malley, Co-founder & Technical Fellow, Hortonworks
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Speaker
Alan Gates, Co-founder, Hortonworks
Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.
Over the last eighteen months, we have seen significant adoption of Hadoop eco-system centric big data processing in Microsoft Azure and Amazon AWS. In this talk we present some of the lessons learned and architectural considerations for cloud-based deployments including security, fault tolerance and auto-scaling.
We look at how Hortonworks Data Cloud and Cloudbreak can automate that scaling of Hadoop clusters, showing how it can react dynamically to workloads, and what that can deliver in cost-effective Hadoop-in-cloud deployments.
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetDataWorks Summit
The landscape for storing your big data is quite complex, with several competing formats and different implementations of each format. Understanding your use of the data is critical for picking the format. Depending on your use case, the different formats perform very differently. Although you can use a hammer to drive a screw, it isn’t fast or easy to do so.
The use cases that we’ve examined are:
* reading all of the columns
* reading a few of the columns
* filtering using a filter predicate
* writing the data
Furthermore, different kinds of data have distinct properties. We've used three real schemas:
* the NYC taxi data http://tinyurl.com/nyc-taxi-analysis
* the Github access logs http://githubarchive.org
* a typical sales fact table with generated data
Finally, the value of having open source benchmarks that are available to all interested parties is hugely important and all of the code is available from Apache.
Speaker
Owen O'Malley, Co-founder & Technical Fellow, Hortonworks
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Speaker
Alan Gates, Co-founder, Hortonworks
Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.
LLAP (Live Long and Process) is the newest query acceleration engine for Hive 2.0, which entered GA in 2017. LLAP brings into light a new set of trade-offs and optimizations that allows for efficient and secure multi-user BI systems on the cloud. In this talk, we discuss the specifics of building a modern BI engine within those boundaries, designed to be fast and cost-effective on the public cloud. The focus of the LLAP cache is to speed up common BI query patterns on the cloud, while avoiding most of the operational administration overheads of maintaining a caching layer, with an automatically coherent cache with intelligent eviction and support for custom file formats from text to ORC, and explore the possibilities of combining the cache with a transactional storage layer which supports online UPDATE and DELETES without full data reloads. LLAP by itself, as a relational data layer, extends the same caching and security advantages to any other data processing framework. We overview the structure of such a hybrid system, where both Hive and Spark use LLAP to provide SQL query acceleration on the cloud with new, improved concurrent query support and production-ready tools and UI.
Speaker
Sergey Shelukin, Member of Technical Staff, Hortonworks
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
Over the last eighteen months, we have seen significant adoption of Hadoop eco-system centric big data processing in Microsoft Azure and Amazon AWS. In this talk we present some of the lessons learned and architectural considerations for cloud-based deployments including security, fault tolerance and auto-scaling.
We look at how Hortonworks Data Cloud and Cloudbreak can automate that scaling of Hadoop clusters, showing how it can react dynamically to workloads, and what that can deliver in cost-effective Hadoop-in-cloud deployments.
Protecting your Critical Hadoop Clusters Against DisastersDataWorks Summit
Our enterprise customers are deploying business critical applications on Hadoop clusters and now, want a business continuity solution -that will protect against disasters and cover both processed and unstructured data with varying recovery point objective (RPO) requirements. Our customers are also asking for backup & restore of select unstructured data and databases, in case of accidental deletion by users. They are asking us to automagically tier and move data that becomes less frequently accessed over time to a high-density, slower media or cloud. We will unveil a product suite that is going to solve those customer pain points in phases, starting with Disaster Recovery of Hadoop eco-system with a single source of truth enforcement. We will also cover the deep dive architecture that required extensive changes in Hive, HDFS, Ranger, Atlas (more in pipeline) and demonstrate the end to end functioning of our data lifecycle management.
Speakers:
Jeff Sposetti, Product Management, Hortonworks
Venkat Ranganathan, Director of Engineering, Hortonworks
How to Use Apache Zeppelin with HWX HDBHortonworks
Part five in a five-part series, this webcast will be a demonstration of the integration of Apache Zeppelin and Pivotal HDB. Apache Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. This webinar will demonstrate the configuration of the psql interpreter and the basic operations of Apache Zeppelin when used in conjunction with Hortonworks HDB.
Many Organizations are currently processing various types of data and in different formats. Most often this data will be in free form, As the consumers of this data growing it’s imperative that this free-flowing data needs to adhere to a schema. It will help data consumers to have an expectation of about the type of data they are getting and also they will be able to avoid immediate impact if the upstream source changes its format. Having a uniform schema representation also gives the Data Pipeline a really easy way to integrate and support various systems that use different data formats.
SchemaRegistry is a central repository for storing, evolving schemas. It provides an API & tooling to help developers and users to register a schema and consume that schema without having any impact if the schema changed. Users can tag different schemas and versions, register for notifications of schema changes with versions etc.
In this talk, we will go through the need for a schema registry and schema evolution and showcase the integration with Apache Nifi, Apache Kafka, Apache Storm.
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3DataWorks Summit
Deep learning is useful for enterprises tasks in the field of speech recognition, image classification, AI chatbots, and machine translation, just to name a few.
In order to train deep learning/machine learning models, applications such as TensorFlow, MXNet, Caffe, and XGBoost can be leveraged. And sometimes these applications will be used together to solve different problems.
To make distributed deep learning/machine learning applications easily launched, managed, and monitored, we introduced, in Apache Hadoop 3.x, YARN native services along with other improvements such as first-class GPU support, container-DNS support, scheduling improvements, etc. These improvements make distributed deep learning/machine learning applications run on YARN as simple as running it locally, which can let machine learning engineers focus on algorithms instead of worrying about underlying infrastructure. Also, YARN can better manage a shared cluster which runs deep learning/machine learning and other services and ETL jobs with these improvements.
In this session, we will take a closer look at these improvements and show how to run these applications on YARN with demos. Audiences can start trying running these applications on YARN after this talk.
Speakers
Wanga Tan, Staff Software Engineer, Hortonworks
Sunil Govindan, Staff Engineer, Hortonworks
Apache Hadoop 3 is coming! As the next major milestone for hadoop and big data, it attracts everyone's attention as showcase several bleeding-edge technologies and significant features across all components of Apache Hadoop: Erasure Coding in HDFS, Docker container support, Apache Slider integration and Native service support, Application Timeline Service version 2, Hadoop library updates and client-side class path isolation, etc. In this talk, first we will update the status of Hadoop 3.0 releasing work in apache community and the feasible path through alpha, beta towards GA. Then we will go deep diving on each new feature, include: development progress and maturity status in Hadoop 3. Last but not the least, as a new major release, Hadoop 3.0 will contain some incompatible API or CLI changes which could be challengeable for downstream projects and existing Hadoop users for upgrade - we will go through these major changes and explore its impact to other projects and users.
Speaker: Sanjay Radia, Founder and Chief Architect, Hortonworks
Apache Ambari is an extensible framework that simplifies provisioning, managing and monitoring Hadoop clusters. Apache Ambari was built on a standardized stack-based operations model. Stacks wrap services of all shapes and sizes with a consistent definition and lifecycle-control layer; thereby providing a consistent approach for managing and monitoring the services. This also provided a natural extension point for operators and the community to bring in their own add-on services and “plug-in” the new services into the stack.
However, one of the fundamental limitations of the current Apache Ambari architecture has been that there is a strong one-on-one coupling between entities. For instance, a cluster is tied to a single stack and a Hadoop operator can only deploy services defined in that stack, a cluster can have only a single instance of a service and a host can have only a single instance of a component. Taking into consideration various use case scenarios that cannot be enabled due to these limitations there is a growing need to revamp the Ambari architecture.
In this talk, we propose a revamped Apache Ambari architecture that will open up the floodgates for a wide range of scenarios that wouldn’t have been possible thus far. We will focus the discussion on a new mpack-based operations model that will replace the stack-based operations model. A management package is a self-contained deployment artifact that includes all the details for deploying, managing and upgrading a set of services bundled in the package. A third-party provider can also build their own management package containing their custom services. This eliminates the need to plug-in their services into a stack and also can define their own upgrade story for these custom services. A Hadoop operator will be able to deploy a Hadoop cluster with a mix of services across multiple packages instead of being limited to a single stack. For example, it would be possible to deploy a cluster with HDFS from HDP and NIFI from HDF.
Further, we will also discuss about the architectural changes needed to enable a multi instance architecture in future Ambari releases to support deploying multiple instances of a service in a cluster, deploying multiple instances of a component on a host as well as future proofing the Ambari architecture to leverage some of the advancements happening in the Hadoop community like YARN services (YARN-4692). We will wrap up the conversation with a brief overview of other improvements planned for future releases of Ambari.
Apache Ambari is now the preferred way of provisioning, managing and monitoring Hadoop Clusters. Ambari helps users to manage Hadoop clusters simplifying actions such as upgrades, configuration management, service management, etc. From release 2.0, Ambari started supporting automated Rolling Upgrades. This was further enhanced with release 2.2.0.0 to include support for Express Upgrades, which allows users to upgrade large scale clusters faster but requiring cluster downtime.
This talk will cover planning and execution of Hadoop cluster upgrades from an operational perspective. The talk will also cover the internals of the upgrade process including the various stages such as pre-upgrade, backup, service checks, configuration upgrades, and finalization. Finally, the talk will cover troubleshooting upgrade failures, monitoring services during upgrades and post upgrade actions. The presentation will conclude with a case study that will cover how the upgrade process works on a large cluster (including aspects such as planning the upgrade, the amount of time required for the various stages, and troubleshooting)
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsHortonworks
Customers are preparing themselves to analyze and manage an increasing quantity of structured and unstructured data. Business leaders introduce new analytical workloads faster than what IT departments can handle. Legacy IT infrastructure needs to evolve to deliver operational improvements and cost containment, while increasing flexibility to meet future requirements. By providing HDP on IBM Power Systems, Hortonworks and IBM are giving customers have more choice in selecting the appropriate architectural platform that is right for them. In this webinar, we’ll discuss some of the challenges with deploying big data platforms, and how choosing solutions built with HDP on IBM Power Systems can offer tangible benefits and flexibility to accommodate changing needs.
Accelerate Your Big Data Analytics Efforts with SAS and HadoopDataWorks Summit
Analytics and machine learning continue to be the top use cases for deploying big data platforms such as Hadoop. SAS recognised the potential and power of Hadoop platform early on and has been integrating analytical solutions with Hadoop to leverage the power and flexibility of Hadoop for analytical workloads. The combination of SAS and Hadoop offers developers and organisations an approach that can accelerate the development and deployment of big data analytics applications that are mature, proven and scalable. Furthermore, by giving developers and analysts analytical applications that are rich, proven and collaborative, SAS allows more users across different skill levels to unleash the value of data stored in big data platform more easily and quickly.
In this session, we will cover common big data analytics use cases, the depth and breadth of SAS analytical capabilities on Hadoop, and how SAS solutions are integrated into the Hadoop ecosystem via technologies such as Hive, YARN and Spark.
Speaker
Felix Liao, SAS Institute Australia & New Zealand
Apache Hadoop YARN is a modern resource-management platform that can host multiple data processing engines for various workloads like batch processing (MapReduce), interactive SQL (Hive, Tez), real-time processing (Storm), existing services and a wide variety of custom applications. These applications can all co-exist on YARN and share a single data center in a cost-effective manner with the platform worrying about resource management, isolation and multi-tenancy.
YARN is now adding support for services in a first class manner. This talk will first cover the challenges of running services on YARN, and then move on to the changes that were made to the ResourceManager to support scheduling services on YARN(such as affinity and anti-affinity). The talk will then move on to cover the changes made in the NodeManager and features such as container restart and container upgrades. The talk will also cover new additions to YARN like the new application manager (that will allow users to bring services workloads onto YARN by providing features such as container orchestration and management) and the DNS server that uses the YARN registry to enable service discovery.
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
Apache Hive has been continuously evolving to support a broad range of use cases, bringing it beyond its batch processing roots to its current support for interactive queries with sub-second response times using LLAP. However, the development of its execution internals is not sufficient to guarantee efficient performance, since poorly optimized queries can create a bottleneck in the system. Hence, each release of Hive has included new features for its optimizer aimed to generate better plans and deliver improvements to query execution. In this talk, we present the development of the optimizer since its initial release. We describe its current state and how Hive leverages the latest Apache Calcite features to generate the most efficient execution plans. We show numbers demonstrating the improvements brought to Hive performance, and we discuss future directions for the next-generation Hive optimizer, which include an enhanced cost model, materialized views support, and complex query decorrelation.
Cost-based query optimization in Apache Hive 0.14Julian Hyde
Tez is making Hive faster, and now cost-based optimization (CBO) is making it smarter. A new initiative in Hive introduces cost-based optimization for the first time, based on the Optiq framework. Optiq's lead developer Julian Hyde shows the improvements that CBO is bringing in Apache Hive 0.14.
For those interested in Hive internals, he gives an overview of the Optiq framework and shows some of the improvements that are coming to future versions of Hive with the Stinger.next initiative.
LLAP (Live Long and Process) is the newest query acceleration engine for Hive 2.0, which entered GA in 2017. LLAP brings into light a new set of trade-offs and optimizations that allows for efficient and secure multi-user BI systems on the cloud. In this talk, we discuss the specifics of building a modern BI engine within those boundaries, designed to be fast and cost-effective on the public cloud. The focus of the LLAP cache is to speed up common BI query patterns on the cloud, while avoiding most of the operational administration overheads of maintaining a caching layer, with an automatically coherent cache with intelligent eviction and support for custom file formats from text to ORC, and explore the possibilities of combining the cache with a transactional storage layer which supports online UPDATE and DELETES without full data reloads. LLAP by itself, as a relational data layer, extends the same caching and security advantages to any other data processing framework. We overview the structure of such a hybrid system, where both Hive and Spark use LLAP to provide SQL query acceleration on the cloud with new, improved concurrent query support and production-ready tools and UI.
Speaker
Sergey Shelukin, Member of Technical Staff, Hortonworks
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
Over the last eighteen months, we have seen significant adoption of Hadoop eco-system centric big data processing in Microsoft Azure and Amazon AWS. In this talk we present some of the lessons learned and architectural considerations for cloud-based deployments including security, fault tolerance and auto-scaling.
We look at how Hortonworks Data Cloud and Cloudbreak can automate that scaling of Hadoop clusters, showing how it can react dynamically to workloads, and what that can deliver in cost-effective Hadoop-in-cloud deployments.
Protecting your Critical Hadoop Clusters Against DisastersDataWorks Summit
Our enterprise customers are deploying business critical applications on Hadoop clusters and now, want a business continuity solution -that will protect against disasters and cover both processed and unstructured data with varying recovery point objective (RPO) requirements. Our customers are also asking for backup & restore of select unstructured data and databases, in case of accidental deletion by users. They are asking us to automagically tier and move data that becomes less frequently accessed over time to a high-density, slower media or cloud. We will unveil a product suite that is going to solve those customer pain points in phases, starting with Disaster Recovery of Hadoop eco-system with a single source of truth enforcement. We will also cover the deep dive architecture that required extensive changes in Hive, HDFS, Ranger, Atlas (more in pipeline) and demonstrate the end to end functioning of our data lifecycle management.
Speakers:
Jeff Sposetti, Product Management, Hortonworks
Venkat Ranganathan, Director of Engineering, Hortonworks
How to Use Apache Zeppelin with HWX HDBHortonworks
Part five in a five-part series, this webcast will be a demonstration of the integration of Apache Zeppelin and Pivotal HDB. Apache Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. This webinar will demonstrate the configuration of the psql interpreter and the basic operations of Apache Zeppelin when used in conjunction with Hortonworks HDB.
Many Organizations are currently processing various types of data and in different formats. Most often this data will be in free form, As the consumers of this data growing it’s imperative that this free-flowing data needs to adhere to a schema. It will help data consumers to have an expectation of about the type of data they are getting and also they will be able to avoid immediate impact if the upstream source changes its format. Having a uniform schema representation also gives the Data Pipeline a really easy way to integrate and support various systems that use different data formats.
SchemaRegistry is a central repository for storing, evolving schemas. It provides an API & tooling to help developers and users to register a schema and consume that schema without having any impact if the schema changed. Users can tag different schemas and versions, register for notifications of schema changes with versions etc.
In this talk, we will go through the need for a schema registry and schema evolution and showcase the integration with Apache Nifi, Apache Kafka, Apache Storm.
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3DataWorks Summit
Deep learning is useful for enterprises tasks in the field of speech recognition, image classification, AI chatbots, and machine translation, just to name a few.
In order to train deep learning/machine learning models, applications such as TensorFlow, MXNet, Caffe, and XGBoost can be leveraged. And sometimes these applications will be used together to solve different problems.
To make distributed deep learning/machine learning applications easily launched, managed, and monitored, we introduced, in Apache Hadoop 3.x, YARN native services along with other improvements such as first-class GPU support, container-DNS support, scheduling improvements, etc. These improvements make distributed deep learning/machine learning applications run on YARN as simple as running it locally, which can let machine learning engineers focus on algorithms instead of worrying about underlying infrastructure. Also, YARN can better manage a shared cluster which runs deep learning/machine learning and other services and ETL jobs with these improvements.
In this session, we will take a closer look at these improvements and show how to run these applications on YARN with demos. Audiences can start trying running these applications on YARN after this talk.
Speakers
Wanga Tan, Staff Software Engineer, Hortonworks
Sunil Govindan, Staff Engineer, Hortonworks
Apache Hadoop 3 is coming! As the next major milestone for hadoop and big data, it attracts everyone's attention as showcase several bleeding-edge technologies and significant features across all components of Apache Hadoop: Erasure Coding in HDFS, Docker container support, Apache Slider integration and Native service support, Application Timeline Service version 2, Hadoop library updates and client-side class path isolation, etc. In this talk, first we will update the status of Hadoop 3.0 releasing work in apache community and the feasible path through alpha, beta towards GA. Then we will go deep diving on each new feature, include: development progress and maturity status in Hadoop 3. Last but not the least, as a new major release, Hadoop 3.0 will contain some incompatible API or CLI changes which could be challengeable for downstream projects and existing Hadoop users for upgrade - we will go through these major changes and explore its impact to other projects and users.
Speaker: Sanjay Radia, Founder and Chief Architect, Hortonworks
Apache Ambari is an extensible framework that simplifies provisioning, managing and monitoring Hadoop clusters. Apache Ambari was built on a standardized stack-based operations model. Stacks wrap services of all shapes and sizes with a consistent definition and lifecycle-control layer; thereby providing a consistent approach for managing and monitoring the services. This also provided a natural extension point for operators and the community to bring in their own add-on services and “plug-in” the new services into the stack.
However, one of the fundamental limitations of the current Apache Ambari architecture has been that there is a strong one-on-one coupling between entities. For instance, a cluster is tied to a single stack and a Hadoop operator can only deploy services defined in that stack, a cluster can have only a single instance of a service and a host can have only a single instance of a component. Taking into consideration various use case scenarios that cannot be enabled due to these limitations there is a growing need to revamp the Ambari architecture.
In this talk, we propose a revamped Apache Ambari architecture that will open up the floodgates for a wide range of scenarios that wouldn’t have been possible thus far. We will focus the discussion on a new mpack-based operations model that will replace the stack-based operations model. A management package is a self-contained deployment artifact that includes all the details for deploying, managing and upgrading a set of services bundled in the package. A third-party provider can also build their own management package containing their custom services. This eliminates the need to plug-in their services into a stack and also can define their own upgrade story for these custom services. A Hadoop operator will be able to deploy a Hadoop cluster with a mix of services across multiple packages instead of being limited to a single stack. For example, it would be possible to deploy a cluster with HDFS from HDP and NIFI from HDF.
Further, we will also discuss about the architectural changes needed to enable a multi instance architecture in future Ambari releases to support deploying multiple instances of a service in a cluster, deploying multiple instances of a component on a host as well as future proofing the Ambari architecture to leverage some of the advancements happening in the Hadoop community like YARN services (YARN-4692). We will wrap up the conversation with a brief overview of other improvements planned for future releases of Ambari.
Apache Ambari is now the preferred way of provisioning, managing and monitoring Hadoop Clusters. Ambari helps users to manage Hadoop clusters simplifying actions such as upgrades, configuration management, service management, etc. From release 2.0, Ambari started supporting automated Rolling Upgrades. This was further enhanced with release 2.2.0.0 to include support for Express Upgrades, which allows users to upgrade large scale clusters faster but requiring cluster downtime.
This talk will cover planning and execution of Hadoop cluster upgrades from an operational perspective. The talk will also cover the internals of the upgrade process including the various stages such as pre-upgrade, backup, service checks, configuration upgrades, and finalization. Finally, the talk will cover troubleshooting upgrade failures, monitoring services during upgrades and post upgrade actions. The presentation will conclude with a case study that will cover how the upgrade process works on a large cluster (including aspects such as planning the upgrade, the amount of time required for the various stages, and troubleshooting)
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsHortonworks
Customers are preparing themselves to analyze and manage an increasing quantity of structured and unstructured data. Business leaders introduce new analytical workloads faster than what IT departments can handle. Legacy IT infrastructure needs to evolve to deliver operational improvements and cost containment, while increasing flexibility to meet future requirements. By providing HDP on IBM Power Systems, Hortonworks and IBM are giving customers have more choice in selecting the appropriate architectural platform that is right for them. In this webinar, we’ll discuss some of the challenges with deploying big data platforms, and how choosing solutions built with HDP on IBM Power Systems can offer tangible benefits and flexibility to accommodate changing needs.
Accelerate Your Big Data Analytics Efforts with SAS and HadoopDataWorks Summit
Analytics and machine learning continue to be the top use cases for deploying big data platforms such as Hadoop. SAS recognised the potential and power of Hadoop platform early on and has been integrating analytical solutions with Hadoop to leverage the power and flexibility of Hadoop for analytical workloads. The combination of SAS and Hadoop offers developers and organisations an approach that can accelerate the development and deployment of big data analytics applications that are mature, proven and scalable. Furthermore, by giving developers and analysts analytical applications that are rich, proven and collaborative, SAS allows more users across different skill levels to unleash the value of data stored in big data platform more easily and quickly.
In this session, we will cover common big data analytics use cases, the depth and breadth of SAS analytical capabilities on Hadoop, and how SAS solutions are integrated into the Hadoop ecosystem via technologies such as Hive, YARN and Spark.
Speaker
Felix Liao, SAS Institute Australia & New Zealand
Apache Hadoop YARN is a modern resource-management platform that can host multiple data processing engines for various workloads like batch processing (MapReduce), interactive SQL (Hive, Tez), real-time processing (Storm), existing services and a wide variety of custom applications. These applications can all co-exist on YARN and share a single data center in a cost-effective manner with the platform worrying about resource management, isolation and multi-tenancy.
YARN is now adding support for services in a first class manner. This talk will first cover the challenges of running services on YARN, and then move on to the changes that were made to the ResourceManager to support scheduling services on YARN(such as affinity and anti-affinity). The talk will then move on to cover the changes made in the NodeManager and features such as container restart and container upgrades. The talk will also cover new additions to YARN like the new application manager (that will allow users to bring services workloads onto YARN by providing features such as container orchestration and management) and the DNS server that uses the YARN registry to enable service discovery.
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
Apache Hive has been continuously evolving to support a broad range of use cases, bringing it beyond its batch processing roots to its current support for interactive queries with sub-second response times using LLAP. However, the development of its execution internals is not sufficient to guarantee efficient performance, since poorly optimized queries can create a bottleneck in the system. Hence, each release of Hive has included new features for its optimizer aimed to generate better plans and deliver improvements to query execution. In this talk, we present the development of the optimizer since its initial release. We describe its current state and how Hive leverages the latest Apache Calcite features to generate the most efficient execution plans. We show numbers demonstrating the improvements brought to Hive performance, and we discuss future directions for the next-generation Hive optimizer, which include an enhanced cost model, materialized views support, and complex query decorrelation.
Cost-based query optimization in Apache Hive 0.14Julian Hyde
Tez is making Hive faster, and now cost-based optimization (CBO) is making it smarter. A new initiative in Hive introduces cost-based optimization for the first time, based on the Optiq framework. Optiq's lead developer Julian Hyde shows the improvements that CBO is bringing in Apache Hive 0.14.
For those interested in Hive internals, he gives an overview of the Optiq framework and shows some of the improvements that are coming to future versions of Hive with the Stinger.next initiative.
SQL on Hadoop Batch, Interactive and Beyond.
Public Presentation showing history and where Hortonworks is looking to go with 100% Open Source Technology.
Apache Hive, Apache SparkSQL, Apache Pheonix, and Apache Druid
Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. The Ambari engineering team will talk about improvements being made to the automation, metrics, logging, upgrade, and other core frameworks within Ambari as the project is being re-imagined.
Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
As part of this talk, the engineering team will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari is being re-imagined to handle these new challenges.
Speaker
Paul Codding, Product Management Director, Hortonworks
Oliver Szabo, Senior Software Engineer, Hortonworks
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...VMware Tanzu
SpringOne Platform 2016
Speaker: Ian Fyfe; Director, Product Marketing, Hortonworks
Apache Hadoop is the most powerful and popular platform for ingesting, storing and processing enormous amounts of “big data”. However, due to its original roots as a batch processing system, doing interactive business analytics with Hadoop has historically suffered from slow response times, or forced business analysts to extract data summaries out of Hadoop into separate data marts. This talk will discuss the different options for implementing speed-of-thought business analytics and machine learning tools directly on top of Hadoop including Apache Hive on Tez, Apache Hive on LLAP, Apache HAWQ and Apache MADlib.
Atlanta meetup presentation, discussion around big data processing engines (Hive, HBase, Druid, Spark). Weighs the relative strengths of each engine and which use cases each of the engines are most suited for
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
Presentation on new features of HDF 3.0 presented on August 8, 2017 to the Future of Data: New Jersey Meetup group. This event was hosted by Honeywell in Morris Plains, NJ.
https://www.meetup.com/futureofdata-princeton/events/240972326/
In 2017, more and more corporations are looking to reduce operational overheads in their enterprise data warehouse (EDW) installations. Hortonworks just launched Industry’s first turn key EDW Optimization solution together with our partners Syncsort and AtScale. Join Hortonworks’ CTO Scott Gnau to learn more about this exciting solution and its 3 use cases.
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
Hortonworks DataFlow (HDF) is built with the vision of creating a platform that enables enterprises to build dataflow management and streaming analytics solutions that collect, curate, analyze and act on data in motion across the datacenter and cloud. Do you want to be able to provide a complete end-to-end streaming solution, from an IoT device all the way to a dashboard for your business users with no code? Come to this session to learn how this is now possible with HDF 3.1.
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in big data ecosystem. Although, Hive started primarily as batch ingestion and reporting tool, community is hard at work in improving it along many different dimensions and use cases. This talk will provide an overview of latest and greatest features and optimizations which have landed in project over last year. Materialized view, micro managed tables and workload management are some noteworthy features.
I will deep dive into some optimizations which promise to provide major performance gains. Support for ACID tables has also improved considerably. Although some of these features and enhancements are not novel but have existed for years in other DB systems, implementing them on Hive poses some unique challenges and results in lessons which are generally applicable in many other contexts. I will also provide a glimpse of what is expected to come in near future.
Speaker: Ashutosh Chauhan, Engineering Manager, Hortonworks
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
Today's typical Apache Hadoop deployments use HDFS for persistent, fault-tolerant storage of big data files. However, recent emerging architectural patterns increasingly rely on cloud object storage such as S3, Azure Blob Store, GCS, which are designed for cost-efficiency, scalability and geographic distribution. Hadoop supports pluggable file system implementations to enable integration with these systems for use cases such as off-site backup or even complex multi-step ETL, but applications may encounter unique challenges related to eventual consistency, performance and differences in semantics compared to HDFS. This session explores those challenges and presents recent work to address them in a comprehensive effort spanning multiple Hadoop ecosystem components, including the Object Store FileSystem connector, Hive, Tez and ORC. Our goal is to improve correctness, performance, security and operations for users that choose to integrate Hadoop with Cloud Storage. We use S3 and S3A connector as case study.
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
Apache Deep Learning 101 with Apache MXNet, Apache NiFi, MiniFi, Apache Tika, Apache Open NLP, Apache Spark, Apache Hive, Apache HBase, Apache Livy and Apache Hadoop. Using Python we run various existing models via MXNet Model Server and via Python APIs. We also use NLP for entity resolution
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
Trafodion, open sourced by HP, reflects 20+ years of investment in a full-fledged RDBMS built on Tandem's OLTP heritage and geared towards a wide set of mixed query workloads. In this talk, we will discuss how HP integrated Trafodion with HBase to take full advantage of the Trafodion database engine and the HBase storage engine, covering 3-tier architecture, storage, salting/partitioning, data movement, and more.
Keynote slides from Big Data Spain Nov 2016. Has some thoughts on how Hadoop ecosystem is growing and changing to support the enterprise, including Hive, Spark, NiFi, security and governance, streaming, and the cloud.
Apache Ambari is an extensible framework that simplifies provisioning, managing and monitoring Hadoop clusters. Apache Ambari was built on a standardized stack-based operations model. Stacks wrap services of all shapes and sizes with a consistent definition and lifecycle-control layer; thereby providing a consistent approach for managing and monitoring the services. This also provided a natural extension point for operators and the community to bring in their own add-on services and “plug-in” the new services into the stack.
However, one of the fundamental limitations of the current Apache Ambari architecture has been that there is a strong one-on-one coupling between entities. For instance, a cluster is tied to a single stack and a Hadoop operator can only deploy services defined in that stack, a cluster can have only a single instance of a service and a host can have only a single instance of a component. Taking into consideration various use case scenarios that cannot be enabled due to these limitations there is a growing need to revamp the Ambari architecture.
In this talk, we propose a revamped Apache Ambari architecture that will open up the floodgates for a wide range of scenarios that wouldn’t have been possible thus far. We will focus the discussion on a new mpack-based operations model that will replace the stack-based operations model. A management package is a self-contained deployment artifact that includes all the details for deploying, managing and upgrading a set of services bundled in the package. A third-party provider can also build their own management package containing their custom services. This eliminates the need to plug-in their services into a stack and also can define their own upgrade story for these custom services. A Hadoop operator will be able to deploy a Hadoop cluster with a mix of services across multiple packages instead of being limited to a single stack. For example, it would be possible to deploy a cluster with HDFS from HDP and NIFI from HDF.
Further, we will also discuss about the architectural changes needed to enable a multi instance architecture in future Ambari releases to support deploying multiple instances of a service in a cluster, deploying multiple instances of a component on a host as well as future proofing the Ambari architecture to leverage some of the advancements happening in the Hadoop community like YARN services (YARN-4692). We will wrap up the conversation with a brief overview of other improvements planned for future releases of Ambari.
Effective data governance is imperative to the success of Data Lake initiatives. Without governance policies and processes, information discovery and analysis is severely impaired. In this session we will provide an in-depth look into the Data Governance Initiative launched collaboratively between Hortonworks and partners from across industries. We will cover the objectives of Data Governance Initiatives and demonstrate key governance capabilities of the Hortonworks Data Platform.
Similar to An Overview on Optimization in Apache Hive: Past, Present Future (20)
Many Organizations are currently processing various types of data and in different formats. Most often this data will be in free form, As the consumers of this data growing it’s imperative that this free-flowing data needs to adhere to a schema. It will help data consumers to have an expectation of about the type of data they are getting and also they will be able to avoid immediate impact if the upstream source changes its format. Having a uniform schema representation also gives the Data Pipeline a really easy way to integrate and support various systems that use different data formats.
SchemaRegistry is a central repository for storing, evolving schemas. It provides an API & tooling to help developers and users to register a schema and consume that schema without having any impact if the schema changed. Users can tag different schemas and versions, register for notifications of schema changes with versions etc.
In this talk, we will go through the need for a schema registry and schema evolution and showcase the integration with Apache NiFi, Apache Kafka, Apache Storm.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
DeepLearning is not just a hype - it outperforms state-of-the-art ML algorithms. One by one. In this talk we will show how DeepLearning can be used for detecting anomalies on IoT sensor data streams at high speed using DeepLearning4J on top of different BigData engines like ApacheSpark and ApacheFlink. Key in this talk is the absence of any large training corpus since we are using unsupervised machine learning - a domain current DL research threats step-motherly. As we can see in this demo LSTM networks can learn very complex system behavior - in this case data coming from a physical model simulating bearing vibration data. Once draw back of DeepLearning is that normally a very large labaled training data set is required. This is particularly interesting since we can show how unsupervised machine learning can be used in conjunction with DeepLearning - no labeled data set is necessary. We are able to detect anomalies and predict braking bearings with 10 fold confidence. All examples and all code will be made publicly available and open sources. Only open source components are used.
QE automation for large systems is a great step forward in increasing system reliability. In the big-data world, multiple components have to come together to provide end-users with business outcomes. This means, that QE Automations scenarios need to be detailed around actual use cases, cross-cutting components. The system tests potentially generate large amounts of data on a recurring basis, verifying which is a tedious job. Given the multiple levels of indirection, the false positives of actual defects are higher, and are generally wasteful.
At Hortonworks, we’ve designed and implemented Automated Log Analysis System - Mool, using Statistical Data Science and ML. Currently the work in progress has a batch data pipeline with a following ensemble ML pipeline which feeds into the recommendation engine. The system identifies the root cause of test failures, by correlating the failing test cases, with current and historical error records, to identify root cause of errors across multiple components. The system works in unsupervised mode with no perfect model/stable builds/source-code version to refer to. In addition the system provides limited recommendations to file/open past tickets and compares run-profiles with past runs.
Improving business performance is never easy! The Natixis Pack is like Rugby. Working together is key to scrum success. Our data journey would undoubtedly have been so much more difficult if we had not made the move together.
This session is the story of how ‘The Natixis Pack’ has driven change in its current IT architecture so that legacy systems can leverage some of the many components in Hortonworks Data Platform in order to improve the performance of business applications. During this session, you will hear:
• How and why the business and IT requirements originated
• How we leverage the platform to fulfill security and production requirements
• How we organize a community to:
o Guard all the players, no one gets left on the ground!
o Us the platform appropriately (Not every problem is eligible for Big Data and standard databases are not dead)
• What are the most usable, the most interesting and the most promising technologies in the Apache Hadoop community
We will finish the story of a successful rugby team with insight into the special skills needed from each player to win the match!
DETAILS
This session is part business, part technical. We will talk about infrastructure, security and project management as well as the industrial usage of Hive, HBase, Kafka, and Spark within an industrial Corporate and Investment Bank environment, framed by regulatory constraints.
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
There has been an explosion of data digitising our physical world – from cameras, environmental sensors and embedded devices, right down to the phones in our pockets. Which means that, now, companies have new ways to transform their businesses – both operationally, and through their products and services – by leveraging this data and applying fresh analytical techniques to make sense of it. But are they ready? The answer is “no” in most cases.
In this session, we’ll be discussing the challenges facing companies trying to embrace the Analytics of Things, and how Teradata has helped customers work through and turn those challenges to their advantage.
In this talk, we will present a new distribution of Hadoop, Hops, that can scale the Hadoop Filesystem (HDFS) by 16X, from 70K ops/s to 1.2 million ops/s on Spotiy's industrial Hadoop workload. Hops is an open-source distribution of Apache Hadoop that supports distributed metadata for HSFS (HopsFS) and the ResourceManager in Apache YARN. HopsFS is the first production-grade distributed hierarchical filesystem to store its metadata normalized in an in-memory, shared nothing database. For YARN, we will discuss optimizations that enable 2X throughput increases for the Capacity scheduler, enabling scalability to clusters with >20K nodes. We will discuss the journey of how we reached this milestone, discussing some of the challenges involved in efficiently and safely mapping hierarchical filesystem metadata state and operations onto a shared-nothing, in-memory database. We will also discuss the key database features needed for extreme scaling, such as multi-partition transactions, partition-pruned index scans, distribution-aware transactions, and the streaming changelog API. Hops (www.hops.io) is Apache-licensed open-source and supports a pluggable database backend for distributed metadata, although it currently only support MySQL Cluster as a backend. Hops opens up the potential for new directions for Hadoop when metadata is available for tinkering in a mature relational database.
In high-risk manufacturing industries, regulatory bodies stipulate continuous monitoring and documentation of critical product attributes and process parameters. On the other hand, sensor data coming from production processes can be used to gain deeper insights into optimization potentials. By establishing a central production data lake based on Hadoop and using Talend Data Fabric as a basis for a unified architecture, the German pharmaceutical company HERMES Arzneimittel was able to cater to compliance requirements as well as unlock new business opportunities, enabling use cases like predictive maintenance, predictive quality assurance or open world analytics. Learn how the Talend Data Fabric enabled HERMES Arzneimittel to become data-driven and transform Big Data projects from challenging, hard to maintain hand-coding jobs to repeatable, future-proof integration designs.
Talend Data Fabric combines Talend products into a common set of powerful, easy-to-use tools for any integration style: real-time or batch, big data or master data management, on-premises or in the cloud.
While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
ORM (object relational mapping)
Combines exhaustive search with greedy algorithm
Exhaustive search finds every possible plan using rewriting rules
Not practical for large number of joins
Greedy algorithm builds the plan iteratively
Uses heuristic to choose best join to add next