Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
Speaker
Jayush Luniya, Principal Software Engineer, Hortonworks
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
Apache Ambari manages Hadoop at large-scale and it becomes increasingly difficult for cluster admins to keep the machinery running smoothly as data grows and nodes scale from 30 to 3000 agents. To test at scale, Ambari has a Performance Stack that allows a VM to host as many as 50 Ambari Agents. The simulated stack and 50 Agents per VM can stress-test Ambari Server with the same load as a 3000 node cluster. This talk will cover how to tune the performance of Ambari and MySQL, and share performance benchmarks for features like deploy times, bulk operations, installation of bits, Rolling & Express Upgrade. Moreover, the speaker will show how to use Ambari Metrics System and Grafana to plot performance, detect anomalies, and pinpoint tips on how to improve performance for a more responsive experience. Lastly, the talk will discuss roadmap features in Ambari 3.0 for improving performance and scale.
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
Apache Knox Gateway is a proxy for interacting with Apache Hadoop clusters in a secure way providing authentication, service level authorization, and many other extensions to secure any HTTP interactions in your cluster. One main feature of Apache Knox Gateway is the ability to extend the reach of your REST APIs to the internet while still securing your cluster and working with Kerberos. Recent contributions to the Apache Knox community have added support for Single Sign On (SSO) based on Pac4j 1.8.9 which is a very powerful security engine which provides SSO support through SAML2, OAuth, OpenID, and CAS. In addition, through recent community contributions Apache Ambari, and Apache Ranger can now also provide SSO authentication through Knox. This paper will discuss the architecture of Knox SSO, it will explain how enterprise user could benefit by this feature and will present enterprise use cases for Knox SSO, and integration with open source Shibboleth, ADFS Windows server Idp support, and Okta cloud Idp.
Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
Speaker
Jayush Luniya, Principal Software Engineer, Hortonworks
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
Apache Ambari manages Hadoop at large-scale and it becomes increasingly difficult for cluster admins to keep the machinery running smoothly as data grows and nodes scale from 30 to 3000 agents. To test at scale, Ambari has a Performance Stack that allows a VM to host as many as 50 Ambari Agents. The simulated stack and 50 Agents per VM can stress-test Ambari Server with the same load as a 3000 node cluster. This talk will cover how to tune the performance of Ambari and MySQL, and share performance benchmarks for features like deploy times, bulk operations, installation of bits, Rolling & Express Upgrade. Moreover, the speaker will show how to use Ambari Metrics System and Grafana to plot performance, detect anomalies, and pinpoint tips on how to improve performance for a more responsive experience. Lastly, the talk will discuss roadmap features in Ambari 3.0 for improving performance and scale.
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
Apache Knox Gateway is a proxy for interacting with Apache Hadoop clusters in a secure way providing authentication, service level authorization, and many other extensions to secure any HTTP interactions in your cluster. One main feature of Apache Knox Gateway is the ability to extend the reach of your REST APIs to the internet while still securing your cluster and working with Kerberos. Recent contributions to the Apache Knox community have added support for Single Sign On (SSO) based on Pac4j 1.8.9 which is a very powerful security engine which provides SSO support through SAML2, OAuth, OpenID, and CAS. In addition, through recent community contributions Apache Ambari, and Apache Ranger can now also provide SSO authentication through Knox. This paper will discuss the architecture of Knox SSO, it will explain how enterprise user could benefit by this feature and will present enterprise use cases for Knox SSO, and integration with open source Shibboleth, ADFS Windows server Idp support, and Okta cloud Idp.
Apache kafka performance(throughput) - without data loss and guaranteeing dat...SANG WON PARK
Apache Kafak의 성능이 특정환경(데이터 유실일 발생하지 않고, 데이터 전송순서를 반드시 보장)에서 어느정도 제공하는지 확인하기 위한 테스트 결과 공유
데이터 전송순서를 보장하기 위해서는 Apache Kafka cluster로 partition을 분산할 수 없게되므로, 성능향상을 위한 장점을 사용하지 못하게 된다.
이번 테스트에서는 Apache Kafka의 단위 성능, 즉 partition 1개에 대한 성능만을 측정하게 된다.
향후, partition을 증가할 경우 본 테스트의 1개 partition 단위 성능을 기준으로 예측이 가능할 것 같다.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
Apache Kafak의 빅데이터 아키텍처에서 역할이 점차 커지고, 중요한 비중을 차지하게 되면서, 성능에 대한 고민도 늘어나고 있다.
다양한 프로젝트를 진행하면서 Apache Kafka를 모니터링 하기 위해 필요한 Metrics들을 이해하고, 이를 최적화 하기 위한 Configruation 설정을 정리해 보았다.
[Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안]
Apache Kafka 성능 모니터링에 필요한 metrics에 대해 이해하고, 4가지 관점(처리량, 지연, Durability, 가용성)에서 성능을 최적화 하는 방안을 정리함. Kafka를 구성하는 3개 모듈(Producer, Broker, Consumer)별로 성능 최적화를 위한 …
[Apache Kafka 모니터링을 위한 Metrics 이해]
Apache Kafka의 상태를 모니터링 하기 위해서는 4개(System(OS), Producer, Broker, Consumer)에서 발생하는 metrics들을 살펴봐야 한다.
이번 글에서는 JVM에서 제공하는 JMX metrics를 중심으로 producer/broker/consumer의 지표를 정리하였다.
모든 지표를 정리하진 않았고, 내 관점에서 유의미한 지표들을 중심으로 이해한 내용임
[Apache Kafka 성능 Configuration 최적화]
성능목표를 4개로 구분(Throughtput, Latency, Durability, Avalibility)하고, 각 목표에 따라 어떤 Kafka configuration의 조정을 어떻게 해야하는지 정리하였다.
튜닝한 파라미터를 적용한 후, 성능테스트를 수행하면서 추출된 Metrics를 모니터링하여 현재 업무에 최적화 되도록 최적화를 수행하는 것이 필요하다.
[OpenInfra Days Korea 2018] (Track 1) TACO (SKT All Container OpenStack): Clo...OpenStack Korea Community
- 폰트 문제로 다운로드를 여기서 해 주세요: http://bit.ly/openinfradays-day1-skt-taco
- 발표자: 안재석, SK Telecom
- 설명: https://event.openinfradays.kr/2018/session1/track_1_4
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Data in Hadoop is getting bigger every day, consumers of the data are growing, organizations are now looking at making their Hadoop cluster compliant to federal regulations and commercial demands. Apache Ranger simplifies the management of security policies across all components in Hadoop. Ranger provides granular access controls to data.
The deck describes what security tools are available in Hadoop and their purpose then it moves on to discuss in detail Apache Ranger.
Apache Spark 2.3, released on February 2018, is the fourth release in 2.x line and has a lot of new improvements. One of the notable improvements is ORC support. Apache Spark 2.3 adds a native ORC file format implementation by using the latest Apache ORC 1.4.1. Users can switch between “native” and “hive” ORC file formats. Hive ORC file format is the existing one until Spark 2.2.
In this talk, I'll talk about three key changes. First of all, performance. New native ORC implementation is faster 2x - 11x times on 10TB TPCDS benchmark. Vectorized query execution over ORC files improves Spark ORC query execution greatly. Especially, ORC filter pushdown can be faster than Parquet due to in-file indexes. Second, as a part of native ORC support, Spark 2.3 can convert the Hive ORC tables into Spark ORC data sources automatically. This solves several existing ORC issues and Spark 2.4 will enable it by default. Last, but not least, Spark 2.3 officially supports structural streaming over ORC data sources. You can create a streaming dataset over ORC files.
Speaker
Dongjoon Hyun, Staff Software Engineer, Hortonworks
Performance Optimizations in Apache ImpalaCloudera, Inc.
Apache Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Hive or SPARK. Impala is written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile).
To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. Impala employs runtime code generation using LLVM in order to improve execution times and uses static and dynamic partition pruning to significantly reduce the amount of data accessed. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload. Although initially designed for running on-premises against HDFS-stored data, Impala can also run on public clouds and access data stored in various storage engines such as object stores (e.g. AWS S3), Apache Kudu and HBase. In this talk, we present Impala's architecture in detail and discuss the integration with different storage engines and the cloud.
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)Lee Myring
A quick introduction to log aggregation in a local Docker development environment using Fluentd followed by a demonstration using a publicly available GitHub repo.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Apache kafka performance(throughput) - without data loss and guaranteeing dat...SANG WON PARK
Apache Kafak의 성능이 특정환경(데이터 유실일 발생하지 않고, 데이터 전송순서를 반드시 보장)에서 어느정도 제공하는지 확인하기 위한 테스트 결과 공유
데이터 전송순서를 보장하기 위해서는 Apache Kafka cluster로 partition을 분산할 수 없게되므로, 성능향상을 위한 장점을 사용하지 못하게 된다.
이번 테스트에서는 Apache Kafka의 단위 성능, 즉 partition 1개에 대한 성능만을 측정하게 된다.
향후, partition을 증가할 경우 본 테스트의 1개 partition 단위 성능을 기준으로 예측이 가능할 것 같다.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
Apache Kafak의 빅데이터 아키텍처에서 역할이 점차 커지고, 중요한 비중을 차지하게 되면서, 성능에 대한 고민도 늘어나고 있다.
다양한 프로젝트를 진행하면서 Apache Kafka를 모니터링 하기 위해 필요한 Metrics들을 이해하고, 이를 최적화 하기 위한 Configruation 설정을 정리해 보았다.
[Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안]
Apache Kafka 성능 모니터링에 필요한 metrics에 대해 이해하고, 4가지 관점(처리량, 지연, Durability, 가용성)에서 성능을 최적화 하는 방안을 정리함. Kafka를 구성하는 3개 모듈(Producer, Broker, Consumer)별로 성능 최적화를 위한 …
[Apache Kafka 모니터링을 위한 Metrics 이해]
Apache Kafka의 상태를 모니터링 하기 위해서는 4개(System(OS), Producer, Broker, Consumer)에서 발생하는 metrics들을 살펴봐야 한다.
이번 글에서는 JVM에서 제공하는 JMX metrics를 중심으로 producer/broker/consumer의 지표를 정리하였다.
모든 지표를 정리하진 않았고, 내 관점에서 유의미한 지표들을 중심으로 이해한 내용임
[Apache Kafka 성능 Configuration 최적화]
성능목표를 4개로 구분(Throughtput, Latency, Durability, Avalibility)하고, 각 목표에 따라 어떤 Kafka configuration의 조정을 어떻게 해야하는지 정리하였다.
튜닝한 파라미터를 적용한 후, 성능테스트를 수행하면서 추출된 Metrics를 모니터링하여 현재 업무에 최적화 되도록 최적화를 수행하는 것이 필요하다.
[OpenInfra Days Korea 2018] (Track 1) TACO (SKT All Container OpenStack): Clo...OpenStack Korea Community
- 폰트 문제로 다운로드를 여기서 해 주세요: http://bit.ly/openinfradays-day1-skt-taco
- 발표자: 안재석, SK Telecom
- 설명: https://event.openinfradays.kr/2018/session1/track_1_4
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Data in Hadoop is getting bigger every day, consumers of the data are growing, organizations are now looking at making their Hadoop cluster compliant to federal regulations and commercial demands. Apache Ranger simplifies the management of security policies across all components in Hadoop. Ranger provides granular access controls to data.
The deck describes what security tools are available in Hadoop and their purpose then it moves on to discuss in detail Apache Ranger.
Apache Spark 2.3, released on February 2018, is the fourth release in 2.x line and has a lot of new improvements. One of the notable improvements is ORC support. Apache Spark 2.3 adds a native ORC file format implementation by using the latest Apache ORC 1.4.1. Users can switch between “native” and “hive” ORC file formats. Hive ORC file format is the existing one until Spark 2.2.
In this talk, I'll talk about three key changes. First of all, performance. New native ORC implementation is faster 2x - 11x times on 10TB TPCDS benchmark. Vectorized query execution over ORC files improves Spark ORC query execution greatly. Especially, ORC filter pushdown can be faster than Parquet due to in-file indexes. Second, as a part of native ORC support, Spark 2.3 can convert the Hive ORC tables into Spark ORC data sources automatically. This solves several existing ORC issues and Spark 2.4 will enable it by default. Last, but not least, Spark 2.3 officially supports structural streaming over ORC data sources. You can create a streaming dataset over ORC files.
Speaker
Dongjoon Hyun, Staff Software Engineer, Hortonworks
Performance Optimizations in Apache ImpalaCloudera, Inc.
Apache Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Hive or SPARK. Impala is written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard components (HDFS, HBase, Metastore, Sentry) and is able to read the majority of the widely-used file formats (e.g. Parquet, Avro, RCFile).
To reduce latency, such as that incurred from utilizing MapReduce or by reading data remotely, Impala implements a distributed architecture based on daemon processes that are responsible for all aspects of query execution and that run on the same machines as the rest of the Hadoop infrastructure. Impala employs runtime code generation using LLVM in order to improve execution times and uses static and dynamic partition pruning to significantly reduce the amount of data accessed. The result is performance that is on par or exceeds that of commercial MPP analytic DBMSs, depending on the particular workload. Although initially designed for running on-premises against HDFS-stored data, Impala can also run on public clouds and access data stored in various storage engines such as object stores (e.g. AWS S3), Apache Kudu and HBase. In this talk, we present Impala's architecture in detail and discuss the integration with different storage engines and the cloud.
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)Lee Myring
A quick introduction to log aggregation in a local Docker development environment using Fluentd followed by a demonstration using a publicly available GitHub repo.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
The event, held on 11th December 2018, was a technical presentation about running MS SQL Server 2017 on Linux. We started off by using containers and proceeded in looking at High Availability and Data Protection, more specifically:
- Supported features & Linux differences
- Installing SQL Server on a Linux Container
- Accessing SMB 3.0 shared storage using Samba
- Setting up a Fail over Cluster using Pacemaker
- Setting up AlwaysOn Availability Groups using Pacemaker
- Authenticating to SQL Server using AD Authentication
- Setting up Read-Scale Cross-Platform Availability Groups
https://techspark.mt/sql-server-on-linux-11th-december-2018/
Tackle Containerization Advisor (TCA) for Legacy ApplicationsKonveyor Community
Recording of presentation: https://youtu.be/VapEooROERw
With the adoption of cloud services and the reliability and resiliency it offers, enterprises are eager to understand how many of their legacy applications can be containerized.
We propose Tackle Containerization Advisor (TCA), a framework that provides a containerization advisory for legacy applications.
Given an application description in terms of its technical components, TCA proposes a multi-step process that standardizes the raw inputs and curates technology stack into various components, detects missing components and finally recommends the best possible containerization approach.
Presenter: Anup Kalia, Research Staff Member @ IBM Research
GitHub: https://github.com/konveyor/tackle-container-advisor
Building RESTful services using SCA and JAX-RSLuciano Resende
REST is an important aspect of the Web 2.0 world. Building RESTful services can be a challenge as REST is just an architectural style. JAX-RS emerges as the programming model that guides Java developers to develop services in REST. On the other hand, we often need to assemble services, including RESTful and traditional ones, into an enterprise composite application. SCA gives us the power to define and composite services in a technology neutral fashion. This talk is to share the interesting ideas to combine the power of both SCA and JAX-RS that we explore in Apache Tuscany project with the JAX-RS runtime from Apache Wink project. The Tuscany Java SCA runtime provides the integration with REST services out of the box via several extensions. Tuscany REST binding (binding.rest) leverage JAX-RS annotations to map business operations to HTTP operations such as POST, GET, PUT and DELETE to provide a REST view to SCA services. The REST binding also allows SCA components to invoke existing RESTful services via a JAX-RS annotated interfaces without messing around HTTP clients. JAX-RS applications and resources can be dropped into the SCA assembly as JAX-RS implementation (implementation.jaxrs). Tuscany also enrich the JAX-RS runtime with more databindings to provide support for data representations and transformation without the interventions from application code. This session will teach you how to model, implement, invoke and expose RESTful services using SCA and JAX-RS. We'll walk you through a sample application developed using Apache Tuscany and Wink.
In this talk we will be looking at what is in the new v0.13 release and what to look forward to. Besides awaited new modules improvements and provider syntax/ecosystem we will also focus on some other changes, such as simplified Terraform Cloud collaboration, a new stable validation rules feature and other improvements. We will also have a quick look at planning your upgrade with possible breaking changes and how to get started using v0.13 release.
Real-time Big Data Analytics Engine using ImpalaJason Shih
Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it's own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we'll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis.
Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop and Hortonworks Data Platform (HDP). Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers views to the end user. In this session Hortonworks will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.
WebSphere Technical University: Top WebSphere Problem Determination FeaturesChris Bailey
Problem determination is an important focus area in the IBM WebSphere Application Server. Serviceability improvements have been added that have greatly improved the ability to find root causes of problems in both the full IBM WebSphere Application Server profile, and the newer Liberty profile. The session focuses on how to effectively use serviceability improvements added to the application server since V8.0. This includes high performance extensibe logging, cross-component trace, IBM Support Assistant data collector, timed operations, memory leak detection/prevention, and IBM Support Assistant 5.
Presented at the WebSphere Technical University 2014, Dusseldorf
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...Timofey Turenko
The presentation describes CI environment for our product - Maxscale - database proxy server. To test such product we need a setup that consists of tens of machines: locally hosted virtual machines as well as machines from different clouds. All our Jenkins jobs are implemented in the form of Jenkins Job Builder code. Presentations also tells about our tool to manage virtual machines (wrapper over Vagrant)- MDBCI.
WildFly core is fully modular application server, which is used as base to build WildFly EE container and much more. Functionalities such as EE are implemented as sets of extensions also known as subsystems.
Extensions give you low level access to application server’s functionalities such as
JBoss Modules for class loading
Domain management model
Deployment processors
Modular Service Container (aka service kernel)
Apache Ambari provides a 100% open source and intuitive set of tools to monitor, manage and efficiently provision your Apache Hadoop cluster. Ambari simplifies the operation and hides the complexity of Hadoop, making Hadoop appear like a single, cohesive data platform. Hadoop cluster provisioning and ongoing management can be a complicated task, especially when there are hundreds or thousands of nodes involved. Ambari allows you to control Hadoop cluster services from a single point. In this session, we will provide an overview of the Apache Ambari key features, architecture and web service-based APIs.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
5. Stack Terminologies
Term Definition Examples
STACK Defines a set of Services, where to obtain
the software packages and how to manage
the lifecycle.
HDP-2.3, HDP-2.2
SERVICE Defines the Components that make-up the
service.
HDFS, YARN
COMPONENT The building-blocks of a Service, that adhere
to a certain lifecycle.
NAMENODE, DATANODE,
OOZIE_SERVER
CATEGORY The category of Component. MASTER, SLAVE, CLIENT
REPO Repository metadata where the artifacts
reside
http://public-repo-
1.hortonworks.com/HDP/centos6/2
.x/GA/2.3.0.0
6. Ambari Stacks
• Stacks define Services + Repo
• What is a stack composed of and where to get the bits
• Each service has a definition
• What components are part of the Service
• Each service has defined lifecycle commands
• start, stop, status, install, configure
• Lifecycle is controlled via command scripts
Ambari Server
Stack
Service
Definitions
Command
Scripts
xml python
Ambari Agents
Repos
7. Ambari Stack Framework
• Ambari’s extensible stack framework allows different stack vendors to
create their own custom stacks.
9. Current State
• Extensible Framework
• Provides ability for onboarding new Hadoop distributions
• Key role in avoiding vendor lock-in
• Stack driven
• Service lifecycle management
• Custom service commands
• UI layouts and themes
• Upgrade packs
• Stack advisors
• Metrics
• Kerberos
• Stack Inheritance
• Code reuse between stack versions
• Common Service Definitions
• Code reuse between multiple stacks
• ODPi Operations Spec
10. Limitations – Common Services
• AMBARI-7201 focused on adding notion of common services
• Service definitions scripts were not refactored
• Common service scripts still tightly coupled with HDP distribution
• Hardcoded references for stack name (HDP), stack version checks
(HDP-2.2+), stack tools (hdp-select, conf-select), installation paths
(/usr/hdp), method names (format_hdp_stack_version) etc.
11. Limitations – Custom Services
• Rudimentary approach to add third party custom services Ambari Wiki
• No release vehicle for building, releasing and deploying custom
services
• No service level extension points for stack features
• Role command order
• Stack advisor
• Upgrade packs
• Repositories
• Custom service definitions not self-contained and require hacks inside
stack definitions
12. Limitations – Release Management
• Ambari core and stack definitions released together
• Bug fix in stack definition requires Ambari release
• New stack version requires Ambari release
• Stack releases and Ambari releases have to be coordinated
• Need to decouple stack definition releases from Ambari core release
14. Stack Featurization
• Provides a basic framework for refactoring stack-specific hardcoded logic in
common service definitions.
• Features defined at stack-level based on which execution logic in service
definition is triggered.
• All common service definitions have been stack featurized and HDP-specific
hardcodings have been removed.
• Stacks similar to HDP can now use common service definitions.
• EPIC: AMBARI-13363
Apache JIRA Description Target Release
AMBARI-15420 Generalize resource management library 2.4.0.0
AMBARI-13364 Parameterize stack information used by common services 2.4.0.0
AMBARI-15329 Remove HDP hardcoding 2.4.0.0
18. Service Level Extension Points
Apache JIRA Description Ambari Wiki Target Release
AMBARI- 15388 Upgrade Pack Extensions Ambari Wiki 2.4.0.0
AMBARI-15226 Stack Advisor Extensions Ambari Wiki 2.4.0.0
AMBARI-9363 Role command order Extensions Ambari Wiki 2.4.0.0
AMBARI-11268 Quick Links for custom services Ambari Wiki 2.4.0.0
AMBARI-15538 Service-specific repo definitions 2.X.X.X
• Facilitate integration of third-party custom services to a stack definition
• Self-contained service definitions without requiring changes to stack definition
• Upgrade Pack Extensions
• Service-level upgrade packs that specify how to upgrade the custom service
• Defines how to integrate into the stack upgrade pack
• Stack Advisor Extensions
• Service Advisors (service_advisor.py) define recommendations and validations for the service
• Stack framework extends stack advisors with service advisors
• Role Command Order
• Define role command order at service level
• Stack framework merges all role command orders to create dependencies
• Quick Links
• Define quick links (quicklinks.json) in the service definition that is used by Ambari Web UI
• No hardcoded quick links in the Ambari Web UI
• Repo Definitions
• Service specific repositories so that third party artifacts can reside on their own repositories
• Under development
EPIC: AMBARI-15537
19. Upgrade Pack Service Extension
Without Upgrade Pack Extensions:
- Upgrade logic defined at stack level
- Custom services need to modify the stack's upgrade-packs in order to integrate themselves into the cluster
With Upgrade Pack Extensions:
- Each service can define upgrade-packs XML files placed in the service's upgrades/ folder
ambari-server/src/test/resources/stacks/HDP/2.0.5/services/HDFS/upgrades/HDP/2.2.0/upgrade_test_15388.xml
<target>2.4.*</target>
<target-stack>HDP-2.4.0</target-stack>
<type>ROLLING</type>
<prerequisite-checks>
<check>org.apache.ambari.server.checks.FooCheck</check>
</prerequisite-checks>
<order>
<group xsi:type="cluster" name="PRE_CLUSTER" title="Pre {{direction.text.proper}}">
<add-after-group-entry>HDFS</add-after-group-entry>
<execute-stage service="FOO" component="BAR" title="Backup FOO">
<task xsi:type="manual">
<message>Back FOO up.</message>
</task>
</execute-stage>
</group>
….
20. Service Advisor
Without Service Advisor:
- Stack advisor defined at stack level
- Custom services need to modify stack advisor files in order to recommend/validate dynamically service
configurations and the layout of the service on cluster
With Service Advisor:
- Each service can choose to define its own service advisor and explicitly extend parent's service-advisor script
ambari-server/src/main/resources/common-services/HAWQ/2.0.0/service_advisor.py
ambari-server/src/main/resources/common-services/PXF/3.0.0/service_advisor.py
def getServiceComponentLayoutValidations(self, services, hosts):
componentsListList = [service["components"] for service in services["services"]]
componentsList = [item["StackServiceComponents"] for sublist in componentsListList for item in sublist]
hawqMasterHosts = self.getHosts(componentsList, "HAWQMASTER")
hawqStandbyHosts = self.getHosts(componentsList, "HAWQSTANDBY")
hawqSegmentHosts = self.getHosts(componentsList, "HAWQSEGMENT")
datanodeHosts = self.getHosts(componentsList, "DATANODE")
# Generate WARNING if any HAWQSEGMENT is not colocated with a DATANODE
mismatchHosts = sorted(set(hawqSegmentHosts).symmetric_difference(set(datanodeHosts)))
if len(mismatchHosts) > 0:
hostsString = ', '.join(mismatchHosts)
message = "HAWQ Segment must be installed on all DataNodes. "
21. Quick Links
Without Quick Links Extensions:
- Hardcoded quick links in the Ambari Web UI
With Quick Links Extensions :
- A service can add a list of quick links to the Ambari web UI by adding predefined JSON format file to metainfo.xml
ambari-server/src/main/resources/stacks/HDP/2.3/services/HBASE/metainfo.xml
ambari-server/src/main/resources/stacks/HDP/2.3/services/HBASE/quicklinks/quicklinks.json
<quickLinksConfigurations>
<quickLinksConfiguration>
<fileName>quicklinks.json</fileName>
<default>true</default>
</quickLinksConfiguration>
</quickLinksConfigurations>
mapQuickLinks: function (finalJson, item){
if(!(item && item.ServiceInfo)) return;
var quickLinks = {
OOZIE: [19],
GANGLIA: [20],
STORM: [31],
FALCON: [32],
RANGER: [33],
SPARK: [34],
MY_CUSTOM_SERVICE: [35]
};
{
"name": "default",
"description": "default quick links configuration",
"configuration": {
"protocol":
{
"type":"http"
},
"links": [ {
"name": "hbase_master_ui",
"label": "HBase Master UI",
….. }
}, {
"name": "hbase_logs",
"label": "HBase Logs",
…
22. Stack Extensions
• A stack extension is a collection of custom services which are packaged
together.
• Provides a REST API for installing extensions.
• Extensions are staged at /var/lib/ambari-server/resources/extensions
• After installing extensions requires explicit linking of extensions with the
supported stack versions.
• Custom services contained in the extension may be added to the cluster like
any other service in the stack.
• EPIC: AMBARI-12885
25. Ambari Management Packs
• Release artifact to bundle stack definitions, service definitions, add-on service
definitions, views etc.
• Decouple stack definition releases from Ambari core release.
• Also provide a release vehicle for add-on services.
• Released as tarballs but contains metadata that describes applicability and
contents of the management pack
• Staging Location: /var/lib/ambari-server/resources/mpacks
• Final stack definition can be an overlay of multiple management packs.
• EPIC: AMBARI-14854
• Ambari Wiki
• Target Release: 2.4.0.0
29. Future Goals
• Service level repos
• Management Pack++
• Short Term Goals (Ambari 2.4.0.0)
o Release vehicle for stacks
o HDP management pack, IOP management pack
o Release vehicle for add-on services (custom services)
o Microsoft-R management pack
o Retrofit in existing stack processing infrastructure
o Command line to update stack and service definitions
• Long Term Goals (Ambari 2.4+)
o Release HDP stacks as mpacks
o Build management pack processing infrastructure
o Dynamic creation of stack definitions by processing mpacks
o Rest API for adding/removing mpacks
• UI wizards stack driven