This document discusses Apache Ambari and provides the following information:
1) It provides a background on Apache Ambari, describing it as an open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters.
2) It discusses recent Ambari releases including versions 2.2.0, 2.2.2 and 2.4.0 GA.
3) It describes features of Ambari including alerts and metrics, blueprints, security setup using Kerberos and RBAC, log search, automated cluster upgrades and extensibility options.
Jeff Sposetti of Ambari discusses the Apache project Ambari used to help deploy and provision Hadoop Clusters
- Ambari Overview and the Community
- Ambari Architecture - Provisioning Clusters and Services -Standard Services: HDFS, YARN, MR2, Hive, new Services: Storm, Falcon
- Management and Monitoring Capabilities -Nagios and Ganglia Integration
- Key Innovation Features -Ambari Stacks providing dynamic service lifecycle -Ambari BluePrints powering Savannah OpenStack -Ambari Views enabling custom UI development
Managing your Hadoop Clusters with Apache AmbariDataWorks Summit
Deploying, configuring, and managing large Apache Hadoop and HBase clusters can be quite complex. Once you have your clusters, keeping them up and running and making sure that the SLAs are met presents even more challenges and headaches to Hadoop operators. To make matters worse, managing upgrades can be a nightmare. Hadoop users are presented with their own fair share of difficulties such as slow running jobs and not knowing why they are slow. For third-party software vendors interested in incorporating Hadoop management and monitoring capabilities, there does not seem to be an obvious, easy solution. Apache Ambari is aimed at making lives of Hadoop operators, users, and integrators simpler by providing a management interface to do all of that and more. This session presents usages of Ambari`s Web UI for Hadoop operators (deploying, managing, and monitoring) as well as Hadoop users (job analytics). The talk will also touch upon Ambari`s REST API and how it is used in the real world. The session concludes by revealing the future roadmap of Ambari including queue management, upgrade, disaster recovery, high availability, and more.
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
Flink Forward San Francisco 2022.
With a real-time processing engine like Flink and a transactional storage layer like Hudi, it has never been easier to build end-to-end low-latency data platforms connecting sources like Kafka to data lake storage. Come learn how to blend Lakehouse architectural patterns with real-time processing pipelines with Flink and Hudi. We will dive deep on how Flink can leverage the newest features of Hudi like multi-modal indexing that dramatically improves query and write performance, data skipping that reduces the query latency by 10x for large datasets, and many more innovations unique to Flink and Hudi.
by
Ethan Guo & Kyle Weller
Jeff Sposetti of Ambari discusses the Apache project Ambari used to help deploy and provision Hadoop Clusters
- Ambari Overview and the Community
- Ambari Architecture - Provisioning Clusters and Services -Standard Services: HDFS, YARN, MR2, Hive, new Services: Storm, Falcon
- Management and Monitoring Capabilities -Nagios and Ganglia Integration
- Key Innovation Features -Ambari Stacks providing dynamic service lifecycle -Ambari BluePrints powering Savannah OpenStack -Ambari Views enabling custom UI development
Managing your Hadoop Clusters with Apache AmbariDataWorks Summit
Deploying, configuring, and managing large Apache Hadoop and HBase clusters can be quite complex. Once you have your clusters, keeping them up and running and making sure that the SLAs are met presents even more challenges and headaches to Hadoop operators. To make matters worse, managing upgrades can be a nightmare. Hadoop users are presented with their own fair share of difficulties such as slow running jobs and not knowing why they are slow. For third-party software vendors interested in incorporating Hadoop management and monitoring capabilities, there does not seem to be an obvious, easy solution. Apache Ambari is aimed at making lives of Hadoop operators, users, and integrators simpler by providing a management interface to do all of that and more. This session presents usages of Ambari`s Web UI for Hadoop operators (deploying, managing, and monitoring) as well as Hadoop users (job analytics). The talk will also touch upon Ambari`s REST API and how it is used in the real world. The session concludes by revealing the future roadmap of Ambari including queue management, upgrade, disaster recovery, high availability, and more.
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
Flink Forward San Francisco 2022.
With a real-time processing engine like Flink and a transactional storage layer like Hudi, it has never been easier to build end-to-end low-latency data platforms connecting sources like Kafka to data lake storage. Come learn how to blend Lakehouse architectural patterns with real-time processing pipelines with Flink and Hudi. We will dive deep on how Flink can leverage the newest features of Hudi like multi-modal indexing that dramatically improves query and write performance, data skipping that reduces the query latency by 10x for large datasets, and many more innovations unique to Flink and Hudi.
by
Ethan Guo & Kyle Weller
Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
Speaker
Jayush Luniya, Principal Software Engineer, Hortonworks
Performance Troubleshooting Using Apache Spark MetricsDatabricks
Performance troubleshooting of distributed data processing systems is a complex task. Apache Spark comes to rescue with a large set of metrics and instrumentation that you can use to understand and improve the performance of your Spark-based applications. You will learn about the available metric-based instrumentation in Apache Spark: executor task metrics and the Dropwizard-based metrics system. The talk will cover how Hadoop and Spark service at CERN is using Apache Spark metrics for troubleshooting performance and measuring production workloads. Notably, the talk will cover how to deploy a performance dashboard for Spark workloads and will cover the use of sparkMeasure, a tool based on the Spark Listener interface. The speaker will discuss the lessons learned so far and what improvements you can expect in this area in Apache Spark 3.0.
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
Apache Ambari manages Hadoop at large-scale and it becomes increasingly difficult for cluster admins to keep the machinery running smoothly as data grows and nodes scale from 30 to 3000 agents. To test at scale, Ambari has a Performance Stack that allows a VM to host as many as 50 Ambari Agents. The simulated stack and 50 Agents per VM can stress-test Ambari Server with the same load as a 3000 node cluster. This talk will cover how to tune the performance of Ambari and MySQL, and share performance benchmarks for features like deploy times, bulk operations, installation of bits, Rolling & Express Upgrade. Moreover, the speaker will show how to use Ambari Metrics System and Grafana to plot performance, detect anomalies, and pinpoint tips on how to improve performance for a more responsive experience. Lastly, the talk will discuss roadmap features in Ambari 3.0 for improving performance and scale.
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
Zstandard is a fast compression algorithm which you can use in Apache Spark in various way. In this talk, I briefly summarized the evolution history of Apache Spark in this area and four main use cases and the benefits and the next steps:
1) ZStandard can optimize Spark local disk IO by compressing shuffle files significantly. This is very useful in K8s environments. It’s beneficial not only when you use `emptyDir` with `memory` medium, but also it maximizes OS cache benefit when you use shared SSDs or container local storage. In Spark 3.2, SPARK-34390 takes advantage of ZStandard buffer pool feature and its performance gain is impressive, too.
2) Event log compression is another area to save your storage cost on the cloud storage like S3 and to improve the usability. SPARK-34503 officially switched the default event log compression codec from LZ4 to Zstandard.
3) Zstandard data file compression can give you more benefits when you use ORC/Parquet files as your input and output. Apache ORC 1.6 supports Zstandardalready and Apache Spark enables it via SPARK-33978. The upcoming Parquet 1.12 will support Zstandard compression.
4) Last, but not least, since Apache Spark 3.0, Zstandard is used to serialize/deserialize MapStatus data instead of Gzip.
There are more community works to utilize Zstandard to improve Spark. For example, Apache Avro community also supports Zstandard and SPARK-34479 aims to support Zstandard in Spark’s avro file format in Spark 3.2.0.
Step-by-Step Introduction to Apache Flink Slim Baltagi
This a talk that I gave at the 2nd Apache Flink meetup in Washington DC Area hosted and sponsored by Capital One on November 19, 2015. You will quickly learn in step-by-step way:
How to setup and configure your Apache Flink environment?
How to use Apache Flink tools?
3. How to run the examples in the Apache Flink bundle?
4. How to set up your IDE (IntelliJ IDEA or Eclipse) for Apache Flink?
5. How to write your Apache Flink program in an IDE?
Tez is the next generation Hadoop Query Processing framework written on top of YARN. Computation topologies in higher level languages like Pig/Hive can be naturally expressed in the new graph dataflow model exposed by Tez. Multi-stage queries can be expressed as a single Tez job resulting in lower latency for short queries and improved throughput for large scale queries. MapReduce has been the workhorse for Hadoop but its monolithic structure had made innovation slower. YARN separates resource management from application logic and thus enables the creation of Tez, a more flexible and generic new framework for data processing for the benefit of the entire Hadoop query ecosystem.
Deep Dive: Memory Management in Apache SparkDatabricks
Memory management is at the heart of any data-intensive system. Spark, in particular, must arbitrate memory allocation between two main use cases: buffering intermediate data for processing (execution) and caching user data (storage). This talk will take a deep dive through the memory management designs adopted in Spark since its inception and discuss their performance and usability implications for the end user.
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
Cosco is an efficient shuffle-as-a-service that powers Spark (and Hive) jobs at Facebook warehouse scale. It is implemented as a scalable, reliable and maintainable distributed system. Cosco is based on the idea of partial in-memory aggregation across a shared pool of distributed memory. This provides vastly improved efficiency in disk usage compared to Spark's built-in shuffle. Long term, we believe the Cosco architecture will be key to efficiently supporting jobs at ever larger scale. In this talk we'll take a deep dive into the Cosco architecture and describe how it's deployed at Facebook. We will then describe how it's integrated to run shuffle for Spark, and contrast it with Spark's built-in sort-based shuffle mechanism and SOS (presented at Spark+AI Summit 2018).
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
The slides explain how shuffle works in Spark and help people understand more details about Spark internal. It shows how the major classes are implemented, including: ShuffleManager (SortShuffleManager), ShuffleWriter (SortShuffleWriter, BypassMergeSortShuffleWriter, UnsafeShuffleWriter), ShuffleReader (BlockStoreShuffleReader).
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
Lakehouses are quickly growing in popularity as a new approach to Data Platform Architecture bringing some of the long-established benefits from OLTP world to OLAP, including transactions, record-level updates/deletes, and changes streaming. In this talk, we will discuss Apache Hudi and how it unlocks possibilities of building your own fully open-source Lakehouse featuring a rich set of integrations with existing technologies, including Apache Pulsar. In this session, we will present: - What Lakehouses are, and why they are needed. - What Apache Hudi is and how it works. - Provide a use-case and demo that applies Apache Hudi’s DeltaStreamer tool to ingest data from Apache Pulsar.
Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
Speaker
Jayush Luniya, Principal Software Engineer, Hortonworks
Performance Troubleshooting Using Apache Spark MetricsDatabricks
Performance troubleshooting of distributed data processing systems is a complex task. Apache Spark comes to rescue with a large set of metrics and instrumentation that you can use to understand and improve the performance of your Spark-based applications. You will learn about the available metric-based instrumentation in Apache Spark: executor task metrics and the Dropwizard-based metrics system. The talk will cover how Hadoop and Spark service at CERN is using Apache Spark metrics for troubleshooting performance and measuring production workloads. Notably, the talk will cover how to deploy a performance dashboard for Spark workloads and will cover the use of sparkMeasure, a tool based on the Spark Listener interface. The speaker will discuss the lessons learned so far and what improvements you can expect in this area in Apache Spark 3.0.
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
Apache Ambari manages Hadoop at large-scale and it becomes increasingly difficult for cluster admins to keep the machinery running smoothly as data grows and nodes scale from 30 to 3000 agents. To test at scale, Ambari has a Performance Stack that allows a VM to host as many as 50 Ambari Agents. The simulated stack and 50 Agents per VM can stress-test Ambari Server with the same load as a 3000 node cluster. This talk will cover how to tune the performance of Ambari and MySQL, and share performance benchmarks for features like deploy times, bulk operations, installation of bits, Rolling & Express Upgrade. Moreover, the speaker will show how to use Ambari Metrics System and Grafana to plot performance, detect anomalies, and pinpoint tips on how to improve performance for a more responsive experience. Lastly, the talk will discuss roadmap features in Ambari 3.0 for improving performance and scale.
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
Zstandard is a fast compression algorithm which you can use in Apache Spark in various way. In this talk, I briefly summarized the evolution history of Apache Spark in this area and four main use cases and the benefits and the next steps:
1) ZStandard can optimize Spark local disk IO by compressing shuffle files significantly. This is very useful in K8s environments. It’s beneficial not only when you use `emptyDir` with `memory` medium, but also it maximizes OS cache benefit when you use shared SSDs or container local storage. In Spark 3.2, SPARK-34390 takes advantage of ZStandard buffer pool feature and its performance gain is impressive, too.
2) Event log compression is another area to save your storage cost on the cloud storage like S3 and to improve the usability. SPARK-34503 officially switched the default event log compression codec from LZ4 to Zstandard.
3) Zstandard data file compression can give you more benefits when you use ORC/Parquet files as your input and output. Apache ORC 1.6 supports Zstandardalready and Apache Spark enables it via SPARK-33978. The upcoming Parquet 1.12 will support Zstandard compression.
4) Last, but not least, since Apache Spark 3.0, Zstandard is used to serialize/deserialize MapStatus data instead of Gzip.
There are more community works to utilize Zstandard to improve Spark. For example, Apache Avro community also supports Zstandard and SPARK-34479 aims to support Zstandard in Spark’s avro file format in Spark 3.2.0.
Step-by-Step Introduction to Apache Flink Slim Baltagi
This a talk that I gave at the 2nd Apache Flink meetup in Washington DC Area hosted and sponsored by Capital One on November 19, 2015. You will quickly learn in step-by-step way:
How to setup and configure your Apache Flink environment?
How to use Apache Flink tools?
3. How to run the examples in the Apache Flink bundle?
4. How to set up your IDE (IntelliJ IDEA or Eclipse) for Apache Flink?
5. How to write your Apache Flink program in an IDE?
Tez is the next generation Hadoop Query Processing framework written on top of YARN. Computation topologies in higher level languages like Pig/Hive can be naturally expressed in the new graph dataflow model exposed by Tez. Multi-stage queries can be expressed as a single Tez job resulting in lower latency for short queries and improved throughput for large scale queries. MapReduce has been the workhorse for Hadoop but its monolithic structure had made innovation slower. YARN separates resource management from application logic and thus enables the creation of Tez, a more flexible and generic new framework for data processing for the benefit of the entire Hadoop query ecosystem.
Deep Dive: Memory Management in Apache SparkDatabricks
Memory management is at the heart of any data-intensive system. Spark, in particular, must arbitrate memory allocation between two main use cases: buffering intermediate data for processing (execution) and caching user data (storage). This talk will take a deep dive through the memory management designs adopted in Spark since its inception and discuss their performance and usability implications for the end user.
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
Cosco is an efficient shuffle-as-a-service that powers Spark (and Hive) jobs at Facebook warehouse scale. It is implemented as a scalable, reliable and maintainable distributed system. Cosco is based on the idea of partial in-memory aggregation across a shared pool of distributed memory. This provides vastly improved efficiency in disk usage compared to Spark's built-in shuffle. Long term, we believe the Cosco architecture will be key to efficiently supporting jobs at ever larger scale. In this talk we'll take a deep dive into the Cosco architecture and describe how it's deployed at Facebook. We will then describe how it's integrated to run shuffle for Spark, and contrast it with Spark's built-in sort-based shuffle mechanism and SOS (presented at Spark+AI Summit 2018).
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
The slides explain how shuffle works in Spark and help people understand more details about Spark internal. It shows how the major classes are implemented, including: ShuffleManager (SortShuffleManager), ShuffleWriter (SortShuffleWriter, BypassMergeSortShuffleWriter, UnsafeShuffleWriter), ShuffleReader (BlockStoreShuffleReader).
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
Lakehouses are quickly growing in popularity as a new approach to Data Platform Architecture bringing some of the long-established benefits from OLTP world to OLAP, including transactions, record-level updates/deletes, and changes streaming. In this talk, we will discuss Apache Hudi and how it unlocks possibilities of building your own fully open-source Lakehouse featuring a rich set of integrations with existing technologies, including Apache Pulsar. In this session, we will present: - What Lakehouses are, and why they are needed. - What Apache Hudi is and how it works. - Provide a use-case and demo that applies Apache Hudi’s DeltaStreamer tool to ingest data from Apache Pulsar.
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
As enterprises around the world bring more of their sensitive data into Hadoop data lakes, balancing the need for democratization of access to data without sacrificing strong security principles becomes paramount. In this webinar, Srikanth Venkat, director of product management for security & governance will demonstrate two new data protection capabilities in Apache Ranger – dynamic column masking and row level filtering of data stored in Apache Hive. These features have been introduced as part of HDP 2.5 platform release.
The Power of your Data Achieved - Next Gen ModernizationHortonworks
Fueled by ever-changing customer behaviors and an increasing number of industry disruptions, the modern enterprise requires analytics to stay ahead of the game. Today’s data warehouse needs continuous enhancements to address new requirements for advanced analytics, real-time streaming data, Big Data, and unstructured data. The focus should be on developing a forward-looking, future-proof view and holistically addressing the combination of forces that are impacting the existing operational model.
Webinar Series Part 5 New Features of HDF 5Hortonworks
Overview of the newest features of Hortonworks DataFlow highlighting the new processors, new user interface, edge intelligence powered by Apache MiNiFi and new support for multi-tenancy and new zero master clustering architecture
How to Use Apache Zeppelin with HWX HDBHortonworks
Part five in a five-part series, this webcast will be a demonstration of the integration of Apache Zeppelin and Pivotal HDB. Apache Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. This webinar will demonstrate the configuration of the psql interpreter and the basic operations of Apache Zeppelin when used in conjunction with Hortonworks HDB.
Top 5 Strategies for Retail Data AnalyticsHortonworks
It’s an exciting time for retailers as technology is driving a major disruption in the market. Whether you are just beginning to build a retail data analytics program or you have been gaining advanced insights from your data for quite some time, join Eric and Shish as we explore the trends, drivers and hurdles in retail data analytics
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
Apache NiFi, Storm and Kafka augment each other in modern enterprise architectures. NiFi provides a coding free solution to get many different formats and protocols in and out of Kafka and compliments Kafka with full audit trails and interactive command and control. Storm compliments NiFi with the capability to handle complex event processing.
Join us to learn how Apache NiFi, Storm and Kafka can augment each other for creating a new dataplane connecting multiple systems within your enterprise with ease, speed and increased productivity.
https://www.brighttalk.com/webcast/9573/224063
The path to a Modern Data Architecture in Financial ServicesHortonworks
Delivering Data-Driven Applications at the Speed of Business: Global Banking AML use case.
Chief Data Officers in financial services have unique challenges: they need to establish an effective data ecosystem under strict governance and regulatory requirements. They need to build the data-driven applications that enable risk and compliance initiatives to run efficiently. In this webinar, we will discuss the case of a global banking leader and the anti-money laundering solution they built on the data lake. With a single platform to aggregate structured and unstructured information essential to determine and document AML case disposition, they reduced mean time for case resolution by 75%. They have a roadmap for building over 150 data-driven applications on the same search-based data discovery platform so they can mitigate risks and seize opportunities, at the speed of business.
Apache Ambari: Managing Hadoop and YARNHortonworks
Part of the Hortonworks YARN Ready Webinar Series, this session is about management of Apache Hadoop and YARN using Apache Ambari. This series targets developers and we will feature a demo on Ambari.
Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop and Hortonworks Data Platform (HDP). Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers views to the end user. In this session Hortonworks will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.
Deploying and Managing Hadoop Clusters with AMBARIDataWorks Summit
Deploying, configuring, and managing large Hadoop and HBase clusters can be quite complex. Just upgrading one Hadoop component on a 2000-node cluster can take a lot of time and expertise, and there have been few tools specialized for Hadoop cluster administrators. AMBARI is an Apache incubator project to deliver Monitoring and Management functionality for Hadoop clusters. This paper presents the AMBARI tools for cluster management, specifically: Cluster pre-configuration and validation; Hadoop software deployment, installation, and smoketest; Hadoop configuration and re-config; and a basic set of management ops including start/stop service, add/remove node, etc. In providing these capabilities, AMBARI seeks to integrate with (rather than replace) existing open-source packaging and deployment technology available in most data centers, such as Puppet and Chef, Yum, Apt, and Zypper.
How Universities Use Big Data to Transform EducationHortonworks
Student performance data is increasingly being captured as part of software-based and online classroom exercises and testing. This data can be augmented with behavioral data captured from sources such as social media, student-professor meeting notes, blogs, student surveys, and so forth to discover new insights to improve student learning. The results transcend traditional IT departments to focus on issues like retention, research, and the delivery of content and courses through new modalities.
Hortonworks is partnering with Microsoft to show you how the Hortonworks Data Platform (HDP) running on the Microsoft stack enables you to develop a “single view of a student”.
Scaling real time streaming architectures with HDF and Dell EMC IsilonHortonworks
Streaming Analytics are the new normal. Customers are exploring use cases that have quickly transitioned from batch to near real time. Hortonworks Data Flow / Apache NiFi and Isilon provide a robust scalable architecture to enable real time streaming architectures. Explore our use cases and demo on how Hortonworks Data Flow and Isilon can empower your business for real time success
Hortonworks technical workshop operations with ambariHortonworks
Ambari continues on its journey of provisioning, monitoring and managing enterprise Hadoop deployments. With 2.0, Apache Ambari brings a host of new capabilities including updated metric collections; Kerberos setup automation and developer views for Big Data developers. In this Hortonworks Technical Workshop session we will provide an in-depth look into Apache Ambari 2.0 and showcase security setup automation using Ambari 2.0. View the recording at https://www.brighttalk.com/webcast/9573/155575. View the github demo work at https://github.com/abajwa-hw/ambari-workshops/blob/master/blueprints-demo-security.md. Recorded May 28, 2015.
Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. The Ambari engineering team will talk about improvements being made to the automation, metrics, logging, upgrade, and other core frameworks within Ambari as the project is being re-imagined.
Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
As part of this talk, the engineering team will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari is being re-imagined to handle these new challenges.
Speaker
Paul Codding, Product Management Director, Hortonworks
Oliver Szabo, Senior Software Engineer, Hortonworks
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Data Con LA
This talk draws on our experience in debugging and analyzing Hadoop jobs to describe some methodical approaches to this and present current and new tracing and tooling ideas that can help semi-automate parts of this difficult problem.
Manage Add-on Services in Apache AmbariJayush Luniya
Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
Cloudbreak (https://hortonworks.com/open-source/cloudbreak/), as part of the Hortonworks Data Platform (https://hortonworks.com/products/data-center/hdp/) (HDP), makes it easy to provision, configure and elastically grow clusters across cloud infrastructure providers including Amazon Web Services, Microsoft Azure, Google Cloud Platform and OpenStack.
It has been designed and developed in a micro-service architecture since the beginning and shipped in Docker.
This talk is about the challenges of how we scaled the application using Kubernetes on ACS (https://azure.microsoft.com/en-us/services/container-service/) (Azure Container Services) and how handle the hosted service challenges like:
- Exposing metrics regarding the application
- Collecting application, k8s, and system logs
- Handling scenarios when one of the application node dies
- Preparing for high workloads
Learn more about the meetup, here: https://www.meetup.com/futureofdata-budapest/events/244277121/
This presentation was part of a Future of Data meetup at Budapest. It is about how we run Cloudbreak in High Available / Fault tolerant mode on Kubernetes hosted on Azure.
Hadoop is used to run large scale jobs that are sub-divided into many tasks that are executed over multiple machines. There are complex dependencies between these tasks. And at scale, there can be 10’s to 1000’s of tasks running over 100 to 1000’s of machines which increases the challenge of making sense of their performance. Add to that pipelines of such jobs that logically run a business workflow as another level of complexity. This does not even account for other users competing for resources on the same Hadoop cluster which can have even a more drastic impact on your job performance and your SLAs. No wonder that the question of why Hadoop jobs run slower than expected, remains a perennial source of grief for developers. In this talk, we will draw on our experience in debugging and analyzing Hadoop jobs to describe some methodical approaches to this and present current and new tracing and tooling ideas that can help semi-automate parts of this difficult problem.
Hadoop is used to run large scale jobs that are sub-divided into many tasks that are executed over multiple machines. There are complex dependencies between these tasks. And at scale, there can be 10’s to 1000’s of tasks running over 100 to 1000’s of machines which increases the challenge of making sense of their performance. Add to that pipelines of such jobs that logically run a business workflow as another level of complexity. This does not even account for other users competing for resources on the same Hadoop cluster which can have even a more drastic impact on your job performance and your SLAs. No wonder that the question of why Hadoop jobs run slower than expected, remains a perennial source of grief for developers. In this talk, we will draw on our experience in debugging and analyzing Hadoop jobs to describe some methodical approaches to this and present current and new tracing and tooling ideas that can help semi-automate parts of this difficult problem.
Hadoop operations started on-prem primarily driven by Apache Ambari. However, due to the agility and flexibility of the cloud, it has driven many Hadoop cluster operations to the cloud and to hybrid environments. Cloud is enabling many ephemeral on-demand use cases which is a game-changing opportunity for analytic workloads. But all of this comes with the challenges of running enterprise workloads in the cloud securely and with ease.
Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
In this session, we will also take you through Cloudbreak as a solution to simplify provisioning and managing enterprise workloads while providing an open and common experience for deploying workloads across clouds. We will discuss the challenges (and opportunities) to run enterprise workloads in the cloud and will go through a live demo of how the latest from Cloudbreak enables enterprises to easily and securely run Apache Hadoop. This includes deep-dive discussion on Ambari Blueprints, recipes, custom images, and enabling Kerberos -- which are all key capabilities for Enterprise deployments.
As part of this talk, will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari and Cloudbreak community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari and Cloudbreak are being re-imagined to handle these new challenges.
Hadoop operations started on-prem primarily driven by Apache Ambari. However, due to the agility and flexibility of the cloud, it has driven many Hadoop cluster operations to the cloud and to hybrid environments. Cloud is enabling many ephemeral on-demand use cases which is a game-changing opportunity for analytic workloads. But all of this comes with the challenges of running enterprise workloads in the cloud securely and with ease.
Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
In this session, we will also take you through Cloudbreak as a solution to simplify provisioning and managing enterprise workloads while providing an open and common experience for deploying workloads across clouds. We will discuss the challenges (and opportunities) to run enterprise workloads in the cloud and will go through a live demo of how the latest from Cloudbreak enables enterprises to easily and securely run Apache Hadoop. This includes deep-dive discussion on Ambari Blueprints, recipes, custom images, and enabling Kerberos -- which are all key capabilities for Enterprise deployments.
As part of this talk, will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari and Cloudbreak community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari and Cloudbreak are being re-imagined to handle these new challenges.
Speaker: Santosh Gowda, Principal Solutions Engineer, Hortonworks
Demand for cloud is through the roof. Cloud is turbo charging the Enterprise IT landscape with agility and flexibility. And now, discussions of cloud architecture dominate Enterprise IT. Cloud is enabling many ephemeral on-demand use cases which is a game changing opportunity for analytic workloads. But all of this comes with the challenges of running enterprise workloads in the cloud securely and with ease.
In this session, we will take you through Cloudbreak as a solution to simplify provisioning and managing enterprise workloads while providing an open and common experience for deploying workloads across clouds. We will discuss the challenges (and opportunities) to run enterprise workloads in the cloud and will go through how the latest from Cloudbreak enables enterprises to easily and securely run big data workloads. This includes deep-dive discussion on autoscaling, Ambari Blueprints, recipes, custom images, and enabling Kerberos -- which are all key capabilities for Enterprise deployments.
As a last topic we will discuss how we deployed and operate Cloudbreak as a Service internally which enables rapid cluster deployment for prototyping and testing purposes.
Speakers
Peter Darvasi, Cloudbreak Partner Engineer, Hortonworks
Richard Doktorics, Staff Engineer, Hortonworks
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
The HDF 3.3 release delivers several exciting enhancements and new features. But, the most noteworthy of them is the addition of support for Kafka 2.0 and Kafka Streams.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-3-taking-stream-processing-next-level/
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
Forrester forecasts* that direct spending on the Internet of Things (IoT) will exceed $400 Billion by 2023. From manufacturing and utilities, to oil & gas and transportation, IoT improves visibility, reduces downtime, and creates opportunities for entirely new business models.
But successful IoT implementations require far more than simply connecting sensors to a network. The data generated by these devices must be collected, aggregated, cleaned, processed, interpreted, understood, and used. Data-driven decisions and actions must be taken, without which an IoT implementation is bound to fail.
https://hortonworks.com/webinar/iot-predictions-2019-beyond-data-heart-iot-strategy/
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
Cloudbreak, a part of Hortonworks Data Platform (HDP), simplifies the provisioning and cluster management within any cloud environment to help your business toward its path to a hybrid cloud architecture.
https://hortonworks.com/webinar/getting-data-cloud-cloudbreak-live-demo/
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
In this webinar, we talk with experts from Johns Hopkins as they share techniques and lessons learned in real-world Apache Hadoop implementation.
https://hortonworks.com/webinar/johns-hopkins-using-hadoop-securely-access-log-events/
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
Cybersecurity today is a big data problem. There’s a ton of data landing on you faster than you can load, let alone search it. In order to make sense of it, we need to act on data-in-motion, use both machine learning, and the most advanced pattern recognition system on the planet: your SOC analysts. Advanced visualization makes your analysts more efficient, helps them find the hidden gems, or bombs in masses of logs and packets.
https://hortonworks.com/webinar/catch-hacker-real-time-live-visuals-bots-bad-guys/
We have introduced several new features as well as delivered some significant updates to keep the platform tightly integrated and compatible with HDP 3.0.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-2-release-raises-bar-operational-efficiency/
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
With the growth of Apache Kafka adoption in all major streaming initiatives across large organizations, the operational and visibility challenges associated with Kafka are on the rise as well. Kafka users want better visibility in understanding what is going on in the clusters as well as within the stream flows across producers, topics, brokers, and consumers.
With no tools in the market that readily address the challenges of the Kafka Ops teams, the development teams, and the security/governance teams, Hortonworks Streams Messaging Manager is a game-changer.
https://hortonworks.com/webinar/curing-kafka-blindness-hortonworks-streams-messaging-manager/
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
The healthcare industry—with its huge volumes of big data—is ripe for the application of analytics and machine learning. In this webinar, Hortonworks and Quanam present a tool that uses machine learning and natural language processing in the clinical classification of genomic variants to help identify mutations and determine clinical significance.
Watch the webinar: https://hortonworks.com/webinar/interpretation-tool-genomic-sequencing-data-clinical-environments/
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
Last year IBM and Hortonworks jointly announced a strategic and deep partnership. Join us as we take a close look at the partnership accomplishments and the conjoined road ahead with industry-leading analytics offers.
View the webinar here: https://hortonworks.com/webinar/ibmhortonworks-transformation-big-data-landscape/
In this exclusive Premier Inside Out, you will hear from Druid committer Slim Bouguerra, Staff Software Engineer and Product Manager Will Xu. These Hortonworkers will explain the vision of these components, review new features, share some best practices and answer your questions.
View the webinar here: https://hortonworks.com/webinar/hortonworks-premier-apache-druid/
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
Gaining business advantages from big data is moving beyond just the efficient storage and deep analytics on diverse data sources to using AI methods and analytics on streaming data to catch insights and take action at the edge of the network.
https://hortonworks.com/webinar/accelerating-data-science-real-time-analytics-scale/
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
Thanks to sensors and the Internet of Things, industrial processes now generate a sea of data. But are you plumbing its depths to find the insight it contains, or are you just drowning in it? Now, Hortonworks and Seeq team to bring advanced analytics and machine learning to time-series data from manufacturing and industrial processes.
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
Trimble Transportation Enterprise is a leading provider of enterprise software to over 2,000 transportation and logistics companies. They have designed an architecture that leverages Hortonworks Big Data solutions and Machine Learning models to power up multiple Blockchains, which improves operational efficiency, cuts down costs and enables building strategic partnerships.
https://hortonworks.com/webinar/blockchain-with-machine-learning-powered-by-big-data-trimble-transportation-enterprise/
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
For years, the healthcare industry has had problems of data scarcity and latency. Clearsense solved the problem by building an open-source Hortonworks Data Platform (HDP) solution while providing decades worth of clinical expertise. Clearsense is delivering smart, real-time streaming data, to its healthcare customers enabling mission-critical data to feed clinical decisions.
https://hortonworks.com/webinar/delivering-smart-real-time-streaming-data-healthcare-customers-clearsense/
Making Enterprise Big Data Small with EaseHortonworks
Every division in an organization builds its own database to keep track of its business. When the organization becomes big, those individual databases grow as well. The data from each database may become silo-ed and have no idea about the data in the other database.
https://hortonworks.com/webinar/making-enterprise-big-data-small-ease/
Driving Digital Transformation Through Global Data ManagementHortonworks
Using your data smarter and faster than your peers could be the difference between dominating your market and merely surviving. Organizations are investing in IoT, big data, and data science to drive better customer experience and create new products, yet these projects often stall in ideation phase to a lack of global data management processes and technologies. Your new data architecture may be taking shape around you, but your goal of globally managing, governing, and securing your data across a hybrid, multi-cloud landscape can remain elusive. Learn how industry leaders are developing their global data management strategy to drive innovation and ROI.
Presented at Gartner Data and Analytics Summit
Speaker:
Dinesh Chandrasekhar
Director of Product Marketing, Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
Hortonworks DataFlow (HDF) is the complete solution that addresses the most complex streaming architectures of today’s enterprises. More than 20 billion IoT devices are active on the planet today and thousands of use cases across IIOT, Healthcare and Manufacturing warrant capturing data-in-motion and delivering actionable intelligence right NOW. “Data decay” happens in a matter of seconds in today’s digital enterprises.
To meet all the needs of such fast-moving businesses, we have made significant enhancements and new streaming features in HDF 3.1.
https://hortonworks.com/webinar/series-hdf-3-1-technical-deep-dive-new-streaming-features/
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
Join the Hortonworks product team as they introduce HDF 3.1 and the core components for a modern data architecture to support stream processing and analytics.
You will learn about the three main themes that HDF addresses:
Developer productivity
Operational efficiency
Platform interoperability
https://hortonworks.com/webinar/series-hdf-3-1-redefining-data-motion-modern-data-architectures/
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. It provides an end-to-end platform that can collect, curate, analyze, and act on data in real-time, on-premises, or in the cloud with a drag-and-drop visual interface. It’s being used across industries on large amounts of data that had stored in isolation which made collaboration and analysis difficult.
Join industry experts from Hortonworks and Attunity as they explain how Apache NiFi and streaming CDC technology provides a distributed, resilient platform for unlocking the value of data in new ways.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.