Garbage collection is the most famous (infamous) JVM mechanism and it dates back to Java 1.0. Every Java developer knows about its existence yet most of the time we wish we can ignore its behavior and assume it works perfectly. Unfortunately this is not the case and if you are ignoring it, GC may hit you really hard.... in production. Furthermore the information that you may find on the web can be a lot of times misleading. In this event we will try to demystify some of the misconceptions around GC by understanding how different GC mechanisms work and how to make the right decisions in order to make them work for you.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluHostedbyConfluent
"I will be presenting how we do the smart/automated capacity management on Multi-tenant Kafka cluster in Booking.com. It was a long journey. In this end to end story, I will be presenting what the issues were at the beginning, how we came up with a plan, designed, implemented, and applied to our existing clusters smoothly, now how the clients can monitor and even get alerted before their reserved capacity has been reached. What were the challenges and our learnings? What is next?
Why? In Booking.com, the infra team manages 60 different Kafka clusters with hundreds of topics in each. There are clusters running with hundred brokers. As there are hundreds of Kafka clients from tens of different departments, it is high likely some of the clients start abusing the cluster. Especially during peak times, when the retention was set as retention.ms, or when the underlying message size changes, it is hard to predict what would be the occupied storage in total. Finding the relevant clients, deciding which data to discard, dealing with so many unknowns in a short period of time can be hassle. Also these are not fun activities but just a toil for the team.
What? To avoid such boring issues, the team has chosen the path to build a smart mechanism and have quotas in place. It helped saving time developing new features instead of chasing people to resolve collisions. You can think that as an extension to the built-in throttling producer/consumer rate limits provided by the Apache Kafka, but it is much more than that. There are several components will be explained during the presentation one of them is our control plane (custom built) which manages the communication between clients and servers and does many things automated.
Another one is the Custom Policies that we plugged in on the Kafka side to validate the configuration even tried (malicious configuration) on the server side. The talk guarantees learning and shows examples of Kafka at scale problems in Booking.com."
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent
"Just as the Apache Kafka Brokers provide JMX metrics to monitor your cluster's health, Kafka Streams provides a rich set of metrics for monitoring your application's health and performance. The metrics to observe for a given use-case of Kafka Streams will vary significantly from application to application. Learning how to build and customize monitoring of those applications will help you maintain a healthy Kafka Streams ecosystem.
Takeaways
* An analysis and overview of the provided metrics, including the new end-to-end metrics of Kafka Streams 2.7.
* See how to extract metrics from your application using existing JMX tooling.
* Walkthrough how to build a dashboard for observing those metrics.
* Explore options of how to add additional JMX resources and Kafka Stream metrics to your application.
* How to verify you built your dashboard correctly by creating a data control set to validate your dashboard.
* Go beyond what you can collect from the Kafka Stream metrics."
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
Garbage collection is the most famous (infamous) JVM mechanism and it dates back to Java 1.0. Every Java developer knows about its existence yet most of the time we wish we can ignore its behavior and assume it works perfectly. Unfortunately this is not the case and if you are ignoring it, GC may hit you really hard.... in production. Furthermore the information that you may find on the web can be a lot of times misleading. In this event we will try to demystify some of the misconceptions around GC by understanding how different GC mechanisms work and how to make the right decisions in order to make them work for you.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluHostedbyConfluent
"I will be presenting how we do the smart/automated capacity management on Multi-tenant Kafka cluster in Booking.com. It was a long journey. In this end to end story, I will be presenting what the issues were at the beginning, how we came up with a plan, designed, implemented, and applied to our existing clusters smoothly, now how the clients can monitor and even get alerted before their reserved capacity has been reached. What were the challenges and our learnings? What is next?
Why? In Booking.com, the infra team manages 60 different Kafka clusters with hundreds of topics in each. There are clusters running with hundred brokers. As there are hundreds of Kafka clients from tens of different departments, it is high likely some of the clients start abusing the cluster. Especially during peak times, when the retention was set as retention.ms, or when the underlying message size changes, it is hard to predict what would be the occupied storage in total. Finding the relevant clients, deciding which data to discard, dealing with so many unknowns in a short period of time can be hassle. Also these are not fun activities but just a toil for the team.
What? To avoid such boring issues, the team has chosen the path to build a smart mechanism and have quotas in place. It helped saving time developing new features instead of chasing people to resolve collisions. You can think that as an extension to the built-in throttling producer/consumer rate limits provided by the Apache Kafka, but it is much more than that. There are several components will be explained during the presentation one of them is our control plane (custom built) which manages the communication between clients and servers and does many things automated.
Another one is the Custom Policies that we plugged in on the Kafka side to validate the configuration even tried (malicious configuration) on the server side. The talk guarantees learning and shows examples of Kafka at scale problems in Booking.com."
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent
"Just as the Apache Kafka Brokers provide JMX metrics to monitor your cluster's health, Kafka Streams provides a rich set of metrics for monitoring your application's health and performance. The metrics to observe for a given use-case of Kafka Streams will vary significantly from application to application. Learning how to build and customize monitoring of those applications will help you maintain a healthy Kafka Streams ecosystem.
Takeaways
* An analysis and overview of the provided metrics, including the new end-to-end metrics of Kafka Streams 2.7.
* See how to extract metrics from your application using existing JMX tooling.
* Walkthrough how to build a dashboard for observing those metrics.
* Explore options of how to add additional JMX resources and Kafka Stream metrics to your application.
* How to verify you built your dashboard correctly by creating a data control set to validate your dashboard.
* Go beyond what you can collect from the Kafka Stream metrics."
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
One sink to rule them all: Introducing the new Async SinkFlink Forward
Flink Forward San Francisco 2022.
Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics. It allows you to focus on your business logic, rather than spending time integrating with your downstream consumers. During the session we will dive deep into the internals to uncover how it works, why it was designed this way, and how to use it. We will code up a new sink from scratch and demonstrate how to quickly push data to a destination. At the end of this talk you will be ready to start implementing your own Flink sink using the new Async Sink framework.
by
Steffen Hausmann & Danny Cranmer
Reactive Microservices with Spring 5: WebFlux Trayan Iliev
On November 27 Trayan Iliev from IPT presented “Reactive microservices with Spring 5: WebFlux” @Dev.bg in Betahaus Sofia. IPT – Intellectual Products & Technologies has been organizing Java & JavaScript trainings since 2003.
Spring 5 introduces a new model for end-to-end functional and reactive web service programming with Spring 5 WebFlow, Spring Data & Spring Boot. The main topics include:
– Introduction to reactive programming, Reactive Streams specification, and project Reactor (as WebFlux infrastructure)
– REST services with WebFlux – comparison between annotation-based and functional reactive programming approaches for building.
– Router, handler and filter functions
– Using reactive repositories and reactive database access with Spring Data. Building end-to-end non-blocking reactive web services using Netty-based web runtime
– Reactive WebClients and integration testing. Reactive WebSocket support
– Realtime event streaming to WebClients using JSON Streams, and to JS client using SSE.
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
Learn what you need to know to experience nirvana in the evaluation of G1 GC even if your are migrating from Parallel GC to G1, or CMS GC to G1 GC
You also get a walk through of some case study data
G1 GC
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
One sink to rule them all: Introducing the new Async SinkFlink Forward
Flink Forward San Francisco 2022.
Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics. It allows you to focus on your business logic, rather than spending time integrating with your downstream consumers. During the session we will dive deep into the internals to uncover how it works, why it was designed this way, and how to use it. We will code up a new sink from scratch and demonstrate how to quickly push data to a destination. At the end of this talk you will be ready to start implementing your own Flink sink using the new Async Sink framework.
by
Steffen Hausmann & Danny Cranmer
Reactive Microservices with Spring 5: WebFlux Trayan Iliev
On November 27 Trayan Iliev from IPT presented “Reactive microservices with Spring 5: WebFlux” @Dev.bg in Betahaus Sofia. IPT – Intellectual Products & Technologies has been organizing Java & JavaScript trainings since 2003.
Spring 5 introduces a new model for end-to-end functional and reactive web service programming with Spring 5 WebFlow, Spring Data & Spring Boot. The main topics include:
– Introduction to reactive programming, Reactive Streams specification, and project Reactor (as WebFlux infrastructure)
– REST services with WebFlux – comparison between annotation-based and functional reactive programming approaches for building.
– Router, handler and filter functions
– Using reactive repositories and reactive database access with Spring Data. Building end-to-end non-blocking reactive web services using Netty-based web runtime
– Reactive WebClients and integration testing. Reactive WebSocket support
– Realtime event streaming to WebClients using JSON Streams, and to JS client using SSE.
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
Learn what you need to know to experience nirvana in the evaluation of G1 GC even if your are migrating from Parallel GC to G1, or CMS GC to G1 GC
You also get a walk through of some case study data
G1 GC
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
Apache Storm 0.9 basic training - VerisignMichael Noll
Apache Storm 0.9 basic training (130 slides) covering:
1. Introducing Storm: history, Storm adoption in the industry, why Storm
2. Storm core concepts: topology, data model, spouts and bolts, groupings, parallelism
3. Operating Storm: architecture, hardware specs, deploying, monitoring
4. Developing Storm apps: Hello World, creating a bolt, creating a topology, running a topology, integrating Storm and Kafka, testing, data serialization in Storm, example apps, performance and scalability tuning
5. Playing with Storm using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/09/15/apache-storm-training-deck-and-tutorial/
Many thanks to the Twitter Engineering team (the creators of Storm) and the Apache Storm open source community!
Bobby Evans and Tom Graves, the engineering leads for Spark and Storm development at Yahoo will talk about how these technologies are used on Yahoo's grids and reasons why to use one or the other.
Bobby Evans is the low latency data processing architect at Yahoo. He is a PMC member on many Apache projects including Storm, Hadoop, Spark, and Tez. His team is responsible for delivering Storm as a service to all of Yahoo and maintaining Spark on Yarn for Yahoo (Although Tom really does most of that work).
Tom Graves a Senior Software Engineer on the Platform team at Yahoo. He is an Apache PMC member on Hadoop, Spark, and Tez. His team is responsible for delivering and maintaining Spark on Yarn for Yahoo.
This tutorial covers advanced consumer topics like custom deserializers, ConsumerRebalanceListener to rewind to a certain offset, manual assignment of partitions to implement a "priority queue", “at least once” message delivery semantics Consumer Java example, “at most once” message delivery semantics Consumer Java example, “exactly once” message delivery semantics Consumer Java example, and a lot more.
UKOUG version of a presentation trying to establish the sensible limits of parallelism on a couple of hardware configurations. Detailed white paper is at http://oracledoug.com/px_slaves.pdf
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
Tech-talk at Bay Area Apache Spark Meetup.
Apache Spark 2.0 will ship with the second generation Tungsten engine. Building upon ideas from modern compilers and MPP databases, and applying them to data processing queries, we have started an ongoing effort to dramatically improve Spark’s performance and bringing execution closer to bare metal. In this talk, we’ll take a deep dive into Apache Spark 2.0’s execution engine and discuss a number of architectural changes around whole-stage code generation/vectorization that have been instrumental in improving CPU efficiency and gaining performance.
Stress Testing at Twitter: a tale of New Year EvesHerval Freire
Failure testing is a fundamental piece of Twitter’s reliability engineering. Over the years, we developed a rich toolchain that allows us to detect and fix scalability problems long before they happen. In this talk, we’ll cover some of the strategies we employ and discuss our always evolving approach to API stress testing and its “unit test” equivalent, redline testing.
introduction to data processing using Hadoop and PigRicardo Varela
In this talk we make an introduction to data processing with big data and review the basic concepts in MapReduce programming with Hadoop. We also comment about the use of Pig to simplify the development of data processing applications
YDN Tuesdays are geek meetups organized the first Tuesday of each month by YDN in London
With tens of thousands of Java servers running in production in enterprise, Java has become a language of choice for building production systems. If our machines are to exhibit acceptable performance, they require regular tuning.This talk takes a detailed look at techniques for tuning a Java Server.
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareMiro Samek
Embedded software developers from different industries are independently re-discovering patterns for building concurrent software that is safer, more responsive and easier to understand than naked threads of a Real-Time Operating System (RTOS). These best practices universally favor event-driven, asynchronous, non-blocking, encapsulated active objects with state machines instead of naked, blocking RTOS threads. This presentation explains the concepts related to this increasingly popular "reactive approach", and specifically how they apply to real-time embedded systems.
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
SDN operators need to measure the performance of OF HW switch on their site. Cause there is 1000 times differences in latency, depends on the specified flow entry. ASIC can forward in several μsecs but the software (CPU) may take msec.
To protect yourself from unexpected performance plunge, monitor your switches healthiness on your site.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
18. Little’s Law
L=λW
The long-term average number of customers in a stable system L
is equal to the long-term average effective arrival rate, λ, multiplied
by the average time a customer spends in the system, W; or
expressed algebraically: L = λW.
http://en.wikipedia.org/wiki/Little's_law
29. Externalize Configuration
Hard-coded values require
recompilation/repackaging.
conf.setNumWorkers(3);
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Values from external config.
No repackaging!
conf.setNumWorkers(props.get(“num.workers"));
builder.setSpout("spout", new RandomSentenceSpout(), props.get(“spout.paralellism”));
builder.setBolt("split", new SplitSentence(), props.get(“split.paralellism”)).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), props.get(“count.paralellism”)).fieldsGrouping("split", new Fields("word"));
36. Parallelism == Manifold
Take input from one big pipe and
distribute it to many smaller pipes
The bigger the size difference, the
more parallelism you will need
41. Sizeup — Fire
What are my water
sources? What GPM
can they support?
How many lines (hoses)
do I need?
How much water will I
need to flow to put this
fire out?
42. Sizeup — Storm
What are my input
sources?
At what rate do they
deliver messages?
What size are the
messages?
What's my slowest data
sink?
53. Example
10 Worker Nodes
16 Cores / Machine
(10 * 16) - 10 = 150 “Parallelism Units” available
54. Example
10 Worker Nodes
16 Cores / Machine
(10 * 16) - 10 = 150 “Parallelism Units” available (* 10-100 if I/O bound)
Distrubte this among tasks in topology. Higher for slow tasks, lower for fast tasks.
60. Key Settings
topology.max.spout.pending
Spout/Bolt API: Controls how many tuples are in-flight (not ack’ed)
Trident API: Controls how many batches are in flight (not committed)
63. Key Settings
topology.message.timeout.secs
Controls how long a tuple tree (Spout/Bolt API) or batch (Trident API) has to
complete processing before Storm considers it timed out and fails it.
Default value is 30 seconds.
64. Key Settings
topology.message.timeout.secs
Q: “Why am I getting tuple/batch failures for no apparent reason?”
A: Timeouts due to a bottleneck.
Solution: Look at the “Complete Latency” metric. Increase timeout and/or
increase component parallelism to address the bottleneck.
69. Nimbus
Generally light load
Can collocate Storm UI service
m1.xlarge (or equivalent) should suffice
Save the big metal for Supervisor/Worker machines…
78. ZooKeeper Considerations
Use dedicated machines, preferably
bare-metal if an option
Start with 3 node ensemble
(can tolerate 1 node loss)
I/O is ZooKeeper’s main bottleneck
Dedicated disk for ZK storage
SSDs greatly improve performance
79. Recap
Know/track your latencies and code appropriately
Externalize configuration
Scaling is a factor of balancing the I/O and CPU requirements of your use
case
Dev + DevOps + Ops coordination and collaboration is essential