Akka is the platform for the next generation event-driven, scalable and fault-tolerant architectures on the JVM
We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it's because we are using the wrong tools and the wrong level of abstraction.
Akka is here to change that.
Using the Actor Model together with Software Transactional Memory we raise the abstraction level and provides a better platform to build correct concurrent and scalable applications.
For fault-tolerance we adopt the "Let it crash" / "Embrace failure" model which have been used with great success in the telecom industry to build applications that self-heals, systems that never stop.
Actors also provides the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.
Akka is Open Source and available under the Apache 2 License.
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
Apache Kafka를 이용하여 이미지 데이터를 얼마나 빠르게(with low latency) 전달 가능한지 성능 테스트.
최종 목적은 AI(ML/DL) 모델의 입력으로 대량의 실시간 영상/이미지 데이터를 전달하는 메세지 큐로 사용하기 위하여, Drone/제조공정 등의 장비에서 전송된 이미지를 얼마나 빨리 AI Model로 전달 할 수 있는지 확인하기 위함.
그래서 Kafka에서 이미지를 전송하는 간단한 테스트를 진행하였고,
이 과정에서 latency를 얼마나 줄여주는지를 확인해 보았다.(HTTP 프로토콜/Socket과 비교하여)
[현재 까지 결론]
- Apache Kafka는 대량의 요청 처리를 위한 throughtput에 최적화 된 솔루션임.
- 현재는 producer의 몇가지 옵션만 조정하여 테스트한 결과이므로,
- 잠정적인 결과이지만, kafka의 latency를 향상을 위해서는 많은 시도가 필요할 것 같음.
- 즉, 단일 요청의 latency는 확실히 느리지만,
- 대량의 처리를 기준으로 평균 latency를 비교하면 평균적인 latency는 많이 낮아짐.
Test Code : https://github.com/freepsw/kafka-latency-test
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Kafka Tiered Storage separates compute and data storage in two independently scalable layers. Uber's Kafka Improvement Proposal (KIP) #405 describes two-tiered storage, which is a major step towards cloud-native Kafka. It stores the most recent data locally and offloads older data to a remote storage service. Operationally, the benefit is faster routine cluster maintenance activities. In Linkedin, Kafka tiered storage is strongly desired to reduce the cost of running Kafka in the Azure cloud environment. As KIP-405 does not dictate the implementation of remote storage substrate, Linkedin's choice for tiering Kafka in Azure deployments is the Azure Blob Service. This presentation will begin with the motivation behind Linkedin efforts to adopt Kafka Tiered Storage. Next, the architecture of KIP-405 will be discussed. Finally, the Remote Storage Manager for Azure Blobs, which is a work-in-progress, will be presented.
Video: https://youtu.be/V5gaBE5CMwg?t=1387
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
What if you could get the simplicity, convenience, interoperability, and storage niceties of an old-fashioned CSV with the speed of a NoSQL database and the storage requirements of a gzipped file? Enter Parquet.
At The Weather Company, Parquet files are a quietly awesome and deeply integral part of our Spark-driven analytics workflow. Using Spark + Parquet, we’ve built a blazing fast, storage-efficient, query-efficient data lake and a suite of tools to accompany it.
We will give a technical overview of how Parquet works and how recent improvements from Tungsten enable SparkSQL to take advantage of this design to provide fast queries by overcoming two major bottlenecks of distributed analytics: communication costs (IO bound) and data decoding (CPU bound).
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
Apache Kafka를 이용하여 이미지 데이터를 얼마나 빠르게(with low latency) 전달 가능한지 성능 테스트.
최종 목적은 AI(ML/DL) 모델의 입력으로 대량의 실시간 영상/이미지 데이터를 전달하는 메세지 큐로 사용하기 위하여, Drone/제조공정 등의 장비에서 전송된 이미지를 얼마나 빨리 AI Model로 전달 할 수 있는지 확인하기 위함.
그래서 Kafka에서 이미지를 전송하는 간단한 테스트를 진행하였고,
이 과정에서 latency를 얼마나 줄여주는지를 확인해 보았다.(HTTP 프로토콜/Socket과 비교하여)
[현재 까지 결론]
- Apache Kafka는 대량의 요청 처리를 위한 throughtput에 최적화 된 솔루션임.
- 현재는 producer의 몇가지 옵션만 조정하여 테스트한 결과이므로,
- 잠정적인 결과이지만, kafka의 latency를 향상을 위해서는 많은 시도가 필요할 것 같음.
- 즉, 단일 요청의 latency는 확실히 느리지만,
- 대량의 처리를 기준으로 평균 latency를 비교하면 평균적인 latency는 많이 낮아짐.
Test Code : https://github.com/freepsw/kafka-latency-test
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Kafka Tiered Storage separates compute and data storage in two independently scalable layers. Uber's Kafka Improvement Proposal (KIP) #405 describes two-tiered storage, which is a major step towards cloud-native Kafka. It stores the most recent data locally and offloads older data to a remote storage service. Operationally, the benefit is faster routine cluster maintenance activities. In Linkedin, Kafka tiered storage is strongly desired to reduce the cost of running Kafka in the Azure cloud environment. As KIP-405 does not dictate the implementation of remote storage substrate, Linkedin's choice for tiering Kafka in Azure deployments is the Azure Blob Service. This presentation will begin with the motivation behind Linkedin efforts to adopt Kafka Tiered Storage. Next, the architecture of KIP-405 will be discussed. Finally, the Remote Storage Manager for Azure Blobs, which is a work-in-progress, will be presented.
Video: https://youtu.be/V5gaBE5CMwg?t=1387
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
What if you could get the simplicity, convenience, interoperability, and storage niceties of an old-fashioned CSV with the speed of a NoSQL database and the storage requirements of a gzipped file? Enter Parquet.
At The Weather Company, Parquet files are a quietly awesome and deeply integral part of our Spark-driven analytics workflow. Using Spark + Parquet, we’ve built a blazing fast, storage-efficient, query-efficient data lake and a suite of tools to accompany it.
We will give a technical overview of how Parquet works and how recent improvements from Tungsten enable SparkSQL to take advantage of this design to provide fast queries by overcoming two major bottlenecks of distributed analytics: communication costs (IO bound) and data decoding (CPU bound).
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Monitoring Apache Kafka
When you are running systems in production, clearly you want to make sure they are up and running at all times. But in a distributed system such as Apache Kafka… what does “up and running” even mean?
Experienced Apache Kafka users know what is important to monitor, which alerts are critical and how to respond to them. They don’t just collect metrics - they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines.
In this presentation, we’ll discuss best practices of monitoring Apache Kafka. We’ll look at which metrics are critical to alert on, which are useful in troubleshooting and what may actually misleading. We’ll review a few “worst practices” - common mistakes that you should avoid. We’ll then look at what metrics don’t tell you - and how to cover those essential gaps.
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
Cloud Native Night August 2016, Munich: Talk by Julius Volz (@juliusvolz, Co-founder at Prometheus).
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: This talk is on monitoring dynamic cloud environments with Prometheus.
Stream Processing – Concepts and FrameworksGuido Schmutz
More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. It is one thing to collect these events in the velocity they arrive, without losing any single message. An Event Hub and a data flow engine can help here. It’s another thing to do some (complex) analytics on the data. There is always the option to first store in a data sink of choice and later analyze it. Storing even a high-volume event stream is feasible and not a challenge anymore. But this adds to the end-to-end latency and it takes minutes if not hours to present results. If you need to react fast, you simply can’t afford to first store the data. You need to do process it directly on the data stream. This is called Stream Processing or Stream Analytics. In this talk I will present the important concepts, a Stream Processing solution should support and then dive into some of the most popular frameworks available on the market and how they compare.
Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks
As we continue to push the boundaries of what is possible with respect to pipeline throughput and data serving tiers, new methodologies and techniques continue to emerge to handle larger and larger workloads
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Lightbend
The Big Data industry emerged in response to the unprecedented sizes of data sets collected by Internet companies and the particular needs they had to store and use that data.
Today, the need to process that data more quickly is morphing Big Data architectures into Fast Data architectures. This session discusses the forces driving this trend and the most popular tools that have emerged to address particular design challenges:
Spark - For sophisticated processing of data streams, as well as traditional batch-mode processing.
Kafka - For durable and scalable ingestion and distribution of data streams.
Cassandra - For scalable, flexible persistence.
Reactive Platform: Lagom, Akka, and Play - For integration of other components and building microservices.
Mesos - For cluster resource management.
---
About the presenter:
Dean Wampler, Ph.D. is the Architect for Big Data Products and Services and a member of the office of the CTO at Lightbend. He is designing the product strategy and technical architecture for Lightbend's Spark on Mesos products and emerging streaming tools built around Spark and Lightbend’s ConductR and Akka products. Dean has written books on Scala, Functional Programming, and Hive for O'Reilly. He speaks at and co-organizes many industry conferences. He also organizes several Chicago-area user groups and contributes to many open-source projects, including Apache Spark. Dean has a Ph.D. in Physics from the University of Washington.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Monitoring Apache Kafka
When you are running systems in production, clearly you want to make sure they are up and running at all times. But in a distributed system such as Apache Kafka… what does “up and running” even mean?
Experienced Apache Kafka users know what is important to monitor, which alerts are critical and how to respond to them. They don’t just collect metrics - they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines.
In this presentation, we’ll discuss best practices of monitoring Apache Kafka. We’ll look at which metrics are critical to alert on, which are useful in troubleshooting and what may actually misleading. We’ll review a few “worst practices” - common mistakes that you should avoid. We’ll then look at what metrics don’t tell you - and how to cover those essential gaps.
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
Cloud Native Night August 2016, Munich: Talk by Julius Volz (@juliusvolz, Co-founder at Prometheus).
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: This talk is on monitoring dynamic cloud environments with Prometheus.
Stream Processing – Concepts and FrameworksGuido Schmutz
More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. It is one thing to collect these events in the velocity they arrive, without losing any single message. An Event Hub and a data flow engine can help here. It’s another thing to do some (complex) analytics on the data. There is always the option to first store in a data sink of choice and later analyze it. Storing even a high-volume event stream is feasible and not a challenge anymore. But this adds to the end-to-end latency and it takes minutes if not hours to present results. If you need to react fast, you simply can’t afford to first store the data. You need to do process it directly on the data stream. This is called Stream Processing or Stream Analytics. In this talk I will present the important concepts, a Stream Processing solution should support and then dive into some of the most popular frameworks available on the market and how they compare.
Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks
As we continue to push the boundaries of what is possible with respect to pipeline throughput and data serving tiers, new methodologies and techniques continue to emerge to handle larger and larger workloads
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Lightbend
The Big Data industry emerged in response to the unprecedented sizes of data sets collected by Internet companies and the particular needs they had to store and use that data.
Today, the need to process that data more quickly is morphing Big Data architectures into Fast Data architectures. This session discusses the forces driving this trend and the most popular tools that have emerged to address particular design challenges:
Spark - For sophisticated processing of data streams, as well as traditional batch-mode processing.
Kafka - For durable and scalable ingestion and distribution of data streams.
Cassandra - For scalable, flexible persistence.
Reactive Platform: Lagom, Akka, and Play - For integration of other components and building microservices.
Mesos - For cluster resource management.
---
About the presenter:
Dean Wampler, Ph.D. is the Architect for Big Data Products and Services and a member of the office of the CTO at Lightbend. He is designing the product strategy and technical architecture for Lightbend's Spark on Mesos products and emerging streaming tools built around Spark and Lightbend’s ConductR and Akka products. Dean has written books on Scala, Functional Programming, and Hive for O'Reilly. He speaks at and co-organizes many industry conferences. He also organizes several Chicago-area user groups and contributes to many open-source projects, including Apache Spark. Dean has a Ph.D. in Physics from the University of Washington.
Introduction to Akka 2. Explains what Akka's actors are all about and how to utilize them to write scalable and fault-tolerant systems.
Talk given at JavaZone 2012.
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Lightbend
Things were easier when all our data used to be offline, analyzed overnight in batches. Now our data is online, in motion, and generated constantly. For architects, developers and their businesses, this means that there is an urgent need for tools and applications that can deliver real-time (or near real-time) streaming ETL capabilities.
In this session by Konrad Malawski, author, speaker and Senior Akka Engineer at Lightbend, you will learn how to build these streaming ETL pipelines with Akka Streams, Alpakka and Apache Kafka, and why they matter to enterprises that are increasingly turning to streaming Fast Data applications.
Everyone in the Scala world is using or looking into using Akka for low-latency, scalable, distributed or concurrent systems. I'd like to share my story of developing and productionizing multiple Akka apps, including low-latency ingestion and real-time processing systems, and Spark-based applications.
When does one use actors vs futures?
Can we use Akka with, or in place of, Storm?
How did we set up instrumentation and monitoring in production?
How does one use VisualVM to debug Akka apps in production?
What happens if the mailbox gets full?
What is our Akka stack like?
I will share best practices for building Akka and Scala apps, pitfalls and things we'd like to avoid, and a vision of where we would like to go for ideal Akka monitoring, instrumentation, and debugging facilities. Plus backpressure and at-least-once processing.
We are drowning in complexity—can we do better?Jonas Bonér
Today’s vast cloud-native infrastructure ecosystem is excellent. Unfortunately, it has grown very complex and hard to navigate. What tools to use for what job? How to compose them into a single coherent system? How to ensure the application’s guarantees and SLAs holistically? It can easily be overwhelming, and a lot falls on the Ops/SRE team that needs to manage it all.
Serverless to the rescue? Yes and no. It does provide a fantastic promise of a better DX for developers. But it has fallen short of this promise, stopped in its tracks halfway there.
Can we do better? Definitely. What we need is a new category of managed platforms that do full “vertical integration” of all infrastructure; providing a simple and high-level programming model allows the developer to focus on just three things: API definition, domain data, and business logic—i.e., working on direct business value. The rest, all of the rest, should be outsourced to the platform itself. Let me show you what I mean.
Kalix: Tackling the The Cloud to Edge ContinuumJonas Bonér
Read this blog for an overview of Kalix:
https://www.kalix.io/blog/kalix-move-to-the-cloud-extend-to-the-edge-go-beyond
Abstract:
What will the future of the Cloud and Edge look like for us as developers? We have great infrastructure nowadays, but that only solves half of the problem. The Serverless developer experience shows the way, but it’s clear that FaaS is not the final answer. What we need is a programming model and developer UX that takes full advantage of new Cloud and Edge infrastructure, allowing us to build general-purpose applications, without needless complexity.
What if you only had to think about your business logic, public API, and how your domain data is structured, not worry about how to store and manage it? What if you could not only be serverless but become “databaseless” and forget about databases, storage APIs, and message brokers?
Instead, what if your data just existed wherever it needed to be, co-located with the service and its user, at the edge, in the cloud, or in your own private network—always there and available, always correct and consistent? Where the data is injected into your services on an as-needed basis, automatically, timely, efficiently, and intelligently.
Services, powered with this “data plane” of application state—attached to and available throughout the network—can run anywhere in the world: from the public Cloud to 10,000s of PoPs out at the Edge of the network, in close physical approximation to its users, where the co-location of state, processing, and end-user, ensures ultra-low latency and high throughput.
Sounds exciting? Let me show you how we are making this vision a reality building a distributed real-time Data Plane PaaS using technologies like Akka, Kubernetes, gRPC, Linkerd, and more.
The Reactive Principles: Design Principles For Cloud Native ApplicationsJonas Bonér
Reactive Summit Keynote 2020: https://www.youtube.com/watch?v=e5kek8vx2ws
Abstract: Building applications for the cloud means embracing a radically different architecture than that of a traditional single-machine monolith, requiring new tools, practices, and design patterns. The cloud’s distributed nature brings its own set of concerns–building a Cloud Native, Edge Native, or Internet of Things (IoT) application means building and running a distributed system on unreliable hardware and across unreliable networks. In this keynote session by Jonas Bonér, creator of Akka, founder/CTO of Lightbend, and Chair of the Reactive Foundation, we’ll review a set of Reactive Principles that enable the design and implementation of Cloud Native applications–applications that are highly concurrent, distributed, performant, scalable, and resilient, while at the same time conserving resources when deploying, operating, and maintaining them.
The Serverless experience is revolutionary and will grow to dominate the future of Cloud. Function-as-a-Service (FaaS) however—with its ephemeral, stateless, and short-lived functions—is only the first step. FaaS is great for processing-intensive, parallelizable workloads, moving data from A to B providing enrichment and transformation along the way. But it is quite limited and constrained in what use-cases it addresses well, which makes it very hard/inefficient to implement general-purpose application development and distributed systems protocols.
What’s needed is a next-generation Serverless platform and programming model for general-purpose application development in the new world of real-time data and event-driven systems. What is missing is ways to manage distributed state in a scalable and available fashion, support for long-lived virtual stateful services, ways to physically co-locate data and processing, and options for choosing the right data consistency model for the job.
This talk will discuss the challenges, requirements, and introduce you to our proposed solution: Cloudstate—an Open Source project building the next generation Stateful Serverless and leveraging state models such as Event Sourcing, CQRS, and CRDTs, running on Akka, gRPC, Knative, Kubernetes, and GraalVM, in a polyglot fashion with support for Go, JavaScript, Java, Swift, Scala, Python, Kotlin, and more.
In this talk, we will explore the nature of events, what it means to be event-driven, and how we can unleash the power of events and commands by applying an events-first domain-driven design to microservices-based architectures.
We will start by developing a solid theoretical understanding of how to design systems of event-driven microservices. Then we will discuss the practical tools and techniques you can use to reap the most benefit from that design, as well as, most importantly, what to avoid along the way.
We’ll discuss how an events-first design approach to building microservices can improve the following characteristics over competing techniques:
- increase certainty
- increase resilience
- increase scalability
- increase traceability
- increase loose coupling
- reduce risk
Skeptics should definitely attend.
How Events Are Reshaping Modern SystemsJonas Bonér
Event-driven architecture and design have been getting a lot of attention in recent years. It’s an old concept that has been around for decades, so why this sudden peak of interest?
In this talk, we will explore the nature of events, what it means to be event-driven, and how we can unleash the power of events. The goal is to arm you with a solid theoretical understanding of how to design an event-driven system, what tools and techniques you can use to reap the most benefit from its design, and perhaps most importantly, what to avoid.
We'll discuss how an event-driven design can help:
- drive autonomy
- reduce risk
- increase certainty
- increase loose coupling
- increase scalability
- increase resilience
- increase traceability
Skeptics should definitely attend.
Reactive Microsystems: The Evolution of Microservices at ScaleJonas Bonér
Everyone is talking about Microservices and there is more confusion than ever about what the promise of Microservices really means—and how to deliver on it. To address this situation, we will explore Microservices from first principles, distill their essence and put them in their true context: Distributed Systems.
Distributed Systems is very hard and we—system developers—have been spoiled by centralized servers for too long. Slicing an existing system into various REST services and wiring them back together again with synchronous protocols and traditional enterprise tools—designed for Monolithic architectures—will set us up for failure.
As if that wasn’t enough, we can’t only think about systems of Microservices. In order to make each Microservice scalable and resilient in and of itself, we have to design each Microservice as a Distributed System—a «Microsystem»—architected from the ground up using the Reactive principles and Events-first Domain Driven Design.
In this talk I’ll walk you through the evolution of such a system, discussing what you need to know in order to design a Scalable Microservices Architecture.
Everyone is talking about microservices, and there is more confusion than ever about what the promise of microservices really means and how to deliver on it. To address this we will explore microservices from first principles, distilling their essence and putting them in their true context: distributed systems.
What many people forget is that microservices are distributed and collaborative by nature and only make sense as systems—one collaborator is no collaborator. It is in between the microservices that the most interesting and rewarding, and also challenging, problems arise—enter the world of distributed systems.
Distributed systems are by definition complex, and we system developers have been spoiled by centralized servers for too long to easily understand what this really means. Slicing an existing system into various REST services and wiring them back together again with synchronous protocols and traditional enterprise tools—designed for monolithic architectures—will set us up for failure.
As if that wasn’t enough, we can’t just think about systems of microservices. In order to make each microservice resilient and elastic in and of itself, we have to design each individual microservice as a distributed system—a «microsystem»—architected from the ground up using the reactive principles.
Without Resilience, Nothing Else MattersJonas Bonér
It doesn’t matter how beautiful, loosely coupled, scalable, highly concurrent, non-blocking, responsive and performant your application is—if it isn't running, then it's 100% useless. Without resilience, nothing else matters.
Most developers understand what the word resilience means, at least superficially, but way too many lack a deeper understanding of what it really means in the context of the system that they are working on now. I find it really sad to see, since understanding and managing failure is more important today than ever. Outages are incredibly costly—for many definitions of cost—and can sometimes take down whole businesses.
In this talk we will explore the essence of resilience. What does it really mean? What is its mechanics and characterizing traits? How do other sciences and industries manage it, and what can we learn from that? We will see that everything hints at the same conclusion; that failure is inevitable and needs to be embraced, and that resilience is by design.
Video: https://www.parleys.com/tutorial/life-beyond-illusion-present
Summary: The idea of the present is an illusion. Everything we see, hear and feel is just an echo from the past. But this illusion has influenced us and the way we view the world in so many ways; from Newton’s physics with a linearly progressing timeline accruing absolute knowledge along the way to the von Neumann machine with its total ordering of instructions updating mutable state with full control of the “present”. But unfortunately this is not how the world works. There is no present, all we have is facts derived from the merging of multiple pasts. The truth is closer to Einstein’s physics where everything is relative to one’s perspective.
As developers we need to wake up and break free from the perceived reality of living in a single globally consistent present. The advent of multicore and cloud computing architectures meant that most applications today are distributed systems—multiple cores separated by the memory bus or multiple nodes separated by the network—which puts a harsh end to this illusion. Facts travel at the speed of light (at best), which makes the distinction between past and perceived present even more apparent in a distributed system where latency is higher and where facts (messages) can get lost.
The only way to design truly scalable and performant systems that can construct a sufficiently consistent view of history—and thereby our local “present”—is by treating time as a first class construct in our programming model and to model the present as facts derived from the merging of multiple concurrent pasts.
In this talk we will explore what all this means to the design of our systems, how we need to view and model consistency, consensus, communication, history and behaviour, and look at some practical tools and techniques to bring it all together.
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsJonas Bonér
Abstract:
The demands and expectations for applications have changed dramatically in recent years. Applications today are deployed on a wide range of infrastructure; from mobile devices up to thousands of nodes running in the cloud—all powered by multi-core processors. They need to be rich and collaborative, have a real-time feel with millisecond response time and should never stop running. Additionally, modern applications are a mashup of external services that need to be consumed and composed to provide the features at hand.
We are seeing a new type of applications emerging to address these new challenges—these are being called Reactive Applications. In this talk we will discuss four key traits of Reactive; Responsive, Resilient, Elastic and Message-Driven—how they impact application design, how they interact, their supporting technologies and techniques, how to think when designing and building them—all to make it easier for you and your team to Go Reactive.
Intended Audience:
Programmers, architects, CIO/CTOs and everyone with a desire to challenge the status quo and expand their horizons on how to tackle the current and future challenges in the computing industry.
Abstract:
Reactive applications need to be able to respond to demand, be elastic and ready to scale up, down, in and out—taking full advantage of mobile, multi-core and cloud computing architectures—in real time.
In this talk we will discuss the guiding principles making this possible through the use of share-nothing and non-blocking designs, applied all the way down the stack. We will learn how to deliver systems that provide reactive supply to changing demand.
I gave this talk at React Conf 2014 in London. Recording available here: https://www.youtube.com/watch?v=mBFdj7w4aFA
Building Reactive Systems with Akka (in Java 8 or Scala)Jonas Bonér
Learn how to build Reactive Systems with Akka. Examples in both Java 8 and Scala.
Abstract:
The demands and expectations for applications have changed dramatically in recent years. Applications today are deployed on a wide range of infrastructure; from mobile devices up to thousands of nodes running in the cloud—all powered by multi-core processors. They need to be rich and collaborative, have a real-time feel with millisecond response time and should never stop running. Additionally, modern applications are a mashup of external services that need to be consumed and composed to provide the features at hand. We are seeing a new type of applications emerging to address these new challenges—these are being called Reactive Applications.
In this talk we will introduce you to Akka and discuss how it can help you deliver on the four key traits of Reactive; Responsive, Resilient, Elastic and Message-Driven. We will start with the basics of Akka and work our way towards some of its more advanced modules such as Akka Cluster and Akka Persistence—all driven through code and practical examples.
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsJonas Bonér
The demands and expectations for applications have changed dramatically in recent years. Applications today are deployed on a wide range of infrastructure; from mobile devices up to thousands of nodes running in the cloud—all powered by multi-core processors. They need to be rich and collaborative, have a real-time feel with millisecond response time and should never stop running. Additionally, modern applications are a mashup of external services that need to be consumed and composed to provide the features at hand.
We are seeing a new type of applications emerging to address these new challenges—these are being called Reactive Applications. In this talk we will discuss four key traits of Reactive; Event-Driven, Scalable, Resilient and Responsive—how they impact application design, how they interact, their supporting technologies and techniques, how to think when designing and building them—all to make it easier for you and your team to Go Reactive.
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMJonas Bonér
My talk for JavaOne 2009
Abstract:
Writing concurrent programs in the Java programming language is hard, and writing correct concurrent programs is even harder. What should be noted is that the main problem is not concurrency itself but the use of mutable shared state. Reasoning about concurrent updates to, and guarding of, mutable shared state is extremely difficult. It imposes problems such as dealing with race conditions, deadlocks, live locks, thread starvation, and the like.
It might come as a surprise to some people, but there are alternatives to so-called shared-state concurrency (which has been adopted by C, C++, and the Java programming language and become the default industry-standard way of dealing with concurrency problems).
This session discusses the importance of immutability and explores alternative paradigms such as dataflow concurrency, message-passing concurrency, and software transactional memory. It includes a pragmatic discussion of the drawbacks and benefits of each paradigm and, through hands-on examples, shows you how each one, in its own way, can raise the abstraction level and give you a model that is much easier to reason about and use. The presentation also shows you how, by choosing the right abstractions and technologies, you can make hard concurrency problems close to trivial. All discussions are driven by examples using state-of-the-art implementations available for the JVM machine.
Short (45 min) version of my 'Pragmatic Real-World Scala' talk. Discussing patterns and idioms discovered during 1.5 years of building a production system for finance; portfolio management and simulation.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
3. The devil is in
the state
3
Wednesday, December 30, 2009
4. Wrong,
let me rephrase
4
Wednesday, December 30, 2009
5. The devil is in
the mutable state
5
Wednesday, December 30, 2009
6. Definitions
&
Philosophy
6
Wednesday, December 30, 2009
7. What is a Value?
A Value is something that
does not change
Discussion based on
http://clojure.org/state
by Rich Hickey
Wednesday, December 30, 2009
8. What is an Identity?
A stable logical entity
associated with a
series of different Values
over time
Wednesday, December 30, 2009
9. What is State?
The Value
an entity with a
specific Identity
has at a particular
point in time
Wednesday, December 30, 2009
10. How do we know if
something has State?
If a function is invoked with
the same arguments at
two different points in time
and returns different values...
...then it has state
Wednesday, December 30, 2009
11. The Problem
The unification Of
Identity & Value
They are not the same
Wednesday, December 30, 2009
12. We need to separate Identity & Value
...add a level of indirection
Software Transactional Memory
Managed References
Message-Passing Concurrency
Actors/Active Objects
Dataflow Concurrency
Dataflow (Single-Assignment) Variables
Wednesday, December 30, 2009
13. The problems with
Shared State
Concurrency
Wednesday, December 30, 2009
14. Shared-State Concurrency
> Concurrent access to shared, mutable state.
> Protect mutable state with locks
> The
Java
C#
C/C++
Ruby
Python
etc.
Wednesday, December 30, 2009
15. Shared-State
Concurrency is
incredibly hard
> Inherentlyvery hard to use reliably
> Even the experts get it wrong
Wednesday, December 30, 2009
16. Example of
Shared-State Concurrency
Transfer funds
between bank accounts
Wednesday, December 30, 2009
17. Account
public class Account {
private int balance;
public void withdraw(int amount) {
balance ‐= amount;
}
public void deposit(int amount) {
balance += amount;
}
}
Not thread-safe
Wednesday, December 30, 2009
18. Let’s make it thread-safe
public class Account {
private int balance;
public synchronized void withdraw(int amount) {
balance ‐= amount;
}
public synchronized void deposit(int amount) {
balance += amount;
}
}
Thread-safe right?
Wednesday, December 30, 2009
19. It’s still broken
Transfers are not atomic
Wednesday, December 30, 2009
20. Let’s write an
atomic transfer method
public class Account {
...
public synchronized void transferTo(
Account to, double amount) {
this.withdraw(amount);
to.deposit(amount);
}
...
}
This will work right?
Wednesday, December 30, 2009
21. Let’s transfer funds
Account alice = ...
Account bob = ...
// in one thread
alice.transferTo(bob, 10.0D);
// in another thread
bob.transferTo(alice, 3.0D);
Wednesday, December 30, 2009
22. Might lead to
DEADLOCK
Darn, this is really hard!!!
Wednesday, December 30, 2009
23. We need to enforce lock ordering
> How?
> Javawon’t help us
> Need to use code convention (names etc.)
> Requires knowledge about the internal state
and implementation of Account
> …runs counter to the principles of
encapsulation in OOP
> Opens up a Can of Worms
Wednesday, December 30, 2009
24. The problem with locks
Locks do not compose
Taking too few locks
Taking too many locks
Taking the wrong locks
Taking locks in the wrong order
Error recovery is hard
Wednesday, December 30, 2009
25. It’s just
too hard
Wednesday, December 30, 2009
26. Java bet on the
wrong horse?
Perhaps,
but we’re not
completely screwed
There are alternatives
Wednesday, December 30, 2009
27. We need better
and more
high-level
abstractions
Wednesday, December 30, 2009
28. Alternative Paradigms
>Software Transactional
Memory (STM)
>Message-Passing Concurrency (Actors)
>Dataflow Concurrency
28
Wednesday, December 30, 2009
30. Actors
• Originates in a 1973 paper by Carl
Hewitt
• Implemented in Erlang, Occam, Oz
• Encapsulates state and behavior
• Closer to the definition of OO than
classes
Wednesday, December 30, 2009
31. Alan Kay
(father of SmallTalk and OOP)
“OOP to me means only messaging,
local retention and protection and
hiding of state-process, and
extreme late-binding of all things.”
“Actually I made up the term
“object-oriented”, and I can tell you
I did not have C++ in mind.”
Replace C++ with Java or C#
Wednesday, December 30, 2009
32. Actors
• Implements Message-Passing Concurrency
• Share NOTHING
• Isolated lightweight processes
• Communicates through messages
• Asynchronous and non-blocking
Wednesday, December 30, 2009
33. Actor Model of
Concurrency
• No shared state
… hence, nothing to synchronize.
• Each actor has a mailbox (message queue)
Wednesday, December 30, 2009
34. Actor Model of
Concurrency
• Non-blocking send
• Blocking receive
• Messages are immutable
• Highly performant and scalable
• SEDA-style (Staged Event-Driven Architecture)
Wednesday, December 30, 2009
35. Actor Model of
Concurrency
• Easier to reason about
• Raised abstraction level
• Easier to avoid
–Race conditions
–Deadlocks
–Starvation
–Live locks
Wednesday, December 30, 2009
37. Two different models
• Thread-based
• Event-based
–Very lightweight
–Can easily create millions on a single
workstation (6.5 million on 4 G RAM)
Wednesday, December 30, 2009
38. Actor libs for the JVM
> Akka (Java/Scala)
> Kilim (Java)
> Jetlang (Java)
> Actor’s Guild (Java)
> ActorFoundry (Java)
> Actorom (Java)
> FunctionalJava (Java)
> GParallelizer (Groovy)
> Fan Actors (Fan)
Wednesday, December 30, 2009
44. Reply
class SomeActor extends Actor {
def receive = {
case User(name) =>
// use implicit sender
sender.get ! (“Hi ” + name)
}
}
Wednesday, December 30, 2009
45. Reply
class SomeActor extends Actor {
def receive = {
case User(name) =>
// use reply
reply(“Hi ” + name)
}
}
Wednesday, December 30, 2009
46. Send: !!
// uses Future with default timeout
val resultOption = actor !! Message
val result =
resultOption getOrElse defaultResult
// uses Future with explicit timeout
(actor !! (Message, 1000)).getOrElse(
throw new Exception(“Timed out”))
Wednesday, December 30, 2009
47. Reply
class SomeActor extends Actor {
def receive = {
case User(name) =>
// use reply
reply(“Hi ” + name)
}
}
Wednesday, December 30, 2009
48. Reply
class SomeActor extends Actor {
def receive = {
case User(name) =>
// store away the sender future
// to resolve later or
// somewhere else
... = senderFuture
}
}
Wednesday, December 30, 2009
49. Start / Stop
actor.start
actor.stop
spawn(classOf[MyActor])
// callback
override def shutdown = {
... // clean up before shutdown
}
Wednesday, December 30, 2009
50. Active Objects: Java
public class Counter {
private int counter = 0;
public void count() {
counter++;
System.out.println(counter);
}
}
Wednesday, December 30, 2009
53. Active Objects
class Counter {
private var counter = 0
def count = {
counter += 1
println(counter)
}
}
Wednesday, December 30, 2009
54. Create: POSO
val counter = ActiveObject.newInstance(
classOf[Counter], 1000)
Wednesday, December 30, 2009
55. Send
counter.count
Wednesday, December 30, 2009
56. @oneway
class Counter {
private var counter = 0
@oneway def count = {
counter += 1
println(counter)
}
}
Wednesday, December 30, 2009
57. Immutable messages
// define the case class
case class Register(user: User)
// create and send a new case class message
actor ! Register(user)
// tuples
actor ! (username, password)
// lists
actor ! List(“bill”, “bob”, “alice”)
Wednesday, December 30, 2009
58. Actors: config
<akka>
version = "0.6"
<actor>
timeout = 5000
serialize‐messages = off
</actor>
</akka>
Wednesday, December 30, 2009
60. Dispatchers
class Dispatchers {
def newThreadBasedDispatcher(actor: Actor)
def newExecutorBasedEventDrivenDispatcher(name:String)
...
}
Wednesday, December 30, 2009
61. Set dispatcher
class MyActor extends Actor {
dispatcher = Dispatchers
.newThreadBasedDispatcher(this)
...
}
actor.dispatcher = dispatcher // before started
Wednesday, December 30, 2009
62. EventBasedDispatcher
Fluent DSL
val dispatcher =
Dispatchers.newExecutorBasedEventDrivenDispatcher
.withNewThreadPoolWithBoundedBlockingQueue(100)
.setCorePoolSize(16)
.setMaxPoolSize(128)
.setKeepAliveTimeInMillis(60000)
.setRejectionPolicy(new CallerRunsPolicy)
.buildThreadPool
Wednesday, December 30, 2009
63. When to use which
dispatcher?
Wednesday, December 30, 2009
64. Thread-based actors
• One thread per Actor
• Good:
• Threads (actors) don’t block each other
• Good for long-running tasks
• Bad:
• Poor scalability
• Bad for short running tasks
• Use for a limited number of Actors
• Use for low frequency of messages
Wednesday, December 30, 2009
65. Event-based actors
• Backed by thread pool
• Lightweight:
• Can create millions of Actors
• 6.5 million on 4 G RAM
• Best scalability and performance
• 2 million messages in 8 seconds
Wednesday, December 30, 2009
66. Single threaded event-based actors
• Best performance
• Millions of Actors
• Bad:
• one actor can block
all other actors
• Does not take
advantage of multicore
Wednesday, December 30, 2009
67. MessageQueues
Unbounded LinkedBlockingQueue
Bounded LinkedBlockingQueue
Bounded ArrayBlockingQueue
Bounded SynchronousQueue
Plus different options per queue
Wednesday, December 30, 2009
76. Supervision
class Supervisor extends Actor {
trapExit = List(classOf[Throwable])
faultHandler =
Some(OneForOneStrategy(5, 5000))
def receive = {
case Register(actor) =>
link(actor)
}
}
Wednesday, December 30, 2009
77. Manage failure
class FaultTolerant extends Actor {
...
override def preRestart(reason: AnyRef) = {
... // clean up before restart
}
override def postRestart(reason: AnyRef) = {
... // init after restart
}
}
Wednesday, December 30, 2009
78. Declarative config
RestartStrategy(
AllForOne, // restart policy
10, // max # of restart retries
5000) // within time in millis
LifeCycle(
// Permanent: always be restarted
// Temporary: restarted if exited through ERR
Permanent)
Wednesday, December 30, 2009
83. Remote Server
// use host & port in config
RemoteNode.start
RemoteNode.start(classLoader)
RemoteNode.start(
"localhost", 9999)
RemoteNode.start(
"localhost", 9999, classLoader)
Wednesday, December 30, 2009
84. Remote Server
// use host & port in config
val server = new RemoteServer
server.start("localhost", 9999)
Wednesday, December 30, 2009
90. STM: overview
> See the memory (heap and stack) as a
transactional dataset
> Similar to a database
begin
commit
abort/rollback
> Transactions are retried automatically upon
collision
> Rolls back the memory on abort
Wednesday, December 30, 2009
95. Managed References
Typical OO - Direct
• Typical OO: direct Mutable Objects objects
references to access to mutable
foo
:a ?
:b ?
:c 42
:d ?
:e 6
Clojure - and value
• Unifies identity Indirect references
• Managed Reference: separates Identity & Value
• Anything can change at any time
• Consistency is a user problem Objects
to Immutable
• Encapsulation doesn’t solve concurrency:a
foo "fred"
problems :b "ethel"
@foo :c 42
:d 17
:e 6
Copyright Rich Hickey 2009
Wednesday, December 30, 2009
• Separates identity and value
96. Managed References
• Separates Identity from Value
- Values are immutable
- Identity (Ref) holds Values
• Change is a function
• Compare-and-swap (CAS)
• Abstraction of time
• Can only be altered within a transaction
Wednesday, December 30, 2009
97. Managed References
val ref = TransactionalState.newRef(
HashTrie[String, User]())
val users = ref.get
val newUsers = users + // creates new HashTrie
(“bill” ‐> new User(“bill”, “secret”)
ref.swap(newUsers)
Wednesday, December 30, 2009
98. Managed References
val usersRef = TransactionalState.newRef(
HashTrie[String, User]())
for (users <‐ usersRef) {
users + (name ‐> user)
}
val user = for (users <‐ usersRef) yield {
users(name)
}
Wednesday, December 30, 2009
99. Managed References
for {
users <‐ usersRef
user <‐ users
roles <‐ rolesRef
role <‐ roles
if user.hasRole(role)
} {
... // do stuff
}
Wednesday, December 30, 2009
100. Managed References
convenience classes
// wraps a Ref with a HashTrie
val users = TransactionalState.newMap[String, User]
// wraps a Ref with a Vector
val users = TransactionalState.newVector[User]
Wednesday, December 30, 2009
101. Persistent
datastructures
• Immutable
• Change is a function
• Old version still available after change
• Very fast with performance guarantees (near
constant time)
• Thread safe
• Iteration safe
Wednesday, December 30, 2009
103. Structural sharing
with path copying
Approach
Path Copying
• Programming with values is critical
HashMap
HashMap
int count 16
int count 15
INode root
INode root
• By eschewing morphing in place, we just
need to manage the succession of values
(states) of an identity
• A timeline coordination problem
• Several semantics possible
• Managed references
• Variable-like cells with coordination
semantics Hickey 2009
Copyright Rich
Wednesday, December 30, 2009
104. Persistent datastructures
ble
HashTrie
oordination
import se.scalablesolutions.akka.collection._
val hashTrie = new HashTrie[K, V]
// API (+ extends Map)
def get(key: K): V
def +[A >: V](pair: (K, A)): HashTrie[K, A]
def ‐(key: K): HashTrie[K, A]
def empty[A]: HashTrie[K, A]
Wednesday, December 30, 2009
105. Persistent datastructures
ble
Vector
oordination
import se.scalablesolutions.akka.collection._
val vector = new Vector[T]
// API (+ extends RandomAccessSeq)
def apply(i: Int): T
def +[A >: T](obj: A): Vector[A]
def pop: HashTrie[K, A] // remove tail
def update[A >: T](i: Int, obj: A): Vector[A]
Wednesday, December 30, 2009
106. Akka STM
• Transactional Memory
- Atomic, Consistent, Isolated (ACI)
- MVCC
- Rolls back in memory and retries
automatically on clash
• Works together with Managed References
• Map, Vector and Ref abstraction
Wednesday, December 30, 2009
107. STM: declarative API
class UserRegistry extends Transactor {
private lazy val storage =
TransactionalState.newMap[String, User]
def receive = {
case NewUser(user) =>
storage + (user.name ‐> user)
...
}
}
Wednesday, December 30, 2009
108. STM: declarative API
class UserRegistry extends Actor {
makeTransactionRequired
private lazy val storage =
TransactionalState.newMap[String, User]
def receive = {
case NewUser(user) =>
storage + (user.name ‐> user)
...
}
}
Wednesday, December 30, 2009
109. STM: declarative Java API
@transactionrequired
class UserRegistry {
}
Wednesday, December 30, 2009
110. STM: high-order fun API
import se.scalablesolutions.akka.stm.Transaction._
atomic {
.. // do something within a transaction
}
atomic(maxNrOfRetries) {
.. // do something within a transaction
}
atomicReadOnly {
.. // do something within a transaction
}
Wednesday, December 30, 2009
111. STM: high-order fun API
import se.scalablesolutions.akka.stm.Transaction._
atomically {
.. // try to do something
} orElse {
.. // if tx clash; try do do something else
}
Wednesday, December 30, 2009
112. STM: monadic API
val userStorage =
TransactionalState.newMap[String, User]
for (tx <‐ Transaction()) {
userStorage.put(user.name, user)
}
Wednesday, December 30, 2009
113. STM: monadic API
val userStorage =
TransactionalState.newMap[String, User]
val users = for {
tx <‐ Transaction()
name <‐ userNames
if userStorage.contains(name)
} yield userStorage.get(name) // transactional
Wednesday, December 30, 2009
114. STM: config
<akka>
<stm>
service = on
distributed = off
</stm>
</akka>
Wednesday, December 30, 2009
115. STM: disable
TransactionManagement.disableTransactions
Wednesday, December 30, 2009
123. Persistence
• Pluggable storage backend
- Cassandra
- MongoDB
- Redis
• Map, Vector and Ref abstraction
• MVCC
• Works together with STM
Wednesday, December 30, 2009
124. Akka Persistence API
// transactional Cassandra‐backed Map
val map = CassandraStorage.newMap
// transactional Redis‐backed Vector
val vector = RedisStorage.newVector
// transactional Mongo‐backed Ref
val ref = MongoStorage.newRef
Wednesday, December 30, 2009
125. Persistence: config
<akka>
<storage>
<cassandra>
hostname = "127.0.0.1"
port = 9160
storage‐format = "protobuf"
consistency‐level = quorum
</cassandra>
<mongodb>
hostname = "127.0.0.1"
port = 27017
dbname = "mydb"
</mongodb>
</storage>
</akka>
Wednesday, December 30, 2009
126. Akka’s Cassandra API
val sessions = new CassandraSessionPool(
keyspace,
StackPool(SocketProvider(host, port)),
Protocol.Binary,
consistencyLevel)
Create a session pool
Wednesday, December 30, 2009
127. Akka Cassandra API
// get a column
val column = sessions.withSession { session =>
session | (key, new ColumnPath(
columnFamily, superColumn, serializer.out(name))
}
val value = if (column.isDefined)
Some(serializer.in(column.get.value, None)) else
None
Automatic connection management
Wednesday, December 30, 2009
128. Akka Cassandra API
// add a column
sessions.withSession { session =>
session ++|
(key,
new ColumnPath(cf, null, serializer.out(name)),
serializer.out(value),
System.currentTimeMillis,
consistencyLevel)
}
Wednesday, December 30, 2009
138. Security: service
class SampleAuthenticationService
extends DigestAuthenticationActor {
// Use an in‐memory nonce‐map as default
override def mkNonceMap = new HashMap[String, Long]
// Change this to whatever you want
override def realm = “sample”
// Username, password and roles for a username
override def userInfo(uname: String): Option[UserInfo] = {
... // get user with password and roles
Some(UserInfo(uname, password, roles))
}
}
Wednesday, December 30, 2009
139. Security: usage
class SecuredActor extends Actor {
@RolesAllowed(Array(“admin”))
def resetSystem =
(this !! Reset).getOrElse(
<error>Could not reset system</error>)
def receive = {
case Reset => ...
}
}
Wednesday, December 30, 2009
140. Security: config
<akka>
<rest>
service = on
hostname = “localhost”
port = 9998
filters = [“AkkaSecurityFilterFactory”]
authenticator = “SimpleAuthenticationService”
</rest>
</akka>
Wednesday, December 30, 2009