In recent years, serverless has gained momentum in the realm of cloud computing. Broadly speaking, it comprises function as a service (FaaS) and backend as a service (BaaS). The distinction between the two is that under FaaS, one writes and maintains the code (e.g., the functions) for serverless compute; in contrast, under BaaS, the platform provides the functionality and manages the operational complexity behind it. Serverless provides a great means to boost development velocity. With greatly reduced infrastructure costs, more agile and focused teams, and faster time to market, enterprises are increasingly adopting serverless approaches to gain a key advantage over their competitors.
Example early use cases of serverless include, for example, data transformation in batch and ETL scenarios and data processing using MapReduce patterns. As a natural extension, serverless is being used in the streaming context such as, but not limited to, real-time bidding, fraud detection, intrusion detection. Serverless is, arguably, naturally suited to extracting insights from fast data, that is, high-volume, high-velocity data. Example tasks in this regard include filtering and reducing noise in the data and leveraging machine learning and deep learning models to provide continuous insights about business operations.
We walk the audience through the landscape of streaming systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage. We overview the inception and growth of the serverless paradigm. Further, we deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions, and paint a bird’s-eye view of the application domains where Pulsar functions can be leveraged.
Baking in intelligence in a serverless flow is paramount from a business perspective. To this end, we detail different serverless patterns—event processing, machine learning, and analytics—for different use cases and highlight the trade-offs. We present perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of serverless streaming architectures and algorithms. The topics covered include an introduction to st
reaming, an introduction to serverless, serverless and streaming requirements, Apache Pulsar, application domains, serverless event processing patterns, serverless machine learning patterns, and serverless analytics patterns.
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberHostedbyConfluent
Kafka is a vital part of data infrastructure in many organizations. When the Kafka cluster grows and more data is stored in Kafka for a longer duration, several issues related to scalability, efficiency, and operations become important to address. Kafka cluster storage is typically scaled by adding more broker nodes to the cluster. But this also adds needless memory and CPUs to the cluster making overall storage cost less efficient compared to storing the older data in external storage.
Tiered storage is introduced to extend Kafka's storage beyond the local storage available on the Kafka cluster by retaining the older data in cheaper stores, such as HDFS, S3, Azure or GCS with minimal impact on the internals of Kafka.
We will talk about
- How tiered storage addresses the above problems and also brings several other advantages.
- High level architecture of tiered storage
- Future work planned as part of tiered storage.
Kafka is a high-throughput, fault-tolerant, scalable platform for building high-volume near-real-time data pipelines. This presentation is about tuning Kafka pipelines for high-performance.
Select configuration parameters and deployment topologies essential to achieve higher throughput and low latency across the pipeline are discussed. Lessons learned in troubleshooting and optimizing a truly global data pipeline that replicates 100GB data under 25 minutes is discussed.
Kafka streams windowing behind the curtain confluent
Kafka Streams Windowing Behind the Curtain, Neil Buesing, Principal Solutions Architect, Rill
https://www.meetup.com/TwinCities-Apache-Kafka/events/279316299/
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...DataWorks Summit
Scheduler of a container orchestration system, such as YARN and K8s, is a critical component that users rely on to plan resources and manage applications.
And if we assess where we are today, in YARN effectively it had two power schedulers (Fair and Capacity scheduler) and both serve many strong use cases in big data ecosystem. It can scale up to 50k nodes per cluster, and schedule 20k containers per second, and extremely efficient to manage batch workloads.
K8s default scheduler is an industry-proven solution to efficiently manage long-running services. As more big data apps are moving to K8s and cloud world, but many features like hierarchical queues to support multi-tenancy better, fairness resource sharing, and preemption, etc. are either missing or not mature enough at this point of time to support big data apps running on K8s.
At this point, there is no solution that exists to address the needs of having a unified resource scheduling experiences across platforms. That makes it extremely difficult to manage workloads running on different environments, from on-premise to cloud.
Hence evolving a common scheduler powered from YARN and K8s’s legacy capabilities and improving towards cloud use cases will focus more on use cases like:
Better bin-packing scheduling (and gang scheduling)
Autoscale up and shrink policy management
Effectively run batch workloads and services with clear SLA’s
In summary, we are improving core scheduling capabilities to manage both K8s and YARN cluster which is cloud aware as a separate initiative and above-mentioned cases will be the core focus of this initiative. More details of our works will be presented in this talk.
Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?Jonas Hecht
There's a new Infrastructure-as-Code (IaC) kid on the block: Pulumi is there to frighten the established: Chef, Puppet, Terraform, Cloudformation, Ansible... But is it really the "better" tool and how could they be compared? Is it only hype-driven? We'll find out, incl. lot's of example code. (ContainerConf / Continuous Lifecycle 2019 Talk in Mannheim)
Example GitHub code: https://github.com/jonashackt/pulumi-python-aws-ansible
https://github.com/jonashackt/pulumi-typescript-aws-fargate
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberHostedbyConfluent
Kafka is a vital part of data infrastructure in many organizations. When the Kafka cluster grows and more data is stored in Kafka for a longer duration, several issues related to scalability, efficiency, and operations become important to address. Kafka cluster storage is typically scaled by adding more broker nodes to the cluster. But this also adds needless memory and CPUs to the cluster making overall storage cost less efficient compared to storing the older data in external storage.
Tiered storage is introduced to extend Kafka's storage beyond the local storage available on the Kafka cluster by retaining the older data in cheaper stores, such as HDFS, S3, Azure or GCS with minimal impact on the internals of Kafka.
We will talk about
- How tiered storage addresses the above problems and also brings several other advantages.
- High level architecture of tiered storage
- Future work planned as part of tiered storage.
Kafka is a high-throughput, fault-tolerant, scalable platform for building high-volume near-real-time data pipelines. This presentation is about tuning Kafka pipelines for high-performance.
Select configuration parameters and deployment topologies essential to achieve higher throughput and low latency across the pipeline are discussed. Lessons learned in troubleshooting and optimizing a truly global data pipeline that replicates 100GB data under 25 minutes is discussed.
Kafka streams windowing behind the curtain confluent
Kafka Streams Windowing Behind the Curtain, Neil Buesing, Principal Solutions Architect, Rill
https://www.meetup.com/TwinCities-Apache-Kafka/events/279316299/
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...DataWorks Summit
Scheduler of a container orchestration system, such as YARN and K8s, is a critical component that users rely on to plan resources and manage applications.
And if we assess where we are today, in YARN effectively it had two power schedulers (Fair and Capacity scheduler) and both serve many strong use cases in big data ecosystem. It can scale up to 50k nodes per cluster, and schedule 20k containers per second, and extremely efficient to manage batch workloads.
K8s default scheduler is an industry-proven solution to efficiently manage long-running services. As more big data apps are moving to K8s and cloud world, but many features like hierarchical queues to support multi-tenancy better, fairness resource sharing, and preemption, etc. are either missing or not mature enough at this point of time to support big data apps running on K8s.
At this point, there is no solution that exists to address the needs of having a unified resource scheduling experiences across platforms. That makes it extremely difficult to manage workloads running on different environments, from on-premise to cloud.
Hence evolving a common scheduler powered from YARN and K8s’s legacy capabilities and improving towards cloud use cases will focus more on use cases like:
Better bin-packing scheduling (and gang scheduling)
Autoscale up and shrink policy management
Effectively run batch workloads and services with clear SLA’s
In summary, we are improving core scheduling capabilities to manage both K8s and YARN cluster which is cloud aware as a separate initiative and above-mentioned cases will be the core focus of this initiative. More details of our works will be presented in this talk.
Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?Jonas Hecht
There's a new Infrastructure-as-Code (IaC) kid on the block: Pulumi is there to frighten the established: Chef, Puppet, Terraform, Cloudformation, Ansible... But is it really the "better" tool and how could they be compared? Is it only hype-driven? We'll find out, incl. lot's of example code. (ContainerConf / Continuous Lifecycle 2019 Talk in Mannheim)
Example GitHub code: https://github.com/jonashackt/pulumi-python-aws-ansible
https://github.com/jonashackt/pulumi-typescript-aws-fargate
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai Wähner
Machine Learning is separated into model training and model inference. ML frameworks typically load historical data from a data store like HDFS or S3 to train models. This talk shows how you can completely avoid such a data store by ingesting streaming data directly via Apache Kafka from any source system into TensorFlow for model training and model inference using the capabilities of “TensorFlow I/O” add-on.
The talk compares this modern streaming architecture to traditional batch and big data alternatives and explains benefits like the simplified architecture, the ability of reprocessing events in the same order for training different models, and the possibility to build a scalable, mission-critical, real time ML architecture with muss less headaches and problems.
Key takeaways for the audience
• Scalable open source Machine Learning infrastructure
• Streaming ingestion into TensorFlow without the need for another data store like HDFS or S3 (leveraging TensorFlow I/O and its Kafka plugin)
• Stream Processing using analytic models in mission-critical deployments to act in Real Time
• Learn how Apache Kafka open source ecosystem including Kafka Connect, Kafka Streams and KSQL help to build, deploy, score and monitor analytic models
• Comparison and trade-offs between this modern streaming approach and traditional batch model training infrastructures
Tutorial - Modern Real Time Streaming ArchitecturesKarthik Ramasamy
Across diverse segments in industry, there has been a shift in focus from big data to fast data, stemming, in part, from the deluge of high-velocity data streams as well as the need for instant data-driven insights, and there has been a proliferation of messaging and streaming frameworks that enterprises utilize to satisfy the needs of various applications.
Drawing on their experience operating streaming systems at Twitter scale, Karthik Ramasamy, Sanjeev Kulkarni, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. They also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, they explore the interplay between storage and stream processing and speculate about future developments.
Topics include:
Basic requirements of stream processing
Streaming and one-pass algorithms
Different types of streaming architectures
An in-depth review of streaming frameworks
Deploying and operating stream processing applications
Lessons learned from building a real-time stack using Apache Pulsar and Apache Heron at Twitter Scale
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Stream Processing with Apache Kafka and .NETconfluent
Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
Change Data Capture CDC is a typical use case in Real-Time Data Warehousing. It tracks the data change log -binlog- of a relational database [OLTP], and replay these change log timely to an external storage to do Real-Time OLAP, such as delta/kudu. To implement a robust CDC streaming pipeline, lots of factors should be concerned, such as how to ensure data accuracy , how to process OLTP source schema changed, whether it is easy to build for variety databases with less code.
The evolution of Netflix's S3 data warehouse (Strata NY 2018)Ryan Blue
In the last few years, Netflix’s S3 data warehouse has grown to more than 100 PB. In that time, the company has shared several techniques and released open source tools for working around S3’s quirks, including s3mper to work around eventual consistency, S3 multipart committers to commit data without renames, and the batchid pattern for cross-partition atomic commits.
Ryan Blue and Daniel Weeks share lessons learned, the tools Netflix currently uses and those it has retired, and the improvements it is rolling out, including Iceberg, a new table format for S3 that is replacing many of the company’s current tools. Iceberg enables a new generation of improvements, including:
* Snapshot isolation with no directory listing or file renames
* Distributed planning to relieve metastore bottlenecks
* Improved data layout for S3 performance
* Immediately available writes from streaming applications
* Opportunistic compaction and data optimization
Building Cloud-Native App Series - Part 11 of 11
Microservices Architecture Series
Service Mesh - Observability
- Zipkin
- Prometheus
- Grafana
- Kiali
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. In this session we compare two popular Streaming Analytics solutions: Spark Streaming and Kafka Streams.
Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala.
Kafka Streams is the stream processing solution which is part of Kafka. It is provided as a Java library and by that can be easily integrated with any Java application.
This presentation shows how you can implement stream processing solutions with each of the two frameworks, discusses how they compare and highlights the differences and similarities.
Building Cloud-Native App Series - Part 3 of 11
Microservices Architecture Series
AWS Kinesis Data Streams
AWS Kinesis Firehose
AWS Kinesis Data Analytics
Apache Flink - Analytics
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Amazon Web Services
Customers regularly use Apache Spark running on Amazon EMR to process large amounts of data. As time to insight and the ability to act quickly based on those insights become core differentiators for customers, there is a greater need to be able to analyze data in real time. In this session, we teach you several design patterns to process and analyze real-time streaming data using Amazon EMR and Amazon Kinesis data services.
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
High Performance Computing on AWS: Accelerating Innovation with virtually unl...Amazon Web Services
In this session, learn how you innovate without limits, reduce costs, and get your results to market faster by moving your HPC workloads to AWS. Learn how you can use HPC on AWS to let your research needs dictate you HPC architecture requirements, not the other way around. Understand how to create, operate, and tear down secure, well-optimized HPC clusters in minutes.
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai Wähner
Machine Learning is separated into model training and model inference. ML frameworks typically load historical data from a data store like HDFS or S3 to train models. This talk shows how you can completely avoid such a data store by ingesting streaming data directly via Apache Kafka from any source system into TensorFlow for model training and model inference using the capabilities of “TensorFlow I/O” add-on.
The talk compares this modern streaming architecture to traditional batch and big data alternatives and explains benefits like the simplified architecture, the ability of reprocessing events in the same order for training different models, and the possibility to build a scalable, mission-critical, real time ML architecture with muss less headaches and problems.
Key takeaways for the audience
• Scalable open source Machine Learning infrastructure
• Streaming ingestion into TensorFlow without the need for another data store like HDFS or S3 (leveraging TensorFlow I/O and its Kafka plugin)
• Stream Processing using analytic models in mission-critical deployments to act in Real Time
• Learn how Apache Kafka open source ecosystem including Kafka Connect, Kafka Streams and KSQL help to build, deploy, score and monitor analytic models
• Comparison and trade-offs between this modern streaming approach and traditional batch model training infrastructures
Tutorial - Modern Real Time Streaming ArchitecturesKarthik Ramasamy
Across diverse segments in industry, there has been a shift in focus from big data to fast data, stemming, in part, from the deluge of high-velocity data streams as well as the need for instant data-driven insights, and there has been a proliferation of messaging and streaming frameworks that enterprises utilize to satisfy the needs of various applications.
Drawing on their experience operating streaming systems at Twitter scale, Karthik Ramasamy, Sanjeev Kulkarni, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. They also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, they explore the interplay between storage and stream processing and speculate about future developments.
Topics include:
Basic requirements of stream processing
Streaming and one-pass algorithms
Different types of streaming architectures
An in-depth review of streaming frameworks
Deploying and operating stream processing applications
Lessons learned from building a real-time stack using Apache Pulsar and Apache Heron at Twitter Scale
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Stream Processing with Apache Kafka and .NETconfluent
Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
Change Data Capture CDC is a typical use case in Real-Time Data Warehousing. It tracks the data change log -binlog- of a relational database [OLTP], and replay these change log timely to an external storage to do Real-Time OLAP, such as delta/kudu. To implement a robust CDC streaming pipeline, lots of factors should be concerned, such as how to ensure data accuracy , how to process OLTP source schema changed, whether it is easy to build for variety databases with less code.
The evolution of Netflix's S3 data warehouse (Strata NY 2018)Ryan Blue
In the last few years, Netflix’s S3 data warehouse has grown to more than 100 PB. In that time, the company has shared several techniques and released open source tools for working around S3’s quirks, including s3mper to work around eventual consistency, S3 multipart committers to commit data without renames, and the batchid pattern for cross-partition atomic commits.
Ryan Blue and Daniel Weeks share lessons learned, the tools Netflix currently uses and those it has retired, and the improvements it is rolling out, including Iceberg, a new table format for S3 that is replacing many of the company’s current tools. Iceberg enables a new generation of improvements, including:
* Snapshot isolation with no directory listing or file renames
* Distributed planning to relieve metastore bottlenecks
* Improved data layout for S3 performance
* Immediately available writes from streaming applications
* Opportunistic compaction and data optimization
Building Cloud-Native App Series - Part 11 of 11
Microservices Architecture Series
Service Mesh - Observability
- Zipkin
- Prometheus
- Grafana
- Kiali
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. In this session we compare two popular Streaming Analytics solutions: Spark Streaming and Kafka Streams.
Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala.
Kafka Streams is the stream processing solution which is part of Kafka. It is provided as a Java library and by that can be easily integrated with any Java application.
This presentation shows how you can implement stream processing solutions with each of the two frameworks, discusses how they compare and highlights the differences and similarities.
Building Cloud-Native App Series - Part 3 of 11
Microservices Architecture Series
AWS Kinesis Data Streams
AWS Kinesis Firehose
AWS Kinesis Data Analytics
Apache Flink - Analytics
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Amazon Web Services
Customers regularly use Apache Spark running on Amazon EMR to process large amounts of data. As time to insight and the ability to act quickly based on those insights become core differentiators for customers, there is a greater need to be able to analyze data in real time. In this session, we teach you several design patterns to process and analyze real-time streaming data using Amazon EMR and Amazon Kinesis data services.
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
High Performance Computing on AWS: Accelerating Innovation with virtually unl...Amazon Web Services
In this session, learn how you innovate without limits, reduce costs, and get your results to market faster by moving your HPC workloads to AWS. Learn how you can use HPC on AWS to let your research needs dictate you HPC architecture requirements, not the other way around. Understand how to create, operate, and tear down secure, well-optimized HPC clusters in minutes.
This presentation explains what serverless is all about, explaining the context from Devs & Ops points of view, and presenting the various ways to achieve serverless (Functions a as Service, BaaS....). It also presents the various competitors on the market and demo one of them, openfaas. Finally, it enlarges the pictures, positionning serverless, combined with Edge computing & IoT, as a valuable triptic cloud vendors are leveraging on top of, to create end-to-end offers.
AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...Amazon Web Services
Organizations leveraging Amazon Web Services (AWS) can choose from a variety of Disaster Recovery (DR) strategies to deploy across on-premises infrastructure and one or more AWS regions.
Join us to learn how Attunity is helping Amazon customers implement durable, low cost DR solutions. Using Attunity, customers can automate and accelerate the replication of critical structured data, unstructured data, content, and applications across on-premises and AWS service environments. Also learn how you can utilize multiple AWS regions for added resiliency. Attunity customer LeaseHawk will share their story on using Attunity services to implement DR with AWS.
What you'll learn:
- Options for how you can implement Disaster Recovery strategies with AWS
- How to use Attunity to make data available across environments
- A customer’s perspective on best practices
Are you deploying Windows on AWS? Are you interested in taking advantage of existing investments when running Windows workloads on AWS? In this session we will discuss real world customer examples including as SharePoint, Exchange, SQL Server, and Remote Desktop Services with licensing options. We will explore deployment options and provide an overview of the AWS created QuickStarts and QuickLaunches to help with speed of deployment. This session will also include migration options for customer running End of Extended support products such as Windows Server2003 and SQL2005.
This topic introduces the need of a unique architecture style for Cloud Native application deployments. Further, the fitment of DevOps, usage of Microservices and the runtime of Cloud Native application (* as a Service) are covered in detail. The need of distributed computing in Cloud for Cloud Native applications is trivial to understand. Insights on the same are covered.
Why Your Digital Transformation Strategy Demands Middleware ModernizationVMware Tanzu
Your current middleware platform is costing you more than you think. It wasn't designed to support high-velocity software releases and frequent iteration of applications—prerequisites for success in today’s world. A new, modern approach to middleware is needed that enables both developer productivity and operational efficiency.
Join Pivotal’s Rohit Kelapure and Perficient’s Joel Thimsen as they discuss:
- The limitations of traditional middleware
- The benefits of middleware modernization
- Your options for modernization, including a cloud-native platform
- Tips for overcoming some common challenges
Presenters: Rohit Kelapure, Pivotal, Joel Thimsen, Perficient & Jeff Kelly, Pivotal (Host)
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
Google Cloud Platform, Avere Systems, and Cycle Computing experts will share best practices for advancing solutions to big challenges faced by enterprises with growing compute and storage needs. In this “best practices” webinar, you’ll hear how these companies are working to improve results that drive businesses forward through scalability, performance, and ease of management.
The slides were from a webinar presented January 24, 2017. The audience learned:
- How enterprises are using Google Cloud Platform to gain compute and storage capacity on-demand
- Best practices for efficient use of cloud compute and storage resources
- Overcoming the need for file systems within a hybrid cloud environment
- Understand how to eliminate latency between cloud and data center architectures
- Learn how to best manage simulation, analytics, and big data workloads in dynamic environments
- Look at market dynamics drawing companies to new storage models over the next several years
Presenters communicated a foundation to build infrastructure to support ongoing demand growth.
These slides were presented at the Red Hat "Achieving True Integration Agility with Microservices, Containers and API's" workshop in Santa Clara on 10/26
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersenconfluent
Best Practices for building Hybrid-Cloud Architectures - Hans Jespersen
Afternoon opening presentation during Confluent’s streaming event in Paris, presented by Hans Jespersen, VP WW Systems Engineering at Confluent.
Understand the core concepts of “Cloud Computing” and how businesses around the world are running the infrastructure that supports their websites to lower costs, improve time-to-market, and enable rapid scalability matching resource to demands of users. Whether you are an enterprise looking for IT innovation, agility and resiliency or small and medium business who wants to accelerate growth without a big upfront investment in cash or time for technology, the AWS Cloud provides a complete set of services at zero upfront costs which are available with a few clicks and within minutes.
Build & Deploy Scalable Cloud Applications in Record TimeRightScale
RightScale Webinar: August 11, 2009 - Watch this webinar to see a hands-on demonstration of WaveMaker Visual Ajax Studio and Rapid Deployment Framework to illustrate how easy it is to build your app in Wavemaker. We demonstrate the one-button push from Wavemaker to deploying your application on the cloud with the RightScale Cloud Management Platform. From there we show you how easy it is to manage, automate and scale your application running on the cloud.
In the wake of IoT becoming ubiquitous, there has been a large interest in the industry to develop novel techniques for anomaly detection at the Edge. Example applications include, but not limited to, smart cities/grids of sensors, industrial process control in manufacturing, smart home, wearables, connected vehicles, agriculture (sensing for soil moisture and nutrients). What makes anomaly detection at the Edge different? The following constraints be it due to the sensors or the applications necessitate the need for the development of new algorithms for AD.
* Very low power and low compute/memory resources
* High data volume making centralized AD infeasible owing to the communication overhead
* Need for low latency to drive fast action taking
Guaranteeing privacy In this talk we shall throw light on the above in detail. Subsequently, we shall walk through the algorithm design process for anomaly detection at the Edge. Specifically, we shall dive into the need to build small models/ensembles owing to limited memory on the sensors. Further, how to training data in an online fashion as long term historical data is not available due to limited storage. Given the need for data compression to contain the communication overhead, can one carry out anomaly detection on compressed data? We shall throw light on building of small models, sequential and one-shot learning algorithms, compressing the data with the models and limiting the communication to only the data corresponding to the anomalies and model description. We shall illustrate the above with concrete examples from the wild!
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
In this talk we overview Sequence-2-Sequence (S2S) and explore its early use cases. We walk the audience through how to leverage S2S modeling for several use cases, particularly with regard to real-time anomaly detection and forecasting.
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
Sequence-to-sequence modeling (seq2seq) is now being used for applications based on time series data. We overview Seq-2-Seq and explore its early use cases. They then walk the audience through how to leverage Seq-2-Seq modeling for a couple of concrete use cases - real-time anomaly detection and forecasting.
In this talk we walk through an architecture in which models are served in real time and the models are updated, using Apache Pulsar, without restarting the application at hand. They then describe how to apply Pulsar functions to support two example use—sampling and filtering—and explore a concrete case study of the same.
Designing Modern Streaming Data ApplicationsArun Kejariwal
Many industry segments have been grappling with fast data (high-volume, high-velocity data). The enterprises in these industry segments need to process this fast data just in time to derive insights and act upon it quickly. Such tasks include but are not limited to enriching data with additional information, filtering and reducing noisy data, enhancing machine learning models, providing continuous insights on business operations, and sharing these insights just in time with customers. In order to realize these results, an enterprise needs to build an end-to-end data processing system, from data acquisition, data ingestion, data processing, and model building to serving and sharing the results. This presents a significant challenge, due to the presence of multiple messaging frameworks and several streaming computing frameworks and storage frameworks for real-time data.
In this tutorial we lead a journey through the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline, messaging frameworks, streaming computing frameworks, storage frameworks for real-time data, and more. We also share case studies from the IoT, gaming, and healthcare as well as their experience operating these systems at internet scale at Twitter and Yahoo. We conclude by offering their perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of messaging systems, streaming systems, storage systems for streaming data, and reinforcement learning-based systems that will power fast processing and analysis of a large (potentially of the order of hundreds of millions) set of data streams.
Topics include:
* An introduction to streaming
* Common data processing patterns
* Different types of end-to-end stream processing architectures
* How to seamlessly move data across data different frameworks
* Case studies: Healthcare and the IoT
* Data sketches for mining insights from data streams
There has been a shift from big data to live streaming data to facilitate faster data-driven decision making. As the number of live data streams grow—partly a result of the expanding IoT—it is critical to develop techniques to better extract actionable insights.
One current application, anomaly detection, is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in an anomaly fatigue, limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.
In this talk, we explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.
Topics include:
* An overview correlation analysis
* Robust correlation analysis
* Overview of alternative measures, such as co-median
* Trade-offs between speed and accuracy
* Correlation analysis in large dimensions
In this talk we walk the audience through how to marry correlation analysis with anomaly detection, discuss how the topics are intertwined, and detail the challenges one may encounter based on production data. We also showcase how deep learning can be leveraged to learn nonlinear correlation, which in turn can be used to further contain the false positive rate of an anomaly detection system. Further, we provide an overview of how correlation can be leveraged for common representation learning.
There has been a shift from big data to live streaming data to facilitate faster data-driven decision making. As the number of live data streams grow—partly a result of the expanding IoT—it is critical to develop techniques to better extract actionable insights.
One current application, anomaly detection, is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in an anomaly fatigue, limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.
In this talk we explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.
Topics include:
* An overview correlation analysis
* Robust correlation analysis
* Trade-offs between speed and accuracy
* Multi-modal correlation analysis
compute tier. Detection and filtering of anomalies in live data is of paramount importance for robust decision making. To this end, in this talk we share techniques for anomaly detection in live data.
In this tutorial we walk through state-of-the-art streaming systems, algorithms, and deployment architectures and cover the typical challenges in modern real-time big data platforms and offering insights on how to address them. We also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, we explore the interplay between storage and stream processing and discuss future developments.
Anomaly detection in real-time data streams using HeronArun Kejariwal
Twitter has become the de facto medium for consumption of news in real time, and billions of events are generated and analyzed on a daily basis. To analyze these events, Twitter designed its own next-generation streaming system, Heron. Arun Kejariwal and Karthik Ramasamy walk you through how Heron is used to detect anomalies in real-time data streams. Although there’s been over 75 years of prior work in anomaly detection, most of the techniques cannot be used off the shelf because they’re not suitable for high-velocity data streams. Arun and Karthik explain how to make trade-offs between accuracy and speed and discuss incremental approaches that marry sampling with robust measures such as median and MCD for anomaly detection.
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
The big data era is characterized by ever-increasing velocity and volume of data. Over the last two or three years, several talks at Velocity have explored how to analyze operations data at scale, focusing on anomaly detection, performance analysis, and capacity planning, to name a few topics. Knowledge sharing of the techniques for the aforementioned problems helps the community to build highly available, performant, and resilient systems.
A key aspect of operations data is that data may be missing—referred to as “holes”—in the time series. This may happen for a wide variety of reasons, including (but not limited to):
# Packets being dropped due to unresponsive downstream services
# A network hiccup
# Transient hardware or software failure
# An issue with the data collection service
“Holes” in the time series on data analysis can potentially skew the analysis of data. This in turn can materially impact decision making. Arun Kejariwal presents approaches for analyzing operations data in the presence of “holes” in the time series, highlighting how missing data impacts common data analysis such as anomaly detection and forecasting, discussing the implications of missing data on time series of different granularities, such as minutely and hourly, and exploring a gamut of techniques that can be used to address the missing data issue (e.g., approximate the data using interpolation, regression, ensemble methods, etc.). Arun then walks you through how the techniques can be leveraged using real data.
Real Time Analytics: Algorithms and SystemsArun Kejariwal
In this tutorial, an in-depth overview of streaming analytics -- applications, algorithms and platforms -- landscape is presented. We walk through how the field has evolved over the last decade and then discuss the current challenges -- the impact of the other three Vs, viz., Volume, Variety and Veracity, on Big Data streaming analytics.
Finding bad apples early: Minimizing performance impactArun Kejariwal
The big data era is characterized by the ever-increasing velocity and volume of data. In order to store and analyze the ever-growing data, the operational footprint of data stores and Hadoop have also grown over time. (As per a recent report from IDC, the spending on big data infrastructure is expected to reach $41.5 billion by 2018.) The clusters comprise several thousands of nodes. The high performance of such clusters is vital for delivering the best user experience and productivity of teams.
The performance of such clusters is often limited by slow/bad nodes. Finding slow nodes in large clusters is akin to finding a needle in a haystack; hence, manual identification of slow/bad nodes is not practical. To this end, we developed a novel statistical technique to automatically detect slow/bad nodes in clusters comprising hundreds to thousands of nodes. We modeled the problem as a classification problem and employed a simple, yet very effective, distance measure to determine slow/bad nodes. The key highlights of the proposed technique are the following:
# Robustness against anomalies (note that anomalies may occur, for example, due to an ad-hoc heavyweight job on a Hadoop cluster)
# Given the varying data characteristics of different services, no one model fits all. Consequently, we parameterized the threshold used for classification
The proposed technique works well with both hourly and daily data, and has been in use in production by multiple services. This has not only eliminated manual investigation efforts, but has also mitigated the impact of slow nodes, which used to get detected after several weeks/months of lag!
We shall walk the audience through how the techniques are being used with REAL data.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
3. WHY BOTHER?
ON A HIGH LEVEL LANGUAGE
e.g., Python, JavaScript,…
DEMAND DRIVEN EXECUTION
Runs whenever new requests arrive
PAY BASED ON RUNTIME
~ millisecond granularity
CODE
BILLING
COMPUTATION
3
5. EVENT-DRIVEN APPLICATION EXAMPLE: IMAGE RESIZING[1]
5
Cloud Storage Serverless
Save
thumbnail
Save path
Cloud Storage
Cloud Database
λ
[1] Slide adapted from talk by Eric Jonas and Johann Schleier-Smith, “A Berkeley View on Cloud CompuCng”
6. BATCH ANALYTICS EXAMPLE: VIDEO ANALYTICS
λ
No car: filter locally Car detected: analyze in cloud
Analyze video
using DNNs
Law enforcement
Traffic video analytics
λ
Video encoding/decoding
Encoder/Decoder
6
7. STREAMING EXAMPLE: FIGHTING SPAMS ON TWITTER
7
Spammy Tweet Regular Tweet
λ
Similarity
Clustering
Message Queue
Key-Value Store
✦ Fight spammy content, engagements, and behaviors in Twitter
✦ Spam campaign comes in large batch
✦ Despite randomized tweaks, enough similarity among spammy entities are preserved
8. A REAL USE-CASE: HOW FINANCIAL ENGINES CUT COSTS 90% USING SERVERLESS
[1]
8
✦ Financial Engines: Independent Investment Advisor
๏ 9 million people across 743 companies, $1.8 trillion in assets
✦ Automated portfolio management using computational engines
๏ Core engine component: Integer programming optimizer (IPO)
๏ Linear Programming to compute optimization/feasibility
[1] Financial Engines Cuts Costs 90% Using AWS Lambda and Serverless CompuCng,
hNps://aws.amazon.com/soluCons/case-studies/financial-engines/
10. NEED TO DO A LOT OF WORK …
10
✦ Scaling in response to load variations
✦ Request routing and load balancing
✦ Monitoring to respond to problems
✦ Provision servers based on budget, requirements
✦ System upgrades, including security patching
✦ Migration to new hardware as it becomes available
…
11. λ Solver
Libraryλ Solver
Library
11
✦ AWS Lambda function for each IPO request
๏ Run as many copies of the IPO function as needed in parallel
✦ Serverless benefits
๏ Up to 94% cost savings annually, not including operational savings
๏ 200-300 M IPO requests/month, 60,000 per minute at peak
๏ Increased reliability: just instantiate new lambda requests on crash
λ Solver
Library
[1] Financial Engines Cuts Costs 90% Using AWS Lambda and Serverless CompuCng,
hNps://aws.amazon.com/soluCons/case-studies/financial-engines/
A REAL USE-CASE: HOW FINANCIAL ENGINES CUT COSTS 90% USING SERVERLESS
[1]
12. OF CLOUD PLATFORMS
EVOLUTION
12
On-prem
virtualization
Platform as a Service (PaaS)
Backend as a Service (BaaS)
Container Orchestration
Serverless Platforms
App Engine, Heroku
Borg, Kubernetes
✦ AWS Lambda, Google Cloud
Functions, Azure Functions
✦ Big Query, DynamoDB
✦ Cloud Dataflow
✦ Easy switch from legacy
infrastructure
✦ Added cloud services
(e.g., storage, pub-sub)
VMs in the cloud
13. OF SHARING RESOURCES
EVOLUTION
App
Runtime
OS
Hardware
No Sharing
App
Runtime
OS
Hardware
VM
App
Runtime
OS
VM
Virtual Machines
OS
Hardware
App
Runtime
App
Runtime
Containers
Runtime
OS
Hardware
App App
FaaS
Increasing Virtualization
[1]
[1] Serverless ComputaCon with OpenLambda, Hendrickson et. al.
14. ✦ Different pricing models, resource allocations
✦ Security and isolation support
✦ Programming language support, OS support, etc.
[1,2]
SERVERLESS TODAY: FUNCTION-AS-A-SERVICE (FAAS)
14[1] Peeking Behind the Curtains of Serverless PlaWorms, Wang et. al.
[2] EvaluaCon of ProducCon Serverless CompuCng Environments, Lee et. al.
✦ Many FaaS platforms
AWS Lambda Google Cloud
Functions
IBM Cloud
Functions
Azure FunctionsCloudflare Workers Alibaba Function
Compute
15. FAAS ORCHESTRATION
[1]
15
[1] Comparison of FaaS OrchestraCon Systems, Lopez et. al.
✦ Many orchestration frameworks:
✦ Varying pricing models, programming models, parallel
execution support, state management, architectures, etc.
[1]
✦ Serverless trilemma:
๏ black boxes
๏ substitution principle
๏ double-billing
AWS Step Functions Azure Durable Functions
IBM
Composer
16. SERVERLESS IS MORE THAN FaaS …
Serverless = FaaS BaaS+
✦ Object Storage (e.g., S3)
✦ Key-Value Stores (e.g., DynamoDB)
✦ Database (e.g., Cloud Firestore)
✦ Data Processing (e.g., Cloud Dataflow)
✦ Complexity Hiding
✦ Consumption based billing
✦ Automatic scaling
λ
Storage
Database
FaaS
Data
Processing
Messaging
16
17. … NOT EVERYTHING IS SERVERLESS!
✦ The “buzzword” effect
๏ Cloud providers market services as “serverless”
without its properties:
๏ Complexity hiding
๏ Consumption-based billing
๏ Automatic scaling
✦ “Semi”-serverless
๏ Do not provide one or more of these properties
17
21. THE COST OF SERVERLESS
21
Function Execution Cost
✦ Charged at ~100ms
✦ Charged per GB memory
Data Transfer Cost
✦ Charged per GB
✦ Function fusion: combine functions to avoid data transfer for performance and cost
๏ But fusing functions with different memory requirements can be expensive..
✦ Function placement: place function close to source for cost savings
๏ But limited compute power at source may slow things down…
✦ How to balance cost with performance?
[1]
✦ Use fusion and placement judiciously to optimize cost and performance
[1] Costless: OpCmizing Cost of Serverless CompuCng through FuncCon Fusion and Placement, Elgamal et. al.
25. TRADING SUPPORT PLATFORM
Scenario
✦ Major bank looking to move to next-generation data
pipeline to support continuous reconciliation of trading
activity
Challenges
✦ Zero tolerance for data loss
✦ Performance at scale difficult to achieve
✦ Need to support future data and usage growth
25
26. INDUSTRIAL IOT ANALYTICS
Data from sensors on
power generation
equipment
Combined with data from
sensors in distribution
network
Brought together and
analyzed in the cloud
For immediate insights
into capacity, failures,
alerts
!
26
27. STREAMING DATA TRANSFORMATIONS
27
Move best-fit
transformations and those
needed for fast data access
into streaming systems
Provide users and
applications access to data
at multiple stages of
transformation
Leverage batch systems for
specialized capabilities and
complex transformations
28. CONNECTED VEHICLE
28
Scenario
Continuously-arriving data generated by connected
cars needs to be quickly collected, processed and
distributed to applications and partners
Challenges
Require scalability to handle growing data
sources and volumes without complex
mix of technologies
Solution
Leverage Apache Pulsar solution to provide data
backbone that can receive, transform,
and distribute data at scale
29. CONNECTED VEHICLE
29
Telemetry data from
connected vehicles
transmitted and published
to Pulsar
Data cleansing, enrichment
and refinement processed
inside Pulsar
Data made available to
internal teams for analysis
and reports
Data feeds supplied to
partners and partner
applications
30. DATA DRIVEN WORKFLOWS
30
Scenario
Application processes
incoming events and
documents that generate
processing workflows
Challenges
Operational burdens and
scalability challenges of
existing technologies growing
as data grows
Solution
Process incoming events and
data and create work queues in
same system
Decrypt, extract, convert, dispatch, process, store
32. BIG DATA ANALYTICS
Analyze volumes of data
Wide range of applications,: text analytics, machine learning, predictive analytics, data
mining, statistics, natural language processing
Why Serverless?
No server management
Transparent resource elasticity
Pay for what you use
Building Analytics on FaaS platforms
PyWren, Flint, Locus, ExCamera, …
32
33. BIG DATA ANALYTICS: SORT
…
Partition
Task
…
Partition
Task
…
Partition
Task
…
Merge
Task
Merge
Task
Merge
Task
…
OR
REDIS S3
Service Capacity IOPS
S3 High Low
Redis Low High
33
λ λS3 S3
34. BIG DATA ANALYTICS: LOCUS
… S3
λ λ
λ λ
REDIS
λ
PARTITION
λ
MERGE
λ
FINAL
MERGE
Hybrid Sort
34
36. How is it done today?
✦ Video = Series of Chunks
๏ Chunk = KeyFrame (large) + InterFrames (small deltas from KeyFrame)
Thread#1 Thread#2 Thread#3 Thread#4
1 5 6
KF I I…
Frames:
Encoded:
1 5 6
KF I I…
1 5 6
KF I I…
1 5 6
KF I I…
VIDEO ENCODING/DECODING
36
✦ High parallelism = worse compression (more KeyFrames)
37. VIDEO ANALYTICS: EXCAMERA
VIDEO ENCODING/DECODING ON AWS LAMBDA
Lambda#1 Lambda#2 Lambda#3 Lambda#4
1 5 6
KF I I…
1 5 6
KF I I…
1 5 6
KF I I…
1 5 6
KF I I…
37
38. VIDEO ANALYTICS: EXCAMERA
VIDEO ENCODING/DECODING ON AWS LAMBDA
Lambda#1 Lambda#2 Lambda#3 Lambda#4
Serial Pass: Rebase
1 5 6
KF I I…
1 5 6
I I…
1 5 6
I I…
1 5 6
I I…
State State StateI I I
37
✦ 60X faster and 6x cheaper than Google’s vpxenc on 128 cores
39. VIDEO ANALYTICS: EXCAMERA
Making lambdas talk to each other
✦ Lambdas are only permitted outbound TCP/IP connections
✦ Establish outbound cxns to rendezvous server (R) at init
✦ If A wants to talk to B, it sends R an init msg connect(A, B)
๏ R forwards all of A’s subsequent msgs to B
Rendezvous
Server (R)
A
B
C
...
Lambdas
38
56. MULTITENANCY
55
SEVERAL TEAMS SHARING THE SAME CLUSTER
✦ Authentication / Authorization / Namespaces / Admin APIs
✦ I/O isolation between writes and reads
๏ Provided by storage layer - ensure readers draining backlog won’t affect publishers
Soft isolation
✦ Storage quotas — flow-control — back-pressure — rate limiting
Hardware isolation
✦ Constrain some tenants on a subset of brokers or bookies
57. STORAGE TIERING
56
TAKING ADVANTAGE OF LOW COST CLOUD STORAGE
✦ Offload cold topic data to lower-cost
storage (e.g. cloud storage, HDFS)
✦ Manual or automatic (configurable
threshold)
✦ Transparent to publishers and consumers
✦ Allows near-infinite event storage at low
cost
Cold storage
Hot storage
Topic
58. SCHEMA REGISTRY
57
MAKING SENSE OF THE BYTES IN DATA
✦ Provides type safety to applications built on top of Pulsar
✦ Two approaches
๏ Client side enforcement: type safety enforcement up to the application
๏ Server side enforcement: system enforces type safety and ensures that producers and consumers remain synced
✦ Schema registry enables clients to upload data schemas on a topic basis.
✦ Schemas dictate which data types are recognized as valid for that topic
✦ Supports JSON, protobuf, binary schemas
59. SCHEMA REGISTRY
58
MAKING SENSE OF THE BYTES IN DATA
✦ Means for publishers and consumers to
communicate structure of topic data
✦ Validates schema as data is published
✦ Supports JSON, protobuf, binary schemas
PulsarClient client = PulsarClient.builder()
.serviceUrl("pulsar://localhost:6650")
.build();
Producer<SensorReading> producer =
client.newProducer(JSONSchema.of(SensorReading.class))
.topic("sensor-data")
.create();
Consumer<SensorReading> consumer =
client.newConsumer(JSONSchema.of(SensorReading.class)
.topic("sensor-data")
.subscriptionName("sensor-subscriber")
.subscribe();
60. ON THE FLY SCALABILITY
59
ADJUST PULSAR ON DEMAND BASED ON LOAD
Scale serving
✦ New nodes immediately available to process
requests, no data rebalancing required
Scale processing
✦ Add threads, processes or containers to increase
parallelism
Scale storage retention
✦ Add nodes to increase capacity, no data
redistribution required
Messaging
Broker Broker Broker
Bookie Bookie Bookie Bookie Bookie
Stream storage
Processing
WorkerWorker
61. TOPIC COMPACTION
60
ADJUST PULSAR ON DEMAND BASED ON LOAD
✦ Efficient way to enable consumer to catch up
to current state
✦ Process that creates version of a topic that
only has current values for each key
✦ Triggered via simple command
{key: “A”, value: “foo”}
{key: “B”, value: “foobar”}
{key: “B”, value: “bar”}
{key: “A”, value: “binky”}
{key: “A”, value: “bar”}
Complete topic Compacted topic
{key: “B”, value: “foobar”}
{key: “A”, value: “bar”}
62. SQL QUERYING
61
Enable SQL clients to directly query
data in Streamlio
✦ Integrated with schema registry
✦ Uses Presto as query engine
✦ Query engine reads data directly from
storage layer
✦ Data visible to SQL engine as soon as
published
Processing
Messaging and queuing
Stream storage
Data Access
Msg QueuePub-Sub
SQL engine
(Presto)Functions
SQL Clients
Metadata
66. RESILENCY AND RECOVERY
65
BROKER, BOOKIE AND DATA CENTER FAILURES
Segment 1
Segment 2
Segment n
. . .
Segment 2
Segment 3
Segment n
. . .
Segment 3
Segment 1
Segment n
. . .
Segment 1
Segment 2
Segment n
. . .
Storage
Broker
Serving
Broker Broker
✦ Broker Failure
๏ Topic reassigned to available broker based on load
๏ Can construct the previous state consistently
๏ No data needs to be copied
✦ Bookie Failure
๏ Immediate switch to a new node
๏ Background process copies segments to other bookies to
maintain replication factor
✦ Datacenter Failure
๏ Built-in multi-datacenter replication
๏ Brokers in any datacenter can immediately serve replicated
topics
67. BROKER FAILURE RECOVERY
66
BROKER, BOOKIE AND DATA CENTER FAILURES
๏ Topic reassigned to available broker based on load
๏ Can construct the previous state consistently
๏ No data needs to be copied
๏ Failure handled transparently by client library
69. BOOKIE FAILURE RECOVERY
68
๏ After a write failure, BookKeeper will immediately
switch write to a new bookie, within the same
segment
๏ As long as we have any 3 bookies in the cluster, we
can continue to write
๏ In background, starts a many-to-many recovery
process to regain the configured replication factor
71. MULTI-DATACENTER REPLICATION
70
๏ Scalable asynchronous replication
๏ Integrated in the broker message flow
๏ Simple configuration to add/remove
regions
Topic (T1) Topic (T1)
Topic (T1)
Subscription (S1) Subscription (S1)
Producer
(P1)
Consumer
(C1)
Producer
(P3)
Producer
(P2)
Consumer
(C2)
Data Center A Data Center B
Data Center C
DISASTER RECOVERY
72. SYNCHRONOUS REPLICATION
DISASTER RECOVERY
✦ Each topic owned by one broker at a
time, i.e in one datacenter
✦ ZooKeeper cluster spread across
multiple locations
✦ Broker commits writes to bookies in
both datacenter
✦ In event of datacenter failure, broker in
surviving datacenter assumes
ownership of topic
ZooKeeperProducers
Datacenter 1
Consumers
Pulsar Cluster
Datacenter 2
Producers
Consumers
71
73. ASYNCHRONOUS REPLICATION
DISASTER RECOVERY
Producers
(active)
Datacenter 1
Consumers
(active)
Pulsar Cluster
(primary)
Datacenter 2
Producers
(standby)
Consumers
(standby)
Pulsar Cluster
(standby)
Pulsar
replication
ZooKeeper ZooKeeper
✦ Two independent clusters, primary and
standby
✦ Configured tenants and namespaces
replicate to standby
✦ Data published to primary is
asynchronously replicated to standby
✦ Producers and consumers restarted in
second datacenter upon primary failure
72
81. WHAT’S NEEDED: STREAM NATIVE COMPUTATION
80
✦ Simplest possible API
๏ Method/Procedure/Function
๏ Multi Language API
๏ Scale developers
✦ Message bus native concepts
๏ Input/Output/Log as topics
✦ Flexible runtime
๏ Simple standalone applications vs system managed applications
82. PULSAR FUNCTIONS
81
Execute user-defined functions to process
and transform data
✦ Dynamic filtering, transformation, routing and analytics
✦ Easy for developers: serverless deployment, fully managed
by cluster
✦ Multiple input topics, multiple output topics
✦ Access to windows of messages
✦ Integrated global state storage
✦ Integrated with schema registry
f(x)
83. PULSAR FUNCTIONS
82
SDK-LESS API
import java.util.function.Function;
public class ExclamationFunction implements Function<String, String> {
@Override
public String apply(String input) {
return input + "!";
}
}
84. PULSAR FUNCTIONS
83
SDK API
import org.apache.pulsar.functions.api.PulsarFunction;
import org.apache.pulsar.functions.api.Context;
public class ExclamationFunction implements PulsarFunction<String, String> {
@Override
public String process(String input, Context context) {
return input + "!";
}
}
85. PULSAR FUNCTIONS
84
INPUT AND OUTPUT
✦ Function executed for every message of input topic
๏ Supports multiple topics as inputs
✦ Function Output goes to the output topic
๏ Function Output can be void/null
✦ SerDe takes care of serialization/deserialization of messages
๏ Custom SerDe can be provided by the users
๏ Integrates with Schema Registry
87. PULSAR FUNCTIONS
86
AS A STANDALONE APPLICATION
bin/pulsar-admin functions localrun
--input persistent://sample/standalone/ns1/test_input
--output persistent://sample/standalone/ns1/test_result
--className org.mycompany.ExclamationFunction
--jar myjar.jar
✦ Runs as a standalone process
✦ Run as many instances as you want. Framework automatically balances data
✦ Run and manage via Mesos/K8/Nomad/your favorite tool
88. PULSAR FUNCTIONS
87
RUNNING INSIDE PULSAR CLUSTER
✦ ‘Create’ and ‘Delete’ Functions in a Pulsar Cluster
✦ Pulsar brokers run functions as either threads/processes/docker containers
✦ Unifies Messaging and Compute cluster into one, significantly improving manageability
✦ Ideal match for Edge or small startup environment
✦ Serverless in a jar
90. PULSAR FUNCTIONS - DEPLOYMENT
89
(CONTD.)
Broker 1
Worker
Function
wordcount-1
Function
transform-2
Broker 1
Worker
Function
transform-1
Function
dataroute-1
Broker 1
Worker
Function
wordcount-2
Function
transform-3
Node 1 Node 2 Node 3
91. PULSAR FUNCTIONS - DEPLOYMENT
90
(CONTD.)
Worker
Function
wordcount-1
Function
transform-2
Worker
Function
transform-1
Function
dataroute-1
Worker
Function
wordcount-2
Function
transform-3
Node 1 Node 2 Node 3
Broker 1 Broker 2 Broker 3
Node 4 Node 5 Node 6
92. PULSAR FUNCTIONS - DEPLOYMENT
91
(CONTD.)
Function
wordcount-1
Function
transform-1
Function
transform-3
Pod 1 Pod 2 Pod 3
Broker 1 Broker 2 Broker 3
Pod 7 Pod 8 Pod 9
Function
dataroute-1
Function
wordcount-2
Function
transform-2
Pod 4 Pod 5 Pod 6
94. PULSAR FUNCTIONS
93
BUILT-IN STATE
✦ Functions can store state in stream storage
๏ Framework provides an simple library around this
✦ Support server side operations like counters
✦ Simplified application development
๏ No need to standup an extra system
95. PULSAR FUNCTIONS
94
BUILT-IN STATE MANAGEMENT
✦ Pulsar uses BookKeeper as its stream storage
✦ Functions can store State in BookKeeper
✦ Framework provides the Context object for users to access State
✦ Support server side operations like Counters
✦ Simplified application development
๏ No need to standup an extra system to develop/test/integrate/operate
96. PULSAR FUNCTIONS
95
STATE EXAMPLE
import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.PulsarFunction;
public class CounterFunction implements PulsarFunction<String, Void> {
@Override
public Void process(String input, Context context) throws Exception {
for (String word : input.split(".")) {
context.incrCounter(word, 1);
}
return null;
}
}
97. PULSAR FUNCTIONS
96
STATE IMPLEMENTATION
✦ The built-in state management is powered by Table Service in BookKeeper
✦ BP-30: Table Service
๏ Originated for a built-in metadata management within BookKeeper
๏ Expose for general usage. e.g. State management for Pulsar Functions
✦ Available from Pulsar 2.4
98. PULSAR FUNCTIONS
97
STATE IMPLEMENTATION
✦ Updates are written in the log streams in BookKeeper
✦ Materialized into a key/value table view
✦ The key/value table is indexed with rocksdb for fast lookup
✦ The source-of-truth is the log streams in BookKeeper
✦ Rocksdb are transient key/value indexes
✦ Rocksdb instances are incrementally checkpointed and stored into BookKeeper for fast recovery
99. EVENT PROCESSING DESIGN PATTERNS
DYNAMIC DATA ROUTING
ETL
DATA ENRICHMENT
FILTERING
98
WINDOW AGGREGATION
108. STATEFUL SERVERLESS APPLICATIONS
100
Generate and exchange intermediate data or ephemeral state
Need a serverless layer for sharing and exchanging ephemeral state
MapReduce
(Spark, Hadoop)
Stateful Streaming
Video Analytics
…
109. STATEFUL SERVERLESS APPLICATIONS
100
Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
Generate and exchange intermediate data or ephemeral state
Need a serverless layer for sharing and exchanging ephemeral state
MapReduce
(Spark, Hadoop)
Stateful Streaming
Video Analytics
…
114. Sorting data on PyWren
using Locus [NSDI’19]
Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
ExCamera [NSDI’17]
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc
115. Sorting data on PyWren
using Locus [NSDI’19]
Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
ExCamera [NSDI’17]
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc
116. Sorting data on PyWren
using Locus [NSDI’19]
Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
ExCamera [NSDI’17]
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc General
117. Sorting data on PyWren
using Locus [NSDI’19]
Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
ExCamera [NSDI’17]
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc General
Anna [VLDB’19, IEEE TKDE’19]
118. Sorting data on PyWren
using Locus [NSDI’19]
Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
ExCamera [NSDI’17]
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc General
Pocket [OSDI’18]
Anna [VLDB’19, IEEE TKDE’19]
120. JIFFY: MEMORY MANAGEMENT UNIT FOR SERVERLESS OS
102
…
CPU CPUCPU …CPU
Jiffy: Remote Ephemeral Storage
Application: Scale ephemeral storage
resources independent of other resources
Cloud Provider: Multiplex ephemeral
storage for high utilization
121. JIFFY: MEMORY MANAGEMENT UNIT FOR SERVERLESS OS
102
…
CPU CPUCPU …CPU
Jiffy: Remote Ephemeral Storage
Application: Scale ephemeral storage
resources independent of other resources
Cloud Provider: Multiplex ephemeral
storage for high utilization
Challenges:
What is the right interface?
How can we share ephemeral storage across applications with isolation?
How should we manage lifetimes of application storage?
How to facilitate efficient communication across tasks?
122. JIFFY INTERFACE
103
Virtual Memory Layer: Transparent memory scaling at “block” granularity for each namespace
CreateNamespace(), DestroyNamespace()
Stateful Programming Models: Use data structures to exchange state between tasks
…Map Reduce Dataflow Streaming Dataflow Piccolo
Distributed Data Structure Layer: Wrap “blocks” to efficiently support rich semantics
…FIFO Queues Files Hash Table B-Tree
Enqueue(),
Dequeue()
Read(),
Write()
Get(),
Put(),…
Lookup(),
Insert(),…
M
M
R
R
123. Isolation: Separate data structure per namespace
Multiplexing: Blocks multiplexed across data structures
JIFFY: HIGH UTILIZATION WITH ISOLATION
104
Transparent scaling by adding/removing blocks &
data-structure specific repartitioning
Serve
r#1
Server
#2
Server
#N
Jiffy Approach
…DS#1 DS#N
Shared Ephemeral Storage
App#1 App#2 App#N…
High utilization by multiplexing ephemeral storage across apps
Provide isolation guarantees across applications
125. JIFFY: STATE LIFETIME MANAGEMENT
105
New challenges in serverless compute platforms: independent compute/memory lifetimes
126. JIFFY: STATE LIFETIME MANAGEMENT
105
Server-centric Architectures
New challenges in serverless compute platforms: independent compute/memory lifetimes
127. JIFFY: STATE LIFETIME MANAGEMENT
105
Server-centric Architectures
New challenges in serverless compute platforms: independent compute/memory lifetimes
128. JIFFY: STATE LIFETIME MANAGEMENT
105
Serverless Architectures
New challenges in serverless compute platforms: independent compute/memory lifetimes
129. JIFFY: STATE LIFETIME MANAGEMENT
105
Serverless Architectures
New challenges in serverless compute platforms: independent compute/memory lifetimes
130. JIFFY: STATE LIFETIME MANAGEMENT
105
Serverless Architectures
Goal: Couple lifetime of storage resources to application lifetime
New challenges in serverless compute platforms: independent compute/memory lifetimes
131. JIFFY: STATE LIFETIME MANAGEMENT
105
Goal: Couple lifetime of storage resources to application lifetime
132. Existing storage systems: do not couple
JIFFY: STATE LIFETIME MANAGEMENT
105
Goal: Couple lifetime of storage resources to application lifetime
133. Existing storage systems: do not couple
JIFFY: STATE LIFETIME MANAGEMENT
105
Goal: Couple lifetime of storage resources to application lifetime
Programming languages: scoping & garbage collection
134. Challenge: Identify data scope, lifetime when compute and storage are separated
Existing storage systems: do not couple
JIFFY: STATE LIFETIME MANAGEMENT
105
Goal: Couple lifetime of storage resources to application lifetime
Programming languages: scoping & garbage collection
135. Challenge: Identify data scope, lifetime when compute and storage are separated
Existing storage systems: do not couple
JIFFY: STATE LIFETIME MANAGEMENT
105
Jiffy Approach: Hierarchical namespaces with lease management
Goal: Couple lifetime of storage resources to application lifetime
Programming languages: scoping & garbage collection
136. Challenge: Identify data scope, lifetime when compute and storage are separated
Existing storage systems: do not couple
JIFFY: STATE LIFETIME MANAGEMENT
105
Jiffy Approach: Hierarchical namespaces with lease management
Goal: Couple lifetime of storage resources to application lifetime
Programming languages: scoping & garbage collection
App1 App2
Task1 Task1 Task1 Task2
Subtask1 Subtask2
App3
/
137. Challenge: Identify data scope, lifetime when compute and storage are separated
Existing storage systems: do not couple
JIFFY: STATE LIFETIME MANAGEMENT
105
Jiffy Approach: Hierarchical namespaces with lease management
Goal: Couple lifetime of storage resources to application lifetime
Programming languages: scoping & garbage collection
App1 App2
Task1 Task1 Task1 Task2
Subtask1 Subtask2
App3
lease duration,
last renewed
Lease Renewals
Application
Tasks
/
139. JIFFY: INTER-TASK COMMUNICATION
106
Ephemeral Remote Storage
?
A CPU BCPU
How does B know it has data to consume?
Jiffy: in-built notification mechanism to
indicate availability of data
Jiffy
CPUA CPU B
140. JIFFY: INTER-TASK COMMUNICATION
106
Ephemeral Remote Storage
?
A CPU BCPU
How does B know it has data to consume?
Jiffy: in-built notification mechanism to
indicate availability of data
Jiffy
CPUA CPU B
Subscribe(Put)
141. JIFFY: INTER-TASK COMMUNICATION
106
Ephemeral Remote Storage
?
A CPU BCPU
How does B know it has data to consume?
Jiffy: in-built notification mechanism to
indicate availability of data
Jiffy
Notify(Put, K, V)
CPUA CPU B
Put(K, V)
142. JIFFY: SYSTEM OVERVIEW
107
Directory Service
Storage Service
Hierarchical namespaces
Data Structure per Namespace
Jiffy Client
Lease Renewal
Lease Management
Notification Framework
Block-level allocator
CONTROL
DATA
143. TWOFOLD
JIFFY: KEY IDEAS
SEPARATION OF CONTROL PLANE
AND DATA PLANE
HIERARCHICAL NAMESPACES
For resource multiplexing
and lifetime management
ELASTIC SCALING
MILLISECOND TIMESCALES
ISOLATION BETWEEN TASKS
108
145. HOW WELL DOES JIFFY PERFORM?
110
Serverless Platform AWS Lambda Service
Storage Service Amazon EC2 (m4.16xlarge instances)
Compared Storage Systems Redis, Apache Crail, Pocket, DynamoDB, Amazon S3
Latency/IOPS/MBPS comparable to state-of-the-art (Redis, Apache Crail, Pocket)
• ~100us/operation for 64B requests, at ~100,000 operations per second.
Transparent fine-grained elasticity for various data structures within 2-500ms
110
146. PERFORMANCE FOR STATEFUL APPLICATIONS
111
Encode 15min 4k
video on ExCamera
TaskID
15
12
9
6
3
0
Task Latency (s)
0 15 30 45 60
ExCamera
ExCamera + Jiffy
Sort 50GB data on
PyWren
S3
Redis
Jiffy
Task Latency (s)
0 10 20 30 40 50
Map Task
Reduce Task
TPC-DS Queries on
100GB data on Hive
Q1
Q2
Q3
Q4
Q5
Task Latency (s)
0 160 320 480 640 800
Local HDFS
Jiffy
Takeaway
Jiffy performance is comparable to state-of-the-art, even while providing
fine-grained transparent elasticity, lifetime-management, etc.
147. Total Capacity
BENEFITS OF MULTIPLEXING
112
50GB sort jobs arriving every 50s,
50 100
Used
capacity
Time
0
Delay until
capacity
available
UsedCapacity
(GB)
0
10
20
30
40
50
60
Time (s)
0 50 100 150 200 250 300 350 400 450 500
Sort-1 Sort-2 Sort-3 Sort-4 Sort-5
0
10
20
30
40
50
60
Time (s)
0 50 100 150 200 250 300 350 400 450 500
Redis Jiffy
on storage system with fixed 50GB capacity
No Available Capacity
152. IP/ Device ID Blacklisting
Databases (e.g., speed up semi-join
operations), Caches, Routers,
Storage Systems Reduce space requirement in
probabilistic routing tables
MEMBERSHIP
APPLICATIONS
117
155. BLOOM FILTER
120
✦ Natural generalization of hashing
✦ False positives are possible
✦ No false negatives
No deletions allowed
✦ For false positive rate ε, # hash functions = log2(1/ε)
where, n = # elements,
k = # hash functions
m = # bits in the array
156. CUCKOO FILTER
[1]
121
✦ Key Highlights
๏ Add and remove items dynamically
๏ For false positive rate ε < 3%, more space efficient than Bloom filter
๏ Higher performance than Bloom filter for many real workloads
๏ Asymptotically worse performance than Bloom filter
‣ Min fingerprint size α log (# entries in table)
✦ Overview
๏ Stores only a fingerprint of an item inserted
‣ Original key and value bits of each item not retrievable
๏ Set membership query for item x: search hash table for fingerprint of x
[1] Fan et al. (2014). “Cuckoo Filter: PracCcally BeNer Than Bloom”, CoNEXT.
157. CUCKOO FILTER
[1]
122
Cuckoo Hashing [1]
[1] R. Pagh and F. Rodler. “Cuckoo hashing,” Journal of Algorithms, 51(2):122-144, 2004.
[2] IllustraCon borrowed from Fan et al., (2014) “Cuckoo Filter: PracCcally BeNer Than Bloom”, CoNEXT.
[2]
IllustraCon of Cuckoo hashing [2]
✦ High space occupancy
✦ Practical implementations: multiple items/bucket
✦ Example uses: Software-based Ethernet switches
Cuckoo Filter [2]
✦ Uses a multi-way associative Cuckoo hash table
✦ Employs partial-key cuckoo hashing
๏ Store fingerprint of an item
๏ Relocate existing fingerprints to their alternative
locations
[2]
158. 123[1] Mitzenmacher et al. (2017). “AdapCve Cuckoo Filters”.
✦ Motivation
๏ Minimize false positive rate
✦ Selectively remove false positives without introducing false
negatives
✦ Maintain a replica of cuckoo hash table with raw elements
✦ Indices of buckets are determined by hash values of the
element, and not solely by the fingerprint
✦ Allow different hash functions for the fingerprints
๏ Enables removal and reinsertion of elements to remove
false positives
✦ Insertion complexity and space overhead
KEY HIGHLIGHTS
ADAPTIVE CUCKOO FILTER
[1]
159. 124
CONCURRENT CUCKOO FILTER
[1]
[1] Li et al. (2014), “Algorithmic Improvements for Fast Concurrent Cuckoo Hashing”.
Support for multiple writers
Optimistic cuckoo hashing
Minimizes the size of the locked critical section during updates
Leverage Intel’s Hardware Transactional Memory (HTM)
Optimize TSX lock elision to reduce transactional abort rate
Algorithmic/Architectural tuning
Breadth-first Search for an Empty Slot
Lock After Discovering a Cuckoo Path
Striped fine-grain spin locks
Increase set-associativity
Prefetcing
160. CUCKOO FILTER
125
CUCKOO++ HASH TABLES
[Scouarnec 2018]
MORTON FILTER
[Breslow et al. 2018]
SMART CUCKOO
[Sun et al. 2017]
POSITION-AWARE CUCKOO
[Kwon et al. 2018]
VARIANTS
162. LEARNED BLOOM FILTER
[1]
127[1] Kraska et al. (2018). “The Case for Learned index Structures”, SIGMOD.
✦ Bloom filter as a binary classifier - predict whether a key exists in as set or not (membership)
๏ Subtleties - no false negatives
‣ Learned model + auxiliary data structure
✦ Learn structure of lookup keys
๏ Minimize collisions between keys and non-keys
๏ Leverage continuous functions to capture the underlying data distribution
✦ Learn different models for read-heavy vs. write-heavy workloads
KEY HIGHLIGHTS
164. 129[1] Rae et al. (2019). “Meta-Learning Neural Bloom Filters”.
NEURAL BLOOM FILTER
[1]
✦ Inputs arrive at high throughput, or are ephemeral
๏ Few-shot neural data structures
✦ Learning membership in one-shot via meta-learning
✦ Overview
๏ Sample tasks from a common distribution
๏ Network learns to specialize to a given task with few examples
KEY HIGHLIGHTS
[1]
166. FREQUENT ELEMENTS
COUNT-SKETCH
[Charikar et al. 2002]
COUNT-MIN-LOG
[Pitel & Fouquier 2015]
COUNT-MIN
[Cormode & Muthukrishnan 2005]
LEARNED COUNT-MIN
[Hsu et al. 2019]
131
5 5 5 5
167. ✦ A two-dimensional array counts with w columns and d rows
✦ Each entry of the array is initially zero
✦ d hash functions are chosen uniformly at random from a pairwise independent family
✦ Update
๏ For a new element i, for each row j and k = hj(i), increment the kth column by one
✦ Point query where, sketch is the table
✦ Parameters
COUNT-MIN
[1]
132
[1] Cormode and Muthukrishnan (2005). "An Improved Data Stream Summary: The Count-Min Sketch and its
ApplicaCons". J. Algorithms 55: 29–38.
),( δε
!
!
"
#
#
$
=
ε
e
w
!
!
"
#
#
$
=
δ
1
lnd
}1{}1{:,,1 wnhh d ……… →
168. ✦ Millions/billions of features - a routine
๏ NLP, genomics, computational biology, chemistry
✦ Accuracy vs. Performance trade-off
๏ Model vs. runtime
✦ Model Interpretability
COUNT-SKETCH
FEATURE SELECTION
✦ Feature Hashing
๏ Loss of interpretability
✦ Count-Sketch + top-k heap
๏ top-k values of the sketch used for
iterative update
[1] IllustraCon borrowed from Aghazadeh et al. (2018). “MISSION: Ultra Large-Scale Feature SelecCon using
Count-Sketched”.
[1]
133
169. ✦ Count-Min sketch with conservative update (CU sketch)
✦ Update an item with frequency c
๏ Avoid unnecessary updating of counter values => Reduce over-estimation error
๏ Prone to over-estimation error on low-frequency items
✦ Lossy Conservative Update (LCU) - SWS
๏ Divide stream into windows
๏ At window boundaries, ∀ 1 ≤ i ≤ w, 1 ≤ j ≤ d, decrement sketch[i,j] if 0 < sketch[i,j] ≤
COUNT-MIN
[1]
[1] Cormode, G. 2009. Encyclopedia entry on ’Count-MinSketch’. In Encyclopedia of Database Systems. Springer., 511–516.
VARIANTS
134
170. ✦ Minimize error of low frequency items
✦ Overview
๏ Same structure than Count-Min Sketch with conservative update
๏ Replace the classical binary counting cells by log counting cells
COUNT-MIN-LOG
[1]
135[1] Pitel and Fouquier (2015). "Count-Min-Log sketch: Approximately counCng with approximate counters”.
UPDATE
QUERY
171. ✦ Applications
๏ Changepoint/Global Iceberg Detection
๏ Entropy Estimation
UnivMON
[1]
136[1] Liu et al. (2016). "One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon”.
ONLINESKETCHINGSTEP
OFFLINEESTIMATION
✦ Universal sketch
✦ Provably accurate for estimating a large class of functions
✦ Generality
๏ Delay binding to application of interest
✦ High fidelity
172. ✦ Need for line rate processing: 10-100 Gbps
✦ Limited memory in switching hardware
๏ Memory ∝ # heavy flows
HASH-PIPE
[1]
137[1] Sivaraman et al. (2017). “Heavy-HiNer DetecCon EnCrely in the Data Plane”.
✦ Small time budget: 1 ns
๏ Manipulate state & process packets at each stage
๏ Process each packet only once
173. ✦ Exploit patterns in the input
๏ For example, in text data, word frequency ∝ 1/word length
✦ Mitigate large estimation error
๏ Collisions between high-frequency elements
✦ Learn properties to identify heavy hitters
✦ Does not need to know the data distribution a priori
✦ Logarithmic improvement in error bound
✦ Key high level idea
๏ Assign each heavy hitter to its unique bucket
LEARNED COUNT-MIN
[1]
138[1] Hsu et al. (2019). “Learning-based Frequency EsCmaCon Algorithms”, ICLR.
174. LEARNED COUNT-MIN
139
✦ Frequency of an element in a unique bucket is exact
✦ Provably reduces estimation errors
[1] IllustraCon borrowed from Hsu et al. (2019). “Learning-based Frequency EsCmaCon Algorithms”, ICLR.
[1]
175. REAL-TIME FREQUENT ELEMENTS in PULSAR & HERON
140
Streamlio (Apache Pulsar and Apache Heron)
Data
Source 2
clean-fn 2
Data
Source 1
Data
Source 3
clean-fn 1
trend-
topology 3
Trending
Application
T1
T2
T3
176. PRIVATE COUNT-MIN
[1]
141[1] Melis et al. (2016), “Efficient Private StaCsCcs with Succinct Sketches”.
✦ out-of-dictionary words → auto-complete
✦ Why not employ homomorphic encryption for privacy-preserving aggregation?
✦ Perform private aggregation over the sketches, rather than the raw inputs
✦ Reduce the communication and computation complexity
๏ Linear to logarithmic in the size of their input
✦ Real-world privacy-friendly systems
๏ Recommendations for media streaming services
๏ Prediction of user locations
‣ Improve transportation services and predict future trends
✦ Federated learning
178. FEDERATED & DIFFERENTIALLY
PRIVATE
Discover the heavy hitters but not their frequencies
Without additional noise
Iterative algorithm[1]
randomly a select set of users
Each user votes on a single character extension
to an already discovered popular prefix
Server aggregates the received votes using a trie structure and prunes
nodes that have counts that fall below a chosen threshold θ
[1] Zhu et al. (2019), “Federated Heavy HiNers with DifferenCal Privacy”.
143
180. 145
CARDINALITY ESTIMATION
✦ Hash values as strings
✦ Occurrence of particular patterns in the binary representation
✦ Example: Hyperloglog [Flajolet et al. 2008]
BIT-PATTERN OBSERVABLES
✦ Hash values as real numbers
✦ k-th smallest value
๏ Insensitive to distribution of repeated values
✦ Examples: MinCount [Giroire, 2000]
ORDER STATISTIC OBSERVABLES
181. SKETCH-BASED VS. SAMPLING BASED
UNIFORM HASHING VS. LOGARITHMIC HASHING
INTERNAL BASED VS. BUCKET BASED
146
CARDINALITY ESTIMATION
FLAVORS
Adaptive sampling, Distinct sampling, Method-of-Moments Estimator, (Smoothed) Jacknife Estimator
LogLog, SuperLogLog, HyperLogLog, and HyperLogLog++
MinCount
Counting Bloom filter
182. ✦ Apply hash function h to every element in a multiset
✦ Cardinality of multiset is 2max(ϱ) where 0ϱ-11 is the bit pattern observed at the beginning of a hash
value
✦ Above suffers with high variance
๏ Employ stochastic averaging
๏ Partition input stream into m sub-streams Si using first p bits of hash values (m = 2p)
147
HYPERLOGLOG
where
183. 148
HYPERLOGLOG
OPTIMIZATIONS
✦ Use of 64-bit hash function
๏ Total memory requirement 5 * 2p -> 6 * 2p, where p is the precision
✦ Empirical bias correction
๏ Uses empirically determined data for cardinalities smaller than 5m and uses the unmodified raw estimate
otherwise
✦ Sparse representation
๏ For n≪m, store an integer obtained by concatenating the bit patterns for idx and ϱ(w)
๏ Use variable length encoding for integers that uses variable number of bytes to represent integers
๏ Use difference encoding - store the difference between successive elements
✦ Other optimizations [1, 2]
[1] hNp://druid.io/blog/2014/02/18/hyperloglog-opCmizaCons-for-real-world-systems.html
[2] hNp://anCrez.com/news/75
187. ✦ Stochastic/Incremental gradient descent
๏ Slow to converge
✦ Variance reduction, Accelerated gradient descent
๏ AdaBound, AMSGrad, Nesterov, Adamax, Adam, RMSProp, AdaDelta
‣ Stragglers worsen the convergence
✦ Select a subset of training data points along with their
corresponding learning rates
๏ Greedily maximize the facility location function
‣ Minimizes the upper-bound on the estimation error of the full
gradient
FASTER TRAINING
[1] Mirzasoleiman et al. (2019). “Data Sketching for Faster Training of Machine Learning Models”.
152
KEY IDEA
191. Problem Statement
fn(x): smooth function
h(x): non-smooth function (such as l1 and l2 penalty)
Leverage ADMM
Worker w updates its own copy xw and master updates
global variable z
OPTIMIZATION
156 [1] Aytekin and Johansson (2019), “Harnessing the Power of Serverless RunCmes for Large-Scale OpCmizaCon”.
193. 158
OPTIMIZATION
[1]
[1] Gupta et al. (2019). “OverSketched Newton: Fast Convex OpCmizaCon for Serverless Systems”.
✦ Large-scale optimization problems
๏ Second order methods
‣ Use gradient and Hessian
‣ Faster convergence
‣ Do not require step size tuning
‣ Computationally prohibitive when training data is large
๏ Go Serverless
‣ Invoke thousands of workers
‣ Communication costs (# iterations)
‣ Compute approximate Hessian
✦ Matrix sketching
๏ Randomized Numerical Linear Algebra (RandNLA)
๏ Inbuilt resiliency against stragglers
‣ Leverage ECC to create redundant computation
194. OPTIMIZATION
159
✦ Gradient computation
๏ Matrix-vector multiplication
‣ Coded Matrix Multiplication - distributed, straggler resilient
[1]
[1] IllustraCon borrowed from Gupta et al. (2019). “OverSketched Newton: Fast Convex OpCmizaCon for
Serverless Systems”.
195. 160
OPTIMIZATION
✦ Hessian computation
๏ Matrix-matrix multiplication (MM)
‣ Block partitioning of input matrices
‣ Sparse sketching matrix based on Count-Sketch
[1]
[1] IllustraCon borrowed from Gupta et al. (2019). “OverSketched Newton: Fast Convex OpCmizaCon for Serverless Systems”.
✦ Applications - Distributed, Straggler resilient
๏ Ridge Regularized Linear Regression
196. INFERENCE IN SERVERLESS
ENVIRONMENTS
161 [1] IllustraCon borrowed from Dakkak et al. (2018). “TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep
Learning Inference in FuncCon as a Service Environments”.
Key Challenge: Low Latency
Cold Start: move large amount of model data within and across servers
Persistent model store across the GPU, CPU, local
storage, and cloud storage hierarchy
[1]
200. WHAT MAKES IOT ANALYTICS DIFFERENT?
165
More Data
✦ High-volume, continuous
data in motion from
multiple sensors
✦ Store, blend and manage
time-series data
More Complexity
✦ Use of multiple analytics
techniques
✦ Distributed analytics
(edge)
More Automation
✦ Integration with operations
systems and BPS
✦ Bidirectional
communication and
control of endpoints
201. WHAT MAKES IOT ANALYTICS DIFFERENT?
166
Devices Gateways
Data
Collectors
Data
Transport Processing Repositories Applications
202. CHALLENGES
167
✦ Latency - delay resulting from data transmission from edge to cloud or datacenter may exceed
application requirements
✦ Capacity - volume of data streams would require expensive network bandwidth to collect and
transmit detailed data
✦ Processing lag - time required to process incoming data streams to make them ready for
applications may exceed requirements
✦ Complexity - complicated mix of technologies and tools creates inconsistency and operations
burdens
203. WHAT’S NEEDED?
168
✦ Simplified infrastructure for data movement
and processing
✦ Performance and scalability to keep up
with data
✦ Ability to process, understand and act on
data wherever it is
Resilient, scalable data movement
From edge to cloud to datacenter (and back)
Unified platform
Consistent development and processing environment
across edge, cloud, datacenter
Intelligence everywhere
Dynamically filter, process, analyze and route data as needed
at edge, cloud and datacenter
204. IOT DATA FABRIC
169
Apache Pulsar
Edge Cloud Datacenter
Integrated solution for
event data movement,
processing and storage
Scalable for deployment
across, edge, cloud and
datacenter
Simple framework for
filtering, transformation,
enrichment, analytics
Built on Apache Pulsar
open source technology,
proven at massive scale
205. IOT ARCHITECTURE WITH APACHE PULSAR
170
Devices Gateways
Data
Collectors
Data
Transport Processing Repositories Applications
Apache Pulsar
208. 173
✦ Increased co-residency: side-channels
๏ Rowhammer attacks on DRAM
[1]
๏ Exploiting Micro-architectural vulnerabilities
✦ Information leakage via network communications
✦ Potential solutions
๏ Hardware-level security and isolation
๏ Light-weight and secure container isolation
๏ Task-placement strategies
Security
MISSING PIECES: SECURITY
209. 174
✦ Increased multiplexing = less predictable performance
๏ Resource-allocation delays
๏ Scheduling delays
๏ Cold-start latencies
✦ Potential solutions
๏ Hardware-level isolation, container-level isolation
๏ Bin-packing based on performance needs (throughput, latency)
๏ Bin-packing based on complementary resource needs
MISSING PIECES: SLA GUARANTEES
SLA Guarantees
210. 175
✦ Only CPU resources, no hardware heterogeneity
๏ GPU
๏ TPU
๏ FPGAs
✦ Not fundamental, providers eventually will offer them
✦ Leads to new opportunities:
๏ Greater degree of multiplexing for different resource types
๏ Bin-pack applications with different hardware needs
MISSING PIECES: HETEROGENEOUS HARDWARE
Heterogeneous
Hardware
214. 179
ACKNOWLDEGEMENTS
RACHIT AGARWAL, ION STOICA,
ADITYA AKELLA
ERIC JONAS, JOHANN SCHLEIER-
SMITH
VIKRAM SREEKANTI, CHIA-CHE TSAI
QIFAN PU, VAISHAAL SHANKAR,
JOAO MENEZES CARREIRA, KARL
KRAUTH, NEERAJA YADWADKAR,
JOSEPH GONZALEZ, RALUCA
ADA POPA, DAVID A. PATTERSON
216. SERVERLESS
Peeking Behind The Curtains Of Serverless Platforms
[Wang et al. 2018]
The Serverless Data Center : Hardware Disaggregation
Meets Serverless Computing
[Pemberton and Schleier-Smith, 2019]
A Berkeley View On Serverless Computing
[Jonas et al. 2018]
SAND: Towards High-Performance Serverless
Computing
[Akkus et al. 2018]
The Server Is Dead, Long Live The Server: Rise Of
Serverless Computing, Overview Of Current
State And Future Trends In Research
And Industry
[Castro et al. 2019]
Agile Cold Starts For Scalable Serverless
[Mohan et al. 2019]
181
217. 182Slide -
[Brenner and Kapitza, 2019]
Trust More, Serverless
Clemmys: towards secure remote execution in FaaS
[Trach et al. 2019]
SERVERLESS
182
No More, No Less - A Formal Model For
Serverless Computing
[Gabbrielli et al. 2019]
Serverless Computing: One Step Forward,
Two Steps Back
[Hellerstein et al. 2019]
Formal Foundations Of Serverless Computing
[Jangda et al. 2019]
218. numpywren: serverless
linear algebra
183
SERVERLESS ANALYTICS/MACHINE LEARNING
Shuffling, Fast and Slow: Scalable
Analytics on Serverless
Infrastructure
A Serverless Real-Time Data
Analytics Platform for Edge
Computing
[Nastic et al. 2017]
[Ishakian et al. 2017]
Serving deep learning
models in a serverless
platform
[Carreira et al. 2018]
A Case for Serverless
Machine Learning
[Pu et al. 2019]
[Bhattacharjee et al. 2019]
BARISTA: Efficient and Scalable Serverless Serving
System for Deep Learning Prediction Services
[Kim and Lin 2018]
Serverless Data
Analytics with Flint
[Shankar et al. 2018]
[Feng et al. 2018]
Exploring Serverless
Computing for Neural
Network Training
219. ACCELERATED STOCHASTIC GRADIENT DESCENT
On the momentum term in gradient
descent learning algorithms
[Qian 1999]
Accelerating stochastic gradient
descent using predictive
variance reduction
[Johnson and Zhang 2013]
184
A method for unconstrained convex
minimization problem with the rate
of convergence O(1/k2)
[Nesterov 1983]
Adaptive Subgradient Methods for
Online Learning and Stochastic
Optimization
[Duchi et al. 2011]
Incorporating Nesterov
Momentum into Adam
[Dozat 2016]
Adam: a Method for Stochastic
Optimization
[Kingma and Ba 2015]
Fast Stochastic Variance Reduced Gradient
Method with Momentum Acceleration for
Machine Learning
[Shang et al. 2017]
On the Convergence of Adam
and Beyond
[Reddi et al. 2019]
220. OPTIMIZATION
[Drineas and Mahoney 2016]
RandNLA: Randomized Numerical Linear
Algebra
[Gupta et al. 2019]
OverSketched Newton: Fast Convex Optimization
for Serverless Systems
[Boyd et al. 2010]
Distributed Optimization and Statistical
Learning via the Alternating Direction
Method of Multipliers
[Parikh and Boyd, 2014]
Proximal Algorithms
185
[Roosts et al. 2018]
Newton-MR: Newton’s method without
smoothness or convexity
221. APPROXIMATION
A stochastic approximation method
[Robbins and Munro 1951]
On a stochastic approximation method
[Chung et al. 1954]
An analysis of approximations for maximizing submodular set
functions - I
[Nemhauser et al. 1978]
An analysis of approximations for maximizing submodular set
functions - II
[Nemhauser et al. 1978]
Accelerated greedy algorithms for maximizing submodular set
functions
[Minoux 1978]
186
222. A general-purpose counting filter:
Making every bit count
[Pandey et al. 2017]
Multiple Set Matching and Pre-Filtering
with Bloom Multifilters
[Concas et al. 2019]
Cuckoo filter: Practically better than
Bloom
[Fan et al. 2014]
Improving retouched bloom filter for
trading off selected false positives
against false negatives
[Donnet et al. 2010]
187
MEMBERSHIP
Bloom filters in adversarial
environments
[Naor and Yegev 2015]
Bloom Filters, Adaptivity, and
the Dictionary Problem
[Bender et al. 2018]
Don’t thrash: how to cache your
hash on flash
[Bender et al. 2012]
The bloomier filter: an efficient data
structure for static support lookup
tables
[Chazelle et al. 2004]
223. FREQUENT ELEMENTS
[Sivaraman et al. 2017]
Heavy-Hitter Detection Entirely
in the Data Plane
[Roy et al. 2016]
Augmented Sketch: Faster and more Accurate
Stream Processing
[Aghazadel et al. 2018]
MISSION: Ultra Large-Scale Feature Selection
using Count-Sketches
[Harrison et al. 2018]
Network-Wide Heavy Hitter Detection
with Commodity Switches
188
224. CARDINALITY ESTIMATION
NEURAL NETWORK BASED APPROACHES
Cardinality estimation with local deep
learning models
[Woltmann et al. 2019]
Learned Cardinalities: Estimating Correlated Joins
with Deep Learning
[Kipf et al. 2018]
Cardinality estimation using neural
networks
[Liu et al. 2015]
An Empirical Analysis of Deep Learning for
Cardinality Estimation
[Ortiz et al. 2019]
189
225. ✦ Federated Optimization: Distributed Machine Learning for On-Device
Intelligence [Konečný et al. 2016]
✦ Communication-Efficient Learning of Deep Networks from Decentralized
Data [McMahan et al. 2016]
✦ Federated Learning: Strategies for Improving Communication Efficiency
[Konečný et al. 2016]
✦ Towards Federated Learning at Sscale: System Design [Bonawitz et al.
2019]
✦ Asynchronous FEDERATED Optimization [Xie et al. 2019]
✦ FEDERATED Heavy Hitters with Differential Privacy [Zhu et al. 2019]
FEDERATED LEARNING
190