Tableapp architecture migration story for GCPUG.TWYen-Wen Chen
This document summarizes the migration of a web application called TABLEAPP from AWS to GCP. It describes the original AWS architecture, problems encountered like slow scaling, and goals for the migration like improving performance and reducing costs. It then details experiments with Docker containers and Kubernetes on GCP and AWS. The selected solution deployed Kubernetes on GCP's Container Engine for auto-scaling and easy management. The new GCP architecture integrated Kubernetes, Cloud SQL, Cloud Storage and other services. This resulted in faster deployment times, higher performance, better log collection and a 40% reduction in costs compared to the original AWS architecture.
Spark Summit - Mobius C# Binding for Apache Sparkshareddatamsft
Slides used for the talk at Spark Summit West - https://spark-summit.org/2016/events/mobius-c-language-binding-for-spark.
With Mobius developers can use .NET with Apache Spark. This talk covers writing Spark driver program in C# using Mobius, internal architecture of Mobius, observations of C# applications running in Spark cluster and recommended best practices. Mobius is open-sourced @ http://github.com/Microsoft/Mobius.
From AWS to GCP, TABLEAPP Architecture StoryYen-Wen Chen
TABLEAPP is migrating from AWS to GCP due to scaling issues with their AWS architecture. They propose using Kubernetes on GCP to containerize their application and allow for easier auto-scaling. This will eliminate wasted resources and slow provisioning times. They present a new GCP architecture using Kubernetes, Cloud SQL, Cloud Load Balancing, and other GCP services. Migrating has reduced costs by 40% while maintaining availability and performance.
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...Flink Forward
Over the past few months, the Apache Flink and Apache Beam communities have been busy developing an industry leading solution to author batch and streaming pipelines with Python. This was made possible by a significant effort to revamp Beam’s portability framework, build the corresponding Flink Runner, and simplify Flink’s artifact distribution & deployment mechanisms.
What is the “killer big-data app” enabled by this integration: production TensorFlow pipelines. Building production machine learning pipelines that process large distributed data sets can get complex. In this talk, we will describe a set of open source libraries developed at Google, that simplify and unify pre and post processing stages for a production TensorFlow pipeline. These libraries are authored on Beam’s python SDK, and can be run on Apache Flink at scale.
Last, but not least, we will describe how Beam & Flink aim to bring the power of big-data to newer audiences, in particular, developers of the Go programming language.
End to-end large messages processing with Kafka Streams & Kafka Connectconfluent
This document discusses processing large messages with Kafka Streams and Kafka Connect. It describes how large messages can exceed Kafka's maximum message size limit. It proposes using an S3-backed serializer to store large messages in S3 and send pointers to Kafka instead. This allows processing logic to remain unchanged while handling large messages. The serializer transparently retrieves messages from S3 during deserialization.
How to build Linked Data Platform (in a W3C sense https://www.w3.org/TR/ldp/) without actually building one. We'll look into the rich set of services provided by Amazon as part of AWS and see if we can configure them to look like an LDP (spoiler - yes, we can).
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...HostedbyConfluent
Confluent Platform 6.0 and Project Metamorphosis complete the event streaming platform by providing elastic scalability, infinite storage, global access, and transforming Kafka. Key features include self-balancing clusters and dynamic scaling on Confluent Cloud, tiered storage and infinite retention on the platform, and cluster linking to simplify hybrid and multi-cloud deployments. These new capabilities help remove limitations on scale, storage, and deployment that traditionally challenged Kafka applications.
The document introduces the AWS Cloud Development Kit (CDK) framework, explaining that it allows defining cloud infrastructure as reusable components using familiar programming languages like JavaScript. It notes that CDK accelerates onboarding to AWS since there is little new to learn compared to CloudFormation, and that resources can be defined at a higher level of abstraction. The document also compares CDK to AWS Serverless Application Model (SAM), noting that while both build on CloudFormation, CDK supports local testing and deploying a wider range of resource types through programming languages rather than YAML/JSON templates.
Tableapp architecture migration story for GCPUG.TWYen-Wen Chen
This document summarizes the migration of a web application called TABLEAPP from AWS to GCP. It describes the original AWS architecture, problems encountered like slow scaling, and goals for the migration like improving performance and reducing costs. It then details experiments with Docker containers and Kubernetes on GCP and AWS. The selected solution deployed Kubernetes on GCP's Container Engine for auto-scaling and easy management. The new GCP architecture integrated Kubernetes, Cloud SQL, Cloud Storage and other services. This resulted in faster deployment times, higher performance, better log collection and a 40% reduction in costs compared to the original AWS architecture.
Spark Summit - Mobius C# Binding for Apache Sparkshareddatamsft
Slides used for the talk at Spark Summit West - https://spark-summit.org/2016/events/mobius-c-language-binding-for-spark.
With Mobius developers can use .NET with Apache Spark. This talk covers writing Spark driver program in C# using Mobius, internal architecture of Mobius, observations of C# applications running in Spark cluster and recommended best practices. Mobius is open-sourced @ http://github.com/Microsoft/Mobius.
From AWS to GCP, TABLEAPP Architecture StoryYen-Wen Chen
TABLEAPP is migrating from AWS to GCP due to scaling issues with their AWS architecture. They propose using Kubernetes on GCP to containerize their application and allow for easier auto-scaling. This will eliminate wasted resources and slow provisioning times. They present a new GCP architecture using Kubernetes, Cloud SQL, Cloud Load Balancing, and other GCP services. Migrating has reduced costs by 40% while maintaining availability and performance.
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...Flink Forward
Over the past few months, the Apache Flink and Apache Beam communities have been busy developing an industry leading solution to author batch and streaming pipelines with Python. This was made possible by a significant effort to revamp Beam’s portability framework, build the corresponding Flink Runner, and simplify Flink’s artifact distribution & deployment mechanisms.
What is the “killer big-data app” enabled by this integration: production TensorFlow pipelines. Building production machine learning pipelines that process large distributed data sets can get complex. In this talk, we will describe a set of open source libraries developed at Google, that simplify and unify pre and post processing stages for a production TensorFlow pipeline. These libraries are authored on Beam’s python SDK, and can be run on Apache Flink at scale.
Last, but not least, we will describe how Beam & Flink aim to bring the power of big-data to newer audiences, in particular, developers of the Go programming language.
End to-end large messages processing with Kafka Streams & Kafka Connectconfluent
This document discusses processing large messages with Kafka Streams and Kafka Connect. It describes how large messages can exceed Kafka's maximum message size limit. It proposes using an S3-backed serializer to store large messages in S3 and send pointers to Kafka instead. This allows processing logic to remain unchanged while handling large messages. The serializer transparently retrieves messages from S3 during deserialization.
How to build Linked Data Platform (in a W3C sense https://www.w3.org/TR/ldp/) without actually building one. We'll look into the rich set of services provided by Amazon as part of AWS and see if we can configure them to look like an LDP (spoiler - yes, we can).
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...HostedbyConfluent
Confluent Platform 6.0 and Project Metamorphosis complete the event streaming platform by providing elastic scalability, infinite storage, global access, and transforming Kafka. Key features include self-balancing clusters and dynamic scaling on Confluent Cloud, tiered storage and infinite retention on the platform, and cluster linking to simplify hybrid and multi-cloud deployments. These new capabilities help remove limitations on scale, storage, and deployment that traditionally challenged Kafka applications.
The document introduces the AWS Cloud Development Kit (CDK) framework, explaining that it allows defining cloud infrastructure as reusable components using familiar programming languages like JavaScript. It notes that CDK accelerates onboarding to AWS since there is little new to learn compared to CloudFormation, and that resources can be defined at a higher level of abstraction. The document also compares CDK to AWS Serverless Application Model (SAM), noting that while both build on CloudFormation, CDK supports local testing and deploying a wider range of resource types through programming languages rather than YAML/JSON templates.
How to collect and utilize logs at Kubernetes with Elastic StackRakuten Group, Inc.
- Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that share resources into pods and allows containers in a pod to find each other and communicate using localhost. Pods run on nodes which are physical or virtual machines.
- There are different approaches to logging in Kubernetes including sending logs from pods to a log backend directly or indirectly via nodes. Common backends include Elasticsearch, Splunk, and Kibana. Logs can be searched and alerts generated based on their contents.
- Application performance monitoring (APM) tools integrate with applications like Rails to capture metrics on CPU, memory, transactions and send structured log data to backends for creating graphs and dashboards without
Meteor is the next take on agile development on the full JavaScript stack. Based on established JavaScript tools like Node, JQuery and Underscore, it still brings a fresh and integrated approach. And MongoDB is very much its heart: Minimongo implements a client side MongoDB API for manipulating your data model; Transparent replication of data between client and server; Using WebSockets, MongoDB oplog events replicate immediately to all clients, making it simple to do distributed applications "Google Docs style."
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database Redis Labs
Redfin uses Redis as an in-memory database to power caching, rate limiting, real-time analytics and more. Redis provides fast performance and useful data structures and features like Lua scripting, pub/sub, hyperloglog and hashes. Some current uses include caching API responses, database queries and map data. Redis is also used for rate limiting external APIs. Future potential uses discussed include session storage, text search, chat, distributed locks, modules for new data types and behaviors.
Achieving end-to-end visibility into complex event-sourcing transactions usin...HostedbyConfluent
Event-sourcing systems usage like Kafka is growing rapidly among Node.js applications. Building systems around an event-driven architecture simplifies horizontal scalability in distributed computing models and makes them more resilient to failure. With these advantages, we face new challenges - how to get visibility into these complex processes.
Event-driven architecture is async by nature. Tracking the communication between different components is both extremely difficult and important when debugging or figuring out bottlenecks in the system.
In this talk, I will present ways to achieve end-to-end and granular visibility into complex event-sourcing transactions using distributed tracing. I will use open-source tools like OpenTelemetry, Jaeger, and Zipkin to showcase a complex Node.js system using Kafka.
The automation challenge Kubernetes operators vs Helm chartsAna-Maria Mihalceanu
Helm charts and Kubernetes operators both provide tools for automating application deployments to Kubernetes clusters. Helm charts package Kubernetes configurations and allow deploying multiple configurations as a single application, while operators package human operational knowledge to manage applications over their lifetime. Some benefits of operators include maintaining resources securely with HTTPS, creating backups, and configuring clusters, while Helm charts are better for stateless applications where settings don't need ongoing maintenance. The document discusses converting an existing Helm chart to a Kubernetes operator to deploy and automatically manage an application.
Serverless computing allows developers to build and run applications and services without having to manage infrastructure. It uses third party services to handle servers and allows developers to focus only on their application code. Serverless applications are built using event-driven compute services like AWS Lambda, Azure Functions, and Google Cloud Functions. These services allow code to be triggered by events and auto-scale as needed, without the need to provision or manage servers.
This document discusses moving MongoDB to the cloud. It provides an overview of MongoDB hosting options including on-premises data centers, cloud providers, and hosted databases. It outlines some key reasons to move to the cloud, such as cost-effectiveness, reduced need for staffing, and improved availability. It also covers important considerations for strategy planning including instance types, high availability strategy, security, and migration/rollback strategies. Finally, it discusses two common strategies for migrating - adding a cloud server to an existing replica set with no downtime, or taking backups and restoring to the cloud which requires downtime.
MongoDB World 2018: Building Serverless Apps with MongoDB Atlas on Google Clo...MongoDB
This document discusses building serverless apps with MongoDB Atlas on Google Cloud Platform (GCP). It describes using MongoDB Atlas as the database for a global web app with users in the US, UK, and Australia to gain native scaling capabilities and address latency concerns. It demonstrates creating a "Hello World" Node.js app on GCP App Engine connected to a MongoDB Atlas cluster on GCP for proof of concept.
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...Luke Tillman
Transitioning a legacy monolithic application to microservices is a daunting task by itself and it only gets more complicated as you start to dig through all the libraries and frameworks out there meant to help. In this talk, we'll cover the transition of a real Cassandra-based application to a microservices architecture using Grpc from Google and Falcor from Netflix. (Yes, Falcor is more than just a magical luck dragon from an awesome 80's movie.) We'll talk about why these technologies were a good fit for the project as well as why Cassandra is often a great choice once you go down the path of microservices. And since all the code for the project is open source, you'll have plenty to dig into afterwards.
This document summarizes a data engineering project for analyzing trending topics by geo-location in 3 sentences or less:
The project involves building a pipeline to ingest real-time social media data from Kafka into HDFS for batch processing with Spark and storing results in Cassandra, with the goal of exposing trending hashtag data via a web API. Some initial components including a simple Flask API are complete, while work remains on real-time streaming, a NoSQL database interface, and fully configuring the cluster. The presenter has a computer science degree and experience as a software engineer at Citrix and a university research center.
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...confluent
Running a multi-tenant Kafka platform designed for the enterprise can be challenging. You need to manage and plan for data growth, support an ever-increasing number of use cases, and ensure your developers can be productive with the latest tools in the Apache Kafka ecosystem — all while maintaining the stability and performance of Kafka itself.
At Bloomberg, we run a fully-managed, multi-tenant Kafka platform that is used by developers across the enterprise. The variety of use cases for Kafka leads to bursty workloads, latency-sensitive workloads, and topologies where partitions are fanned out across hundreds or thousands of consumer groups running side-by-side in the same cluster.
In this talk, we will give a brief overview of our platform and share some of our experiences and tools for running multi-tenant stretched clusters, managing data growth with compression, and mitigating the impact of various application patterns on shared clusters.
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...HostedbyConfluent
Studying the ""how"" of Kafka makes you better at using Kafka, but studying its ""whys"" makes you better at so much more. In looking at the tradeoffs behind a system like Kafka, we learn to reason more clearly about distributed systems and to make high-stakes technology adoption decisions more effectively. These are skills we all want to improve!
In this talk, we'll examine trade-offs on which our favorite distributed messaging system takes opinionated positions:
- Whether to store data contiguously or using an index
- How many storage tiers are best?
- Where should metadata live?
- And more.
It's always useful to dissect a modern distributed system with the goal of understanding it better, and it's even better to learn to deeper architectural principles in the process. Come to this talk for a generous helping of both.
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...Seldon
Speaker: Barbara Fusinska, Machine Learning Strategic Cloud Engineer at Google
Title: Hassle Free, Scalable, Machine Learning with Kubeflow
Abstract: Kubeflow uses Kubernetes strengths to build a toolkit for data scientists where they can create, train and publish the models in a hassle-free and scalable way. The goal is to run machine learning workflow without a need to think about the infrastructure. In this talk, Barbara will discuss the capabilities of Kubeflow from the data scientist perspective. The presentation will introduce how you can use the platform to build the models and deploy it adjusting the computation environment.
Bio: Barbara is a Machine Learning Strategic Cloud Engineer at Google with strong software development background. While working with a variety of different companies, she gained experience in building diverse software systems. This experience brought her focus to the Data Science and Big Data field. She believes in the importance of the data and metrics when growing a successful business. Alongside collaborating around data architectures, Barbara still enjoys programming activities. Currently speaking at conferences in-between working in London. She tweets at @BasiaFusinska and you can follow her blog.
Thanks to all TensorFlow London meetup organisers and supporters:
Seldon.io
Altoros
Rewired
Google Developers
Rise London
Building Language Agnostic APIs with gRPC - JavaDay Istanbul 2017Mustafa AKIN
This document discusses gRPC, an open-source RPC framework created by Google. It provides high performance for communication between microservices, supporting millions of calls per second. gRPC uses Protocol Buffers to define service interfaces, generates code for client and server implementations, and communicates over HTTP/2. It allows defining services independently of implementations and supports features like bi-directional streaming. The document outlines how gRPC works, language support, advantages over other solutions, example usage, and companies that use it in production.
Nowadays the Kappa Architecture is surely one of the best architectural pattern to implement a streaming system. While the choice for the log / journal side is usually straightforward thanks to engines like Apache Kafka, DistributedLog and Pravega, perfectly fitting the write side of this architecture, we didn’t find an open source counterpart able to fully satisfy all the requirements we believe are essential for a time series database such as: high availability, partition tolerance, optimized time series management, security, out of the box Apache Flink integration, ad-hoc front-end streaming features based on WebSocket protocol and natural real-time Analytics readiness. For this reason we took the decision to start the development of NSDB (Natural Series DB). During this talk we will introduce the main concepts behind the ideation of NSDB focusing on our starting goals and its architecture giving an overview of its first draft implementation. We will eventually provide an explanation on how it leverages Akka cluster and how it partitions data on a time basis.
This document discusses implementing and testing a self-managed logging and visualization solution for a Kubernetes cluster. It considers tools like FluentD, Elasticsearch, Kibana, Helm, and Kops for collecting, processing, and visualizing logs. A turn-key deployment approach using Helm is recommended to install all stack components from a single chart and leverage dependencies. Concerns about authentication, capacity planning, and security hardening are noted for future improvement.
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...HostedbyConfluent
One of the great things about running applications in the cloud is that you only pay for the resources that you use. But that also makes it more important than ever for our applications to be resource-efficient. This becomes even more critical when we use serverless functions.
Micronaut is an application framework that provides dependency injection, developer productivity features, and excellent support for Apache Kafka. By performing dependency injection, AOP, and other productivity-enhancing magic at compile time, Micronaut allows us to build smaller, more efficient microservices and serverless functions.
In this session, we'll explore the ways that Apache Kafka and Micronaut work together to enable us to build fast, efficient, event-driven applications. Then we'll see it in action, using the AWS Lambda Sink Connector for Confluent Cloud.
Serverless Big Data Architecture on Google Cloud Platform at Credit OKKriangkrai Chaonithi
Serverless Big Data Architecture on Google Cloud Platform was presented by Kriangkrai Chaonithi. The presentation covered Credit OK's use of serverless architecture on GCP for their big data analytics platform. Credit OK processes large amounts of customer data from over 400 sites to perform credit scoring. They use Google Cloud Functions to ingest data from sites, as well as Compute Engine and Google Cloud Storage. This serverless architecture allows them to automatically scale infrastructure as needed, reducing costs since they only pay for resources used. While serverless architectures don't require managing servers, there are still resource limits that must be considered to avoid issues like exhausted worker pools during peak loads.
This is the presentation deck I wrote for the LA TrueCar meetup. In it we discuss three use cases for Lambda@Edge, which I call "the Swiss Army Knife of CDNs".
1. The document discusses using a serverless architecture to build a reservation itinerary application for a hospitality group managing 7500 properties worldwide.
2. Key parts of the serverless solution include using AWS Lambda, Kinesis, DynamoDB, API Gateway and other services to process reservation data from multiple sources and expose APIs for mobile and web clients.
3. Challenges in the serverless implementation included unpredictable logging in CloudWatch, performance issues with Java SDK and DOM parsers, and ensuring data consistency when storing logs in DynamoDB. These were addressed through alternative approaches.
How to collect and utilize logs at Kubernetes with Elastic StackRakuten Group, Inc.
- Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that share resources into pods and allows containers in a pod to find each other and communicate using localhost. Pods run on nodes which are physical or virtual machines.
- There are different approaches to logging in Kubernetes including sending logs from pods to a log backend directly or indirectly via nodes. Common backends include Elasticsearch, Splunk, and Kibana. Logs can be searched and alerts generated based on their contents.
- Application performance monitoring (APM) tools integrate with applications like Rails to capture metrics on CPU, memory, transactions and send structured log data to backends for creating graphs and dashboards without
Meteor is the next take on agile development on the full JavaScript stack. Based on established JavaScript tools like Node, JQuery and Underscore, it still brings a fresh and integrated approach. And MongoDB is very much its heart: Minimongo implements a client side MongoDB API for manipulating your data model; Transparent replication of data between client and server; Using WebSockets, MongoDB oplog events replicate immediately to all clients, making it simple to do distributed applications "Google Docs style."
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database Redis Labs
Redfin uses Redis as an in-memory database to power caching, rate limiting, real-time analytics and more. Redis provides fast performance and useful data structures and features like Lua scripting, pub/sub, hyperloglog and hashes. Some current uses include caching API responses, database queries and map data. Redis is also used for rate limiting external APIs. Future potential uses discussed include session storage, text search, chat, distributed locks, modules for new data types and behaviors.
Achieving end-to-end visibility into complex event-sourcing transactions usin...HostedbyConfluent
Event-sourcing systems usage like Kafka is growing rapidly among Node.js applications. Building systems around an event-driven architecture simplifies horizontal scalability in distributed computing models and makes them more resilient to failure. With these advantages, we face new challenges - how to get visibility into these complex processes.
Event-driven architecture is async by nature. Tracking the communication between different components is both extremely difficult and important when debugging or figuring out bottlenecks in the system.
In this talk, I will present ways to achieve end-to-end and granular visibility into complex event-sourcing transactions using distributed tracing. I will use open-source tools like OpenTelemetry, Jaeger, and Zipkin to showcase a complex Node.js system using Kafka.
The automation challenge Kubernetes operators vs Helm chartsAna-Maria Mihalceanu
Helm charts and Kubernetes operators both provide tools for automating application deployments to Kubernetes clusters. Helm charts package Kubernetes configurations and allow deploying multiple configurations as a single application, while operators package human operational knowledge to manage applications over their lifetime. Some benefits of operators include maintaining resources securely with HTTPS, creating backups, and configuring clusters, while Helm charts are better for stateless applications where settings don't need ongoing maintenance. The document discusses converting an existing Helm chart to a Kubernetes operator to deploy and automatically manage an application.
Serverless computing allows developers to build and run applications and services without having to manage infrastructure. It uses third party services to handle servers and allows developers to focus only on their application code. Serverless applications are built using event-driven compute services like AWS Lambda, Azure Functions, and Google Cloud Functions. These services allow code to be triggered by events and auto-scale as needed, without the need to provision or manage servers.
This document discusses moving MongoDB to the cloud. It provides an overview of MongoDB hosting options including on-premises data centers, cloud providers, and hosted databases. It outlines some key reasons to move to the cloud, such as cost-effectiveness, reduced need for staffing, and improved availability. It also covers important considerations for strategy planning including instance types, high availability strategy, security, and migration/rollback strategies. Finally, it discusses two common strategies for migrating - adding a cloud server to an existing replica set with no downtime, or taking backups and restoring to the cloud which requires downtime.
MongoDB World 2018: Building Serverless Apps with MongoDB Atlas on Google Clo...MongoDB
This document discusses building serverless apps with MongoDB Atlas on Google Cloud Platform (GCP). It describes using MongoDB Atlas as the database for a global web app with users in the US, UK, and Australia to gain native scaling capabilities and address latency concerns. It demonstrates creating a "Hello World" Node.js app on GCP App Engine connected to a MongoDB Atlas cluster on GCP for proof of concept.
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...Luke Tillman
Transitioning a legacy monolithic application to microservices is a daunting task by itself and it only gets more complicated as you start to dig through all the libraries and frameworks out there meant to help. In this talk, we'll cover the transition of a real Cassandra-based application to a microservices architecture using Grpc from Google and Falcor from Netflix. (Yes, Falcor is more than just a magical luck dragon from an awesome 80's movie.) We'll talk about why these technologies were a good fit for the project as well as why Cassandra is often a great choice once you go down the path of microservices. And since all the code for the project is open source, you'll have plenty to dig into afterwards.
This document summarizes a data engineering project for analyzing trending topics by geo-location in 3 sentences or less:
The project involves building a pipeline to ingest real-time social media data from Kafka into HDFS for batch processing with Spark and storing results in Cassandra, with the goal of exposing trending hashtag data via a web API. Some initial components including a simple Flask API are complete, while work remains on real-time streaming, a NoSQL database interface, and fully configuring the cluster. The presenter has a computer science degree and experience as a software engineer at Citrix and a university research center.
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...confluent
Running a multi-tenant Kafka platform designed for the enterprise can be challenging. You need to manage and plan for data growth, support an ever-increasing number of use cases, and ensure your developers can be productive with the latest tools in the Apache Kafka ecosystem — all while maintaining the stability and performance of Kafka itself.
At Bloomberg, we run a fully-managed, multi-tenant Kafka platform that is used by developers across the enterprise. The variety of use cases for Kafka leads to bursty workloads, latency-sensitive workloads, and topologies where partitions are fanned out across hundreds or thousands of consumer groups running side-by-side in the same cluster.
In this talk, we will give a brief overview of our platform and share some of our experiences and tools for running multi-tenant stretched clusters, managing data growth with compression, and mitigating the impact of various application patterns on shared clusters.
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...HostedbyConfluent
Studying the ""how"" of Kafka makes you better at using Kafka, but studying its ""whys"" makes you better at so much more. In looking at the tradeoffs behind a system like Kafka, we learn to reason more clearly about distributed systems and to make high-stakes technology adoption decisions more effectively. These are skills we all want to improve!
In this talk, we'll examine trade-offs on which our favorite distributed messaging system takes opinionated positions:
- Whether to store data contiguously or using an index
- How many storage tiers are best?
- Where should metadata live?
- And more.
It's always useful to dissect a modern distributed system with the goal of understanding it better, and it's even better to learn to deeper architectural principles in the process. Come to this talk for a generous helping of both.
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...Seldon
Speaker: Barbara Fusinska, Machine Learning Strategic Cloud Engineer at Google
Title: Hassle Free, Scalable, Machine Learning with Kubeflow
Abstract: Kubeflow uses Kubernetes strengths to build a toolkit for data scientists where they can create, train and publish the models in a hassle-free and scalable way. The goal is to run machine learning workflow without a need to think about the infrastructure. In this talk, Barbara will discuss the capabilities of Kubeflow from the data scientist perspective. The presentation will introduce how you can use the platform to build the models and deploy it adjusting the computation environment.
Bio: Barbara is a Machine Learning Strategic Cloud Engineer at Google with strong software development background. While working with a variety of different companies, she gained experience in building diverse software systems. This experience brought her focus to the Data Science and Big Data field. She believes in the importance of the data and metrics when growing a successful business. Alongside collaborating around data architectures, Barbara still enjoys programming activities. Currently speaking at conferences in-between working in London. She tweets at @BasiaFusinska and you can follow her blog.
Thanks to all TensorFlow London meetup organisers and supporters:
Seldon.io
Altoros
Rewired
Google Developers
Rise London
Building Language Agnostic APIs with gRPC - JavaDay Istanbul 2017Mustafa AKIN
This document discusses gRPC, an open-source RPC framework created by Google. It provides high performance for communication between microservices, supporting millions of calls per second. gRPC uses Protocol Buffers to define service interfaces, generates code for client and server implementations, and communicates over HTTP/2. It allows defining services independently of implementations and supports features like bi-directional streaming. The document outlines how gRPC works, language support, advantages over other solutions, example usage, and companies that use it in production.
Nowadays the Kappa Architecture is surely one of the best architectural pattern to implement a streaming system. While the choice for the log / journal side is usually straightforward thanks to engines like Apache Kafka, DistributedLog and Pravega, perfectly fitting the write side of this architecture, we didn’t find an open source counterpart able to fully satisfy all the requirements we believe are essential for a time series database such as: high availability, partition tolerance, optimized time series management, security, out of the box Apache Flink integration, ad-hoc front-end streaming features based on WebSocket protocol and natural real-time Analytics readiness. For this reason we took the decision to start the development of NSDB (Natural Series DB). During this talk we will introduce the main concepts behind the ideation of NSDB focusing on our starting goals and its architecture giving an overview of its first draft implementation. We will eventually provide an explanation on how it leverages Akka cluster and how it partitions data on a time basis.
This document discusses implementing and testing a self-managed logging and visualization solution for a Kubernetes cluster. It considers tools like FluentD, Elasticsearch, Kibana, Helm, and Kops for collecting, processing, and visualizing logs. A turn-key deployment approach using Helm is recommended to install all stack components from a single chart and leverage dependencies. Concerns about authentication, capacity planning, and security hardening are noted for future improvement.
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...HostedbyConfluent
One of the great things about running applications in the cloud is that you only pay for the resources that you use. But that also makes it more important than ever for our applications to be resource-efficient. This becomes even more critical when we use serverless functions.
Micronaut is an application framework that provides dependency injection, developer productivity features, and excellent support for Apache Kafka. By performing dependency injection, AOP, and other productivity-enhancing magic at compile time, Micronaut allows us to build smaller, more efficient microservices and serverless functions.
In this session, we'll explore the ways that Apache Kafka and Micronaut work together to enable us to build fast, efficient, event-driven applications. Then we'll see it in action, using the AWS Lambda Sink Connector for Confluent Cloud.
Serverless Big Data Architecture on Google Cloud Platform at Credit OKKriangkrai Chaonithi
Serverless Big Data Architecture on Google Cloud Platform was presented by Kriangkrai Chaonithi. The presentation covered Credit OK's use of serverless architecture on GCP for their big data analytics platform. Credit OK processes large amounts of customer data from over 400 sites to perform credit scoring. They use Google Cloud Functions to ingest data from sites, as well as Compute Engine and Google Cloud Storage. This serverless architecture allows them to automatically scale infrastructure as needed, reducing costs since they only pay for resources used. While serverless architectures don't require managing servers, there are still resource limits that must be considered to avoid issues like exhausted worker pools during peak loads.
This is the presentation deck I wrote for the LA TrueCar meetup. In it we discuss three use cases for Lambda@Edge, which I call "the Swiss Army Knife of CDNs".
1. The document discusses using a serverless architecture to build a reservation itinerary application for a hospitality group managing 7500 properties worldwide.
2. Key parts of the serverless solution include using AWS Lambda, Kinesis, DynamoDB, API Gateway and other services to process reservation data from multiple sources and expose APIs for mobile and web clients.
3. Challenges in the serverless implementation included unpredictable logging in CloudWatch, performance issues with Java SDK and DOM parsers, and ensuring data consistency when storing logs in DynamoDB. These were addressed through alternative approaches.
Serverless computing provides several key benefits including no need to provision or manage servers, automatic scaling with usage, only paying for resources used, and built-in high availability and fault tolerance. Some common use cases for serverless include backend services and apps, data processing, voice and chat bots, and web applications. When building serverless applications, developers should consider factors like repository structure, cold starts, package size and dependencies, timeouts and errors, performance optimization, and security practices.
This document discusses running R code on Amazon Lambda. It presents a solution using custom R runtimes provided by Bakdata to deploy R packages and functions to Lambda. Shell scripts automate the deployment process, creating the necessary AWS infrastructure including VPC, S3, API Gateway and Lambda function. A simple example R function is shown running on Lambda. While Lambda has limitations like memory and timeout restrictions, it is suitable for running modularized R code on an as-needed basis without managing servers.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar.
In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR.
Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios.
Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects.
Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.
Speaker: Cory Mintz, Lead Engineer, MongoDB
It has been one year since MongoDB Atlas was revealed at MongoDB World 2016. In this session lead Atlas engineer Cory Mintz will walk you through new Atlas features that have been built over the last year, how they work under the hood, and some of the implementation challenges that the team faced. Cory will discuss the building blocks of IaaS providers and how we leverage them in Atlas. VPC peering and Live Import are on the agenda, as well as features so new that you will hear them announced at MongoDB World.
What You Will Learn:
- How to use the latest features in MongoDB Atlas
Build DynamoDB-Compatible Apps with PythonScyllaDB
Join us for a developer workshop where we’ll go hands-on to build DynamoDB-compatible applications that can be deployed wherever you want: on-premises or on any public cloud. You will also learn how to migrate existing DynamoDB workloads. We’ll use Python as well as the ScyllaDB Alternator interface, which is an open source DynamoDB-compatible API.
In the process you’ll discover the features and best practices that enable your applications to squeeze maximum performance out of your current database infrastructure – and avoid cloud vendor lock-in.
If you’re an application developer who wants more flexible and faster DynamoDB, this workshop is for you!
KubeCon London 2016 Ronana Cloud Native SDNRomana Project
This document summarizes a presentation about Romana, an open source cloud native SDN for Kubernetes. It discusses how traditional enterprise networking is mismatched for cloud native applications and proposes using only IP routing without overlays. Romana encodes tenant and network segment information directly in IP addresses to provide traffic segmentation without the complexity of overlays or MAC learning. It integrates with Kubernetes through the CNI plugin and network policies to provide a simple yet powerful cloud native networking solution.
GDG London Workshop: Build GCP infrastructure with Terraform Pradeep Bhadani
This document provides an overview of using Terraform to build infrastructure on Google Cloud Platform (GCP). It introduces Terraform and its key concepts like infrastructure as code, state management, and the Terraform lifecycle of init, plan, apply, and destroy. It also covers setting up Terraform and GCP command line tools and includes an invitation to a hands-on workshop to build sample infrastructure on GCP using Terraform.
1. TCS provides consulting services for AWS including solutions for eCommerce, social analytics, mobility, digital marketing, and applications. It has over 800 consultants trained on AWS.
2. TCS is a Premier AWS Partner with dedicated AWS training and certifications for consultants. It has experience with large enterprise customers on over 70 engagements across industries.
3. TCS offers services for migrating applications to AWS using its D2D methodology, and for building new applications on AWS using services like EC2, RDS, and EMR. It provides tools and frameworks to enable cloud adoption.
GDG DevFest Romania - Architecting for the Google Cloud PlatformMárton Kodok
Learn about FaaS, PaaS architectural patterns that make use of Cloud Functions, Pub/Sub, Dataflow, Kubernetes and platforms that hides the management of servers from the user and have changed how we develop and deploy future software.
We discuss the difference between an event-driven approach - this means that you can trigger a function whenever something interesting happens within the cloud environment - and the simpler HTTP approach. Quota and pricing of per invocation, and the advantages and disadvantages of the serverless systems.
Build and Manage Serverless APIs (APIDays Nordic, May 19th 2016)3scale
Presentation gave by Nicolas Grenié (@picsoung) at APIdays Nordic in Tampere, Finland in 2016
He covered the principles of serverless infrastructure, explaining the pros and cons about it and the different platforms.
He also gave an overview of the Serverless (serverless.com) framework.
DCEU 18: Developing with Docker ContainersDocker, Inc.
Laura Frank Tacho - Director of Engineering, CloudBees
Wouldn't it be great for a new developer on your team to have their dev environment totally set up on their first day? What about having the confidence that your dev environment mirrors testing and prod? Containers enable this to become reality, along with other great benefits like keeping dependencies nice and tidy and making packaged code easier to share. Come learn about the ways containers can help you build and ship software easily, and walk away with two actionable steps you can take to start using Docker containers for development.
Running your Spring Apps in the Cloud Javaone 2014cornelia davis
Walk through what it took to bring a Srping App initially built for 2nd platform (infrastructure dependent) deployment, and make it deployable to 3rd platform (Cloud Foundry).
Cloud Native Applications on OpenShiftSerhat Dirik
This document discusses cloud native development and DevOps using OpenShift Container Platform. It begins by defining cloud native as involving both application architecture and the development, deployment and management processes used. It then discusses how containers evolve application delivery and how container platforms are part of the DevOps tool kit. The document outlines the path to DevOps, emphasizing culture, automation and using the right platform. It also notes that DevOps and containers often go hand in hand, with many DevOps adopters using containers. The document then discusses various capabilities of OpenShift and how it supports cloud native development.
In this session, we will look at 10 common use cases for AWS Lambda such as REST APIs, WebSockets, IoT and building event-driven systems. We will also touch on some of the latest platform features such as Provisioned Concurrency, EFS integration and Lambda Destinations and when and where we should use them.
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...Andrejs Prokopjevs
This presentation covers the idea of logical hostname feature and its possible use case with E-Business Suite, why it is a must-have configuration for DR, how it can improve your test/dev instance cloning and lifecycle processes, especially in a cloud deployment, support overview by 11i/R12.0/R12.1, and why it is a very hot topic right now for R12.2. Additionally, we will describe possible advanced configuration scenarios like container based virtualization. The content is based on real client environment implementation experience.
Similar to AWS Meetup Paris - Short URL project by Pernod Ricard (20)
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Digital Marketing Trends in 2024 | Guide for Staying Ahead
AWS Meetup Paris - Short URL project by Pernod Ricard
1. PR HQ – IT SOLUTIONS
SHORT URL
#AWS #PernodRicard #Meetup
Life is too short for long URLs.
13 NOVEMBRE 2018
2. 2
Charles Rapp
Tech Lead @ Pernod Ricard
charlesr.app/twitter
charlesr.app/linkedin
charlesr.app/github
3. 3
Few words about Pernod Ricard
18 500 employees in 85 affiliates
9Mds € Net Sales
Co-leader worldwide in Wine & Spirits industry
Hundred of brands
And a tag line…
Créateur de convivialité
7. 7
What we design in AWS as a first draft
LogsDefault
origin
The picture
can't be
displayed.
Availability zone
AWS Cloud
Public
8. 8
Our architecture in details
LogsDefault
origin
The picture
can't be
displayed.
Availability zone
Route53 is mandatory for managing DNS directly in
AWS in order to get ability to set an Alias on the @
record.
9. 9
Our architecture in details
LogsDefault
origin
The picture
can't be
displayed.
Availability zone
As this project needs to be fast as possible, you should
use a CDN for that, CloudFront is there.
10. 10
Our architecture in details
LogsDefault
origin
The picture
can't be
displayed.
Availability zone
2 S3 buckets are related to CloudFront distribution.
1 for logging all events from CloudFront and 1 as
default endpoint of CloudFront, in case of.
11. 11
Our architecture in details
LogsDefault
origin
The picture
can't be
displayed.
Availability zone
AWS Lambda is finally not really through in that
case. We use AWS Lambda@edge for processing
requests coming from CloudFront. More details in
next slides
12. 12
Our architecture in details
LogsDefault
origin
The picture
can't be
displayed.
Availability zone
DynamoDB stores our mapping between short and
long URLs this NoSQL database. More details in
next slides.
13. 13
Choice of
lambda@edge
Ø It is a feature of CloudFront that is
globally replicated
Ø It lets us run code directly in CloudFront
Ø We can develop in NodeJS
Ø Extremly scalable
14. 14
Lambda vs Lambda@edge
Game of differences !
Lambda Lambda@Edge
Specifications From 128MB, up to 3008 MB memory From 128MB, up to 3008 MB memory *
Supported Languages Java, Node.js, C#, Python Node.js
Pricing
Based on # requests and
volume of used memory per second
Free tier available
Based on # requests and
volume of used memory per second
Particularity Timeout 15 min
Only usable along CloudFront distribution
Environment variables can’t be setup
Timeout depending on the event (between 5 to
30s)
Out-of-the-box parallelization
15. 15
How Lambda@edge works with CloudFront
Lambda@Edge can interact on 4 different moments
of a request through a CloudFront distibution :
Ø Viewer Request
When CloudFront receives a request and before cache checks
Ø Origin Request
When CloudFront fowards a request to the origin
Ø Origin Response
When CloudFront receives a response from the origin and
before it caches the object
Ø Viewer Response
When CloudFront sent request file to end user
16. 16
How we use it
Origin Request event handler
1
2
3
4
Step by step
1. Client requests to CloudFront
2. CloudFront asks the origin for this
request. This action handles an
Lambda@edge function
3. Lambda@edge function requests
item based on Client request
4. CloudFront modifies its response to
fit redirection found in DynamoDB
17. 17
Storing redirections in
database
DynamoDB with Global Tables helps to store globally our redirections.
Example of an item
Items stored in DB should be light as possible for performance
(and cost) purpose and also flexible for the future needs.
18. 18
How we deploy the project
Redirect engine is fully managed by a single git repository and using CI/CD for
deployment.
Developers BitBucket repository
Publishes new version with
AWS SAM
AWS ressources
Manages Terraform
configuration with
Terragrunt
Triggers code review by
SonarQube
Git push
19. 19
How to administrate
Ø Development of a static website as Back-office
Ø Implement REST API for CRUD on DynamoDB using
Lambda
Ø Authentication by Azure AD
Ø RDS Database for all functional informations
è Again full serverless
22. 22
One more thing
As we saw, architecture design is quite complex for only
redirect users from A to B.
As Lambda@edge function receives a request from CloudFront,
we can play with all informations we get :
- User Agent
- Headers from CloudFront (country, device, referrer, …)
Rules are stored in a item in DynamoDB then processed by
Lambda@edge function.
User Agent +
CloudFront headers