This document provides an overview and agenda for a presentation on real-time data processing using AWS Lambda. The presentation covers serverless real-time data processing concepts, processing streaming data with Lambda and Kinesis, a streaming data processing demo, a data processing pipeline with Lambda and MapReduce, and a big data processing solution demo. It also discusses a customer story of Fannie Mae using distributed computing with Lambda for financial modeling. Key topics include serverless processing of real-time streaming data, a map-reduce model for serverless distributed computing, benchmarks of serverless distributed computing, and Fannie Mae's journey migrating their high performance computing workloads to AWS Lambda.
Real-time Data Processing with Amazon DynamoDB Streams and AWS LambdaAmazon Web Services
DynamoDB Streams is a feature of DynamoDB that allows you to access a stream of all changes made to your DynamoDB tables in the last rolling 24 hours. You can use AWS Lambda to process event data generated from a DynamoDB Stream.
In this webinar, we will cover key Amazon DynamoDB Streams and AWS Lambda features, walk through sample use cases for real-time data processing, and discuss best practices on using the services together. We'll then demonstrate setting up Amazon DynamoDB Streams and an associated Lambda function to capture and perform custom computations on database table updates, all without setting up any infrastructure
Learning Objectives:
· Understand key Amazon DynamoDB Streams and AWS Lambda features
· Learn how to set up a real-time data modification framework using Amazon DynamoDB Streams AWS Lambda
· Learn sample use cases, best practices and tips on using AWS Lambda with Amazon DynamoDB Streams
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. Learn the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service, and see the DynamoDB console first-hand. See a walk-through demo of building a serverless web application using this high-performance key-value and JSON document store.
BRIEF HISTORY OF DATA PROCESSING
RELATIONAL (SQL) VS. NONRELATIONAL (NOSQL)
Why noSQL?
ACID VS CAP
DynamoDB- what is it?
DynamoDB ARCHITECTURE
Conditional Writes
Provisioned throughput
QUERY VS SCAN
Operations
Benefits
Limitations
DEMO
A closer look at the MySQL and PostgreSQL compatible relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. We’ll explore how Aurora uses the AWS cloud to provide high reliability, high durability, and high throughput.
Speakers:
Steve Abraham - Principal Database Specialist Solutions Architect, AWS
Peter Dachnowicz - Sr. Technical Account Manager, AWS
Do you want to run your code without the cost and effort of provisioning and managing servers? Find out how in this deep dive session on AWS Lambda, which allows you to run code for virtually any type of application or back end service – all with zero administration. During the session, we’ll look at a number of key AWS Lambda features and benefits, including automated application scaling with high availability; pay-as-you-consume billing; and the ability to automatically trigger your code from other AWS services or from any web or mobile app.
Real-time Data Processing with Amazon DynamoDB Streams and AWS LambdaAmazon Web Services
DynamoDB Streams is a feature of DynamoDB that allows you to access a stream of all changes made to your DynamoDB tables in the last rolling 24 hours. You can use AWS Lambda to process event data generated from a DynamoDB Stream.
In this webinar, we will cover key Amazon DynamoDB Streams and AWS Lambda features, walk through sample use cases for real-time data processing, and discuss best practices on using the services together. We'll then demonstrate setting up Amazon DynamoDB Streams and an associated Lambda function to capture and perform custom computations on database table updates, all without setting up any infrastructure
Learning Objectives:
· Understand key Amazon DynamoDB Streams and AWS Lambda features
· Learn how to set up a real-time data modification framework using Amazon DynamoDB Streams AWS Lambda
· Learn sample use cases, best practices and tips on using AWS Lambda with Amazon DynamoDB Streams
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. Learn the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service, and see the DynamoDB console first-hand. See a walk-through demo of building a serverless web application using this high-performance key-value and JSON document store.
BRIEF HISTORY OF DATA PROCESSING
RELATIONAL (SQL) VS. NONRELATIONAL (NOSQL)
Why noSQL?
ACID VS CAP
DynamoDB- what is it?
DynamoDB ARCHITECTURE
Conditional Writes
Provisioned throughput
QUERY VS SCAN
Operations
Benefits
Limitations
DEMO
A closer look at the MySQL and PostgreSQL compatible relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. We’ll explore how Aurora uses the AWS cloud to provide high reliability, high durability, and high throughput.
Speakers:
Steve Abraham - Principal Database Specialist Solutions Architect, AWS
Peter Dachnowicz - Sr. Technical Account Manager, AWS
Do you want to run your code without the cost and effort of provisioning and managing servers? Find out how in this deep dive session on AWS Lambda, which allows you to run code for virtually any type of application or back end service – all with zero administration. During the session, we’ll look at a number of key AWS Lambda features and benefits, including automated application scaling with high availability; pay-as-you-consume billing; and the ability to automatically trigger your code from other AWS services or from any web or mobile app.
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Amazon Web Services
The Amazon Aurora MySQL-compatible Edition is a fully managed relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. It is purpose-built for the cloud using a new architectural model and distributed systems techniques. It provides far higher performance, availability, and durability than previously possible using conventional monolithic database architectures. Amazon Aurora packs a lot of innovations in the engine and storage layers. In this session, we do a deep-dive into some key innovations behind Amazon Aurora MySQL-compatible edition. We explore new improvements to the service and discuss best practices and optimal configurations.
Microservices, Kubernetes and Istio - A Great Fit!Animesh Singh
Microservices and containers are now influencing application design and deployment patterns. Sixty percent of all new applications will use cloud-enabled continuous delivery microservice architectures and containers. Service discovery, registration, and routing are fundamental tenets of microservices. Kubernetes provides a platform for running microservices. Kubernetes can be used to automate the deployment of Microservices and leverage features such as Kube-DNS, Config Maps, and Ingress service for managing those microservices. This configuration works fine for deployments up to a certain size. However, with complex deployments consisting of a large fleet of microservices, additional features are required to augment Kubernetes.
Kafka streams - From pub/sub to a complete stream processing platformPaolo Castagna
A presentation on Kafka Streams APIs (part of Apache Kafka) and the innovative capabilities which brings in the world of open source stream processing engines. Simplicity (but powerful) and focus on developers being the biggest innovation.
In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We’ll cover how each service might help support your application, how much each service costs, and how to get started.
by Kashif Imran, Sr. Solutions Architect, AWS
Serverless computing allows you to build and run applications without the need for provisioning or managing servers. With serverless computing, you can build web, mobile, and IoT backends; run stream processing or big data workloads; run chatbots, and more. In this session, you’ll learn how to get started with serverless computing with AWS Lambda, which lets you run code without provisioning or managing servers. We’ll introduce you to the basics of building with Lambda and how you can benefit from features such as continuous scaling, built-in high availability, integrations with AWS and third-party apps, and subsecond metering pricing. We’ll also introduce you to the broader portfolio of AWS services that help you build serverless applications with Lambda, including Amazon API Gateway, Amazon DynamoDB, AWS Step Functions, and more.
This slides are a recap from the micro-ROS Humble Haskbill release, including all the new updates included from micro-ROS and Micro XRCE-DDS 2.1.1
Watch the live presentation at: https://www.youtube.com/watch?v=PCZr0umED-0&t=1161s
In this session we will explore the world’s first cloud-scale file system and its targeted use cases. Session attendees will learn about EFS’s benefits, how to identify applications that are appropriate for use with EFS, and details about its performance and security models. The target audience is file system administrators, application developers, and application owners that operate or build file-based applications.
Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...Amazon Web Services
Companies are moving existing on-premises applications to the cloud as fast as possible to become more agile and lower costs. However, certain workloads must remain on-premises due to low latency or local data-processing requirements. AWS Outposts brings fully managed, native AWS services, infrastructure, and operating models to virtually any data center, co-location space, or on-premises facility. In this tech talk, we provide an introduction to AWS Outposts and how it works, as well as present customer use cases. We also explore ways to use AWS-cloud native APIs to support workloads that must remain on-premises for a truly consistent hybrid experience.
Introduction to apache kafka, confluent and why they matterPaolo Castagna
This is a short and introductory presentation on Apache Kafka (including Kafka Connect APIs, Kafka Streams APIs, both part of Apache Kafka) and other open source components part of the Confluent platform (such as KSQL).
This was the first Kafka Meetup in South Africa.
AWS Webcast - Cost and Performance Optimization in Amazon RDSAmazon Web Services
Amazon RDS makes it easy to set up, operate, and scale relational databases in the cloud. The service offers a variety of options for optimizing the performance level delivered, as well as optimizing your spending. In this webinar, we will show a variety of techniques for implementing the right performance level for your application.
Learning Objectives:
• Understand the Amazon RDS options that change database performance and cost
• Select the appropriate performance and cost level for your specific application Who Should Attend:
• Technical Amazon RDS customers and prospective customers
Vamos ver o passo a passo de como configurar um ambiente de integração continua e deploy continuo usando ferramentas gerenciadas no Google Cloud, sem se queimar tendo que gerenciar servidores.
Demos
Demo em Flask
https://github.com/alvarowolfx/flask-demo
Demo de Múltiplos Ambientes
https://github.com/alvarowolfx/gcloud-ci-cd-demo
Use Case em IoT
https://medium.com/google-cloud/serverless-continuous-integration-and-ota-update-flow-using-google-cloud-build-and-arduino-d5e1cda504bf
https://github.com/alvarowolfx/gcloud-ota-arduino-update
Referencias
Deploy de aplicativos móveis
Android APK
https://cloud.google.com/community/tutorials/building-android-apk-with-cloud-build-gradle-docker-image
Flutter e Cloud Build
https://medium.com/@lidemin/flutter-ci-cd-with-cloud-build-android-9cd12ade8306
Outros ambientes de execução
Google App Engine
https://medium.com/google-cloud/continuous-delivery-in-google-cloud-platform-cloud-build-with-app-engine-8355d3a11ff5
Cloud Functions
https://cloud.google.com/cloud-build/docs/deploying-builds/deploy-functions
https://medium.com/swlh/how-to-ci-cd-on-google-cloud-platform-1e631cded335
https://cloud.google.com/devops
https://github.com/GoogleCloudPlatform/github-actions/blob/master/get-secretmanager-secrets/README.md
https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#github-context
https://cloud.google.com/cloud-build/docs/configuring-builds/substitute-variable-values#yaml_2
https://cloud.google.com/cloud-build/docs/building/build-go#building_using_go_modules
Cloud Run Quickstart - https://www.youtube.com/watch?v=3OP-q55hOUI
https://fireship.io/lessons/ci-cd-with-google-cloud-build/
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Amazon Web Services
The Amazon Aurora MySQL-compatible Edition is a fully managed relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. It is purpose-built for the cloud using a new architectural model and distributed systems techniques. It provides far higher performance, availability, and durability than previously possible using conventional monolithic database architectures. Amazon Aurora packs a lot of innovations in the engine and storage layers. In this session, we do a deep-dive into some key innovations behind Amazon Aurora MySQL-compatible edition. We explore new improvements to the service and discuss best practices and optimal configurations.
Microservices, Kubernetes and Istio - A Great Fit!Animesh Singh
Microservices and containers are now influencing application design and deployment patterns. Sixty percent of all new applications will use cloud-enabled continuous delivery microservice architectures and containers. Service discovery, registration, and routing are fundamental tenets of microservices. Kubernetes provides a platform for running microservices. Kubernetes can be used to automate the deployment of Microservices and leverage features such as Kube-DNS, Config Maps, and Ingress service for managing those microservices. This configuration works fine for deployments up to a certain size. However, with complex deployments consisting of a large fleet of microservices, additional features are required to augment Kubernetes.
Kafka streams - From pub/sub to a complete stream processing platformPaolo Castagna
A presentation on Kafka Streams APIs (part of Apache Kafka) and the innovative capabilities which brings in the world of open source stream processing engines. Simplicity (but powerful) and focus on developers being the biggest innovation.
In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We’ll cover how each service might help support your application, how much each service costs, and how to get started.
by Kashif Imran, Sr. Solutions Architect, AWS
Serverless computing allows you to build and run applications without the need for provisioning or managing servers. With serverless computing, you can build web, mobile, and IoT backends; run stream processing or big data workloads; run chatbots, and more. In this session, you’ll learn how to get started with serverless computing with AWS Lambda, which lets you run code without provisioning or managing servers. We’ll introduce you to the basics of building with Lambda and how you can benefit from features such as continuous scaling, built-in high availability, integrations with AWS and third-party apps, and subsecond metering pricing. We’ll also introduce you to the broader portfolio of AWS services that help you build serverless applications with Lambda, including Amazon API Gateway, Amazon DynamoDB, AWS Step Functions, and more.
This slides are a recap from the micro-ROS Humble Haskbill release, including all the new updates included from micro-ROS and Micro XRCE-DDS 2.1.1
Watch the live presentation at: https://www.youtube.com/watch?v=PCZr0umED-0&t=1161s
In this session we will explore the world’s first cloud-scale file system and its targeted use cases. Session attendees will learn about EFS’s benefits, how to identify applications that are appropriate for use with EFS, and details about its performance and security models. The target audience is file system administrators, application developers, and application owners that operate or build file-based applications.
Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...Amazon Web Services
Companies are moving existing on-premises applications to the cloud as fast as possible to become more agile and lower costs. However, certain workloads must remain on-premises due to low latency or local data-processing requirements. AWS Outposts brings fully managed, native AWS services, infrastructure, and operating models to virtually any data center, co-location space, or on-premises facility. In this tech talk, we provide an introduction to AWS Outposts and how it works, as well as present customer use cases. We also explore ways to use AWS-cloud native APIs to support workloads that must remain on-premises for a truly consistent hybrid experience.
Introduction to apache kafka, confluent and why they matterPaolo Castagna
This is a short and introductory presentation on Apache Kafka (including Kafka Connect APIs, Kafka Streams APIs, both part of Apache Kafka) and other open source components part of the Confluent platform (such as KSQL).
This was the first Kafka Meetup in South Africa.
AWS Webcast - Cost and Performance Optimization in Amazon RDSAmazon Web Services
Amazon RDS makes it easy to set up, operate, and scale relational databases in the cloud. The service offers a variety of options for optimizing the performance level delivered, as well as optimizing your spending. In this webinar, we will show a variety of techniques for implementing the right performance level for your application.
Learning Objectives:
• Understand the Amazon RDS options that change database performance and cost
• Select the appropriate performance and cost level for your specific application Who Should Attend:
• Technical Amazon RDS customers and prospective customers
Vamos ver o passo a passo de como configurar um ambiente de integração continua e deploy continuo usando ferramentas gerenciadas no Google Cloud, sem se queimar tendo que gerenciar servidores.
Demos
Demo em Flask
https://github.com/alvarowolfx/flask-demo
Demo de Múltiplos Ambientes
https://github.com/alvarowolfx/gcloud-ci-cd-demo
Use Case em IoT
https://medium.com/google-cloud/serverless-continuous-integration-and-ota-update-flow-using-google-cloud-build-and-arduino-d5e1cda504bf
https://github.com/alvarowolfx/gcloud-ota-arduino-update
Referencias
Deploy de aplicativos móveis
Android APK
https://cloud.google.com/community/tutorials/building-android-apk-with-cloud-build-gradle-docker-image
Flutter e Cloud Build
https://medium.com/@lidemin/flutter-ci-cd-with-cloud-build-android-9cd12ade8306
Outros ambientes de execução
Google App Engine
https://medium.com/google-cloud/continuous-delivery-in-google-cloud-platform-cloud-build-with-app-engine-8355d3a11ff5
Cloud Functions
https://cloud.google.com/cloud-build/docs/deploying-builds/deploy-functions
https://medium.com/swlh/how-to-ci-cd-on-google-cloud-platform-1e631cded335
https://cloud.google.com/devops
https://github.com/GoogleCloudPlatform/github-actions/blob/master/get-secretmanager-secrets/README.md
https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#github-context
https://cloud.google.com/cloud-build/docs/configuring-builds/substitute-variable-values#yaml_2
https://cloud.google.com/cloud-build/docs/building/build-go#building_using_go_modules
Cloud Run Quickstart - https://www.youtube.com/watch?v=3OP-q55hOUI
https://fireship.io/lessons/ci-cd-with-google-cloud-build/
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
Learning Objectives:
- Use cases and best practices for serverless big data applications
- Leverage AWS technologies such as AWS Lambda and Amazon Kinesis
- Learn to perform ETL, event processing, ad-hoc analysis, real-time processing, and MapReduce with serverless
Building data processing applications is challenging and time-consuming, and often requires specialized expertise to deploy and operate. With serverless computing, you can perform real-time stream processing of multiple data types without needing to spin up servers or install software, allowing you to deploy big data applications quickly and more easily. Come learn how you can use AWS Lambda with Amazon Kinesis to analyze streaming data in real-time and then store the results in a managed NoSQL database such as Amazon DynamoDB. You’ll learn tips and tricks for doing in-line processing, data manipulation, and even distributed MapReduce on large data sets.
Serverless architecture can eliminate the need to provision and manage servers required to process files or streaming data in real time.
In this session, we will cover the fundamentals of using AWS Lambda to process data from sources such as Amazon DynamoDB Streams, Amazon Kinesis, and Amazon S3. We will walk through sample use cases for real-time data processing and discuss best practices on using these services together. We will then demonstrate how to set up a real-time stream processing solution using just Amazon Kinesis and AWS Lambda, all without the need to run or manage servers.
Serverless architectures can eliminate the need to provision and manage servers required to process files or streaming data in real time. In this session, we will cover the fundamentals of using AWS Lambda to process data from sources such as Amazon DynamoDB Streams, Amazon Kinesis, and Amazon S3. We will walk through sample use cases for real-time data processing and discuss best practices on using these services together. We will then demonstrate run a live demonstration on how to set up a real-time stream processing solution using just Amazon Kinesis and AWS Lambda, all without the need to run or manage servers.
Learning Objectives:
• Learn the fundamentals of using AWS Lambda with various AWS data sources
• Understand best practices of using AWS Lambda with Amazon Kinesis
Who Should Attend:
• Developers
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...Amazon Web Services
If you are interested to know more about AWS Chicago Summit, please use the following to register: http://amzn.to/1RooPPL
Amazon Kinesis is a fully managed, cloud-based service for real-time data processing over large, distributed data streams. AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you. AWS Lambda can run code in response to data in Amazon Kinesis streams, making it easy to build big data applications that respond quickly to new information. In this webinar, we will cover key Kinesis and Lambda features, walk through sample use cases for stream processing, and discuss best practices on using the services together. We'll then demonstrate setting up an Amazon Kinesis stream and an associated Lambda function to capture and perform custom computations on click-stream data, all without setting up any infrastructure.
Learning Objectives: • Understand key Amazon Kinesis and AWS Lambda features • Learn how to setup streaming data capture and processing framework using AWS Lambda • Learn sample use cases, best practices and tips on using AWS Lambda with Amazon Kinesis
Who Should Attend: • Developers, Devops Engineers, IT Operations Professionals
Nesta sessão faremos uma demonstração de controle e defesa de tráfego aéreo utilizando processamento em tempo real. Trataremos das boas práticas para ingestão, armazenamento, processamento e visualização de dados através de serviços da AWS como Kinesis, DynamoDB, Lambda, Redshift, Quicksight e Amazon Machine Learning.
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
Get answers to technical questions, frequently asked by those starting to work with streaming data. Learn best practices for building a real-time streaming data architecture on AWS with Amazon Kinesis, Spark Streaming, AWS Lambda, and Amazon EMR. First, we will focus on building a scalable, durable streaming data ingestion workflow from data producers like mobile devices, servers, or even web browsers. We will provide guidelines to minimize duplicates and achieve exactly-once processing semantics in your stream-processing applications. Then, we will show some of the proven architectures for processing streaming data using a combination of tools including Amazon Kinesis Stream, AWS Lambda, and Spark Streaming running on Amazon EMR.
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
It is becoming increasingly important to analyze real time streaming data. It allows organizations to remain competitive by uncovering relevant, actionable insights. AWS makes it easy to capture, store, and analyze real-time streaming data.
In this webinar, we will guide you through some of the proven architectures for processing streaming data, using a combination of tools including Amazon Kinesis Streams, AWS Lambda, and Spark Streaming on Amazon Elastic MapReduce (EMR). We will then talk about common use cases and best practices for real-time data analysis on AWS.
Learning Objectives:
Understand how you can analyze real-time data streams using Amazon Kinesis, AWS Lambda, and Spark running on Amazon EMR
Learn use cases and best practices for streaming data applications on AWS
NASA LandSat data can be stored, transformed, navigated, and visualized. In this session we will explore how the LandSat dataset is stored in Amazon Simple Storage Service (S3), one of the recommended cloud storage services in AWS for storage of petabytes of data, and how data stored in S3 can be processed on the server with the Lambda service, visualized for users, and made available to search engines.
Create by: Ben Snively, Senior Solutions Architect
AWS Lambda allows any Node.js app to be run at scale in a massively parallel environment with no up-front costs or planning. This session shows how to use Lambda to build dynamic analytic data flows that can be tuned as they execute, based on initial results, to provide real-time output streamed to web clients. This process enables a cost-effective and responsive user experience for ad hoc big data jobs and lets developers focus on how data is consumed and presented, instead of how it is obtained.
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016Amazon Web Services
By building your application with AWS Lambda, Amazon API Gateway, and Amazon DynamoDB, you can free yourself from the burden of managing servers while gaining agility and simple scaling. After introducing the basics of building microservices with AWS Lambda and Amazon API Gateway, the session highlights how the Democratic National Committee (DNC) Technology Team uses AWS Lambda and Amazon DynamoDB microservices to provide campaigns and state parties customized applications on top of a core data platform. This serverless architecture has helped the DNC Technology Team improve their microservice functionality and development process, ensuring their applications are performant through the extremely erratic usage levels of a campaign cycle.
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)Amazon Web Services
Serverless architecture can eliminate the need to provision and manage servers required to process files or streaming data in real time.
In this session, we will cover the fundamentals of using AWS Lambda to process data in real-time from push sources such as AWS Iot and pull sources such as Amazon DynamoDB Streams or Amazon Kinesis. We will walk through sample use cases and demonstrate how to set up some of these real-time data processing solutions. We'll also discuss best practices and do a deep dive into AWS Lambda real-time stream processing.
You also hear from speakers from Thomson Reuters, who discuss how the company leverages AWS for its Product Insight service. The service provides insights to collect usage analytics for Thomson Reuters products. The speakers walk through its architecture and demonstrate how they leverage Amazon Kinesis Streams, Amazon Kinesis Firehose, AWS Lambda, Amazon S3, Amazon Route 53, and AWS KMS for near real-time access to data being collected around the globe. They also outline how applying AWS methodologies benefited its business, such as time-to-market and cross-region ingestion, auto-scaling capabilities, low-latency, security features, and extensibility.
Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
Similar to Real-time Data Processing Using AWS Lambda (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.
Acorn Recovery: Restore IT infra within minutesIP ServerOne
Introducing Acorn Recovery as a Service, a simple, fast, and secure managed disaster recovery (DRaaS) by IP ServerOne. A DR solution that helps restore your IT infra within minutes.
Have you ever wondered how search works while visiting an e-commerce site, internal website, or searching through other types of online resources? Look no further than this informative session on the ways that taxonomies help end-users navigate the internet! Hear from taxonomists and other information professionals who have first-hand experience creating and working with taxonomies that aid in navigation, search, and discovery across a range of disciplines.
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures.
About the Speaker
===============
Diogo Sousa, Engineering Manager @ Canonical
An opinionated individual with an interest in cryptography and its intersection with secure software development.
This presentation by Morris Kleiner (University of Minnesota), was made during the discussion “Competition and Regulation in Professions and Occupations” held at the Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found out at oe.cd/crps.
This presentation was uploaded with the author’s consent.
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Orkestra
UIIN Conference, Madrid, 27-29 May 2024
James Wilson, Orkestra and Deusto Business School
Emily Wise, Lund University
Madeline Smith, The Glasgow School of Art
2. Agenda
What’s Serverless Real-Time Data Processing?
Processing Streaming Data with Lambda and Kinesis
Streaming Data Processing Demo
Data Processing Pipeline with Lambda and MapReduce
Building a Big Data Processing Solution Demo
What’s Serverless Real-Time Data Processing?
Serverless Processing of Real-Time Streaming Data
Serverless Data Processing with Distributed Computing
Customer Story:
Fannie Mae-Distributed Computing with Lambda
4. AWS Lambda
Efficient performance at scale Easy to author, deploy,
maintain, secure & manage. Focus on business logic
to build back-end services that perform at scale.
Bring Your Own Code: Stateless, event-driven code
with native support for Node.js, Java, Python and C#
languages.
No Infrastructure to manage: Compute without
managing infrastructure like Amazon EC2 instances
and Auto Scaling groups.
Cost-effective: Automatically matches capacity to
request rate. Compute cost 100 ms increments.
Triggered by events: Direct Sync & Async API calls,
AWS service integrations, and 3rd party triggers.
6. Serverless Real-Time Data Processing Is..
Capture Data
Streams
IoT Data
Financial
Data
Log Data
No servers to
provision or
manage
EVENT SOURCE
Node.js
Python
Java
C#
Process Data
Streams
FUNCTION
Clickstream
Data
Output
Data
DATABASE
CLOUD
SERVICES
7. Amazon
DynamoDB
Amazon
Kinesis
Amazon
S3
Amazon
SNS
ASYNCHRONOUS PUSH MODEL
STREAM PULL MODEL
Lambda Real-Time Event Sources
Amazon
Alexa
AWS
IoT
SYNCHRONOUS PUSH MODEL
Mapping owned by Event Source
Mapping owned by Lambda
Invokes Lambda via Event Source API
Lambda function invokes when new
records found on stream
Resource-based policy permissions
Lambda Execution role policy permissions
Concurrent executions
Sync invocation
Async Invocation
Sync invocation
Lambda polls the streams
HOW IT WORKS
9. Amazon Kinesis
Real-Time: Collect real-time data streams and
promptly respond to key business events and
operational triggers. Real-time latencies.
Easy to use: Focus on quickly launching data
streaming applications instead of managing
infrastructure.
Amazon Kinesis Offering: Managed services for
streaming data ingestion and processing.
• Amazon Kinesis Streams: Build applications
that process or analyze streaming data.
• Amazon Kinesis Firehose: Load massive
volumes of streaming data into Amazon S3
and Amazon Redshift.
• Amazon Kinesis Analytics: Analyze data
streams using SQL queries.
10. Processing Real-Time Streams: Lambda + Amazon Kinesis
Streaming data sent to Amazon
Kinesis and stored in shards
Multiple Lambda functions can be
triggered to process same Amazon
Kinesis stream for “fan out”
Lambda can process data and store
results ex. to DynamoDB, S3
Lambda can aggregate data to
services like Amazon Elasticsearch
Service for analytics
Lambda sends event data and
function info to Amazon CloudWatch
for capturing metrics and monitoring
Amazon
Kinesis
AWS
Lambda
Amazon
CloudWatch
Amazn
DynamoDB
AWS
Lambda
Amazon
Elasticsearch Service
Amazon
S3
11. Processing Streams: Set Up Amazon Kinesis Stream
Streams
Made up of Shards
Each Shard ingests/reads data up to 1 MB/sec
Each Shard emits/writes data up to 2 MB/sec
Each shard supports 5 reads/sec
Data
All data is stored and is replayable for 24 hours
Make sure partition key distribution is even to optimize parallel throughput
Partition key used to distribute PUTs across shards, choose key with more groups than
shards
Best Practice
Determine an initial size/shards to plan for expected maximum demand
Leverage “Help me decide how many shards I need” option in Console
Use formula for Number Of Shards:
max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
12. Processing Streams: Create Lambda functions
Memory
CPU allocation proportional to the memory configured
Increasing memory makes your code execute faster (if CPU bound)
Increasing memory allows for larger record sizes processed
Timeout
Increasing timeout allows for longer functions, but longer wait in case of errors
Permission model
Execution role defined for Lambda must have permission to access the stream
Retries
With Amazon Kinesis, Lambda retries until the data expires
(24 hours)
Best Practice
Write Lambda function code to be stateless
Instantiate AWS clients & database clients outside the scope of the function handler
13. Processing Streams: Configure Event Source
Amazon Kinesis mapped as event source in Lambda
Batch size
Max number of records that Lambda will send to one invocation
Not equivalent to effective batch size
Effective batch size is every 250 ms – Calculated as:
MIN(records available, batch size, 6MB)
Increasing batch size allows fewer Lambda function invocations with more
data processed per function
Best Practices
Set to “Trim Horizon” for reading from start of
stream (all data)
Set to “Latest” for reading most recent data (LIFO) (latest data)
14. Processing streams: How It Works
Polling
Concurrent polling and processing per shard
Lambda polls every 250 ms if no records found
Will grab as much data as possible in one GetRecords call (Batch)
Batching
Batches are passed for invocation to Lambda through
function parameters
Batch size may impact duration if the Lambda function
takes longer to process more records
Sub batch in memory for invocation payload
Synchronous invocation
Batches invoked as synchronous RequestResponse type
Lambda honors Amazon Kinesis at least once semantics
Each shard blocks in order of synchronous invocation
15. Processing streams: Tuning throughput
If put / ingestion rate is greater than the theoretical throughput, your
processing is at risk of falling behind
Maximum theoretical throughput
# shards * 2MB / Lambda function duration (s)
Effective theoretical throughput
# shards * batch size (MB) / Lambda function duration (s)
… …
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Waits for responsePolls a batch
16. Processing streams: Tuning Throughput w/ Retries
Retries
Will retry on execution failures until the record is expired
Throttles and errors impacts duration and directly impacts throughput
Best Practice
Retry with exponential backoff of up to 60s
Effective theoretical throughput with retries
( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry)
… …
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Receives errorPolls a batch
Receives error
Receives success
17. Processing streams: Common observations
Effective batch size may be less than configured during low throughput
Effective batch size will increase during higher throughput
Increased Lambda duration -> decreased # of invokes and GetRecord calls
Too many consumers of your stream may compete with Amazon Kinesis read
limits and induce ReadProvisionedThroughputExceeded errors and metrics
Amazon
Kinesis
AWS
Lambda
18. Processing streams: Monitoring with Cloudwatch
• GetRecords: (effective throughput)
• PutRecord : bytes, latency, records, etc
• GetRecords.IteratorAgeMilliseconds: how old your
last processed records were
Monitoring Amazon Kinesis Streams
Monitoring Lambda functions
• Invocation count: Time function invoked
• Duration: Execution/processing time
• Error count: Number of Errors
• Throttle count: Number of time function throttled
• Iterator Age: Time elapsed from batch received &
final record written to stream
• Review All Metrics
• Make Custom logs
• View RAM consumed
• Search for log events
Debugging
AWS X-Ray
Coming soon!
20. Serverless Distributed Computing: Map-Reduce Model
Why Serverless Data Processing with Distributed
Computing?
Remove Difficult infrastructure management
Cluster administration
Complex configuration tools
Enable simple, elastic, user-friendly distributed data
processing
Eliminate complexity of state management
Bring Distributed Computing power to the masses
21. Serverless Distributed Computing: Map-Reduce Model
Why Serverless Data Processing with Distributed
Computing?
Eliminate utilization concerns
Makes code simpler by removes complexities of multi-
threading processing to optimize server usage
Cost-effective option to run ad hoc MapReduce jobs
Easier, automatic horizontal scaling
Provide ability to process scientific and analytics
applications
23. Serverless Distributed Computing: PyWren
PyWren Prototype Developed at University of California, Berkeley
Uses Python with AWS Lambda stateless functions for large scale data
analytics
Achieved @ 30-40 MB/s write and read performance per-core to S3
object store
Scaled to 60-80 GB/s across 2800 simultaneous functions
24. Serverless Distributed Computing: Benchmark
Using Amazon MapReduce Reference Architecture Framework
with Lambda
Dataset
Queries:
Scan query (90 M Rows, 6.36 GB of data)
Select query on Page Rankings
Aggregation query on UserVisits ( 775M rows, ~127GB of
data)
Rankings
(rows)
Rankings
(bytes)
UserVisits
(rows)
UserVisits
(bytes)
Documents
(bytes)
90 Million 6.38 GB 775 Million 126.8 GB 136.9 GB
25. Serverless Distributed Computing: Benchmark
Using Amazon MapReduce Reference Architecture Framework
with Lambda
Subset of the Amplab benchmark ran to compare with other data
processing frameworks
Performance Benchmarks: Execution time for each workload in seconds
TECHNOLOGY SCAN 1A SCAN 1B AGGREGATE 2A
Amazon Redshift (HDD) 2.49 2.61 25.46
Serverless MapReduce 39 47 200
Impala - Disk - 1.2.3 12.015 12.015 113.72
Impala - Mem - 1.2.3 2.17 3.01 84.35
Shark - Disk - 0.8.1 6.6 7 151.4
Shark - Mem - 0.8.1 1.7 1.8 83.7
Hive - 0.12 YARN 50.49 59.93 730.62
Tez - 0.2.0 28.22 36.35 377.48
38. Data Processing with AWS: Next steps
Learn more about AWS Serverless at
https://aws.amazon.com/serverless
Explore the AWS Lambda Reference Architecture on GitHub:
Real-Time Streaming:
https://github.com/awslabs/lambda-refarch-
streamprocessing
Distributed Computing Reference Architecture
(serverless MapReduce)
https://github.com/awslabs/lambda-refarch-mapreduce
39. Data Processing with AWS: Next steps
Create an Amazon Kinesis stream. Visit the Amazon Kinesis
Console and configure a stream to receive data Ex. data from
Social media feeds.
Create & test a Lambda function to process streams from Amazon
Kinesis by visiting Lambda console. First 1M requests each month
are on us!
Read the Developer Guide and try the Lambda and Amazon
Kinesis Tutorial:
http://docs.aws.amazon.com/lambda/latest/dg/with-
kinesis.html
Send questions, comments, feedback to the AWS Lambda Forums
TEW Notes: A shard is a uniquely identified group of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity.
Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). The data capacity of your stream is a function of the number of shards that you specify for the stream
You should plan for expected maximum demand when you provision shards. Therefore, Before you create a stream, you need to determine an initial size for the stream. A shard is a unit of throughput capacity. On the Create Stream console page, select the checkbox labelled "Help me decide how many shards I need
To determine the initial size of a stream, you'll need the following input values:
The average size of the data record written to the stream in kilobytes (KB), rounded up to the nearest 1 KB, the data size (average_data_size_in_KB).
The number of data records written to and read from the stream per second (records_per_second).
The number of Amazon Kinesis Streams applications that consume data concurrently and independently from the stream, that is, the consumers (number_of_consumers).
The incoming write bandwidth in KB (incoming_write_bandwidth_in_KB), which is equal to the average_data_size_in_KB multiplied by the records_per_second.
The outgoing read bandwidth in KB (outgoing_read_bandwidth_in_KB), which is equal to the incoming_write_bandwidth_in_KB multiplied by the number_of_consumers.
number_of_shards = max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB/2000)
You can dynamically resize your stream or add and remove shards after you create the stream and while there is an Amazon Kinesis Streams application consuming data from the stream.
You can increase the retention period up to 168 hours (7 days) using the IncreaseStreamRetentionPeriod operation, and decrease the retention period down to a minimum of 24 hours using the DecreaseStreamRetentionPeriod
AT_SEQUENCE_NUMBER - Start reading from the position denoted by a specific sequence number, provided in the value StartingSequenceNumber.
AFTER_SEQUENCE_NUMBER - Start reading right after the position denoted by a specific sequence number, provided in the value StartingSequenceNumber.
AT_TIMESTAMP - Start reading from the position denoted by a specific timestamp, provided in the value Timestamp.
TRIM_HORIZON - Start reading at the last untrimmed record in the shard in the system, which is the oldest data record in the shard.
LATEST - Start reading just after the most recent record in the shard, so that you always read the most recent data in the shard.
TEW
TEW Notes
•Write your Lambda function code in a stateless style, and ensure there is no affinity between your code and the underlying compute infrastructure.
•Instantiate AWS clients and database clients outside the scope of the handler to take advantage of connection re-use.
•Make sure you have set +rx permissions on your files in the uploaded ZIP to ensure Lambda can execute code on your behalf.
•Lower costs and improve performance by minimizing the use of startup code not directly related to processing the current event.
•Use the built-in CloudWatch monitoring of your Lambda functions to view and optimize request latencies.
•Delete old Lambda functions that you are no longer using.
TEW: Always FIFO Trim Horizon (start of stream) latest on reading current data.
TEW: IteratorAge is strictly for stream-based invocations and works by determining the age of the last record for each batch of processed records. Age is quantified by measuring the time elapsed between when Lambda received the batch in question and when the final record within that batch was written to the stream.
TEW : Note this was a pain to reproduce this architecture flow from pic from reference architecture