Here are the key steps to collect data using Kinesis:
1. Create a Kinesis Data Firehose delivery stream to load data from the Kinesis agent into an S3 bucket near real-time.
2. Use a LogGenerator Python script to generate OrderHistory CSV files capturing online order data.
3. Install the Kinesis Agent on an EC2 instance and configure it to read the CSV files, transform to JSON, and deliver to the Kinesis Firehose stream for loading to S3.
4. Create a Kinesis Data Stream to also capture the order data with replay capabilities.
5. Configure the Kinesis Agent to read the CSV files and deliver the JSON records to the K
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
Â
This document provides an overview of streaming data and messaging concepts including batch processing, streaming, streaming vs messaging, challenges with streaming data, and AWS services for streaming and messaging like Kinesis, Kinesis Firehose, SQS, and Kafka. It discusses use cases and comparisons for these different services. For example, Kinesis is suitable for complex analytics on streaming data while SQS focuses on per-event messaging. Firehose automatically loads streaming data into AWS services like S3 and Redshift without custom coding.
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
Â
This document provides an overview of real-time streaming data on AWS and best practices for using Amazon Kinesis, Spark Streaming, AWS Lambda, and Amazon EMR. It discusses ingesting streaming data using Kinesis Streams and Firehose, processing data with Kinesis Client Library, Spark Streaming, and AWS Lambda, and integrating with data stores like S3, Redshift and Elasticsearch. Example use cases are also presented from companies like Sonos, publishers and gaming companies.
Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty
Â
Big data makes you a bit Confused ? messaging? batch processing? data streaming? in flight analytics? Cloud? open source? Flume? kafka? flafka (both)? SQS? kinesis? firehose?
Amazon Kinesis is a managed service for real-time processing of streaming big data at any scale. It allows users to create streams to ingest and process large amounts of data in real-time. Kinesis provides high durability, performance, and elasticity through features like automatic shard management and the ability to seamlessly scale streams. It also offers integration with other AWS services like S3, Redshift, and DynamoDB for storage and analytics. The document discusses various aspects of Kinesis including how to ingest and consume data, best practices, and advantages over self-managed solutions.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. In this webinar, developers will learn how to build and deploy a streaming data processing application with Amazon Kinesis. We will cover the following: - A brief overview of Amazon Kinesis and drill down on key technical concepts. - Amazon Kinesis Client Library capabilities that enable customers to build fault tolerant, continuous processing applications that scale elastically. - The role of the supporting connector library for moving data into stores like S3 and Redshift. - Best practices for streaming data ingestion and processing with Amazon Kinesis.
Amazon Kinesis is the AWS service for real-time streaming big data ingestion and processing. This talk gives a detailed exploration of Kinesis stream processing. We'll discuss in detail techniques for building, and scaling Kinesis processing applications, including data filtration and transformation. Finally we'll address tips and techniques to emitting data into S3, DynamoDB, and Redshift.
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
Â
This document summarizes a presentation on real-time streaming data on AWS. It discusses Amazon Kinesis, Spark Streaming, AWS Lambda, and Amazon EMR. The presentation covers an overview of streaming vs batch processing, common streaming data use cases and design patterns, a deep dive on Amazon Kinesis, examples of ingesting and processing streaming data, and a case study of how Sizmek uses these services for their real-time analytics needs.
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
Â
This document provides an overview of streaming data and messaging concepts including batch processing, streaming, streaming vs messaging, challenges with streaming data, and AWS services for streaming and messaging like Kinesis, Kinesis Firehose, SQS, and Kafka. It discusses use cases and comparisons for these different services. For example, Kinesis is suitable for complex analytics on streaming data while SQS focuses on per-event messaging. Firehose automatically loads streaming data into AWS services like S3 and Redshift without custom coding.
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
Â
This document provides an overview of real-time streaming data on AWS and best practices for using Amazon Kinesis, Spark Streaming, AWS Lambda, and Amazon EMR. It discusses ingesting streaming data using Kinesis Streams and Firehose, processing data with Kinesis Client Library, Spark Streaming, and AWS Lambda, and integrating with data stores like S3, Redshift and Elasticsearch. Example use cases are also presented from companies like Sonos, publishers and gaming companies.
Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty
Â
Big data makes you a bit Confused ? messaging? batch processing? data streaming? in flight analytics? Cloud? open source? Flume? kafka? flafka (both)? SQS? kinesis? firehose?
Amazon Kinesis is a managed service for real-time processing of streaming big data at any scale. It allows users to create streams to ingest and process large amounts of data in real-time. Kinesis provides high durability, performance, and elasticity through features like automatic shard management and the ability to seamlessly scale streams. It also offers integration with other AWS services like S3, Redshift, and DynamoDB for storage and analytics. The document discusses various aspects of Kinesis including how to ingest and consume data, best practices, and advantages over self-managed solutions.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. In this webinar, developers will learn how to build and deploy a streaming data processing application with Amazon Kinesis. We will cover the following: - A brief overview of Amazon Kinesis and drill down on key technical concepts. - Amazon Kinesis Client Library capabilities that enable customers to build fault tolerant, continuous processing applications that scale elastically. - The role of the supporting connector library for moving data into stores like S3 and Redshift. - Best practices for streaming data ingestion and processing with Amazon Kinesis.
Amazon Kinesis is the AWS service for real-time streaming big data ingestion and processing. This talk gives a detailed exploration of Kinesis stream processing. We'll discuss in detail techniques for building, and scaling Kinesis processing applications, including data filtration and transformation. Finally we'll address tips and techniques to emitting data into S3, DynamoDB, and Redshift.
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
Â
This document summarizes a presentation on real-time streaming data on AWS. It discusses Amazon Kinesis, Spark Streaming, AWS Lambda, and Amazon EMR. The presentation covers an overview of streaming vs batch processing, common streaming data use cases and design patterns, a deep dive on Amazon Kinesis, examples of ingesting and processing streaming data, and a case study of how Sizmek uses these services for their real-time analytics needs.
1. The document discusses using AWS Lambda and Amazon Kinesis for real-time data processing in a serverless architecture. It describes how Lambda functions can be triggered by data ingestion in Kinesis streams to process streaming data without needing to manage servers.
2. Key benefits highlighted include automatic scaling of compute capacity, paying only for resources used, and focusing on business logic rather than infrastructure management. Best practices discussed include monitoring for errors/throttling and distributing load evenly across shards.
3. The demo portion shows how to set up a Kinesis stream, Lambda function, and configure the integration between the two for processing streaming data in real-time at scale in a serverless manner.
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Â
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
ě°žěę°ë AWS ě¸ëŻ¸ë(꾏ëĄ,ę°ě°,íęľ) - AWS ę¸°ë° ëš ë°ě´í° íěŠ ë°Šë˛ (ęšěźí¸ ě루ě ěŚ ěí¤í í¸)Amazon Web Services Korea
Â
The document discusses AWS big data services and tools. It provides an overview of AWS building blocks for big data like Amazon S3, Kinesis, DynamoDB, Redshift and EMR. It covers topics like log data collection and storage using Kinesis, data analytics using services like Redshift and EMR, and collaboration and sharing of data. Generation, collection and storage of data, analytics and computation, and collaboration and sharing are highlighted as key aspects of a big data platform on AWS.
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Amazon Web Services
Â
Log analytics is a common big data use case that allows you to analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications including digital marketing, application monitoring, fraud detection, ad tech, gaming, and IoT. In this tech talk, we will walk you step-by-step through the process of building an end-to-end analytics solution that ingests, transforms, and loads streaming data using Amazon Kinesis Firehose, Amazon Kinesis Analytics and AWS Lambda. The processed data will be saved to an Amazon Elasticsearch Service cluster, and we will use Kibana to visualize the data in near real-time.
Learning Objectives:
1. Reference architecture for building a complete log analytics solution
2. Overview of the services used and how they fit together
3. Best practices for log analytics implementation
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
Â
1) The document discusses serverless real-time data processing using AWS Lambda and Amazon Kinesis.
2) It provides an example of how streaming data can be captured in an Amazon Kinesis stream and processed by AWS Lambda functions to output the results to databases or cloud services.
3) The document also discusses how Fannie Mae used a distributed computing approach with AWS Lambda to perform mortgage simulations, achieving a 3x performance increase over their existing process.
NASA LandSat data can be stored, transformed, navigated, and visualized. In this session we will explore how the LandSat dataset is stored in Amazon Simple Storage Service (S3), one of the recommended cloud storage services in AWS for storage of petabytes of data, and how data stored in S3 can be processed on the server with the Lambda service, visualized for users, and made available to search engines.
Create by: Ben Snively, Senior Solutions Architect
Nesta sessão faremos uma demonstração de controle e defesa de tråfego aÊreo utilizando processamento em tempo real. Trataremos das boas pråticas para ingestão, armazenamento, processamento e visualização de dados atravÊs de serviços da AWS como Kinesis, DynamoDB, Lambda, Redshift, Quicksight e Amazon Machine Learning.
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
Â
It is becoming increasingly important to analyze real time streaming data. It allows organizations to remain competitive by uncovering relevant, actionable insights. AWS makes it easy to capture, store, and analyze real-time streaming data.
In this webinar, we will guide you through some of the proven architectures for processing streaming data, using a combination of tools including Amazon Kinesis Streams, AWS Lambda, and Spark Streaming on Amazon Elastic MapReduce (EMR). We will then talk about common use cases and best practices for real-time data analysis on AWS.
Learning Objectives:
Understand how you can analyze real-time data streams using Amazon Kinesis, AWS Lambda, and Spark running on Amazon EMR
Learn use cases and best practices for streaming data applications on AWS
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
Â
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
Â
Highlights of AWS ReInvent 2023 in Las Vegas. Contains new announcements, deep dive into existing services and best practices, recommended design patterns.
Raleigh DevDay 2017: Real time data processing using AWS LambdaAmazon Web Services
Â
This document provides an overview of serverless real-time data processing using AWS Lambda and Amazon Kinesis. It discusses how Lambda functions can be used to process streaming data from Kinesis in real-time. An example pipeline is shown where data is ingested into Kinesis and Lambda functions are triggered to process the data and output results. Distributed computing with Lambda is also briefly discussed.
AWS Summit Seoul 2015 - AWS í´ëźě°ë뼟 íěŠí ëš ë°ě´í° ë° ě¤ěę° ě¤í¸ëŚŹë° ëśěAmazon Web Services Korea
Â
This document discusses real-time data processing using Amazon Web Services. It describes how to use Amazon Kinesis for real-time data ingestion and processing and Amazon Elastic MapReduce (EMR) for batch processing. It provides examples of using EMR for batch processing large amounts of log data and for interactive querying of data stored in Amazon S3. It also discusses using Kinesis as a data broker to distribute streaming data to multiple applications and using Kinesis with EMR, Spark, and Storm for real-time analytics.
This document summarizes a presentation by Randall Hunt on AWS re:Invent re:CAP. It provides information about Randall Hunt and his background. It then outlines the rough agenda for the presentation, which will cover topics including compute instances, containers, serverless, databases, analytics, storage, application integrations, DevOps and security, developer tools, AI/ML, IoT, mobile, media services, AR/VR, and Alexa for Business.
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS Riyadh User Group
Â
This document provides an overview and agenda for an AWS storage, compute, containers, serverless, and management tools presentation. It includes summaries of several upcoming AWS services and features related to EBS, S3, EC2, EKS, Fargate, Lambda, and AWS Cost Optimizer. The speaker is introduced as Paul Maddox, Principal Architect at AWS, with a background in development, SRE, and systems architecture.
- Amazon Kinesis is a real-time data streaming platform that allows for processing of streaming data in the AWS cloud. It includes Kinesis Streams, Kinesis Firehose, and Kinesis Analytics.
- Kinesis Streams allows users to build custom applications to process or analyze streaming data. It is a high-throughput, low-latency service for real-time data processing over large, distributed data streams.
- Key concepts of Kinesis Streams include shards for partitioning streaming data, producers for ingesting data, data records as the unit of stored data, and consumers for reading and processing streaming data.
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
Â
Amazon Redshift is a fast, fully-managed petabyte-scale data warehouse service, for less than $1,000 per TB per year. In this presentation, you'll get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Learn how, with just a few clicks in the AWS Management Console, you can set up with a fully functional data warehouse, ready to accept data without learning any new languages and easily plugging in with the existing business intelligence tools and applications you use today. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort. In this webinar, you will learn: ⢠Understand what Amazon Redshift is and how it works ⢠Create a data warehouse interactively through the AWS Management Console ⢠Load some data into your new Amazon Redshift data warehouse from S3 Who Should Attend ⢠IT professionals, developers, line-of-business managers
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
Â
Originally, Hadoop was used as a batch analytics tool; however, this is rapidly changing, as applications move towards real-time processing and streaming. Amazon Elastic MapReduce has made running Hadoop in the cloud easier and more accessible than ever. Each day, tens of thousands of Hadoop clusters are run on the Amazon Elastic MapReduce infrastructure by users of every size â from university students to Fortune 50 companies. We recently launched Amazon Kinesis â a managed service for real-time processing of high volume, streaming data. Amazon Kinesis enables a new class of big data applications which can continuously analyze data at any volume and throughput, in real-time. Adi will discuss each service, dive into how customers are adopting the services for different use cases, and share emerging best practices. Learn how you can architect Amazon Kinesis and Amazon Elastic MapReduce together to create a highly scalable real-time analytics solution which can ingest and process terabytes of data per hour from hundreds of thousands of different concurrent sources. Forever change how you process web site click-streams, marketing and financial transactions, social media feeds, logs and metering data, and location-tracking events.
Lagom and Panaya presented on accelerating and securing SAP S4/HANA migrations. Panaya's platform reduces migration time by 50% and risk by automating impact analysis, prioritizing testing and fixes, and providing real-time visibility into changes. It helped clients like FGC complete record-time migrations with zero failures. Lagom recommended first assessing impact with Panaya's S/4HANA Impact Assessment Pack before beginning the staged journey to S/4HANA via options like S/4 JumpStart or S/4 Lightning. Key benefits included a scalable, repeatable process and actionable plan that reduces errors.
ASUG83511 - Accelerate Digital Transformation at General Mills.pdfSreeGe1
Â
General Mills and SAP presented on using SAP Data Hub to accelerate General Mills' digital transformation. The presentation provided General Mills' data integration journey and challenges with their existing enterprise data warehouse. It outlined the key capabilities desired in a data integration solution, including real-time replication, federated analytics, data governance, and data science capabilities. Finally, it reviewed SAP Data Hub's capabilities and General Mills' proof of concept, and discussed SAP's roadmap to address additional requirements.
1. The document discusses using AWS Lambda and Amazon Kinesis for real-time data processing in a serverless architecture. It describes how Lambda functions can be triggered by data ingestion in Kinesis streams to process streaming data without needing to manage servers.
2. Key benefits highlighted include automatic scaling of compute capacity, paying only for resources used, and focusing on business logic rather than infrastructure management. Best practices discussed include monitoring for errors/throttling and distributing load evenly across shards.
3. The demo portion shows how to set up a Kinesis stream, Lambda function, and configure the integration between the two for processing streaming data in real-time at scale in a serverless manner.
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Â
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
ě°žěę°ë AWS ě¸ëŻ¸ë(꾏ëĄ,ę°ě°,íęľ) - AWS ę¸°ë° ëš ë°ě´í° íěŠ ë°Šë˛ (ęšěźí¸ ě루ě ěŚ ěí¤í í¸)Amazon Web Services Korea
Â
The document discusses AWS big data services and tools. It provides an overview of AWS building blocks for big data like Amazon S3, Kinesis, DynamoDB, Redshift and EMR. It covers topics like log data collection and storage using Kinesis, data analytics using services like Redshift and EMR, and collaboration and sharing of data. Generation, collection and storage of data, analytics and computation, and collaboration and sharing are highlighted as key aspects of a big data platform on AWS.
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Amazon Web Services
Â
Log analytics is a common big data use case that allows you to analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications including digital marketing, application monitoring, fraud detection, ad tech, gaming, and IoT. In this tech talk, we will walk you step-by-step through the process of building an end-to-end analytics solution that ingests, transforms, and loads streaming data using Amazon Kinesis Firehose, Amazon Kinesis Analytics and AWS Lambda. The processed data will be saved to an Amazon Elasticsearch Service cluster, and we will use Kibana to visualize the data in near real-time.
Learning Objectives:
1. Reference architecture for building a complete log analytics solution
2. Overview of the services used and how they fit together
3. Best practices for log analytics implementation
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
Â
1) The document discusses serverless real-time data processing using AWS Lambda and Amazon Kinesis.
2) It provides an example of how streaming data can be captured in an Amazon Kinesis stream and processed by AWS Lambda functions to output the results to databases or cloud services.
3) The document also discusses how Fannie Mae used a distributed computing approach with AWS Lambda to perform mortgage simulations, achieving a 3x performance increase over their existing process.
NASA LandSat data can be stored, transformed, navigated, and visualized. In this session we will explore how the LandSat dataset is stored in Amazon Simple Storage Service (S3), one of the recommended cloud storage services in AWS for storage of petabytes of data, and how data stored in S3 can be processed on the server with the Lambda service, visualized for users, and made available to search engines.
Create by: Ben Snively, Senior Solutions Architect
Nesta sessão faremos uma demonstração de controle e defesa de tråfego aÊreo utilizando processamento em tempo real. Trataremos das boas pråticas para ingestão, armazenamento, processamento e visualização de dados atravÊs de serviços da AWS como Kinesis, DynamoDB, Lambda, Redshift, Quicksight e Amazon Machine Learning.
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
Â
It is becoming increasingly important to analyze real time streaming data. It allows organizations to remain competitive by uncovering relevant, actionable insights. AWS makes it easy to capture, store, and analyze real-time streaming data.
In this webinar, we will guide you through some of the proven architectures for processing streaming data, using a combination of tools including Amazon Kinesis Streams, AWS Lambda, and Spark Streaming on Amazon Elastic MapReduce (EMR). We will then talk about common use cases and best practices for real-time data analysis on AWS.
Learning Objectives:
Understand how you can analyze real-time data streams using Amazon Kinesis, AWS Lambda, and Spark running on Amazon EMR
Learn use cases and best practices for streaming data applications on AWS
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
Â
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
Â
Highlights of AWS ReInvent 2023 in Las Vegas. Contains new announcements, deep dive into existing services and best practices, recommended design patterns.
Raleigh DevDay 2017: Real time data processing using AWS LambdaAmazon Web Services
Â
This document provides an overview of serverless real-time data processing using AWS Lambda and Amazon Kinesis. It discusses how Lambda functions can be used to process streaming data from Kinesis in real-time. An example pipeline is shown where data is ingested into Kinesis and Lambda functions are triggered to process the data and output results. Distributed computing with Lambda is also briefly discussed.
AWS Summit Seoul 2015 - AWS í´ëźě°ë뼟 íěŠí ëš ë°ě´í° ë° ě¤ěę° ě¤í¸ëŚŹë° ëśěAmazon Web Services Korea
Â
This document discusses real-time data processing using Amazon Web Services. It describes how to use Amazon Kinesis for real-time data ingestion and processing and Amazon Elastic MapReduce (EMR) for batch processing. It provides examples of using EMR for batch processing large amounts of log data and for interactive querying of data stored in Amazon S3. It also discusses using Kinesis as a data broker to distribute streaming data to multiple applications and using Kinesis with EMR, Spark, and Storm for real-time analytics.
This document summarizes a presentation by Randall Hunt on AWS re:Invent re:CAP. It provides information about Randall Hunt and his background. It then outlines the rough agenda for the presentation, which will cover topics including compute instances, containers, serverless, databases, analytics, storage, application integrations, DevOps and security, developer tools, AI/ML, IoT, mobile, media services, AR/VR, and Alexa for Business.
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS Riyadh User Group
Â
This document provides an overview and agenda for an AWS storage, compute, containers, serverless, and management tools presentation. It includes summaries of several upcoming AWS services and features related to EBS, S3, EC2, EKS, Fargate, Lambda, and AWS Cost Optimizer. The speaker is introduced as Paul Maddox, Principal Architect at AWS, with a background in development, SRE, and systems architecture.
- Amazon Kinesis is a real-time data streaming platform that allows for processing of streaming data in the AWS cloud. It includes Kinesis Streams, Kinesis Firehose, and Kinesis Analytics.
- Kinesis Streams allows users to build custom applications to process or analyze streaming data. It is a high-throughput, low-latency service for real-time data processing over large, distributed data streams.
- Key concepts of Kinesis Streams include shards for partitioning streaming data, producers for ingesting data, data records as the unit of stored data, and consumers for reading and processing streaming data.
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
Â
Amazon Redshift is a fast, fully-managed petabyte-scale data warehouse service, for less than $1,000 per TB per year. In this presentation, you'll get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Learn how, with just a few clicks in the AWS Management Console, you can set up with a fully functional data warehouse, ready to accept data without learning any new languages and easily plugging in with the existing business intelligence tools and applications you use today. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort. In this webinar, you will learn: ⢠Understand what Amazon Redshift is and how it works ⢠Create a data warehouse interactively through the AWS Management Console ⢠Load some data into your new Amazon Redshift data warehouse from S3 Who Should Attend ⢠IT professionals, developers, line-of-business managers
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
Â
Originally, Hadoop was used as a batch analytics tool; however, this is rapidly changing, as applications move towards real-time processing and streaming. Amazon Elastic MapReduce has made running Hadoop in the cloud easier and more accessible than ever. Each day, tens of thousands of Hadoop clusters are run on the Amazon Elastic MapReduce infrastructure by users of every size â from university students to Fortune 50 companies. We recently launched Amazon Kinesis â a managed service for real-time processing of high volume, streaming data. Amazon Kinesis enables a new class of big data applications which can continuously analyze data at any volume and throughput, in real-time. Adi will discuss each service, dive into how customers are adopting the services for different use cases, and share emerging best practices. Learn how you can architect Amazon Kinesis and Amazon Elastic MapReduce together to create a highly scalable real-time analytics solution which can ingest and process terabytes of data per hour from hundreds of thousands of different concurrent sources. Forever change how you process web site click-streams, marketing and financial transactions, social media feeds, logs and metering data, and location-tracking events.
Similar to 1.0 - AWS-DAS-Collection-Kinesis.pdf (20)
Lagom and Panaya presented on accelerating and securing SAP S4/HANA migrations. Panaya's platform reduces migration time by 50% and risk by automating impact analysis, prioritizing testing and fixes, and providing real-time visibility into changes. It helped clients like FGC complete record-time migrations with zero failures. Lagom recommended first assessing impact with Panaya's S/4HANA Impact Assessment Pack before beginning the staged journey to S/4HANA via options like S/4 JumpStart or S/4 Lightning. Key benefits included a scalable, repeatable process and actionable plan that reduces errors.
ASUG83511 - Accelerate Digital Transformation at General Mills.pdfSreeGe1
Â
General Mills and SAP presented on using SAP Data Hub to accelerate General Mills' digital transformation. The presentation provided General Mills' data integration journey and challenges with their existing enterprise data warehouse. It outlined the key capabilities desired in a data integration solution, including real-time replication, federated analytics, data governance, and data science capabilities. Finally, it reviewed SAP Data Hub's capabilities and General Mills' proof of concept, and discussed SAP's roadmap to address additional requirements.
This document provides an overview of new features and enhancements in SAP BW/4HANA 2.0 SP04, including improved modeling flexibility, data integration, data tiering optimization, data protection and privacy capabilities, and integration with SAP Analytics Cloud. Key updates include new join types and operators for CompositeProviders, automated generation of process chains from data flows, support for updating and deleting cold store data, and integration with SAP Data Protection Workbench for S/4HANA Cloud. The document also notes upcoming end of maintenance dates for various BW products and releases.
Data Migration Tools for the MOVE to SAP S_4HANA - Comparison_ MC _ RDM _ LSM...SreeGe1
Â
The document compares different SAP tools for migrating data to SAP S/4HANA, including the Migration Cockpit, Rapid Data Migration, and Legacy System Migration Workbench. It provides an overview of the basic features of each tool as well as a more detailed comparison of their content, conditions, and value propositions. The Migration Cockpit is highlighted as providing predefined migration objects to minimize custom coding and risk, while reducing time and effort compared to the other tools.
You can share encrypted EBS snapshots with other AWS accounts by ensuring the snapshot is encrypted with a custom encryption key rather than the default key, marking the snapshot as private, and giving the other accounts permissions to the custom encryption key. To use a shared encrypted snapshot to create volumes in your account, you must first create a copy of the snapshot and can choose to re-encrypt it with your own key during the copy process. If access to the original encryption key is later revoked, access to volumes created from copies of the originally shared snapshot will not be impacted.
This document contains 10 questions about AWS S3.
Question 1 asks about ways to encrypt data at rest and in transit in S3, with answers being to use server-side encryption with KMS keys or customer provided keys, and to encrypt at the source before transferring.
Question 2 asks how to speed up file uploads to S3, with the answer being to use multipart upload which divides large files into multiple parts.
Question 3 asks how an application can verify a file was successfully uploaded to S3, with the answer being the application can check the HTTP response code and MD5 checksum if server-side encryption was used.
Test Automation Tool for SAP S4HANA Cloud.pptxSreeGe1
Â
The document discusses SAP's test automation tool for SAP S/4HANA Cloud. It provides an overview of the tool's usage areas and benefits, including automating business process testing and test data management. Standard automated test scripts are delivered for SAP best practices processes. The tool supports UI and API-based automation and is integrated with SAP's ALM tools.
S4H_747 How to Approach Remote Cutover (2).pptxSreeGe1
Â
The document provides guidance for executing an SAP project remotely through a "remote cutover" when onsite delivery is not possible. It outlines key roles and responsibilities, tools to support remote collaboration, and activities for planning and executing a remote cutover. This includes preparing for remote work, ensuring prerequisites are met, and business preparation tasks which may still require some onsite work. The playbook is intended to guide remote cutover execution in conjunction with SAP's standard methodology and processes.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Â
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Â
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Â
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. đ This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. đť
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. đĽď¸
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. đ
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Â
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
Â
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Â
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
Â
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Â
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
Â
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
Â
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Â
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
Â
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power gridâs behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Donât worry, we can help with all of this!
Weâll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. Weâll provide examples and solutions for those as well. And naturally weâll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Â
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Ivantiâs Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There weâll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
2. DATA COLLECTION INTRODUCTION
ĂReal Time - Immediate actions
Ă Kinesis Data Streams (KDS)
Ă Simple Queue Service (SQS)
Ă Internet of Things (IoT)
ĂNear-real time - Reactive actions
Ă Kinesis Data Firehose (KDF)
Ă Database Migration Service (DMS)
ĂBatch - Historical Analysis
Ă Snowball
ĂData Pipeline
4. AWS KINESIS OVERVIEW
ĂKinesis is a managed alternative to Apache Kafka
ĂGreat for application logs, metrics, IoT, clickstreams
ĂGreat for âreal-timeâ big data
ĂGreat for streaming processing frameworks (Spark, NiFi, etc...)
ĂData is automatically replicated to 3 AZ
ĂKinesis Components
Ă Kinesis Streams: low latency streaming ingest at scale
Ă Kinesis Analytics: perform real-time analytics on streams using SQL
Ă Kinesis Firehose: load streams into S3, Redshift, ElasticSearch âŚ
6. AWS KINESIS OVERVIEW
Streams are divided in ordered Shards / Partitions
ĂData retention is 1 day by default, can go up to 7 days
ĂAbility to reprocess / replay data
ĂMultiple applications can consume the same stream
ĂReal-time processing with scale of throughput
ĂOnce data is inserted in Kinesis, it canât be deleted (immutability)
7. AWS KINESIS STREAMS SHARDS
ĂOne stream is made of many different shards
ĂBilling is per shard provisioned, can have as many shards as you want
ĂBatching available or per message calls.
ĂThe number of shards can evolve over time (reshard / merge)
ĂRecords are ordered per shard
8. ĂAWS Kinesis Streams Records
ĂData Blob: data being sent, serialized
as bytes. Up to 1 MB. Can represent
anything
ĂRecord Key:
sent alongside a record, helps to
group records in Shards. Same key =
Same shard.
Use a highly distributed key to
avoid the âhot partitionâ problem
ĂSequence number: Unique identifier
for each records put in shards.
Added by Kinesis after ingestion
AWS KINESIS STREAMS - SHARDS
9. DATA ORDERING FOR KINESIS
ĂImagine you have 100 trucks (truck_1,
truck_2, ... truck_100) on the road
sending their GPS positions regularly into
AWS.
ĂYou want to consume the data in order for
each truck, so that you can track their
movement accurately.
ĂHow should you send that data into
Kinesis?
ĂAnswer : send using a âPartition Keyâ
value of the âtruck_idâ
ĂThe same key will always go to the
same shard
10. AWS KINESIS STREAMS RECORDS
ĂProducer:
Ă 1MB/s or 1000 messages/s at write PER SHARD
Ă âProvisionedThroughputExceptionâ otherwise
ĂConsumer Classic:
Ă 2MB/s at read PER SHARD across all consumers
Ă 5 API calls per second PER SHARD across all consumers
Ă if 3 different applications are consuming, possibility of throttling
ĂData Retention:
Ă 24 hours data retention by default
Ă Can be extended to 7 days
12. AWS KINESIS PRODUCER SDK
ĂAPIs that are used are PutRecord (one) and PutRecords (many records)
ĂPutRecords uses batching and increases throughput => less HTTP requests
ĂProvisionedThroughputExceeded if we go over the limits
Ă+ AWS Mobile SDK: Android, iOS, etc...
ĂUse case: low throughput, higher latency, simple API, AWS Lambda
ĂManaged AWS sources for Kinesis Data Streams:
Ă CloudWatch Logs
Ă AWS IoT
Ă Kinesis Data Analytics
13. AWS KINESIS API â EXCEPTIONS
ĂProvisionedThroughputExceeded Exceptions
ĂHappens when sending more data (exceeding MB/s or TPS for any shard)
ĂMake sure you donât have a hot shard (such as your partition key is bad and
too much data goes to that partition)
ĂSolution:
ĂRetries with backoff
ĂIncrease shards (scaling)
ĂEnsure your partition key is a good one
14. KINESIS PRODUCER LIBRARY
ĂEasy to use and highly configurable C++ / Java library
ĂUsed for building high performance, long-running producers
ĂAutomated and configurable retry mechanism
ĂSynchronous or Asynchronous API (better performance for async)
ĂSubmits metrics to CloudWatch for monitoring
ĂBatching (both turned on by default) â increase throughput, decrease cost:
ĂCollect Records and Write to multiple shards in the same PutRecords API call
ĂAggregate â increased latency
⢠Capability to store multiple records in one record (go over 1000 records per
second limit)
⢠Increase payload size and improve throughput (maximize 1MB/s limit)
ĂCompression must be implemented by the user
ĂKPL Records must be de-coded with KCL or special helper library
15. KINESIS PRODUCER LIBRARY (KPL) BATCHING
ĂWe can influence the batching efficiency by introducing some delay with
RecordMaxBufferedTime (default 100ms)
16. KINESIS PRODUCER LIBRARY â WHEN NOT TO USE
ĂThe KPL can incur an additional processing delay of up to RecordMaxBufferedTime within
the library (user-configurable)
ĂLarger values of RecordMaxBufferedTime results in higher packing efficiencies and better
performance
ĂApplications that cannot tolerate this additional delay may need to use the AWS SDK
directly
17. KINESIS AGENT
ĂMonitor Log files and sends them to Kinesis Data Streams
ĂJava-based agent, built on top of KPL
ĂInstall in Linux-based server environments
Features:
Ă Write from multiple directories and write to multiple streams
Ă Routing feature based on directory / log file
Ă Pre-process data before sending to streams (single line, csv to json, log to json...)
Ă The agent handles file rotation, checkpointing, and retry upon failures
Ă Emits metrics to CloudWatch for monitoring
19. KINESIS CONSUMER SDK - GETRECORDS
ĂClassic Kinesis - Records are polled
by consumers from a shard
ĂEach shard has 2 MB total
aggregate throughput
ĂGetRecords returns up to 10MB of
data (then throttle for 5 seconds) or
up to 10000 records
ĂMaximum of 5 GetRecords API calls
per shard per second = 200ms
latency
ĂIf 5 consumer applications consume
from the same shard, means every
consumer can poll once a second
and receive less than 400 KB/s
20. KINESIS CLIENT LIBRARY (KCL)
Ă Java-first library but exists for other languages too
(Golang, Python, Ruby, Node, .NET ...)
Ă Read records from Kinesis produced with the KPL (de-
aggregation)
Ă Share multiple shards with multiple consumers in one
âgroupâ, shard discovery
Ă Checkpointing feature to resume progress
Ă Leverages DynamoDB for coordination and
checkpointing (one row per shard)
Ă Make sure you provision enough WCU / RCU
Ă Or use On-Demand for DynamoDB
Ă Otherwise DynamoDB may slow down KCL
Ă Record processors will process the data
Ă ExpiredIteratorException => increase WCU
21. KINESIS CONNECTOR LIBRARY
ĂOlder Java library (2016), leverages
the KCL library
ĂWrite data to:
Ă Amazon S3
Ă DynamoDB
Ă Redshift
Ă ElasticSearch
ĂKinesis Firehose replaces the
Connector Library for a few of
these targets, Lambda for the
others
22. AWS LAMBDA SOURCING FROM KINESIS
ĂAWS Lambda can source records from Kinesis Data Streams
ĂLambda consumer has a library to de-aggregate record from the KPL
ĂLambda can be used to run lightweight ETL to:
Ă Amazon S3
Ă DynamoDB
Ă Redshift
Ă ElasticSearch
Ă Anywhere you want
ĂLambda can be used to trigger notifications / send emails in real time
ĂLambda has a configurable batch size (more in Lambda section)
23. KINESIS ENHANCED FAN OUT
ĂNew game-changing feature from August 2018.
ĂWorks with KCL 2.0 and AWS Lambda (Nov 2018)
ĂEach Consumer get 2 MB/s of provisioned
throughput per shard
ĂThat means 20 consumers will get 40MB/s per
shard aggregated
ĂNo more 2 MB/s limit!
ĂEnhanced Fan Out: Kinesis pushes data to
consumers over HTTP/2
ĂReduce latency (~70 ms)
24. ENHANCED FAN-OUT VS STANDARD CONSUMERS
ĂStandard consumers:
Ă Low number of consuming applications (1,2,3...)
Ă Can tolerate ~200 ms latency
Ă Minimize cost
ĂEnhanced Fan Out Consumers:
Ă Multiple Consumer applications for the same Stream
Ă Low Latency requirements ~70ms
Ă Higher costs (see Kinesis pricing page)
Ă Default limit of 5 consumers using enhanced fan-out per data stream
25. KINESIS OPERATIONS â ADDING SHARDS
ĂAlso called âShard Splittingâ
ĂCan be used to increase the Stream capacity (1 MB/s data in per shard)
ĂCan be used to divide a âhot shardâ
ĂThe old shard is closed and will be deleted once the data is expired
26. KINESIS OPERATIONS â MERGING SHARDS
ĂDecrease the Stream capacity and save costs
ĂCan be used to group two shards with low traffic
ĂOld shards are closed and deleted based on data expiration
27. OUT-OF-ORDER RECORDS AFTER RESHARDING
ĂAfter a reshard, you can read from child
shards
ĂHowever, data you havenât read yet could
still be in the parent
ĂIf you start reading the child before
completing reading the parent, you could
read data for a particular hash key out of
order
ĂAfter a reshard, read entirely from the
parent until you donât have new records
ĂNote: The Kinesis Client Library (KCL) has
this logic already built-in, even after
resharding operations
28. KINESIS OPERATIONS â AUTO SCALING
ĂAuto Scaling is not a native feature of
Kinesis
ĂThe API call to change the number of
shards is UpdateShardCount
ĂWe can implement Auto Scaling with AWS
Lambda
ĂSee:
Ă https://aws.amazon.com/blogs/b ig-
data/scaling-amazon-kinesis- data-
streams-with-aws- application-auto-
scaling/
29. KINESIS SCALING LIMITATIONS
ĂResharding cannot be done in parallel. Plan capacity in advance
ĂYou can only perform one resharding operation at a time and it takes a few seconds
ĂFor 1000 shards, it takes 30K seconds (8.3 hours) to double the shards to 2000
You canât do the following:
Ă Scale more than twice for each rolling 24-hour period for each stream
Ă Scale up to more than double your current shard count for a stream
Ă Scale down below half your current shard count for a stream
Ă Scale up to more than 10000 shards in a stream
Ă Scale a stream with more than 10000 shards down unless the result is fewer than 10000
shards
Ă Scale up to more than the shard limit for your account
ĂIf you need to scale more than once a day, you can request amazon to increase to this
limit
30. KINESIS SECURITY
ĂControl access / authorization using IAM policies
ĂEncryption in flight using HTTPS endpoints
ĂEncryption at rest using KMS
ĂClient side encryption must be manually implemented (harder)
ĂVPC Endpoints available for Kinesis to access within VPC
31. SHARDS KINESIS DATA STREAMS â HANDLING DUPLICATES FOR
PRODUCERS
ĂProducer retries can create duplicates due to network timeouts
ĂAlthough the two records have identical data, they also have unique sequence number
Ă Fix: embed unique record ID in the data to de-duplicate on the consumer side
32. KINESIS DATA STREAMS â HANDLING DUPLICATES
FOR CONSUMERS
ĂConsumer retries can make your application read the same data twice
ĂConsumer retries happen when record processors restart:
Ă A worker terminates unexpectedly
Ă Worker instances are added or removed
Ă Shards are merged or split
Ă The application is deployed
ĂFixes:
Ă Make your consumer application idempotent
Ă If the final destination can handle duplicates, itâs recommended to do it there
ĂMore info: https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-
duplicates.html
33. AWS KINESIS DATA FIREHOSE
ĂFully Managed Service, no administration
ĂNear Real Time (60 seconds latency minimum for non full batches)
ĂLoad data into Redshift / Amazon S3 / ElasticSearch / Splunk
ĂAutomatic scaling
ĂSupports many data formats
ĂData Conversions from JSON to Parquet / ORC (only for S3)
ĂData Transformation through AWS Lambda (ex: CSV => JSON)
ĂSupports compression when target is Amazon S3 (GZIP, ZIP, and SNAPPY)
ĂOnly GZIP is the data is further loaded into Redshift
ĂSpark / KCL do not read from KDF
ĂPay for the amount of data going through Firehose
36. FIREHOSE BUFFER SIZING
ĂFirehose accumulates records in a buffer
ĂThe buffer is flushed based on time and size rules
ĂBuffer Size (ex: 32MB): if that buffer size is reached, itâs flushed
ĂBuffer Time (ex: 2 minutes): if that time is reached, itâs flushed
ĂFirehose can automatically increase the buffer size to increase throughput
ĂHigh throughput => Buffer Size will be hit
ĂLow throughput => Buffer Time will be hit
37. AWS KINESIS DATA STREAMS VS FIREHOSE
Streams
ĂŹ Going to write custom code (producer / consumer)
ĂŹ Real time (~200 ms latency for classic)
ĂŹ Must manage scaling (shard splitting / merging)
ĂŹ Data Storage for 1 to 7 days, replay capability, multi consumers
ĂŹ Use with Lambda to insert data in real-time to ElasticSearch (for example)
Firehose
ĂŹ Fully managed, send to S3, Splunk, Redshift, ElasticSearch
ĂŹ Serverless data transformations with Lambda
ĂŹ Near real time (lowest buffer time is 1 minute)
ĂŹ Automated Scaling
ĂŹ No data storage
38. CLOUDWATCH LOGS SUBSCRIPTIONS FILTERS
ĂYou can stream CloudWatch Logs into
Ă Kinesis Data Streams
Ă Kinesis Data Firehose
Ă AWS Lambda
Ă Using CloudWatch Logs Subscriptions Filters
ĂYou can enable them using the AWS CLI
42. USE CASE â EXERCISE 1 â DATA COLLECTION
⢠Creating a Kinesis Firehose delivery stream
⢠Generate OrderHistory csv file using a LogGenerator Python script
⢠Publishing the data to an S3 bucket from firehose using Kinesis Agent
⢠Create a Kinesis Data Stream
⢠Publish data from the Kinesis agent to Kinesis Data stream