This document provides an overview of building a real-time streaming data visualization system with Amazon Kinesis Analytics. It begins with an introduction and outlines what will be covered, including streaming data, the Amazon Kinesis platform, Kinesis Analytics patterns and features, and a walkthrough of an end-to-end streaming data example. It then provides details on the Amazon Kinesis platform and services, demonstrates how to write SQL queries to analyze streaming data in Kinesis Analytics, and shows how to output results to destinations like Kinesis streams, S3, and DynamoDB. It concludes with notes on deploying Kinesis Analytics applications and testing.
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
by Adrian Hornsby, Technical Evanglist, AWS
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
by Adrian Hornsby, Technical Evangelist, AWS
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. In this session, you’ll learn about how AWS customers are transitioning from batch to real-time processing using Amazon Kinesis, and how to get started. We will provide an overview of streaming data applications and introduce the Amazon Kinesis platform and its services. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Amazon Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
by Dario Rivera, Solutions Architect, AWS
The world is producing an ever-increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Amazon Kinesis Analytics is the easiest way to process streaming data in real time with standard SQL without having to learn new programming languages or processing frameworks. Amazon Kinesis analytics enables you to create and run SQL queries on streaming data so that you can gain actionable insights and respond to your business and customer needs promptly. In this session, we will provide an overview of the capabilities of the Amazon Kinesis Analytics. We will show you how you can build an entire stream processing pipeline to collect, ingest, process, and emit streaming data using Amazon Kinesis Analytics, Amazon Kinesis Firehose, and Amazon Kinesis Streams.
AWS Kinesis - Streams, Firehose, AnalyticsSerhat Can
An introduction to AWS Kinesis including AWS Kinesis Streams, Firehose and Analytics. Focuses on the details of Kinesis Streams concepts such as partition key, sequence number, sharding, KCL etc. A simple comparison between similar services like Kafka and SQS with Amazon Kinesis Streams service.
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
by Adrian Hornsby, Technical Evanglist, AWS
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
by Adrian Hornsby, Technical Evangelist, AWS
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. In this session, you’ll learn about how AWS customers are transitioning from batch to real-time processing using Amazon Kinesis, and how to get started. We will provide an overview of streaming data applications and introduce the Amazon Kinesis platform and its services. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Amazon Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
by Dario Rivera, Solutions Architect, AWS
The world is producing an ever-increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Amazon Kinesis Analytics is the easiest way to process streaming data in real time with standard SQL without having to learn new programming languages or processing frameworks. Amazon Kinesis analytics enables you to create and run SQL queries on streaming data so that you can gain actionable insights and respond to your business and customer needs promptly. In this session, we will provide an overview of the capabilities of the Amazon Kinesis Analytics. We will show you how you can build an entire stream processing pipeline to collect, ingest, process, and emit streaming data using Amazon Kinesis Analytics, Amazon Kinesis Firehose, and Amazon Kinesis Streams.
AWS Kinesis - Streams, Firehose, AnalyticsSerhat Can
An introduction to AWS Kinesis including AWS Kinesis Streams, Firehose and Analytics. Focuses on the details of Kinesis Streams concepts such as partition key, sequence number, sharding, KCL etc. A simple comparison between similar services like Kafka and SQS with Amazon Kinesis Streams service.
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseAmazon Web Services
Evolving your analytics from batch processing to real-time processing can have a major business impact, but ingesting streaming data into your data warehouse requires building complex streaming data pipelines. Amazon Kinesis Firehose solves this problem by making it easy to transform and load streaming data into Amazon Redshift so that you can use existing analytics and business intelligence tools to extract information in near real-time and respond promptly. In this session, we will dive deep using Amazon Kinesis Firehose to load streaming data into Amazon Redshift reliably, scalably, and cost-effectively.
"The only real mistake is the one from which we learn nothing.” So how do we learn from system failures? This session will move beyond “blameless” postmortems and show how to use data to avoid and mitigate future failures. We will share the best practices for gathering systems-related data and people-related data. You will then learn how to apply the data to formulate actionable response plans and avoid repeating failures.
This session is brought to you by AWS Summit New York City sponsor, Datadog."
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services
Learning Objectives:
- Understand key requirements for collecting, preparing, and loading streaming data into data lakes
- Get an overview of transmitting data using Amazon Kinesis Firehose
- Learn how to perform data transformations with Amazon Kinesis Firehose
Data lakes enable your employees across the organization to access and analyze massive amounts of unstructured and structured data from disparate data sources, many of which generate data continuously and rapidly. Making this data available in a timely fashion for analysis requires a streaming solution that can durably and cost-effectively ingest this data into your data lake. Amazon Kinesis Firehose is a fully managed service that makes it easy to prepare and load streaming data into AWS. In this tech talk, we will provide an overview of Amazon Kinesis Firehose and dive deep into how you can use the service to collect, transform, batch, compress, and load real-time streaming data into your Amazon S3 data lakes.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. In this webinar, developers will learn how to build and deploy a streaming data processing application with Amazon Kinesis. We will cover the following: - A brief overview of Amazon Kinesis and drill down on key technical concepts. - Amazon Kinesis Client Library capabilities that enable customers to build fault tolerant, continuous processing applications that scale elastically. - The role of the supporting connector library for moving data into stores like S3 and Redshift. - Best practices for streaming data ingestion and processing with Amazon Kinesis.
Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)Amazon Web Services
You have billions of events in your fact table, all of it waiting to be visualized. Enter Tableau… but wait: how can you ensure scalability and speed with your data in Amazon S3, Spark, Amazon Redshift, or Presto? In this talk, you’ll hear how Albert Wong and Srikanth Devidi at Netflix use Tableau on top of their big data stack. Albert and Srikanth also show how you can get the most out of a massive dataset using Tableau, and help guide you through the problems you may encounter along the way. Session sponsored by Tableau.
AWS Competency Partner
A quick overview of AWS Kinesis: What is Kinesis, what problems does Kinesis solve, and how might you integrate Kinesis with an existing data warehouse.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
This introductory webinar, presented by Adi Krishnan, Senior Product Manager for Amazon Kinesis, will provide you with an overview of the service, sample use cases, and some examples of customer experiences with the service so you can better understand its capabilities and see how it might be integrated into your own applications.
In this session, storage experts will walk you through the object storage offering, Amazon S3, a bulk data repository that can deliver 99.999999999% durability and scale past trillions of objects worldwide. Learn about the different ways you can accelerate data transfer to S3 and get a close look at some of the new tools available for you to secure and manage your data more efficiently. Announced at re:Invent 2016, see how you can use Amazon Athena with S3 to run serverless analytics on your data and as a bonus, walk away with some code snippets to use with S3. Hear AWS customers talk about the solutions they have built with S3 to turn their data into a strategic asset, instead of just a cost center. And bring your toughest questions to our experts on hand and walk away that much smarter on how to use object storage from AWS.
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
During this session Greg Brandt and Liyin Tang, Data Infrastructure engineers from Airbnb, will discuss the design and architecture of Airbnb's streaming ETL infrastructure, which exports data from RDS for MySQL and DynamoDB into Airbnb's data warehouse, using a system called SpinalTap. We will also discuss how we leverage Spark Streaming to compute derived data from tracking topics and/or database tables, and HBase to provide immediate data access and generate cleanly time-partitioned Hive tables.
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
by Dario Rivera, Solutions Architect, AWS
Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features. This session will feature Asurion, a provider of device protection and support services for over 280 million smartphones and other consumer electronics devices.
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisAmazon Web Services
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, you’ll learn about how AWS customers are transitioning from batch to stream processing using Kinesis, and how to get started. We will provide an overview of streaming applications and introduce the Kinesis capabilities. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.
As the amount of data being created by applications increases, the requirement to keep pace in this space becomes increasingly difficult. This session covers how to properly collect, manage and present the data usefully using service offerings from Amazon Web Services such as Amazon RedShift and Amazon Kinesis. At this session we will include live demo on parsing the data generated and management of the data.
Alex Smith, Solutions Architect, Amazon Web Services, ASEAN
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseAmazon Web Services
Evolving your analytics from batch processing to real-time processing can have a major business impact, but ingesting streaming data into your data warehouse requires building complex streaming data pipelines. Amazon Kinesis Firehose solves this problem by making it easy to transform and load streaming data into Amazon Redshift so that you can use existing analytics and business intelligence tools to extract information in near real-time and respond promptly. In this session, we will dive deep using Amazon Kinesis Firehose to load streaming data into Amazon Redshift reliably, scalably, and cost-effectively.
"The only real mistake is the one from which we learn nothing.” So how do we learn from system failures? This session will move beyond “blameless” postmortems and show how to use data to avoid and mitigate future failures. We will share the best practices for gathering systems-related data and people-related data. You will then learn how to apply the data to formulate actionable response plans and avoid repeating failures.
This session is brought to you by AWS Summit New York City sponsor, Datadog."
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services
Learning Objectives:
- Understand key requirements for collecting, preparing, and loading streaming data into data lakes
- Get an overview of transmitting data using Amazon Kinesis Firehose
- Learn how to perform data transformations with Amazon Kinesis Firehose
Data lakes enable your employees across the organization to access and analyze massive amounts of unstructured and structured data from disparate data sources, many of which generate data continuously and rapidly. Making this data available in a timely fashion for analysis requires a streaming solution that can durably and cost-effectively ingest this data into your data lake. Amazon Kinesis Firehose is a fully managed service that makes it easy to prepare and load streaming data into AWS. In this tech talk, we will provide an overview of Amazon Kinesis Firehose and dive deep into how you can use the service to collect, transform, batch, compress, and load real-time streaming data into your Amazon S3 data lakes.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. In this webinar, developers will learn how to build and deploy a streaming data processing application with Amazon Kinesis. We will cover the following: - A brief overview of Amazon Kinesis and drill down on key technical concepts. - Amazon Kinesis Client Library capabilities that enable customers to build fault tolerant, continuous processing applications that scale elastically. - The role of the supporting connector library for moving data into stores like S3 and Redshift. - Best practices for streaming data ingestion and processing with Amazon Kinesis.
Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)Amazon Web Services
You have billions of events in your fact table, all of it waiting to be visualized. Enter Tableau… but wait: how can you ensure scalability and speed with your data in Amazon S3, Spark, Amazon Redshift, or Presto? In this talk, you’ll hear how Albert Wong and Srikanth Devidi at Netflix use Tableau on top of their big data stack. Albert and Srikanth also show how you can get the most out of a massive dataset using Tableau, and help guide you through the problems you may encounter along the way. Session sponsored by Tableau.
AWS Competency Partner
A quick overview of AWS Kinesis: What is Kinesis, what problems does Kinesis solve, and how might you integrate Kinesis with an existing data warehouse.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
This introductory webinar, presented by Adi Krishnan, Senior Product Manager for Amazon Kinesis, will provide you with an overview of the service, sample use cases, and some examples of customer experiences with the service so you can better understand its capabilities and see how it might be integrated into your own applications.
In this session, storage experts will walk you through the object storage offering, Amazon S3, a bulk data repository that can deliver 99.999999999% durability and scale past trillions of objects worldwide. Learn about the different ways you can accelerate data transfer to S3 and get a close look at some of the new tools available for you to secure and manage your data more efficiently. Announced at re:Invent 2016, see how you can use Amazon Athena with S3 to run serverless analytics on your data and as a bonus, walk away with some code snippets to use with S3. Hear AWS customers talk about the solutions they have built with S3 to turn their data into a strategic asset, instead of just a cost center. And bring your toughest questions to our experts on hand and walk away that much smarter on how to use object storage from AWS.
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
During this session Greg Brandt and Liyin Tang, Data Infrastructure engineers from Airbnb, will discuss the design and architecture of Airbnb's streaming ETL infrastructure, which exports data from RDS for MySQL and DynamoDB into Airbnb's data warehouse, using a system called SpinalTap. We will also discuss how we leverage Spark Streaming to compute derived data from tracking topics and/or database tables, and HBase to provide immediate data access and generate cleanly time-partitioned Hive tables.
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
by Dario Rivera, Solutions Architect, AWS
Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features. This session will feature Asurion, a provider of device protection and support services for over 280 million smartphones and other consumer electronics devices.
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisAmazon Web Services
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, you’ll learn about how AWS customers are transitioning from batch to stream processing using Kinesis, and how to get started. We will provide an overview of streaming applications and introduce the Kinesis capabilities. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.
As the amount of data being created by applications increases, the requirement to keep pace in this space becomes increasingly difficult. This session covers how to properly collect, manage and present the data usefully using service offerings from Amazon Web Services such as Amazon RedShift and Amazon Kinesis. At this session we will include live demo on parsing the data generated and management of the data.
Alex Smith, Solutions Architect, Amazon Web Services, ASEAN
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...Amazon Web Services
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks.
In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
It is becoming increasingly important to analyze real time streaming data. It allows organizations to remain competitive by uncovering relevant, actionable insights. AWS makes it easy to capture, store, and analyze real-time streaming data.
In this webinar, we will guide you through some of the proven architectures for processing streaming data, using a combination of tools including Amazon Kinesis Streams, AWS Lambda, and Spark Streaming on Amazon Elastic MapReduce (EMR). We will then talk about common use cases and best practices for real-time data analysis on AWS.
Learning Objectives:
Understand how you can analyze real-time data streams using Amazon Kinesis, AWS Lambda, and Spark running on Amazon EMR
Learn use cases and best practices for streaming data applications on AWS
Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialized needs.
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
In this session, you will learn best practices for implementing simple to advanced real-time streaming data use cases on AWS. First, we’ll review decision points on near real-time versus real time scenarios. Next, we will take a look at streaming data architecture patterns that include Amazon Kinesis Analytics, Amazon Kinesis Firehose, Amazon Kinesis Streams, Spark Streaming on Amazon EMR, and other open source libraries. Finally, we will dive deep into the most common of these patterns and cover design and implementation considerations.
AWS has a large and growing portfolio of big data management and analytics services, designed to integrate into solution architectures to meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, to explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
Real-time processing of streaming data is a common architectural pattern used in many applications. Amazon Kinesis Analytics is the easiest way to process streaming data in real time with standard SQL without having to learn new programming languages or processing frameworks. We will present how to use Amazon Kinesis Analytics on streaming data and gain actionable insights from your data.
Services: Amazon Kinesis Analytics, Amazon Redshift, Amazon Elastic Search Service and Amazon S3.
Presenters: Kobi Biton & Ran Tessler
So, you have IoT Devices connected to IoT Hub sending telemetry data into the Microsoft Azure cloud. Now what? This session will take you through setting up real-time stream processing of IoT data. We’ll look at integrating services like Azure Stream Analytics, Azure Functions, and Cosmos DB to build a highly scalable stream processing backend for any IoT solution. You’ll leave this session better prepared to handle real-time IoT stream processing in Azure; plus you’ll do it with less code by utilizing serverless Azure Functions.
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Amazon Web Services
Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialized needs.
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Web Services
Real-Time Streaming Analytics became popular amongst many verticals and use cases. In AdTech, Gaming, Financial Service and IoT, AWS customers are leveraging Amazon Kinesis platform to ingest billions of events every day and process them in real-time. In this session, we will discuss Amazon Kinesis Streams, Amazon Kinesis Firehose and Amazon Kinesis Analytics. We will show best practice and design patterns in integrating Amazon Kinesis platform with other services like Amazon EMR, Redshift, Amazon Elasticsearch and AWS lambda as well as 3rd party connectors like storm, Spark and more.
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realAmazon Web Services LATAM
Nesta sessão faremos uma demonstração de controle e defesa de tráfego aéreo utilizando processamento em tempo real.
Trataremos das boas práticas para ingestão, armazenamento, processamento e visualização de dados através de serviços da AWS como Kinesis, DynamoDB, Lambda, Redshift, Quicksight e Amazon Machine Learning.
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
Get answers to technical questions, frequently asked by those starting to work with streaming data. Learn best practices for building a real-time streaming data architecture on AWS with Amazon Kinesis, Spark Streaming, AWS Lambda, and Amazon EMR. First, we will focus on building a scalable, durable streaming data ingestion workflow from data producers like mobile devices, servers, or even web browsers. We will provide guidelines to minimize duplicates and achieve exactly-once processing semantics in your stream-processing applications. Then, we will show some of the proven architectures for processing streaming data using a combination of tools including Amazon Kinesis Stream, AWS Lambda, and Spark Streaming running on Amazon EMR.
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Amazon Web Services
Speaker: Tara Walker, AWS
Customer Speaker: Digitata
Level: 200
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. In this session, we will provide an overview of streaming data applications with the Amazon Kinesis platform and present an end-to-end streaming data solution including data ingestion, real-time processing, and persistence.
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
In this session, you will learn best practices for implementing simple to advanced real-time streaming data use cases on AWS. First, we will review decision points on near real-time versus real time scenarios. Next, we will take a look at streaming data architecture patterns that include Amazon Kinesis Analytics, Amazon Kinesis Firehose, Amazon Kinesis Streams, Spark Streaming on Amazon EMR, and other open source libraries. Finally, we will dive deep into the most common of these patterns and cover design and implementation considerations.
Processing streaming data is becoming increasingly important to many organisations who need to analyse incoming data both in near real-time and in batch. In this session we will look at the best practices and patterns for analysing streaming data with AWS Kinesis Streams, Kinesis Firehose and Kinesis Analytics.
Speaker: Johnathon Meichtry, Principal Solutions Architect, Amazon Web Services
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
Similar to Serverless Streaming Data Processing using Amazon Kinesis Analytics (20)
How can your business benefit from going serverless?Adrian Hornsby
Serverless architectures let you build and deploy applications and services with infrastructure resources that require zero administration. In the past, you had to provision and scale servers to run your application code to install and operate distributed databases and build and run custom software to handle API requests. Now AWS provides a stack of scalable fully-managed services that eliminates these operational complexities. In this session, you will learn about the basics of serverless and especially how your business can benefit from it.
Moving Forward with AI - as presented at the Prosessipäivät 2018Adrian Hornsby
https://www.oppia.fi/prosessipaivat/
-
Self-Driving cars. Commercial drones. Smart cameras. Movie and music creation. Powerful & intelligent robots. Over the past few years, a new revolution has brought AI almost to the level of science-fiction. However, most companies are not worried about far-off futuristic applications of AI, they want to know what AI can do - today - for their organisations. Distinguishing the hype from reality can be a bit confusing, especially when you consider the attention that AI gets from the media and commentators. So, how can your organisation get started and put AI to work for you? That is the question I will answer in this talk. From greater customer intimacy, increasing competitive advantage and improving efficiency, I will discuss and show how AI can be used today and help the organisation in more impactful ways.
Chaos Engineering: Why Breaking Things Should Be Practised.Adrian Hornsby
As presented at the AWS London Summit 2018
With the rise of micro-services and large-scale distributed architectures, software systems have grown increasingly complex and hard to understand. Adding to that complexity, the velocity of software delivery has also dramatically increased, resulting in failures being harder to predict and contain.
While the cloud allows for high availability, redundancy and fault-tolerance, no single component can guarantee 100% uptime. Therefore, we have to understand availability but especially learn how to design architectures with failure in mind.
And since failures have become more and more chaotic in nature, we must turn to chaos engineering in order to identify failures before they become outages.
In this talk, I will deep dive into availability, reliability and large-scale architectures and make an introduction to chaos engineering, a discipline that promotes breaking things on purpose in order to learn how to build more resilient systems.
Chaos Engineering: Why Breaking Things Should Be Practised.Adrian Hornsby
With the rise of micro-services and large-scale distributed architectures, software systems have grown increasingly complex and hard to understand. Adding to that complexity, the velocity of software delivery has also dramatically increased, resulting in failures being harder to predict and contain.
While the cloud allows for high availability, redundancy and fault-tolerance, no single component can guarantee 100% uptime. Therefore, we have to understand availability but especially learn how to design architectures with failure in mind.
And since failures have become more and more chaotic in nature, we must turn to chaos engineering in order to identify failures before they become outages.
In this talk, I will deep dive into availability, reliability and large-scale architectures and make an introduction to chaos engineering, a discipline that promotes breaking things on purpose in order to learn how to build more resilient systems.
Slides from my talk at the Data Innovations Summit on MXNet Model Server.
https://www.datainnovationsummit.com/
Apache MXNet Model Server (MMS) is a flexible and easy to use tool for serving deep learning models exported from MXNet or the Open Neural Network Exchange (ONNX).
https://github.com/awslabs/mxnet-model-server
Building a Multi-Region, Active-Active Serverless Backends.Adrian Hornsby
From understanding reliability and availability, this talks walks you through the why and the how of building multi-region, active-active applications, and especially why serverless is a great fit.
Self-Driving cars. Commercial drones. Smart cameras. Movie and music creation. Powerful & intelligent robots. Over the past few years, a new revolution has brought AI almost to the level of science-fiction. However, most companies are not worried about far-off futuristic applications of AI, they want to know what AI can do - today - for their organisations. Distinguishing the hype from reality can be a bit confusing, especially when you consider the attention that AI gets from the media and commentators. So, how can your organisation get started and put AI to work for you? That is the question I will answer in this talk. From greater customer intimacy, increasing competitive advantage and improving efficiency, I will discuss and show how AI can be used today and help the organisation in more impactful ways.
The slides from my talk at the AWS DevDays in the Nordics.
https://aws.amazon.com/events/Devdays-Nordics/agenda/
Objectives:
- Understand Serverless Key Concepts.
- Understand Event Processing Architecture.
- Understand Operation Automation Architecture.
- Understand Web Application Architecture.
- Understand Data Processing Architecture.
* Kinesis-based apps.
* IoT-based apps.
re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...Adrian Hornsby
In this session, you will learn about our strategy for driving machine learning innovation for our customers and learn what’s new from AWS in the machine learning space. We will discuss and demonstrate the latest new services for ML on AWS: Amazon SageMaker, AWS DeepLens, Amazon Rekogntion Video, Amazon Translate, Amazon Transcribe, and Amazon Comprehend. Attend this session to understand how to make the most of machine learning in the cloud.
re:Invent re:Cap - Big Data & IoT at Any ScaleAdrian Hornsby
This session covers the most recent Big Data & IoT announcements at re:Invent. Learn about trends and use cases for understanding your data and implementing an Internet of Things (IoT) project. Hear about how AWS customers are using AWS IoT to connect their devices to the cloud and solve business challenges with IoT.
Devoxx: Building AI-powered applications on AWSAdrian Hornsby
Slides from my talk at devoxx2018
The video: https://www.youtube.com/watch?v=-izfBVlHkSc
https://cfp.devoxx.be/2017/talk/XEO-9942/Building_Serverless_AI-powered_Applications_on_AWS
Slides from my talk at the first AWS Community Day in Bangalore
https://www.meetup.com/awsugblr/events/243819403/
Speaker notes: https://medium.com/@adhorn/10-lessons-from-10-years-of-aws-part-1-258b56703fcf
and https://medium.com/@adhorn/10-lessons-from-10-years-of-aws-part-2-5dd92b533870
The list is not in any particular order :)
Developing Sophisticated Serverless Applications with AIAdrian Hornsby
The slides from my talk at the Serverless Summit in India http://inserverless.com
Developing advanced AI enabled applications with serverless technology on AWS
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Monitoring Java Application Security with JDK Tools and JFR Events
Serverless Streaming Data Processing using Amazon Kinesis Analytics
1. Build a Real-Time Streaming Data
Visualization System with Amazon
Kinesis Analytic
Adrian Hornsby (@adhorn)
Technical Evangelist with AWS
San Francisco Loft - 2017
2. • Technical Evangelist, Developer Advocate,
… Software Engineer
• My @home is in Finland
• Previously:
• Solutions Architect @AWS
• Lead Cloud Architect @Dreambroker
• Director of Engineering, Software Engineer, DevOps, Manager, ... @Hdm
• Researcher @Nokia Research Center
• and a bunch of other stuff.
• Love climbing and ginger shots.
3. What to Expect from the Session
• Streaming data overview
• Amazon Kinesis platform review
• Amazon Kinesis Analytics Overview
• Amazon Kinesis Analytics patterns
• Streaming data end-to-end example and walk-
through
6. Amazon Kinesis makes it easy to work with
real-time streaming data
Amazon Kinesis
Streams
• For Technical Developers
• Collect and stream data
for ordered, replayable,
real-time processing
Amazon Kinesis
Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into Amazon S3, Redshift,
ElasticSearch
Amazon Kinesis
Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
7. Amazon Kinesis Streams
• Reliably ingest and durably store streaming data at low cost
• Build custom real-time applications to process streaming data
10. Amazon Kinesis Analytics
• Interact with streaming data in real-time using SQL
• Build fully managed and elastic stream processing
applications that process data for real-time visualizations
and alarms
12. Kinesis Analytics
Pay for only what you use
Automatic elasticity
Standard SQL for analytics
Real-time processing
Easy to use
13. Use SQL to build real-time applications
Easily write SQL code to process
streaming data
Connect to streaming source
Continuously deliver SQL results
14. Connect to streaming source
• Streaming data sources include Kinesis
Firehose or Kinesis Streams
• Input formats include JSON, .csv, variable
column, unstructured text
• Each input has a schema; schema is inferred,
but you can edit
• Reference data sources (S3) for data
enrichment
15. Write SQL code
• Build streaming applications with one-to-many
SQL statements
• Robust SQL support and advanced analytic
functions
• Extensions to the SQL standard to work
seamlessly with streaming data
• Support for at-least-once processing
semantics
16. Continuously deliver SQL results
• Send processed data to multiple destinations
• S3, Amazon Redshift, Amazon ES (through
Firehose)
• Streams (with AWS Lambda integration for
custom destinations)
• End-to-end processing speed as low as sub-
second
• Separation of processing and data delivery
18. Generate time series analytics
• Compute key performance indicators over time periods
• Combine with static or historical data in S3 or Amazon Redshift
Analytics
Streams
Firehose
Amazon
Redshift
S3
Streams
Firehose
Custom, real-
time
destinations
19. Create real-time alarms and notifications
• Build sequences of events from the stream, like user sessions in a
clickstream or app behavior through logs
• Identify events (or a series of events) of interest, and react to the
data through alarms and notifications
Analytics
Streams
Firehose
Streams
Amazon
SNS
Amazon
CloudWatch
Lambda
20. Feed real-time dashboards
• Validate and transform raw data, and then process to calculate
meaningful statistics
• Send processed data downstream for visualization in BI and
visualization services
Amazon
QuickSight
Analytics
Amazon ES
Amazon
Redshift
Amazon
RDS
Streams
Firehose
22. Scenario Requirements
Data to capture every second:
• Total distinct users
• Number of users for each Operating System
• Number of users in each quadrant
Output Requirements
• Update DynamoDB table every second, with each aggregate
values
25. Data Input
Source JSON Data
• Once per second, using
JavaScript SDK:
• Unique Cognito ID
(anonymous user)
• OS
• Quadrant
• Data sent to Kinesis
Stream
Amazon
Kinesis
Stream
Amazon
Cognito
Amazon
S3
JavaScript
SDK
{
"recordTime": 1486505943.204,
"cognitoId": "us-east-1:3626e211-d2a3-447b-8231-e1f4e0486f44",
"os": "Android",
"quadrant": "A"
}
26. How is raw data mapped to a schema?
Amazon Kinesis stream Amazon KinesisAnalytics
cognitoID os quadrant
<guid1> Android A
<guid2> iOS B
Source Data for Kinesis Analytics
{
"recordTime": 1486505943.204,
"cognitoId": "us-east-1:<guid>",
"os": "Android",
"quadrant": "A"
}
27. How is streaming data accessed with SQL?
STREAM
• Analogous to a TABLE
• Represents continuous data flow
CREATE OR REPLACE STREAM DISTINCT_USER_STREAM(
COGNITO_ID VARCHAR(64),
DEVICE VARCHAR(32),
OS VARCHAR(32),
QUADRANT char(1),
DT TIMESTAMP);
28. How is streaming data accessed with SQL?
PUMP
• Continuous INSERT query
• Inserts data from one in-application stream to another
CREATE OR REPLACE PUMP "DISTINCT_USER_PUMP" AS
INSERT INTO "DISTINCT_USER_STREAM"
SELECT STREAM DISTINCT
"cognitoId",
...
29. How do we model our data?
DISTINCT_USERS_STREAM
•COGNITO_ID
•OS
•QUADRANT
•DT
DESTINATION_SQL_STREAM
•UNIQUE_USER_COUNT
•ANDROID_COUNT
•IOS_COUNT
•OTHER_OS_COUNT
•QUADRANT_A_COUNT
•QUADRANT_B_COUNT
•QUADRANT_C_COUNT
•QUADRANT_C_COUNT
SOURCE_STREAM
•cognitoID
•os
•quadrant
Kinesis
stream
Kinesis
Stream
Pump
Kinesis Analytics Application
30. How do we get distinct user records?
Use PUMP to insert distinct records into in-app STREAM
CREATE OR REPLACE PUMP "DISTINCT_USER_PUMP" AS
INSERT INTO "DISTINCT_USER_STREAM"
SELECT STREAM DISTINCT
"cognitoId",
"device",
"os",
"quadrant",
FLOOR(s.ROWTIME TO SECOND)
FROM "SOURCE_SQL_STREAM_001" s;
DISTINCT_USERS_STREAM
•COGNITO_ID
•OS
•QUADRANT
•DT
SOURCE_STREAM
•cognitoID
•os
•quadrant
31. How do we aggregate streaming data?
• A common requirement in streaming
analytics is to perform set-based operation(s)
(count, average, max, min,..) over events
that arrive within a specified period of time
• Cannot simply aggregate over an entire table
like typical static database
• How do we define a subset in a potentially infinite
stream?
• Windowing functions!
32. Windowing Concepts
• Windows can be tumbling or sliding
• Windows are fixed length
Output record will have the timestamp of the end of the window
1 5 4 26 8 6 4
t1 t2 t5 t6t3 t4
Time
Window1 Window2 Window3
Aggregate
Function (Sum)
18 14
Output Events
33. Comparing Types of Windows
• Output created at the end of the window
• The output of the window will be single event based on the
aggregate function used
Tumbling window
Aggregate per time interval
Sliding window
Windows constantly re-evaluated
34. How do we aggregate per second?
• Tumbling window, group by time period
CREATE OR REPLACE PUMP "OUTPUT_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
COUNT(dus.COGNITO_ID) AS UNIQUE_USER_COUNT,
COUNT((CASE WHEN dus.OS = 'Android' THEN COGNITO_ID ELSE null END)) AS ANDROID_COUNT,
COUNT((CASE WHEN dus.OS = 'iOS' THEN COGNITO_ID ELSE null END)) AS IOS_COUNT,
COUNT((CASE WHEN dus.OS = 'Windows Phone' THEN COGNITO_ID ELSE null END)) AS WINDOWS_PHONE_COUNT,
COUNT((CASE WHEN dus.OS = 'other' THEN COGNITO_ID ELSE null END)) AS OTHER_OS_COUNT,
COUNT((CASE WHEN dus.QUADRANT = 'A' THEN COGNITO_ID ELSE null END)) AS QUADRANT_A_COUNT,
COUNT((CASE WHEN dus.QUADRANT = 'B' THEN COGNITO_ID ELSE null END)) AS QUADRANT_B_COUNT,
COUNT((CASE WHEN dus.QUADRANT = 'C' THEN COGNITO_ID ELSE null END)) AS QUADRANT_C_COUNT,
COUNT((CASE WHEN dus.QUADRANT = 'D' THEN COGNITO_ID ELSE null END)) AS QUADRANT_D_COUNT,
ROWTIME
FROM "DISTINCT_USER_STREAM" dus
GROUP BY
FLOOR(dus.ROWTIME TO SECOND);
36. Processing a Kinesis Streams with AWS Lambda
Shard 1 Shard 2 Shard 3 Shard 4 Shard n
Kinesis Stream
. . .
. . .
• Single instance of Lambda function per shard
• Polls shard once per second
• Lambda function instances created and removed automatically as stream
is scaled
Gets Records
1x per sec
10k records
37. Persist aggregated data in DynamoDB
Amazon
Kinesis Stream
Lambda event
source mapping
Lambda
Function
Amazon
DynamoDB
event.Records.forEach((record) => {
const payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
var docClient = new AWS.DynamoDB.DocumentClient();
var table = "user-quadrant-data";
var data = JSON.parse(payload);
var params = {
TableName: table,
Item:{
"dataType": "quadrantRollup",
"windowtime": (new Date(data.WINDOW_TIME)).getTime(),
"userCount": data.UNIQUE_USER_COUNT,
"quadrantA": data.QUADRANT_A_COUNT,
"quadrantB": data.QUADRANT_B_COUNT, ...
}
};
docClient.put(params, function(err, data) { ...
Narrative: The reality is that most data is produced continuously and is coming at us at lightning speeds due to an explosive growth of real-time data sources.
TP: Machine data will make up 40% of our digital universe by 2020
Narrative: Whether it is log data coming from mobile and web applications, purchase data from ecommerce sites, or sensor data from IoT devices, it all delivers information that can help companies learn about what their customers, organization, and business are doing right now.
TP: Customer Benefits
Improve operational efficiencies, improve customer experiences, new business models
Smart building: reduce energy costs, cut maintenance, increase safety and security
Smart textiles: monitor skin temperature, monitor stress
Narrative: So how much is this data worth? Well, it depends…
Recent data is highly valuable
If you act on it in time
Perishable Insights (M. Gualtieri, Forrester)
Old + Recent data is more valuable
If you have the means to combine them
Narrative: Processing real-time data as it arrives can let you make decisions much faster and get the most value from your data. But, building your own custom applications to process streaming data is complicated and resource intensive. You need to train or hire developers with the right skillsets, and then wait for months for the applications to be built and fine-tuned, and the operate and scale the application as the business grows.
All of this takes lots of time and money, and, at the end of the day, lots of companies just never get there, settle for the status-quo, and live with information that is hours or days old.
Narrative: You need a different set of analytical tools to collect and analyze real-time streaming data than what you have traditionally used for data at rest. With traditional analytics, you gather the information, store it in a database, and analyze it hours, days, or weeks later. Analyzing real-time data requires a different approach. Instead of running database queries on stored data, streaming analytics platforms have to process the data continuously and before the data lands in a database. And streaming data comes in at an incredible rate that can vary up and down all the time. Streaming analytics platforms have to be able to process this data when it arrives, often at speeds of millions and even tens of millions of events per hour.
Key requirements of stream processing
Durable: Durable ingest so that processing can be repeatable;
Continuous - Need to always be processing the latest data
Fast: Frequency (micro batches, size of batches, true streaming), and speed (sub-second, minute, hour)
Correct: at most once, at least once, and exactly once processing; event time, ingest time, processing time.
Reactive: Ability to process and respond in near real-time; feedback mechanisms to send processed data to live applications
Reliable: Highly available, fast failovers
Since Amazon Kinesis launch in 2013, the ecosystem evolved and we introduced Kinesis Firehose and Kinesis Analytics.
Streams was launched in GA at re:Invent 2014, Firehose at re:Invent 2015, and Analytics was launched in August 2016
We have continuously iterated to make it easier for customers to use streaming data, as well as expand the functionality of real-time processing
Together, these three products make up the Amazon Kinesis streaming data platform
Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume.
Build real-time applications: Perform continual processing on streaming data using Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.
Low cost: Cost-efficient for workloads of any scale.
A shard is a group of data records in a stream. When you create a stream, you specify the numberof shards for the stream.
Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). The total capacity of a stream is the sum of the capacities of its shards. You can increase or decrease the number of shards in a stream as needed. However, note that you are charged on a per-shard basis.
Apply SQL on streams: Easily connect to a Kinesis Stream or Firehose Delivery Stream and apply SQL skills.
Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies
Easy Scalability : Elastically scales to match data throughput for most workloads
Easy and interactive experience: Complete most stream processing use cases in minutes, and easily progress toward sophisticated scenarios
Zero Admin: Capture and deliver streaming data into S3, Redshift, ElasticCache and other AWS destinations without writing an application or managing infrastructure
Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into S3, and other destinations in as little as 60 secs, set up in minutes
Seamless elasticity: Seamlessly scales to match data throughput
(feedback: add bullet to discuss why firehose created. Major use case)
Since Amazon Kinesis launch in 2013, the ecosystem evolved and we introduced Kinesis Firehose and Kinesis Analytics.
Streams was launched in GA at re:Invent 2014, Firehose at re:Invent 2015, and Analytics was launched in August 2016
We have continuously iterated to make it easier for customers to use streaming data, as well as expand the functionality of real-time processing
Together, these three products make up the Amazon Kinesis streaming data platform
Please stay within brand by using the attached template.
I’d recommend being visual – use imagery, font color, bold font, etc. in your slides.
Be concise – limit your number of slides and content in them.
It’s always good to have a few slides with backup information in case needed.
Please also make sure there’s VP/Service Leader approval in place for all the content disclosed in the slides and in your call.
Connect to streaming source Select Kinesis Streams or Firehoses as input data for your SQL code
Easily write SQL code to process streaming data - Author applications with a single SQL statement, or build sophisticated applications with multi-step pipelines with advanced analytical functions
Continuously deliver SQL results - Configure one to many outputs for your processed data for real-time alerts, visualizations, and batch analytics
The are three major components of a streaming application; connecting to an input stream, SQL code, and an output destinations.
Inputs include Kinesis Streams and Firehose.
Standard SQL code consists of one or more SQL queries which make up an application. Most applications will have one to three SQL statements that perform ETL after initial schematization, and then several SQL queries to generate real-time analytics on the transformed data. This illustrates the point of the “streaming application” concept; you are chaining SQL statements together in serial and parallel through filters, joins, and merges to build a streaming graph within your application. We believe most customers will have between 5-10 SQL statements, but some customers will build sophisticated applications with 50-100 statements.
Outputs includes Kinesis Streams and Firehose (S3, Redshift, and Elasticsearch(coming Chicago Summit)). The next outputs will be CloudWatch Metrics and Quicksight, but we have yet to confirm whether these will be place for GA.
complex, nested JSON structures supported
Inputs are continuously read, processed, and then written to output destination. This is not like a traditional database; data is continuously processed as opposed to running a query and then seeing a single result set.
1 level of nesting is supported completely; 2-levels of nesting is supported for a single structure. Deeper nesting is supported
Deep nesting (2+ levels) is supported by specifying row paths of desired fields
Multiple record types are supported through canonical schema and/or by manipulating structures in SQL using string functions
Simple mathematical operators (AVG, STDEV, etc.)
String manipulations (SUBSTRING, POSITION)
Tumbling, Sliding, and Custom Windows
Advanced analytics (random sampling, anomaly detection)
Define (streaming) SQL queries against streams.
Join streams to other streams and tables.
Aggregate queries over a time range or a series of rows.
Register specific SQL statements as (stream) views.
STREAM is used to represent both input and in-application data that is similar to a continuously updated table
Windows are simplified to make them more approachable to the average SQL
ROWTIME is a special column for streams to allow user to specify processing windows; this can be used to specify windows by time of processing (recommended) or user event time
At least once processing - data will be re-processed until it has been successfully delivered to a configured destination
Delivery speed heavily dependent upon your SQL queries (i.e. streaming ETL will be very fast versus performing 10 minute aggregations)
Output formats include JSON and CSV
complex, nested JSON structures supported
Inputs are continuously read, processed, and then written to output destination. This is not like a traditional database; data is continuously processed as opposed to running a query and then seeing a single result set.
1 level of nesting is supported completely; 2-levels of nesting is supported for a single structure. Deeper nesting is supported
Deep nesting (2+ levels) is supported by specifying row paths of desired fields
Multiple record types are supported through canonical schema and/or by manipulating structures in SQL using string functions
This type of application is useful in many scenarios, such as:
Continuously computing business metrics like the number of purchases, sending results to a data warehouse like Amazon Redshift for visualization with a BIT tool like Amazon Quicksight.
OR calculating the number application errors and sending results to Amazon Elasticsearch for visualization with Kibana.
Amazon Kinesis Analytics provides the mechanisms to take raw, real-time data and output processed, meaningful information to a variety of destinations in real-time.
Feedback: break into two, flow for introducing windowing needs revision
Tumbling:
Repeat
Are non-overlapping
An event can belong to only one tumbling window
Sliding:
Continuously moves forward in time
Produces an output only during the occurrence of an event
Every window will have at least one event
Events can belong to more than one sliding window
Feedback: break into two, flow for introducing windowing needs revision
AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. AWS Lambda starts running your code within milliseconds of an event such as an image upload, in-app activity, website click, or output from a connected device. You can also use AWS Lambda to create new back-end services where compute resources are automatically triggered based on custom requests. With AWS Lambda you pay only for the requests served and the compute time required to run your code. Billing is metered in increments of 100 milliseconds, making it cost-effective and easy to scale automatically from a few requests per day to thousands per second.
Please stay within brand by using the attached template.
I’d recommend being visual – use imagery, font color, bold font, etc. in your slides.
Be concise – limit your number of slides and content in them.
It’s always good to have a few slides with backup information in case needed.
Please also make sure there’s VP/Service Leader approval in place for all the content disclosed in the slides and in your call.
Feedback: put best practices into context that it’s a fully-managed service
Feedback: put best practices into context that it’s a fully-managed service