by Joyjeet Banerjee, Enterprise Solutions Architect, AWS
Evolving your analytics from batch processing to real-time processing can have a major business impact, but ingesting streaming data into your data warehouse requires building complex streaming data pipelines. Amazon Kinesis Firehose solves this problem by making it easy to transform and load streaming data into Amazon Redshift so that you can use existing analytics and business intelligence tools to extract information in near real-time and respond promptly. In this session, we will dive deep using Amazon Kinesis Firehose to load streaming data into Amazon Redshift reliably, scalably, and cost-effectively. Level: 200
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way.
In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."
In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
by Taz Sayed, Sr Technical Account Manager AWS and Marie Yap, Enterprise Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Need to start querying data instantly? Amazon Athena an interactive query service that makes it easy to interactive queries on data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately.
In this presentation, we will show you how Amazon Athena makes it easy it is to query your data stored in S3
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
Amazon Athena is a new interactive query service that makes it easy to analyze data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this session, we will show you how easy is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way.
In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."
In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
by Taz Sayed, Sr Technical Account Manager AWS and Marie Yap, Enterprise Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Need to start querying data instantly? Amazon Athena an interactive query service that makes it easy to interactive queries on data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately.
In this presentation, we will show you how Amazon Athena makes it easy it is to query your data stored in S3
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
Amazon Athena is a new interactive query service that makes it easy to analyze data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this session, we will show you how easy is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
In this session, we explore the new Network Load Balancer that was launched as part of the Elastic Load Balancing service, which can load balance any kind of TCP traffic. This offers customers a high-performance, scalable, low-cost load balancer that can handle millions of requests per second with very low latencies, while maintaining high levels of performance. Come and learn more about this new Network Load Balancer.
An overview of Amazon Kinesis Firehose, Amazon Kinesis Analytics, and Amazon Kinesis Streams so you can quickly get started with real-time, streaming data.
by Peter Dalton, Principal Consultant AWS and Taz Sayed, Sr Technical Account Manager AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
Performing real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdfAmazon Web Services
In this session, we discuss several options for performing real-time extract, transform, and load (ETL) using Amazon Kinesis, AWS Lambda, AWS Glue, and Amazon S3. We provide an overview of the different options that have distinct advantages in building real-time ETL applications before loading a data lake or warehouse.
Come learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on your object storage workloads.
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters.
In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Speakers:
Ian Meyers, AWS Solutions Architect
Ian McDonald, IT Director, SwiftKey
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting (in its original format) and extract value. In this session, learn how to architect and implement a data lake in the AWS Cloud. Learn about best practices as we walk through architectural blueprints.
Amazon Kinesis Analytics is the easiest way to process streaming data in real time with standard SQL without having to learn new programming languages or processing frameworks. Amazon Kinesis analytics enables you to create and run SQL queries on streaming data so that you can gain actionable insights and respond to your business and customer needs promptly. In this session, we will provide an overview of the capabilities of the Amazon Kinesis Analytics. We will show you how you can build an entire stream processing pipeline to collect, ingest, process, and emit streaming data using Amazon Kinesis Analytics, Amazon Kinesis Firehose, and Amazon Kinesis Streams.
Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialized needs.
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce SPICE - a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
Presented by: Matthew McClean, AWS Partner Solutions Architect, Amazon Web Services
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
by Joyjeet Banerjee, Enterprise Solutions Architect, AWS
Amazon Aurora is a MySQL- and PostgreSQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features in areas like high availability, security, performance management and database cloning. Level 300
In this session, we explore the new Network Load Balancer that was launched as part of the Elastic Load Balancing service, which can load balance any kind of TCP traffic. This offers customers a high-performance, scalable, low-cost load balancer that can handle millions of requests per second with very low latencies, while maintaining high levels of performance. Come and learn more about this new Network Load Balancer.
An overview of Amazon Kinesis Firehose, Amazon Kinesis Analytics, and Amazon Kinesis Streams so you can quickly get started with real-time, streaming data.
by Peter Dalton, Principal Consultant AWS and Taz Sayed, Sr Technical Account Manager AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
Performing real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdfAmazon Web Services
In this session, we discuss several options for performing real-time extract, transform, and load (ETL) using Amazon Kinesis, AWS Lambda, AWS Glue, and Amazon S3. We provide an overview of the different options that have distinct advantages in building real-time ETL applications before loading a data lake or warehouse.
Come learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on your object storage workloads.
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters.
In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Speakers:
Ian Meyers, AWS Solutions Architect
Ian McDonald, IT Director, SwiftKey
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting (in its original format) and extract value. In this session, learn how to architect and implement a data lake in the AWS Cloud. Learn about best practices as we walk through architectural blueprints.
Amazon Kinesis Analytics is the easiest way to process streaming data in real time with standard SQL without having to learn new programming languages or processing frameworks. Amazon Kinesis analytics enables you to create and run SQL queries on streaming data so that you can gain actionable insights and respond to your business and customer needs promptly. In this session, we will provide an overview of the capabilities of the Amazon Kinesis Analytics. We will show you how you can build an entire stream processing pipeline to collect, ingest, process, and emit streaming data using Amazon Kinesis Analytics, Amazon Kinesis Firehose, and Amazon Kinesis Streams.
Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialized needs.
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce SPICE - a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
Presented by: Matthew McClean, AWS Partner Solutions Architect, Amazon Web Services
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
by Joyjeet Banerjee, Enterprise Solutions Architect, AWS
Amazon Aurora is a MySQL- and PostgreSQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features in areas like high availability, security, performance management and database cloning. Level 300
Operations: Production Readiness Review – How to stop bad things from HappeningAmazon Web Services
There is more to deploying code than pushing the deploy button. A good practice that many companies follow is a Production Readiness Review (PRR) which is essentially a pre-flight check list before a service launches. This helps ensure new services are properly architected, monitored, secured, and more. We’ll walk through an example PRR and discuss the value of ensuring each of these is properly taken care of before your service launches.
The presentation at DevFest Tokyo 2017 / @__timakin__
An introduction of blockchain and why go is nice to implement blockchain.
Additionally described about the blockchain projects that are based on Go.
Apache Spark Streaming + Kafka 0.10 with Joan ViladrosarieraSpark Summit
Spark Streaming has supported Kafka since it’s inception, but a lot has changed since those times, both in Spark and Kafka sides, to make this integration more fault-tolerant and reliable.Apache Kafka 0.10 (actually since 0.9) introduced the new Consumer API, built on top of a new group coordination protocol provided by Kafka itself. So a new Spark Streaming integration comes to the playground, with a similar design to the 0.8 Direct DStream approach. However, there are notable differences in usage, and many exciting new features. In this talk, we will cover what are the main differences between this new integration and the previous one (for Kafka 0.8), and why Direct DStreams have replaced Receivers for good. We will also see how to achieve different semantics (at least one, at most one, exactly once) with code examples. Finally, we will briefly introduce the usage of this integration in Billy Mobile to ingest and process the continuous stream of events from our AdNetwork.
Go's simplicity and concurrency model make it an appealing choice for backend systems, but how does it fare for latency-sensitive applications? In this talk, we explore the other side of the coin by providing some tips on writing high-performance Go and lessons learned in the process. We do a deep dive on low-level performance optimizations in order to make Go a more compelling option in the world of systems programming, but we also consider the trade-offs involved.
Andrew Betts Web Developer, The Financial Times at Fastly Altitude 2016
Running custom code at the Edge using a standard language is one of the biggest advantages of working with Fastly’s CDN. Andrew gives you a tour of all the problems the Financial Times and Nikkei solve in VCL and how their solutions work.
Want to build a custom app for Google Home or Google Assistant? Learn the basic concepts and how you can create a custom app to reach your users on new platforms (Google Home, Android, iPhone, and more) and help them get things done.
We'll use serverless tools like Google Cloud Functions as well as API.AI to do intelligent routing of commands to entities and intents.
Video of this talk available at: https://www.youtube.com/watch?v=C492KgDMO0c&list=PLlCd2ljeqltbJQQ79eyxbresnaKkP0TgS&index=1
(BDT320) New! Streaming Data Flows with Amazon Kinesis FirehoseAmazon Web Services
Amazon Kinesis Firehose is a fully-managed, elastic service to deliver real-time data streams to Amazon S3, Amazon Redshift, and other destinations. In this session, we start with overviews of Amazon Kinesis Firehose and Amazon Kinesis Analytics. We then discuss how Amazon Kinesis Firehose makes it even easier to get started with streaming data, without writing a stream processing application or provisioning a single resource. You learn about the key features of Amazon Kinesis Firehose, including its companion agent that makes emitting data from data producers even easier. We walk through capture and delivery with an end-to-end demo, and discuss key metrics that will help developers and architects understand their streaming data flow. Finally, we look at some patterns for data consumption as the data streams into S3. We show two examples: using AWS Lambda, and how you can use Apache Spark running within Amazon EMR to query data directly in Amazon S3 through EMRFS.
AWS October Webinar Series - Introducing Amazon Kinesis FirehoseAmazon Web Services
Extracting real-time information from streaming data generated by mobile devices, sensors, and servers used to require distributed systems skills and writing custom code.
Amazon Kinesis Firehose makes it easy to load streaming data into AWS, and enables you to process or analyze data in near real-time with business intelligence tools you’re already using.
In this webinar, we will provide an overview of Amazon Kinesis Firehose. We will then walk through a demo showing how to create an Amazon Kinesis Firehose delivery stream, send data to the stream, and configure it to load the data automatically into Amazon S3 and Amazon Redshift.
Amazon Kinesis Firehose was launched in the end of 2015 and provides easy way for customers to ingest streaming data to Amazon Redshift, Amazon Elasticsearch and Amazon S3. In 2016, we introduced new features that helps customers to implement in-line processing within Kinesis firehose using AWS Lambda. Amazon Kinesis Firehose makes it easy to load real-time, streaming data into AWS without having to build custom stream processing applications. This session is designed for developers, data engineers and analysts who are looking to load, analyze and get powerful insights from real0time streaming data using existing analytics tools. We will explore key features and walk through how to use Kinesis Firehose to ingest streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift and Amazon Elasticsearch service.
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Amazon Web Services
Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialized needs.
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Amazon Web Services
Log analytics is a common big data use case that allows you to analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications including digital marketing, application monitoring, fraud detection, ad tech, gaming, and IoT. In this tech talk, we will walk you step-by-step through the process of building an end-to-end analytics solution that ingests, transforms, and loads streaming data using Amazon Kinesis Firehose, Amazon Kinesis Analytics and AWS Lambda. The processed data will be saved to an Amazon Elasticsearch Service cluster, and we will use Kibana to visualize the data in near real-time.
Learning Objectives:
1. Reference architecture for building a complete log analytics solution
2. Overview of the services used and how they fit together
3. Best practices for log analytics implementation
Nesta sessão faremos uma demonstração de controle e defesa de tráfego aéreo utilizando processamento em tempo real. Trataremos das boas práticas para ingestão, armazenamento, processamento e visualização de dados através de serviços da AWS como Kinesis, DynamoDB, Lambda, Redshift, Quicksight e Amazon Machine Learning.
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
It is becoming increasingly important to analyze real time streaming data. It allows organizations to remain competitive by uncovering relevant, actionable insights. AWS makes it easy to capture, store, and analyze real-time streaming data.
In this webinar, we will guide you through some of the proven architectures for processing streaming data, using a combination of tools including Amazon Kinesis Streams, AWS Lambda, and Spark Streaming on Amazon Elastic MapReduce (EMR). We will then talk about common use cases and best practices for real-time data analysis on AWS.
Learning Objectives:
Understand how you can analyze real-time data streams using Amazon Kinesis, AWS Lambda, and Spark running on Amazon EMR
Learn use cases and best practices for streaming data applications on AWS
AWS has a large and growing portfolio of big data management and analytics services, designed to integrate into solution architectures to meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, to explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseAmazon Web Services
Evolving your analytics from batch processing to real-time processing can have a major business impact, but ingesting streaming data into your data warehouse requires building complex streaming data pipelines. Amazon Kinesis Firehose solves this problem by making it easy to transform and load streaming data into Amazon Redshift so that you can use existing analytics and business intelligence tools to extract information in near real-time and respond promptly. In this session, we will dive deep using Amazon Kinesis Firehose to load streaming data into Amazon Redshift reliably, scalably, and cost-effectively.
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
Get answers to technical questions, frequently asked by those starting to work with streaming data. Learn best practices for building a real-time streaming data architecture on AWS with Amazon Kinesis, Spark Streaming, AWS Lambda, and Amazon EMR. First, we will focus on building a scalable, durable streaming data ingestion workflow from data producers like mobile devices, servers, or even web browsers. We will provide guidelines to minimize duplicates and achieve exactly-once processing semantics in your stream-processing applications. Then, we will show some of the proven architectures for processing streaming data using a combination of tools including Amazon Kinesis Stream, AWS Lambda, and Spark Streaming running on Amazon EMR.
Cloud Data Migration Strategies - AWS May 2016 Webinar SeriesAmazon Web Services
AWS offers a variety of methods to migrate your data into the cloud. You may want perform regular backups, start collecting device streams, migrate a single large datastore, or simply establish dedicated connectivity and figure out what to do next. Which AWS cloud data migration offering is right for your needs?
This webinar will give you an overview of the six data migration tools we offer, including the strengths and weaknesses of each, as well as their complementary opportunities.
Learning Objectives:
• An overview of cloud data migration
• The basics of the six services (Direct Connect, Storage Gateway, Snowball, Transfer Acceleration, Firehose, 3rd party partners)
• An overview of the Amazon Content Distribution network and how it can help with long distance transfers into and out of the cloud
• Special emphasis on the new Amazon S3 Transfer Acceleration feature
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAmazon Web Services
Different types and sizes of data require different strategies. In this session, learn about the various features and services available for migrating data, be it small ongoing transactional data or large multi-petabyte volumes. Come learn how customers are using the latest network, streaming and large scale ingest features for their cloud data migrations to AWS storage services.
Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialised needs.
Presented at the AWS Summit in London, here's a deep dive on getting started with Amazon Kinesis and use-case with Jampp, the world's leading mobile app marketing platform.
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce SPICE - a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools. NOTE: Make this more themed towards QuickSight as it applies to other AWS Big Data Services - Redshift, Athena, S3, RDS.
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
Analyzing large data sets requires significant compute and storage capacity that can vary in size based on the amount of input data and the analysis required. This characteristic of big data workloads is ideally suited to the pay-as-you-go cloud model, where applications can easily scale up and down based on demand. Learn how Amazon S3 can help scale your big data platform. Hear from Redfin and Twitter about how they build their big data platforms on AWS and how they use S3 as an integral piece of their big data platforms.
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
Similar to Streaming Data Analytics with Amazon Redshift and Kinesis Firehose (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
1. Pop-up Loft
Streaming Data Analytics with Amazon Kinesis Firehose and
Redshift
Joyjeet Banerjee
Enterprise Solutions Architect
2. What to Expect From This Session
Amazon Kinesis streaming data on the AWS cloud
• Amazon Kinesis Streams
• Amazon Kinesis Firehose
• Amazon Kinesis Analytics
3. What to Expect From This Session
Amazon Kinesis streaming data on the AWS cloud
• Amazon Kinesis Streams
• Amazon Kinesis Firehose (focus of this session)
• Amazon Kinesis Analytics
4. What to Expect From This Session
Amazon Kinesis and Streaming data on the AWS cloud
Amazon Kinesis Firehose
• Key concepts
• Experience for S3 and Redshift
• Putting data using the Amazon Kinesis Agent and Key APIs
• Pricing
• Data delivery patterns
• Understanding key metrics
• Troubleshooting
5. Amazon Kinesis: Streaming Data Done the AWS Way
Makes it easy to capture, deliver, and process real-time data streams
Pay as you go, no up-front costs
Elastically scalable
Right services for your specific use cases
Real-time latencies
Easy to provision, deploy, and manage
6. Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
Real-Time Streaming Data Ingestion
Custom-built
Streaming
Applications
(KCL)
Inexpensive: $0.014 per 1,000,000 PUT Payload Units
Amazon Kinesis Streams
Fully managed service for real-time processing of streaming data
9. Amazon Kinesis Streams, features from customer…
Kinesis Producer Library
PutRecords API, 500 records or 5 MB payload
Kinesis Client Library in Python, Node.JS, Ruby…
Server-Side Timestamps
Increased individual max record payload 50 KB to 1 MB
Reduced end-to-end propagation delay
Extended Stream Retention from 24 hours to 7 days
10. Amazon Kinesis Streams
Build your own data streaming applications
Easy administration: Simply create a new stream, and set the desired level of
capacity with shards. Scale to match your data throughput rate and volume.
Build real-time applications: Perform continual processing on streaming big data
using Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.
Low cost: Cost-efficient for workloads of any scale.
11. Sushiro: Kaiten Sushi Restaurants
380 stores stream data from sushi plate sensors and stream to Kinesis
12. Sushiro: Kaiten Sushi Restaurants
380 stores stream data from sushi plate sensors and stream to Kinesis
13. Sushiro: Kaiten Sushi Restaurants
380 stores stream data from sushi plate sensors and stream to Kinesis
14. Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3 and Amazon Redshift
Zero administration: Capture and deliver streaming data into S3, Redshift, and
other destinations without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data
for delivery into data destinations in as little as 60 secs using simple configurations.
Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit
streaming data to Firehose
Firehose loads streaming data
continuously into S3 and Redshift
Analyze streaming data using your favorite
BI tools
16. 1. Delivery Stream: The underlying entity of Firehose. Use Firehose by
creating a delivery stream to a specified destination and send data to it.
• You do not have to create a stream or provision shards.
• You do not have to specify partition keys.
2. Records: The data producer sends data blobs as large as 1,000 KB to a
delivery stream. That data blob is called a record.
3. Data Producers: Producers send records to a Delivery Stream. For
example, a web server sends log data to a delivery stream is a data
producer.
Amazon Kinesis Firehose
3 Simple Concepts
17. Amazon Kinesis Firehose Console Experience
Unified Console Experience for Firehose and Streams
18. Amazon Kinesis Firehose Console Experience (S3)
Create fully managed resources for delivery without building an app
19. Amazon Kinesis Firehose Console Experience
Configure data delivery options simply using the console
21. Amazon Kinesis Firehose to Redshift
A two-step process
Use customer-provided S3 bucket as an intermediate destination
• Still the most efficient way to do large-scale loads to Redshift.
• Never lose data, always safe, and available in your S3 bucket.1
22. Amazon Kinesis Firehose to Redshift
A two-step process
Use customer-provided S3 bucket as an intermediate destination
• Still the most efficient way to do large scale loads to Redshift.
• Never lose data, always safe, and available in your S3 bucket.
Firehose issues customer-provided COPY command
synchronously. It continuously issues a COPY command once the
previous COPY command is finished and acknowledged back from
Redshift.
1
2
23. Amazon Kinesis Firehose Console
Configure data delivery to Redshift simply using the console
24. Amazon Kinesis Agent
Software agent makes submitting data to Firehose easy
• Monitors files and sends new data records to your delivery stream
• Handles file rotation, checkpointing, and retry upon failures
• Delivers all data in a reliable, timely, and simple manner
• Emits AWS CloudWatch metrics to help you better monitor and
troubleshoot the streaming process
• Supported on Amazon Linux AMI with version 2015.09 or later, or Red Hat
Enterprise Linux version 7 or later. Install on Linux-based server
environments such as web servers, front ends, log servers, and more
• Also enabled for Streams
25. • CreateDeliveryStream: Create a DeliveryStream. You specify the S3 bucket
information where your data will be delivered to.
• DeleteDeliveryStream: Delete a DeliveryStream.
• DescribeDeliveryStream: Describe a DeliveryStream and returns configuration
information.
• ListDeliveryStreams: Return a list of DeliveryStreams under this account.
• UpdateDestination: Update the destination S3 bucket information for a
DeliveryStream.
• PutRecord: Put a single data record (up to 1000KB) to a DeliveryStream.
• PutRecordBatch: Put multiple data records (up to 500 records OR 5MBs) to a
DeliveryStream.
Amazon Kinesis Firehose API Overview
26. Amazon Kinesis Firehose Pricing
Simple, pay-as-you-go and no up-front costs
Dimension Value
Per 1 GB of data ingested $0.035
28. Amazon Kinesis Streams is a service for workloads that requires
custom processing, per incoming record, with sub-1 second
processing latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is a service for workloads that require
zero administration, ability to use existing analytics tools based
on S3 or Redshift, and a data latency of 60 seconds or higher.
31. Data Delivery Methodology for Amazon S3
Controlling size and frequency of S3 objects
• Single delivery stream delivers to single S3 bucket
• Buffer size/interval values control size + frequency of data
delivery
• Buffer size – 1 MB to 128 MBs or
• Buffer interval - 60 to 900 seconds
• Firehose concatenates records into a single larger object
• Condition satisfied first triggers data delivery
• (Optional) compress data after data is buffered
• GZIP, ZIP, SNAPPY
• Delivered S3 objects might be smaller than total data Put into Firehose
• For Amazon Redshift, GZIP is only supported format currently
32. Data Delivery Methodology for Amazon S3
AWS IAM Roles and Encryption
• Firehose needs IAM role to access to your S3 bucket
• Delivery stream assumes specified role to gain access to target
S3 bucket
• (Optional) encrypt data with AWS Key Management Service
• AWS KMS is a managed service that makes it easy for you to create and
control the encryption keys used to encrypt your data
• Uses Hardware Security Modules (HSMs) to protect keys
• Firehose passes KMS-id to S3 which uses existing integration
• S3 uses bucket and object name as encryption context
33. Data Delivery Methodology for Amazon S3
S3 Object Name Format
• Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH before
putting objects to Amazon S3. Translates to an Amazon S3 folder
structure. Each ‘/’ is a sub-folder
• Specify S3-prefix to get your own top-level folder
• Modify this by adding your own top-level folder with ‘/’ to get
myApp/YYYY/MM/DD/HH
• Or simply pre-pend text without a ‘/’ to get myapp YYYY/MM/DD/HH
• Expect following naming pattern “DeliveryStreamName-
DeliveryStreamVersion-YYYY-MM-DD-HH-MM-SS-RandomString”
• DeliveryStreamVersion == 1 if delivery stream config is not updated
• DeliveryStreamVersion increases by 1 for every config change
34. Data Delivery Methodology for Amazon S3
If Firehose cannot reach your S3 bucket
• If your Amazon S3 bucket is not reachable by Amazon
Kinesis Firehose, Firehose tries to access the S3 bucket
every 5 seconds until the issue is resolved.
• Firehose durably stores your data for 24 hours
• It resumes delivery (per configurations) as soon as your S3
bucket becomes available and catches up
• Data can be lost if bucket is not accessible for > 24 hours
36. Data Delivery Methodology for Amazon Redshift
2-step process executed on your behalf
• Use customer-provided S3 bucket as an intermediate
destination
• Still the most efficient way to do large scale loads to Redshift
• Never lose data, always safe, and available in your S3 bucket
• Firehose issues customer-provided Redshift COPY
command synchronously
• Loads data from intermediate-S3 bucket to Redshift cluster
• Single delivery stream loads into a single Redshift cluster,
database, and table
37. Data Delivery Methodology for Amazon Redshift
Frequency of loads into Redshift Cluster
• Based on delivery to S3 – buffer size and interval
• Redshift COPY command frequency
• Continuously issues the Redshift COPY command once the
previous one is acknowledged from Redshift.
• Frequency of COPYs from Amazon S3 to Redshift is determined
by how fast your Amazon Redshift cluster can finish the COPY
command
• For efficiency, Firehose uses manifest copy
• Intermediate S3 bucket contains ‘manifests’ folder – that holds a
manifest of files that are to be copied into Redshift
38. Data Delivery Methodology for Amazon Redshift
Redshift COPY command mechanics
• Firehose issues the Redshift COPY command with zero
error tolerance
• If a single record within a file fails, the whole batch of files
within the COPY command fails
• Firehose does not perform partial file or batch copies to
your Amazon Redshift table
• Skipped objects' information is delivered to S3 bucket as
manifest file in the errors folder
39. Data Delivery Methodology for Amazon Redshift
If Redshift cluster is not accessible
• If Firehose cannot reach cluster it tries every 5 minutes
for up to a total of 60 minutes
• If cluster is still not reachable after 60 minutes, it skips
the current batch of S3 objects and moves on to next
• Like before, skipped objects' information is delivered to
S3 bucket as manifest file in the errors folder
• Use info for manual backfill as appropriate
41. Amazon Kinesis Firehose Monitoring
Load to S3 specific metrics
Metrics Description
Incoming.
Bytes
The number of bytes ingested into the Firehose delivery stream.
Incoming.
Records
The number of records ingested into the Firehose delivery stream.
DeliveryToS3.
Bytes
The number of bytes delivered to Amazon S3
DeliveryToS3.
DataFreshness
Age of the oldest record in Firehose. Any record older than this
age has been delivered to the S3 bucket.
DeliveryToS3.
Records
The number of records delivered to Amazon S3
DeliveryToS3.
Success
Sum of successful Amazon S3 put commands over sum of all
Amazon S3 put commands.
42. Amazon Kinesis Firehose Monitoring
Load to Redshift specific metrics
Metrics Description
DeliveryToRedshift.
Bytes
The number of bytes copied to Amazon Redshift
DeliveryToRedshift.
Records
The number of records copied to Amazon Redshift
DeliveryToRedshift.
Success
Sum of successful Amazon Redshift COPY
commands over the sum of all Amazon Redshift
COPY commands.
43. Amazon Kinesis Firehose Troubleshooting
Is data flowing end-to-end?
• Key question: Is data flowing through Firehose from
producers to the S3 destination end-to-end?
• Key metrics: Incoming.Bytes, and Incoming.Records
metrics along with DeliveryToS3.Success and
DeliveryToRedshift.Success can be used to confirm
that data is flowing end-to-end
• Additionally, check the Put* APIs Bytes, Latency, and
Request metrics for Firehose and or S3
44. Amazon Kinesis Firehose Troubleshooting
What is being captured, and what is being delivered?
• Key question: How much data is being captured and delivered to the
destination?
• Key metrics: Put* Records, Bytes, and Requests to determine the
data volume captured by Firehose. Next check
DeliveryToS3.Bytes/ Records and
DeliverytoRedshift.Bytes/Records
• Additionally, check the S3 bucket related metrics above confirm the
data volume delivered to S3. Verify with Incoming.Records and
Incoming.Bytes
45. Amazon Kinesis Firehose Troubleshooting
Is my data showing ‘on time’?
• Key question: Is Firehose delivering data in a timely
way?
• Key metrics: DeliveryToS3.DataFreshness in seconds
helps determine freshness of data. It shows the
maximum age in seconds of oldest undelivered record in
Firehose.
• Any record that is older than this age has been delivered
into S3.
46. Amazon Kinesis Firehose Troubleshooting
Something wrong with my Firehose or bucket or cluster?
• Key question: Is something wrong with Firehose or with the
customers’ own configuration, say S3 bucket?
• Key metrics: The DeliveryToS3.Success metric computes the
sum of successful S3 Puts over sum of all S3 puts managed by
Firehose. Similarly DeliveryToRedshift.Success metric
computes sum of successful COPY commands over sum of all
COPY commands.
• If this trends below “1” it suggests potential issues with the S3
bucket configuration such that Firehose puts to S3 are failing.
47. • Check Incoming.Bytes and Incoming.Records metrics to ensure
that data is Put successfully.
• Check DeliveryToS3.Success metric to ensure that Firehose is
putting data to your S3 bucket.
• Ensure that the S3 bucket specified in your delivery stream still
exists.
• Ensure that the IAM role specified in your delivery stream has
access to your S3 bucket.
Amazon Kinesis Firehose Troubleshooting
Something key steps if data isn’t showing up in S3
48. • Check Incoming.Bytes and Incoming.Records metrics to ensure
that data is Put successfully, and the DeliveryToS3.Success
metric to ensure that data is being staged in the S3 bucket.
• Check DeliveryToRedshift.Success metric to ensure that
Firehose has copied data from your S3 bucket to the cluster.
• Check Redshift STL_LOAD_ERRORS table to verify the reason
of the COPY failure.
• Ensure that Redshift config in Firehose is accurate
Amazon Kinesis Firehose Troubleshooting
Something key steps if data isn’t showing up in Redshift
50. Amazon Kinesis Analytics
Analyze data streams continuously with standard SQL
Apply SQL on streams: Easily connect to data streams and apply
existing SQL skills.
Build real-time applications: Perform continual processing on
streaming big data with sub-second processing latencies
Scale elastically: Elastically scales to match data throughput without
any operator intervention.
Announcement
Only!
Amazon Confidential
Connect to Kinesis streams,
Firehose delivery streams
Run standard SQL queries
against data streams
Kinesis Analytics can send processed
data to analytics tools so you can create
alerts and respond in real-time
52. Amazon Kinesis
Streams
Build your own custom
applications that
process or analyze
streaming data
Amazon Kinesis
Firehose
Easily load massive
volumes of streaming
data into Amazon S3
and Redshift
Amazon Kinesis
Analytics
Easily analyze data
streams using
standard SQL queries
Amazon Kinesis: Streaming data made easy
Services make it easy to capture, deliver, and process streams on AWS
Amazon Confidential