- The document discusses data analysis techniques on AWS including using Amazon EMR, Kinesis, Glue, S3, and Personalize. It provides examples of architectures for ingesting and analyzing streaming and batch data on AWS services. It also covers data processing techniques like TF-IDF and recommendations using Amazon Personalize.
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...Amazon Web Services
Organizations processing mission critical high-volume data must be able to achieve high levels of throughput and durability in data processing workflows. In this session, we will learn how DataXu is using Amazon Kinesis, Amazon S3, and Amazon EMR for its patented approach to programmatic marketing. Every second, the DataXu Marketing Cloud processes over 1 Million ad requests and makes more than 40 billion decisions to select and bid on ad impressions that are most likely to convert. In addition to addressing the scalability and availability of the platform, we will explore Amazon Kinesis producer and consumer applications that support high levels of scalability and durability in mission-critical record processing.
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsAmazon Web Services
Today, many architects and developers are looking to build solutions that integrate batch and real-time data processing, and deliver the best of both approaches. Lambda architecture (not to be confused with the AWS Lambda service) is a design pattern that leverages both batch and real-time processing within a single solution to meet the latency, accuracy, and throughput requirements of big data use cases. Come join us for a discussion on how to implement Lambda architecture (batch, speed, and serving layers) and best practices for data processing, loading, and performance tuning.
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
"Learn how to architect a data lake where different teams within your organization can publish and consume data in a self-service manner. As organizations aim to become more data-driven, data engineering teams have to build architectures that can cater to the needs of diverse users - from developers, to business analysts, to data scientists. Each of these user groups employs different tools, have different data needs and access data in different ways.
In this talk, we will dive deep into assembling a data lake using Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue. The session will feature Mohit Rao, Architect and Integration lead at Atlassian, the maker of products such as JIRA, Confluence, and Stride. First, we will look at a couple of common architectures for building a data lake. Then we will show how Atlassian built a self-service data lake, where any team within the company can publish a dataset to be consumed by a broad set of users."
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services
Learning Objectives:
- Understand key requirements for collecting, preparing, and loading streaming data into data lakes
- Get an overview of transmitting data using Amazon Kinesis Firehose
- Learn how to perform data transformations with Amazon Kinesis Firehose
Data lakes enable your employees across the organization to access and analyze massive amounts of unstructured and structured data from disparate data sources, many of which generate data continuously and rapidly. Making this data available in a timely fashion for analysis requires a streaming solution that can durably and cost-effectively ingest this data into your data lake. Amazon Kinesis Firehose is a fully managed service that makes it easy to prepare and load streaming data into AWS. In this tech talk, we will provide an overview of Amazon Kinesis Firehose and dive deep into how you can use the service to collect, transform, batch, compress, and load real-time streaming data into your Amazon S3 data lakes.
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
by Adrian Hornsby, Technical Evangelist, AWS
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. In this session, you’ll learn about how AWS customers are transitioning from batch to real-time processing using Amazon Kinesis, and how to get started. We will provide an overview of streaming data applications and introduce the Amazon Kinesis platform and its services. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Amazon Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.
While a Data Lake can support completely unstructured data, getting performant analytics at scale requires some data preparation. We'll look at how to use Amazon Kinesis, AWS Glue, and Amazon EMR to make raw data ready to high-performance analytics.
Speakers:
Roger Dahlstrom - Solutions Architect, AWS
Bobby Malik - Sr. Technical Account Manager, AWS
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...Amazon Web Services
Organizations processing mission critical high-volume data must be able to achieve high levels of throughput and durability in data processing workflows. In this session, we will learn how DataXu is using Amazon Kinesis, Amazon S3, and Amazon EMR for its patented approach to programmatic marketing. Every second, the DataXu Marketing Cloud processes over 1 Million ad requests and makes more than 40 billion decisions to select and bid on ad impressions that are most likely to convert. In addition to addressing the scalability and availability of the platform, we will explore Amazon Kinesis producer and consumer applications that support high levels of scalability and durability in mission-critical record processing.
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsAmazon Web Services
Today, many architects and developers are looking to build solutions that integrate batch and real-time data processing, and deliver the best of both approaches. Lambda architecture (not to be confused with the AWS Lambda service) is a design pattern that leverages both batch and real-time processing within a single solution to meet the latency, accuracy, and throughput requirements of big data use cases. Come join us for a discussion on how to implement Lambda architecture (batch, speed, and serving layers) and best practices for data processing, loading, and performance tuning.
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
"Learn how to architect a data lake where different teams within your organization can publish and consume data in a self-service manner. As organizations aim to become more data-driven, data engineering teams have to build architectures that can cater to the needs of diverse users - from developers, to business analysts, to data scientists. Each of these user groups employs different tools, have different data needs and access data in different ways.
In this talk, we will dive deep into assembling a data lake using Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue. The session will feature Mohit Rao, Architect and Integration lead at Atlassian, the maker of products such as JIRA, Confluence, and Stride. First, we will look at a couple of common architectures for building a data lake. Then we will show how Atlassian built a self-service data lake, where any team within the company can publish a dataset to be consumed by a broad set of users."
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services
Learning Objectives:
- Understand key requirements for collecting, preparing, and loading streaming data into data lakes
- Get an overview of transmitting data using Amazon Kinesis Firehose
- Learn how to perform data transformations with Amazon Kinesis Firehose
Data lakes enable your employees across the organization to access and analyze massive amounts of unstructured and structured data from disparate data sources, many of which generate data continuously and rapidly. Making this data available in a timely fashion for analysis requires a streaming solution that can durably and cost-effectively ingest this data into your data lake. Amazon Kinesis Firehose is a fully managed service that makes it easy to prepare and load streaming data into AWS. In this tech talk, we will provide an overview of Amazon Kinesis Firehose and dive deep into how you can use the service to collect, transform, batch, compress, and load real-time streaming data into your Amazon S3 data lakes.
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
by Adrian Hornsby, Technical Evangelist, AWS
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. In this session, you’ll learn about how AWS customers are transitioning from batch to real-time processing using Amazon Kinesis, and how to get started. We will provide an overview of streaming data applications and introduce the Amazon Kinesis platform and its services. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Amazon Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.
While a Data Lake can support completely unstructured data, getting performant analytics at scale requires some data preparation. We'll look at how to use Amazon Kinesis, AWS Glue, and Amazon EMR to make raw data ready to high-performance analytics.
Speakers:
Roger Dahlstrom - Solutions Architect, AWS
Bobby Malik - Sr. Technical Account Manager, AWS
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
In this session, we'll review the features and architecture of the new AWS Data Pipeline service and explain how you can use it to better manage your data-driven workloads. We'll then go over a few examples of setting up and provisioning a pipeline in the system.
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...Amazon Web Services
Evolving your analytics from batch processing to real-time processing can have a major business impact, but ingesting streaming data into your data warehouse requires building complex streaming data pipelines. Amazon Kinesis Firehose solves this problem by making it easy to ingest streaming data into Amazon Redshift so that you can use existing analytics and business intelligence tools to extract information in near real-time and respond promptly. In this webinar, we will dive deep using Amazon Kinesis Firehose to load streaming data into Amazon Redshift reliably, scalably, and cost-effectively. Join us to: - Understand the basics of ingesting streaming data from sources such as mobile devices, servers, and websites with Amazon Kinesis Firehose - Get a closer look at how to automate delivery of streaming data to Amazon Redshift reliably using Amazon Kinesis Firehose - Learn techniques to detect, troubleshoot, and avoid data loading problems Who should attend: Developers, data analysts, data engineers, architects
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes, and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
AWS September Webinar Series - Building Your First Big Data Application on AWS Amazon Web Services
The Big Data ecosystem is moving so fast that is nearly impossible to keep pace. Meanwhile, the strong demand for high analytical and data management skills will continue to grow. So, how can you get up to speed?
Join us for this webinar where we will help you get ramped up on how to use Amazon’s Big Data web services. In just 50 minutes, we will build a Big Data application using Amazon Elastic MapReduce and other AWS Big Data Services. In addition, we will review best practices and architecture design patterns for Big Data. Attending re:Invent? One more reason not to miss this webinar, as it will help you get ready for some of our Big Data deep dives!
Learning Objectives:
Learn about key AWS Big Data services including Amazon S3, Amazon EMR, Amazon Kinesis, and Amazon Redshift
Learn about Big Data architectural patterns
How to ingest data to Amazon S3
How to start an Amazon EMR cluster
Help those attending re:Invent to get up to speed with Big Data services
Who Should Attend:
Architects and developers, interested in starting a Big Data initiative
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
Speaker:
Paul Armstrong, Solutions Architect, Amazon Web Services
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computingAmazon Web Services
AWS Batch is a fully-managed service that enables developers, scientists, and engineers to easily and efficiently run batch computing workloads of any scale on AWS. AWS Batch automatically provisions compute resources and optimizes the workload distribution based on the quantity and scale of the workloads. With AWS Batch, there is no need to install or manage batch computing software, allowing you to focus on analyzing results and solving problems. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2, Spot Instances, and AWS Lambda. AWS Batch reduces operational complexities, saving time and reducing costs. In this session, Principal Product Managers Jamie Kinney and Dougal Ballantyne describe the core concepts behind AWS Batch and details of how the service functions. The presentation concludes with relevant use cases and sample code.
The AWS Workshop Series Online is a series of live webinars designed for IT professionals who are looking to leverage the AWS Cloud to build and transform their business, are new to the AWS Cloud or looking to further expand their skills and expertise. In this series, we will cover : 'Modern Data Architectures for Business Insights at Scale'.
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...Amazon Web Services
Amazon QuickSight is a fast BI service that makes it easy for you to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. QuickSight is built to harness the power and scalability of the cloud, so you can easily run analysis on large datasets, and support hundreds of thousands of users. In this session, we’ll demonstrate how you can easily get started with Amazon QuickSight, uploading files, connecting to S3 and Redshift and creating analyses from visualizations that are optimized based on the underlying data. Once we’ve built our analysis and dashboard, we’ll show you easy it is to share it with colleagues and stakeholders in just a few seconds. And with SPICE – QuckSight’s in-memory calculation engine – you can go from data to insights, faster than ever.
Modern Data Architectures for Real Time Analytics & EngagementAmazon Web Services
The AWS Workshop Series Online is a series of live webinars designed for IT professionals who are looking to leverage the AWS Cloud to build and transform their business, are new to the AWS Cloud or looking to further expand their skills and expertise. In this series, we will cover: 'Modern Data Architectures for Real-time Analytics and Engagement'.
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesAmazon Web Services
Streaming data applications can deliver compelling, near real-time user experiences, but building the back-end infrastructure to collect and process streaming data is difficult. Amazon Kinesis Firehose makes it easy for you to load streaming data into AWS without having to build custom stream processing applications. In this webinar, we will introduce Amazon Kinesis Firehose and discuss how to ingest streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service using Amazon Kinesis Firehose. We will also highlight key use cases based on real-world examples from IoT, AdTech, E-Commerce, and Gaming. Join us to: - Get an introduction to streaming data and an overview of Amazon Kinesis Firehose - Learn about common streaming data use cases from IoT, Ad Tech, E-Commerce, and Gaming - Understand how to use Amazon Kinesis Firehose to load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service Who should attend: Developers, data analysts, data engineers, architects
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
Organisations are increasingly gaining insight and knowledge from a number of IoT, API, clickstream, unstructured, and log data sources. Learn how AWS Glue makes it easy to build and manage enterprise-grade data pipelines to ingest, clean, transform, and automatically catalogue data, which enables a variety of use cases such as ad-hoc analytics, data warehousing, big data analysis, and machine learning. Also, find out how to intergrate an end-to-end CI/CD pipeline to automate the release management process for your serverless data pipelines.
Data Analytics Week at the San Francisco Loft
Preparing Data for the Lake
While a Data Lake can support completely unstructured data, getting performant analytics at scale requires some data preparation. We'll look at how to use Amazon Kinesis, AWS Glue, and Amazon EMR to make raw data ready to high-performance analytics.
Speakers:
John Mallory - Principal Business Development Manager Storage (Object), AWS
Hemant Borole - Sr. Big Data Consultant, AWS
An overview of Amazon Kinesis Firehose, Amazon Kinesis Analytics, and Amazon Kinesis Streams so you can quickly get started with real-time, streaming data.
"The only real mistake is the one from which we learn nothing.” So how do we learn from system failures? This session will move beyond “blameless” postmortems and show how to use data to avoid and mitigate future failures. We will share the best practices for gathering systems-related data and people-related data. You will then learn how to apply the data to formulate actionable response plans and avoid repeating failures.
This session is brought to you by AWS Summit New York City sponsor, Datadog."
Learning Objectives:
- Learn the common use-cases for using Athena, AWS' interactive query service on S3
- Learn best practices for creating tables and partitions and performance optimizations
- Learn how Athena handles security, authorization, and authentication
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Amazon Web Services
Learning Objectives:
- Understand how to easily build an end to end, real time log analytics solution
- Get an overview of collecting and processing data in real-time using Amazon Kinesis
- Learn how to Interactively query and visualize your log data using Amazon Elasticsearch Service
Log analytics is a common big data use case that allows you to analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications such as digital marketing, application monitoring, fraud detection, ad tech, gaming, and IoT. Moving your log analytics to real time can speed up your time to information allowing you to get insights in seconds or minutes instead of hours or days. In this session, you will learn how to ingest and deliver logs with no infrastructure using Amazon Kinesis Firehose. We will show how Amazon Kinesis Analytics can be used to process log data in real time to build responsive analytics. Finally, we will show how to use Amazon Elasticsearch Service to interactively query and visualize your log data.
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
Learning Objectives:
- Understand how to build a serverless big data solution quickly and easily
- Learn how to discover and prepare all your data for analytics
- Learn how to query and visualize analytics on all your data to create actionable insights
Data comes in a variety of forms and in order to gain insight from this data you need to have the right platform in place. AWS has the services to cover all types of data, whether you need databases for structured data, Hadoop for unstructured data or a streaming engine for high-velocity data. In this session we will cover the various data analytics services on AWS and when to use them.
Amazon s3 adds new s3 event notifications for s3 lifecycle, s3 intelligent ti...Dhaval Soni
You can now build event-driven applications using Amazon S3 Event Notifications that trigger when objects are transitioned or expired (deleted) with S3 Lifecycle, or moved within the S3 Intelligent-Tiering storage class to its Archive Access or Deep Archive Access tiers. You can also trigger S3 Event Notifications for any changes to object tags or access control lists (ACLs). You can generate these new notifications for your entire bucket, or for a subset of your objects using prefixes or suffixes, and choose to deliver them to Amazon EventBridge, Amazon SNS, Amazon SQS, or an AWS Lambda function.
AWS supports a robust suite of tools and services that makes analyzing and processing large amounts of data in the cloud faster and more efficient. In this builders session, AWS storage and data experts guide you through Amazon S3, Amazon Glacier, and our query-in-place services, such as Amazon S3 Select, Amazon Glacier Select, Amazon Athena, and Amazon Redshift Spectrum. We also provide best practices around using them with other analytics services, like Amazon EMR and AWS Glue to build data lakes and deploy other analytics solutions. Understanding of a data lake construct, AWS storage services, and AWS analytics tools is recommended.
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
Learn how to build a data lake for analytics in Amazon S3 and Amazon Glacier. In this session, we discuss best practices for data curation, normalization, and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon Athena and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR. You'll also get a chance to hear from Airbnb & Viber about their solutions for Big Data analytics using S3 as a data lake.
In this session, we'll review the features and architecture of the new AWS Data Pipeline service and explain how you can use it to better manage your data-driven workloads. We'll then go over a few examples of setting up and provisioning a pipeline in the system.
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...Amazon Web Services
Evolving your analytics from batch processing to real-time processing can have a major business impact, but ingesting streaming data into your data warehouse requires building complex streaming data pipelines. Amazon Kinesis Firehose solves this problem by making it easy to ingest streaming data into Amazon Redshift so that you can use existing analytics and business intelligence tools to extract information in near real-time and respond promptly. In this webinar, we will dive deep using Amazon Kinesis Firehose to load streaming data into Amazon Redshift reliably, scalably, and cost-effectively. Join us to: - Understand the basics of ingesting streaming data from sources such as mobile devices, servers, and websites with Amazon Kinesis Firehose - Get a closer look at how to automate delivery of streaming data to Amazon Redshift reliably using Amazon Kinesis Firehose - Learn techniques to detect, troubleshoot, and avoid data loading problems Who should attend: Developers, data analysts, data engineers, architects
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes, and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
AWS September Webinar Series - Building Your First Big Data Application on AWS Amazon Web Services
The Big Data ecosystem is moving so fast that is nearly impossible to keep pace. Meanwhile, the strong demand for high analytical and data management skills will continue to grow. So, how can you get up to speed?
Join us for this webinar where we will help you get ramped up on how to use Amazon’s Big Data web services. In just 50 minutes, we will build a Big Data application using Amazon Elastic MapReduce and other AWS Big Data Services. In addition, we will review best practices and architecture design patterns for Big Data. Attending re:Invent? One more reason not to miss this webinar, as it will help you get ready for some of our Big Data deep dives!
Learning Objectives:
Learn about key AWS Big Data services including Amazon S3, Amazon EMR, Amazon Kinesis, and Amazon Redshift
Learn about Big Data architectural patterns
How to ingest data to Amazon S3
How to start an Amazon EMR cluster
Help those attending re:Invent to get up to speed with Big Data services
Who Should Attend:
Architects and developers, interested in starting a Big Data initiative
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
Speaker:
Paul Armstrong, Solutions Architect, Amazon Web Services
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computingAmazon Web Services
AWS Batch is a fully-managed service that enables developers, scientists, and engineers to easily and efficiently run batch computing workloads of any scale on AWS. AWS Batch automatically provisions compute resources and optimizes the workload distribution based on the quantity and scale of the workloads. With AWS Batch, there is no need to install or manage batch computing software, allowing you to focus on analyzing results and solving problems. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2, Spot Instances, and AWS Lambda. AWS Batch reduces operational complexities, saving time and reducing costs. In this session, Principal Product Managers Jamie Kinney and Dougal Ballantyne describe the core concepts behind AWS Batch and details of how the service functions. The presentation concludes with relevant use cases and sample code.
The AWS Workshop Series Online is a series of live webinars designed for IT professionals who are looking to leverage the AWS Cloud to build and transform their business, are new to the AWS Cloud or looking to further expand their skills and expertise. In this series, we will cover : 'Modern Data Architectures for Business Insights at Scale'.
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...Amazon Web Services
Amazon QuickSight is a fast BI service that makes it easy for you to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. QuickSight is built to harness the power and scalability of the cloud, so you can easily run analysis on large datasets, and support hundreds of thousands of users. In this session, we’ll demonstrate how you can easily get started with Amazon QuickSight, uploading files, connecting to S3 and Redshift and creating analyses from visualizations that are optimized based on the underlying data. Once we’ve built our analysis and dashboard, we’ll show you easy it is to share it with colleagues and stakeholders in just a few seconds. And with SPICE – QuckSight’s in-memory calculation engine – you can go from data to insights, faster than ever.
Modern Data Architectures for Real Time Analytics & EngagementAmazon Web Services
The AWS Workshop Series Online is a series of live webinars designed for IT professionals who are looking to leverage the AWS Cloud to build and transform their business, are new to the AWS Cloud or looking to further expand their skills and expertise. In this series, we will cover: 'Modern Data Architectures for Real-time Analytics and Engagement'.
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesAmazon Web Services
Streaming data applications can deliver compelling, near real-time user experiences, but building the back-end infrastructure to collect and process streaming data is difficult. Amazon Kinesis Firehose makes it easy for you to load streaming data into AWS without having to build custom stream processing applications. In this webinar, we will introduce Amazon Kinesis Firehose and discuss how to ingest streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service using Amazon Kinesis Firehose. We will also highlight key use cases based on real-world examples from IoT, AdTech, E-Commerce, and Gaming. Join us to: - Get an introduction to streaming data and an overview of Amazon Kinesis Firehose - Learn about common streaming data use cases from IoT, Ad Tech, E-Commerce, and Gaming - Understand how to use Amazon Kinesis Firehose to load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service Who should attend: Developers, data analysts, data engineers, architects
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
Organisations are increasingly gaining insight and knowledge from a number of IoT, API, clickstream, unstructured, and log data sources. Learn how AWS Glue makes it easy to build and manage enterprise-grade data pipelines to ingest, clean, transform, and automatically catalogue data, which enables a variety of use cases such as ad-hoc analytics, data warehousing, big data analysis, and machine learning. Also, find out how to intergrate an end-to-end CI/CD pipeline to automate the release management process for your serverless data pipelines.
Data Analytics Week at the San Francisco Loft
Preparing Data for the Lake
While a Data Lake can support completely unstructured data, getting performant analytics at scale requires some data preparation. We'll look at how to use Amazon Kinesis, AWS Glue, and Amazon EMR to make raw data ready to high-performance analytics.
Speakers:
John Mallory - Principal Business Development Manager Storage (Object), AWS
Hemant Borole - Sr. Big Data Consultant, AWS
An overview of Amazon Kinesis Firehose, Amazon Kinesis Analytics, and Amazon Kinesis Streams so you can quickly get started with real-time, streaming data.
"The only real mistake is the one from which we learn nothing.” So how do we learn from system failures? This session will move beyond “blameless” postmortems and show how to use data to avoid and mitigate future failures. We will share the best practices for gathering systems-related data and people-related data. You will then learn how to apply the data to formulate actionable response plans and avoid repeating failures.
This session is brought to you by AWS Summit New York City sponsor, Datadog."
Learning Objectives:
- Learn the common use-cases for using Athena, AWS' interactive query service on S3
- Learn best practices for creating tables and partitions and performance optimizations
- Learn how Athena handles security, authorization, and authentication
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Amazon Web Services
Learning Objectives:
- Understand how to easily build an end to end, real time log analytics solution
- Get an overview of collecting and processing data in real-time using Amazon Kinesis
- Learn how to Interactively query and visualize your log data using Amazon Elasticsearch Service
Log analytics is a common big data use case that allows you to analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications such as digital marketing, application monitoring, fraud detection, ad tech, gaming, and IoT. Moving your log analytics to real time can speed up your time to information allowing you to get insights in seconds or minutes instead of hours or days. In this session, you will learn how to ingest and deliver logs with no infrastructure using Amazon Kinesis Firehose. We will show how Amazon Kinesis Analytics can be used to process log data in real time to build responsive analytics. Finally, we will show how to use Amazon Elasticsearch Service to interactively query and visualize your log data.
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
Learning Objectives:
- Understand how to build a serverless big data solution quickly and easily
- Learn how to discover and prepare all your data for analytics
- Learn how to query and visualize analytics on all your data to create actionable insights
Data comes in a variety of forms and in order to gain insight from this data you need to have the right platform in place. AWS has the services to cover all types of data, whether you need databases for structured data, Hadoop for unstructured data or a streaming engine for high-velocity data. In this session we will cover the various data analytics services on AWS and when to use them.
Amazon s3 adds new s3 event notifications for s3 lifecycle, s3 intelligent ti...Dhaval Soni
You can now build event-driven applications using Amazon S3 Event Notifications that trigger when objects are transitioned or expired (deleted) with S3 Lifecycle, or moved within the S3 Intelligent-Tiering storage class to its Archive Access or Deep Archive Access tiers. You can also trigger S3 Event Notifications for any changes to object tags or access control lists (ACLs). You can generate these new notifications for your entire bucket, or for a subset of your objects using prefixes or suffixes, and choose to deliver them to Amazon EventBridge, Amazon SNS, Amazon SQS, or an AWS Lambda function.
AWS supports a robust suite of tools and services that makes analyzing and processing large amounts of data in the cloud faster and more efficient. In this builders session, AWS storage and data experts guide you through Amazon S3, Amazon Glacier, and our query-in-place services, such as Amazon S3 Select, Amazon Glacier Select, Amazon Athena, and Amazon Redshift Spectrum. We also provide best practices around using them with other analytics services, like Amazon EMR and AWS Glue to build data lakes and deploy other analytics solutions. Understanding of a data lake construct, AWS storage services, and AWS analytics tools is recommended.
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
Learn how to build a data lake for analytics in Amazon S3 and Amazon Glacier. In this session, we discuss best practices for data curation, normalization, and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon Athena and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR. You'll also get a chance to hear from Airbnb & Viber about their solutions for Big Data analytics using S3 as a data lake.
From the Amazon Web Services Singapore & Malaysia Summits 2015 Track 2 Breakout, 'Big Data and Analytics' Presented by Russell Nash – AWS Solutions Architect
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Amazon Web Services
Data lakes are helping enterprises of all sizes and industries make the most of their data. However, building a data lake requires consideration of your goals and an understanding of data lakes, including data ingestion, data consumption, and usability layers. In this chalk talk, AWS experts and representatives from Sysco, a Fortune 50 company and leader in food distribution and marketing, discuss parts of a data lake, design considerations, and the pros and cons of different architectural designs. They share guidance around data tracking, costs, user access, synchronization, and data integrity so that your data lake complies with governance requirements and works towards your data goals. Sysco representatives share their data lake experiences, best practices, and lessons learned. We highlight Amazon S3 and S3 Select, Amazon Athena, Amazon EMR, Amazon EC2, and Amazon Redshift Spectrum.
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
In this session we will bring some clarity to the increasingly complex big data landscape and look at the common patterns for the ingest, storage, processing, and analysis of different types of data on the AWS platform.
Speaker: Russell Nash, Solutions Architect, Amazon Web Services
Featured Customer - TechnologyOne
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...Amazon Web Services
Building big data applications often requires integrating a broad set of technologies to store, process, and analyze the increasing variety, velocity, and volume of data being collected by many organizations. In this session, we show how you can build entire big data applications using a core set of managed services including Amazon S3, Amazon Kinesis, Amazon EMR, Amazon Elasticsearch Service, Amazon Redshift, and Amazon QuickSight.
We walk you through the steps of building and securing a big data application using the AWS Big Data Platform. We also share best practices and common use cases for AWS big data services, including tips to help you choose the best services for your specific application.
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Amazon Web Services
AWS provides a broad platform of managed services to help you build, secure, and seamlessly scale end-to-end Big Data applications quickly and with ease. Want to get ramped up on how to use Amazon's big data web services? Learn when to use which service? Want to write your first big data application on AWS? Join us in this session as we discuss reference architecture, design patterns, and best practices for pulling together various AWS services to meet your big data challenges.
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)Amazon Web Services
Join us for this general session where AWS big data experts present an in-depth look at the current state of big data. Learn about the latest big data trends and industry use cases. Hear how other organizations are using the AWS big data platform to innovate and remain competitive. Take a look at some of the most recent AWS big data announcements, as we kick off the Big Data re:Source Mini Con.
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Learn about all of the new features released in Amazon S3 & Glacier.
- Understand how to get started with each of these features.
- Determine how these features enhance security and manageability of AWS Storage solutions.
Optimizing data lakes with Amazon S3 - STG302 - New York AWS SummitAmazon Web Services
Data comes in many different forms that don’t easily fit into a traditional database structure. This is where data lakes help, enabling you to store vast amounts of data in its raw form. In this session, AWS experts dive into the benefits of Amazon S3 for building and managing data lakes in the AWS Cloud. Learn about the Amazon S3 integrations with the AWS analytics suite and Amazon FSx for Lustre. Also learn how to seamlessly run big data analytics, high performance computing applications, machine learning training models, media data processing workloads, and more, across your Amazon S3 data lakes.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
https://aws.amazon.com/webinars/anz-webinar-series/
■ 일시 및 장소
2023년 6월 22일 목요일 19:00 ~
■ 아젠다
트위터의 추천 시스템 파헤치기
2023년 4월 5일 오픈소스로 공개된 트위터의 추천 시스템에 대해서 살펴봅니다.
트위터의 개인화/랭킹 후보군을 만들어내는 추천 알고리즘부터, 이를 지탱하는 파이프라인까지 모두 소개합니다.
■ 발표자
카카오스타일 데이터사이언티스트 이명휘
1. Enablement & Deployment
- 오프라인 기초 교육 (격주로 진행)
- 무료 온라인 학습 페이지 (기능별, 주제별, 웹 세미나, 유튜브 강의)
- 도움말 & 유저 커뮤니티
- Tableau Blueprint
2. References
- Tableau Public: Best 시각화 예시, 샘플 대시보드 검색
- Tableau Conference & Tableau Experience
3. Tableau Certification
- Desktop: Specialist, Associate, Professional | Server: Associate, Professional
- https://www.tableau.com/ko-kr/learn/certification
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/