Check out how you can easily query raw data in various formats in Amazon S3, transform it into a canonical form, analyze it, and build dashboards to get more insights from your data.
Introduction to key architectural concepts to build a data lake using Amazon S3 as the storage layer and making this data available for processing with a broad set of analytic options including Amazon EMR and open source frameworks such as Apache Hadoop, Spark, Presto, and more.
Learn how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes.
Need to start querying data instantly? Amazon Athena an interactive query service that makes it easy to interactive queries on data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately.
In this presentation, we will show you how Amazon Athena makes it easy it is to query your data stored in S3
In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters.
In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Speakers:
Ian Meyers, AWS Solutions Architect
Ian McDonald, IT Director, SwiftKey
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting (in its original format) and extract value. In this session, learn how to architect and implement a data lake in the AWS Cloud. Learn about best practices as we walk through architectural blueprints.
IT organizations today need to support a modern, flexible, global workforce and ensure their users can be productive from anywhere. Moving desktops and applications to AWS offers improved security, scale, and performance with cloud economics. In this session, we provide an overview of Amazon WorkSpaces and discuss the use cases for it. Then, we dive deep into best practices for implementing Amazon WorkSpaces, including integrating with your existing identity, security, networking, and storage solutions.
Diego Magalhaes, Senior Solutions Architect, Amazon Web Services
An overview of Amazon Kinesis Firehose, Amazon Kinesis Analytics, and Amazon Kinesis Streams so you can quickly get started with real-time, streaming data.
Introduction to key architectural concepts to build a data lake using Amazon S3 as the storage layer and making this data available for processing with a broad set of analytic options including Amazon EMR and open source frameworks such as Apache Hadoop, Spark, Presto, and more.
Learn how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes.
Need to start querying data instantly? Amazon Athena an interactive query service that makes it easy to interactive queries on data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately.
In this presentation, we will show you how Amazon Athena makes it easy it is to query your data stored in S3
In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters.
In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Speakers:
Ian Meyers, AWS Solutions Architect
Ian McDonald, IT Director, SwiftKey
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting (in its original format) and extract value. In this session, learn how to architect and implement a data lake in the AWS Cloud. Learn about best practices as we walk through architectural blueprints.
IT organizations today need to support a modern, flexible, global workforce and ensure their users can be productive from anywhere. Moving desktops and applications to AWS offers improved security, scale, and performance with cloud economics. In this session, we provide an overview of Amazon WorkSpaces and discuss the use cases for it. Then, we dive deep into best practices for implementing Amazon WorkSpaces, including integrating with your existing identity, security, networking, and storage solutions.
Diego Magalhaes, Senior Solutions Architect, Amazon Web Services
An overview of Amazon Kinesis Firehose, Amazon Kinesis Analytics, and Amazon Kinesis Streams so you can quickly get started with real-time, streaming data.
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce SPICE - a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
Presented by: Matthew McClean, AWS Partner Solutions Architect, Amazon Web Services
Come learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on your object storage workloads.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Organizations are struggling to make sense of their data within antiquated data platforms. Snowflake, the data warehouse built for the cloud, can help.
As the volume and types of data continues to grow, customers often have valuable data that is not easily discoverable and available for analytics. A common challenge for data engineering teams is architecting a data lake that can cater to the needs of diverse users - from developers to business analysts to data scientists. In this session, dive deep into building a data lake using Amazon S3, Amazon Kinesis, Amazon Athena and AWS Glue. Learn how AWS Glue crawlers can automatically discover your data, extracting and cataloguing relevant metadata to reduce operations in preparing your data for downstream consumers.
What is elastic data warehousing, and how does Snowflake uniquely enable it? Learn about the requirements needed to support flexible, elastic data warehousing using cloud infrastructure.
[Bespin Global 파트너 세션] 분산 데이터 통합 (Data Lake) 기반의 데이터 분석 환경 구축 사례 - 베스핀 글로벌 장익...Amazon Web Services Korea
기업 환경에 따라 차이는 있겠지만, 최근 대부분의 기업은 데이터 분석 환경이 구축되어 있고, 이를 기반으로 데이터를 분석하고 있습니다. 그럼에도 불구하고 현업에서는 분석하고자 하는 데이터가 없거나 변화하는 비즈니스 요건을 반영하지 못한다는 불만을 제기하고, 분석 환경을 제공하는 IT운영팀은 변화하는 비즈니스 요건에 따라 분석 환경을 적시에 제공하기 쉽지 않다는 어려움을 토로하고 있습니다. 이 해결책으로 운영시스템에 데이터베이스 형태로 존재하고 있거나, 현업의 PC에서 수작업으로 작성한 정형, 비정형 파일을 통합 관리할 수 있고, 또한 인프라 환경의 확장 및 변경을 보다 유연하게 할 수 있는 AWS Cloud 기반의 분석 환경 구축 사례를 소개하고자 합니다.
다시보기 링크: https://youtu.be/YvYfNZHMJkI
Recolectar y analizar grandes cantidades de datos se ha convertido en algo esencial para muchas organizaciones. El uso de Data Lakes se ha convertido en una popular estrategia para almacenar todo tipo de datos estructurados y no-estructurados, y centralizarlos en una única fuente. Únase a este webinar para descubrir cómo puede crear y administrar facilmente un data lake seguro usando servicios de AWS.
Every day, businesses across a wide variety of industries share data to support insights that drive efficiency and new business opportunities. However, existing methods for sharing data involve great effort on the part of data providers to share data, and involve great effort on the part of data customers to make use of that data.
However, existing approaches to data sharing (such as e-mail, FTP, EDI, and APIs) have significant overhead and friction. For one, legacy approaches such as e-mail and FTP were never intended to support the big data volumes of today. Other data sharing methods also involve enormous effort. All of these methods require not only that the data be extracted, copied, transformed, and loaded, but also that related schemas and metadata must be transported as well. This creates a burden on data providers to deconstruct and stage data sets. This burden and effort is mirrored for the data recipient, who must reconstruct the data.
As a result, companies are handicapped in their ability to fully realize the value in their data assets.
Snowflake Data Sharing allows companies to grant instant access to ready-to-use data to any number of partners or data customers without any data movement, copying, or complex pipelines.
Using Snowflake Data Sharing, companies can derive new insights and value from data much more quickly and with significantly less effort than current data sharing methods. As a result, companies now have a new approach and a powerful new tool to get the full value out of their data assets.
AWS October Webinar Series - Introducing Amazon QuickSightAmazon Web Services
Amazon QuickSight is a very fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data.
In this webinar, we will demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We will also introduce SPICE, a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and renders visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
In part one you will learn about benefits of moving Oracle Database Workloads to AWS, licensing and key aspects to consider. Part two is about understanding how to execute migrations, key success factors, and demonstration.
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
Learn about architecture best practices for combining AWS storage and database technologies. We outline AWS storage options (Amazon EBS, Amazon EC2 Instance Storage, Amazon S3 and Amazon Glacier) along with AWS database options including Amazon ElastiCache (in-memory data store), Amazon RDS (SQL database), Amazon DynamoDB (NoSQL database), Amazon CloudSearch (search), Amazon EMR (hadoop) and Amazon Redshift (data warehouse). Then we discuss how to architect your database tier by using the right database and storage technologies to achieve the required functionality, performance, availability, and durability—at the right cost.
"In this session, you will learn how to easily access your data on S3, and how to visualize and generate insights from Amazon Athena and other data sources through Amazon QuickSight. In addition we will share some tips & best practices for using Athena & QuickSight.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from various data sources (Amazon Redshift, Amazon Athena, Amazon EMR, Amazon RDS and more)."
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce SPICE - a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
Presented by: Matthew McClean, AWS Partner Solutions Architect, Amazon Web Services
Come learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on your object storage workloads.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Organizations are struggling to make sense of their data within antiquated data platforms. Snowflake, the data warehouse built for the cloud, can help.
As the volume and types of data continues to grow, customers often have valuable data that is not easily discoverable and available for analytics. A common challenge for data engineering teams is architecting a data lake that can cater to the needs of diverse users - from developers to business analysts to data scientists. In this session, dive deep into building a data lake using Amazon S3, Amazon Kinesis, Amazon Athena and AWS Glue. Learn how AWS Glue crawlers can automatically discover your data, extracting and cataloguing relevant metadata to reduce operations in preparing your data for downstream consumers.
What is elastic data warehousing, and how does Snowflake uniquely enable it? Learn about the requirements needed to support flexible, elastic data warehousing using cloud infrastructure.
[Bespin Global 파트너 세션] 분산 데이터 통합 (Data Lake) 기반의 데이터 분석 환경 구축 사례 - 베스핀 글로벌 장익...Amazon Web Services Korea
기업 환경에 따라 차이는 있겠지만, 최근 대부분의 기업은 데이터 분석 환경이 구축되어 있고, 이를 기반으로 데이터를 분석하고 있습니다. 그럼에도 불구하고 현업에서는 분석하고자 하는 데이터가 없거나 변화하는 비즈니스 요건을 반영하지 못한다는 불만을 제기하고, 분석 환경을 제공하는 IT운영팀은 변화하는 비즈니스 요건에 따라 분석 환경을 적시에 제공하기 쉽지 않다는 어려움을 토로하고 있습니다. 이 해결책으로 운영시스템에 데이터베이스 형태로 존재하고 있거나, 현업의 PC에서 수작업으로 작성한 정형, 비정형 파일을 통합 관리할 수 있고, 또한 인프라 환경의 확장 및 변경을 보다 유연하게 할 수 있는 AWS Cloud 기반의 분석 환경 구축 사례를 소개하고자 합니다.
다시보기 링크: https://youtu.be/YvYfNZHMJkI
Recolectar y analizar grandes cantidades de datos se ha convertido en algo esencial para muchas organizaciones. El uso de Data Lakes se ha convertido en una popular estrategia para almacenar todo tipo de datos estructurados y no-estructurados, y centralizarlos en una única fuente. Únase a este webinar para descubrir cómo puede crear y administrar facilmente un data lake seguro usando servicios de AWS.
Every day, businesses across a wide variety of industries share data to support insights that drive efficiency and new business opportunities. However, existing methods for sharing data involve great effort on the part of data providers to share data, and involve great effort on the part of data customers to make use of that data.
However, existing approaches to data sharing (such as e-mail, FTP, EDI, and APIs) have significant overhead and friction. For one, legacy approaches such as e-mail and FTP were never intended to support the big data volumes of today. Other data sharing methods also involve enormous effort. All of these methods require not only that the data be extracted, copied, transformed, and loaded, but also that related schemas and metadata must be transported as well. This creates a burden on data providers to deconstruct and stage data sets. This burden and effort is mirrored for the data recipient, who must reconstruct the data.
As a result, companies are handicapped in their ability to fully realize the value in their data assets.
Snowflake Data Sharing allows companies to grant instant access to ready-to-use data to any number of partners or data customers without any data movement, copying, or complex pipelines.
Using Snowflake Data Sharing, companies can derive new insights and value from data much more quickly and with significantly less effort than current data sharing methods. As a result, companies now have a new approach and a powerful new tool to get the full value out of their data assets.
AWS October Webinar Series - Introducing Amazon QuickSightAmazon Web Services
Amazon QuickSight is a very fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data.
In this webinar, we will demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We will also introduce SPICE, a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and renders visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
In part one you will learn about benefits of moving Oracle Database Workloads to AWS, licensing and key aspects to consider. Part two is about understanding how to execute migrations, key success factors, and demonstration.
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
Learn about architecture best practices for combining AWS storage and database technologies. We outline AWS storage options (Amazon EBS, Amazon EC2 Instance Storage, Amazon S3 and Amazon Glacier) along with AWS database options including Amazon ElastiCache (in-memory data store), Amazon RDS (SQL database), Amazon DynamoDB (NoSQL database), Amazon CloudSearch (search), Amazon EMR (hadoop) and Amazon Redshift (data warehouse). Then we discuss how to architect your database tier by using the right database and storage technologies to achieve the required functionality, performance, availability, and durability—at the right cost.
"In this session, you will learn how to easily access your data on S3, and how to visualize and generate insights from Amazon Athena and other data sources through Amazon QuickSight. In addition we will share some tips & best practices for using Athena & QuickSight.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from various data sources (Amazon Redshift, Amazon Athena, Amazon EMR, Amazon RDS and more)."
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
Amazon Athena is a new interactive query service that makes it easy to analyze data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this session, we will show you how easy is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
Amazon Athena is a new interactive query service that makes it easy to analyze data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this session, we will show you how easy is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
Amazon Athena is a new interactive query service that makes it easy to analyze data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this session, we will show you how easy is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this webinar, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Learning Objectives:
• Learn about the capabilities and features of Amazon Athena
• Understand the different use cases
• Describe how to run queries and options to store and visualize results
• Understand integration with other AWS big data services such as Amazon QuickSight
"Conceptually, a data lake is a flat data store to collect data in its original form, without the need to enforce a predefined schema. Instead, new schemas or views are created “on demand”, providing a far more agile and flexible architecture while enabling new types of analytical insights. AWS provides many of the building blocks required to help organizations implement a data lake. In this session, we will introduce key concepts for a data lake and present aspects related to its implementation. We will discuss critical success factors, pitfalls to avoid as well as operational aspects such as security, governance, search, indexing and metadata management. We will also provide insight on how AWS enables a data lake architecture.
A data lake is a flat data store to collect data in its original form, without the need to enforce a predefined schema. Instead, new schemas or views are created ""on demand"", providing a far more agile and flexible architecture while enabling new types of analytical insights. AWS provides many of the building blocks required to help organizations implement a data lake. In this session, we introduce key concepts for a data lake and present aspects related to its implementation. We discuss critical success factors and pitfalls to avoid, as well as operational aspects such as security, governance, search, indexing, and metadata management. We also provide insight on how AWS enables a data lake architecture. Attendees get practical tips and recommendations to get started with their data lake implementations on AWS."
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
Uncovering new, valuable insights from big data requires organizations to collect, store, and analyze increasing volumes of data from multiple, often disparate sources at disparate points in time. This makes it difficult to handle big data with data warehouses or relational database management systems alone.
A Data Lake allows you to store massive amounts of data in its original form, without the need to enforce a predefined schema, enabling a far more agile and flexible architecture, which makes it easier to gain new types of analytical insights from your data
In this webinar, we will introduce key concepts of a Data Lake and present aspects related to its implementation. We will discuss critical success factors, pitfalls to avoid as well as operational aspects such as security, governance, search, indexing and metadata management.
Learning Objectives:
• Learn how AWS can help enable a Data Lake architecture
• Understand some of the key architectural considerations when building a Data Lake
• Hear some of the important Data Lake implementation considerations
Who Should Attend:
• Data architects, data scientists, advanced AWS developers
In this session, storage experts will walk you through the object storage offering, Amazon S3, a bulk data repository that can deliver 99.999999999% durability and scale past trillions of objects worldwide. Learn about the different ways you can accelerate data transfer to S3 and get a close look at some of the new tools available for you to secure and manage your data more efficiently. Announced at re:Invent 2016, see how you can use Amazon Athena with S3 to run serverless analytics on your data and as a bonus, walk away with some code snippets to use with S3. Hear AWS customers talk about the solutions they have built with S3 to turn their data into a strategic asset, instead of just a cost center. And bring your toughest questions to our experts on hand and walk away that much smarter on how to use object storage from AWS.
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Rukmani Gopalan
Cloud Storage is evolving rapidly, and our Azure Storage portfolio has added a ton of new industry leading capabilities. In this session you will learn the do's and don'ts of building data lakes on Azure Data Lake Storage. You will learn about the commonly used patterns, how to set up your accounts and pipelines to maximize performance, how to organize your data and various options to secure access to your data. We will also cover customer use cases and highlight planned enhancements and upcoming features.
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting, in its original format and extract value. In this session learn how to architect and implement an Analytics Data Lake. Hear customer examples of best practices and learn from their architectural blueprints.
Antoine Genereux takes us on a detailed overview of the Database solutions available on the AWS Cloud, addressing the needs and requirements of customers at all levels. He also discusses Business Intelligence and Analytics solutions.
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
In today’s session we will share with you an overview of what the typical challenges when adoption Big Data are, and how the AWS Big Data platform allows you to tackle this challenges and leverage the right Analytical/Big Data solutions in order to become successful with your strategy (Whiteboard presentation)
Similar to Serverlesss Big Data Analytics with Amazon Athena and Quicksight (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
11. Introducing Amazon Athena
Amazon Athena is an interactive query service
that makes it easy to analyze data directly from
Amazon S3 using standard SQL
12. Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades
13. Amazon Athena is Easy To Use
• Log into the Console
• Create a table
• Type in a Hive DDL Statement
• Use the console Add Table wizard
• Use tables from Glue’s Data Catalog
• Start querying
14. Amazon Athena is Highly Available
• You connect to a service endpoint or log into the console
• Athena uses warm compute pools across multiple
Availability Zones
• Your data is in Amazon S3, which is also highly available
and designed for 99.999999999% durability
15. Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• Avro, Text, CSV, JSON, weblogs, AWS service logs
• Convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
• No ETL required
• Stream data directly from Amazon S3
• Take advantage of Amazon S3 durability and availability
16. Familiar Technologies Under the Covers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
Used for DDL functionality
Complex data types
Multitude of formats
Supports data partitioning
EXTERNAL tables – no impact on underlying data
17. Use ANSI SQL
• Start writing ANSI SQL
• Support for complex joins, nested
queries & window functions
• Support for complex data types
(arrays, structs)
• Support for partitioning of data by
any key
• (date, time, custom keys)
• e.g., Year, Month, Day, Hour or
Customer Key, Date
18. Amazon Athena Supports Multiple Data Formats
• Text files, e.g. CSV, TSV, custom delimiter
• Apache Web Logs, CloudTrail logs
• JSON (simple, nested), AVRO
• Columnar formats, e.g. Apache Parquet & Apache ORC
• Logstash Grok for unstructured text files
• Compressed files (Snappy, Zlib, GZIP, and LZO)
• Encrypted data (SSE-S3, SSE-KMS, CSE-KMS)
Use large (128MB – 1GB) compressed files
New!
19. Amazon Athena is Fast
• Tuned for performance
• Automatically parallelizes queries
• Results are streamed to console
• Results also stored in S3
• Improve query performance:
• Compress your data
• Use columnar formats
• Partition your data
20. Amazon Athena is Cost Effective
• Pay per query
• $5 per TB scanned from S3
• DDL Queries and failed queries are free
• Reduce costs:
• Compress your data
• Use columnar formats
• Partition your data
21. PARQUET
• Columnar format
• Schema segregated into footer
• Column major format
• All data is pushed to the leaf
• Integrated compression and
indexes
• Support for predicate pushdown
Apache Parquet and Apache ORC – Columnar Formats
ORC
• Apache top level project
• Schema segregated into footer
• Column major with stripes
• Integrated compression,
indexes, and stats
• Support for predicate pushdown
22. Pay By the Query - $5/TB Scanned
• Pay by the amount of data scanned per query
• Ways to save costs
• Compress
• Convert to Columnar format
• Use partitioning
• Free: DDL Queries, Failed Queries
Dataset Size on Amazon S3 Query Run time Data Scanned Cost
Logs stored as Text
files
1 TB 237 seconds 1.15TB $5.75
Logs stored in
Apache Parquet
format*
130 GB 5.13 seconds 2.69 GB $0.013
Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
23. Converting to ORC and PARQUET
• Use Glue!
• You can use Hive CTAS to convert data
• CREATE TABLE new_key_value_store
• STORED AS PARQUET
• AS
• SELECT col_1, col2, col3 FROM noncolumartable
• SORT BY new_key, key_value_pair;
• You can also use Spark to convert the file into PARQUET / ORC
• 20 lines of Pyspark code, running on EMR
• Converts 1TB of text data into 130 GB of Parquet with Snappy compression
• Total cost $5
https://github.com/awslabs/aws-big-data-blog/tree/master/aws-blog-spark-parquet-conversion
24. Start Querying with Amazon Athena
• Review console
• Run Glue crawler to create canonical table definition
• Change datatypes
• Run some simple queries
Raw S3
Data
Canonical
Data
Amazon
Athena
Amazon
QuicksightData
Catalog
describes
describes
uses
ETL Job
25. ETL Uber Data into Parquet
glueContext.write_dynamic_frame.from_options(
frame = datasource1,
connection_type = "s3",
connection_options =
{"path": "s3://ianrob-nyc-transportation"},
format = "parquet",
transformation_ctx = "datasink4")
27. QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on-
premises sources including Amazon Athena
Amazon RDS
Amazon S3
Amazon Redshift
Amazon Athena
Using Amazon Athena with Amazon QuickSight
29. AWS Glue regional availability plan
Planned schedule Regions
At launch US East (N. Virginia)
Q3 2017 US East (Ohio), US West (Oregon)
Q4 2017 EU (Ireland), Asia Pacific (Tokyo), Asia Pacific (Sydney)
2018 Rest of the public regions