This document provides an overview of data warehousing and data lake concepts. It discusses key stages in data collection, storage, analysis and consumption. Different data storage options like Amazon S3, DynamoDB and Redshift are presented along with considerations for which tool to use based on data characteristics. The document also covers stream storage options and best practices for building cost-conscious and decoupled data architectures.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
This document provides an introduction to AWS Glue. It discusses that ETL development consumes 70% of data warehouse resources on average. AWS Glue is a fully managed ETL service that automates ETL processes on a serverless Apache Spark environment. It features a data catalog, job authoring tools for Python/Spark code generation, and job execution on serverless Spark. Use cases include understanding data, querying data lakes on S3, and building event-driven ETL pipelines. The presentation demonstrates AWS Glue and reviews pricing.
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Going Serverless - an Introduction to AWS GlueMichael Rainey
Going "serverless" is the latest technology trend for enterprises moving their processing to the cloud, including data integration and ETL tools. But what does that mean and when should I use serverless ETL? In this session, we'll dive into the world of Amazon's fully managed data processing service called AWS Glue. With no server to provision or resources to allocate, and an easy to populate metadata catalog, AWS Glue allows the data engineer to focus on his or her craft; building data transformations and pipelines. Gaining an understanding of the similarities and differences between traditional ETL tools, such as Oracle Data Integrator, and Glue will prepare attendees for the new world of data integration. Presented at Collaborate 18.
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
Introduction to AWS Glue: Data Analytics Week at the San Francisco Loft
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Asim Kumar Sasmal - Big Data Consultant, AWS Professional Services
AWS Glue is a fully managed, serverless extract, transform, and load (ETL) service that makes it easy to move data between data stores. AWS Glue simplifies and automates the difficult and time consuming tasks of data discovery, conversion mapping, and job scheduling so you can focus more of your time querying and analyzing your data using Amazon Redshift Spectrum and Amazon Athena. In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
(ISM213) Building and Deploying a Modern Big Data Architecture on AWSAmazon Web Services
"The AWS platform enables large enterprises to use data to solve business problems and uncover opportunities more easily and affordably than ever before. However, to truly take advantage of AWS, enterprises need a way to collect, store, process, analyze, and continually execute on their data.
Datapipe has been an AWS partner for more than five years. In that time, it has developed a proprietary process for the deployment of AWS environments, as well as the processing and evaluation of big data analytics to optimize these environments over time. This flexible solution includes automation tools, continuous monitoring, and cloud analytics. It protects against architectural sprawl and continually redesigns for scalability. This kind of continuous build environment allows Datapipe to examine the AWS environment as a complete picture and ensure the cloud environment is running as efficiently and effectively as possible, ultimately reducing overhead costs for the enterprise.
In this session, Jason Woodlee, Senior Director of Cloud Products at Datapipe, will discuss the technical details of designing and deploying a modern big data architecture on AWS, including application purpose and design, development environment and language overview, DevOps automation best practices, and continuous build and test frameworks. Session sponsored by Datapipe."
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting, in its original format and extract value. In this session learn how to architect and implement an Analytics Data Lake. Hear customer examples of best practices and learn from their architectural blueprints.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
This document provides an introduction to AWS Glue. It discusses that ETL development consumes 70% of data warehouse resources on average. AWS Glue is a fully managed ETL service that automates ETL processes on a serverless Apache Spark environment. It features a data catalog, job authoring tools for Python/Spark code generation, and job execution on serverless Spark. Use cases include understanding data, querying data lakes on S3, and building event-driven ETL pipelines. The presentation demonstrates AWS Glue and reviews pricing.
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Going Serverless - an Introduction to AWS GlueMichael Rainey
Going "serverless" is the latest technology trend for enterprises moving their processing to the cloud, including data integration and ETL tools. But what does that mean and when should I use serverless ETL? In this session, we'll dive into the world of Amazon's fully managed data processing service called AWS Glue. With no server to provision or resources to allocate, and an easy to populate metadata catalog, AWS Glue allows the data engineer to focus on his or her craft; building data transformations and pipelines. Gaining an understanding of the similarities and differences between traditional ETL tools, such as Oracle Data Integrator, and Glue will prepare attendees for the new world of data integration. Presented at Collaborate 18.
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
Introduction to AWS Glue: Data Analytics Week at the San Francisco Loft
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Asim Kumar Sasmal - Big Data Consultant, AWS Professional Services
AWS Glue is a fully managed, serverless extract, transform, and load (ETL) service that makes it easy to move data between data stores. AWS Glue simplifies and automates the difficult and time consuming tasks of data discovery, conversion mapping, and job scheduling so you can focus more of your time querying and analyzing your data using Amazon Redshift Spectrum and Amazon Athena. In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
(ISM213) Building and Deploying a Modern Big Data Architecture on AWSAmazon Web Services
"The AWS platform enables large enterprises to use data to solve business problems and uncover opportunities more easily and affordably than ever before. However, to truly take advantage of AWS, enterprises need a way to collect, store, process, analyze, and continually execute on their data.
Datapipe has been an AWS partner for more than five years. In that time, it has developed a proprietary process for the deployment of AWS environments, as well as the processing and evaluation of big data analytics to optimize these environments over time. This flexible solution includes automation tools, continuous monitoring, and cloud analytics. It protects against architectural sprawl and continually redesigns for scalability. This kind of continuous build environment allows Datapipe to examine the AWS environment as a complete picture and ensure the cloud environment is running as efficiently and effectively as possible, ultimately reducing overhead costs for the enterprise.
In this session, Jason Woodlee, Senior Director of Cloud Products at Datapipe, will discuss the technical details of designing and deploying a modern big data architecture on AWS, including application purpose and design, development environment and language overview, DevOps automation best practices, and continuous build and test frameworks. Session sponsored by Datapipe."
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting, in its original format and extract value. In this session learn how to architect and implement an Analytics Data Lake. Hear customer examples of best practices and learn from their architectural blueprints.
By using a Data Lake, you no longer need to worry about structuring or transforming data before storing it. A Data Lake on AWS enables your organization to more rapidly analyze data, helping you quickly discover new business insights. Join us for our webinar to learn about the benefits of building a Data Lake on AWS and how your organization can begin reaping their rewards. In this session, we will share methodology for implementing a Data Lake on AWS and best practices for getting the most from your Data Lake.
Speaker: Russell Nash,
APAC Solution Architect, DW, AWS APAC
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
This document discusses a presentation about big data and analytics on AWS. It describes what big data is, provides examples of AWS services for ingesting, storing, processing, analyzing and visualizing big data. It also provides examples of industries using AWS for data analysis and discusses Amazon Kinesis for real-time processing of streaming data. Finally, it discusses putting the various AWS services together in an end-to-end big data workflow.
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Amazon Web Services
Amazon Redshift gives you fast SQL query performance on large data sets. We will discuss optimisation from end to end, all the way from loading through to querying to ensure your end users get the data they need, when they need it.
Speaker: Russell Nash, Solutions Architect, Amazon Web Services
Featured Customer - Domain
AWS Data Services provide a suite of serverless analytics tools including Amazon Athena for interactive SQL queries, AWS Glue for ETL and data cataloging, and Amazon S3 for exabyte-scale data storage. Together these services enable building a data lake architecture for ingesting, storing, discovering, and analyzing all types of data at scale.
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Discover dark data that you are currently not analyzing.
- Analyze dark data without moving it into your data warehouse.
- Visualize the results of your dark data analytics.
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Amazon Web Services
AWS offers many data services, each optimized for a specific set of structure, size, latency, and concurrency requirements. Making the best use of all specialized services has historically required custom, error-prone data transformation and transport. Now, users can use the AWS Data Pipeline service to orchestrate data flows between Amazon S3, Amazon RDS, Amazon DynamoDB, Amazon Redshift, and on-premise data stores, seamlessly and efficiently applying EC2 instances and EMR clusters to process and transform data. In this session, we demonstrate how you can use AWS Data Pipeline to coordinate your Big Data workflows, applying the optimal data storage technology to each part of your data integration architecture. Swipely's Head of Engineering shows how Swipely uses AWS Data Pipeline to build batch analytics, backfilling all their data, while using resources efficiently. Consequently, Swipely launches novel product features with less development time and less operational complexity.
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Amazon Web Services
Developers and DBAs from a traditional relational background are spoilt for choice when looking to integrate caching and NoSQL into an application architecture to solve scaling problems and reduce costs. Even when using relational databases there are 3 managed database services on AWS for the MySQL engine alone. Trying to evaluate all the options often creates analysis paralysis, resulting in a reluctance to try something new or different. This session will guide you through a series of use cases that use different databases to solve business problems that customers face today.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
This document discusses Amazon Athena and AWS Glue. It provides an overview of Athena and how it uses Presto to query data in S3. It then discusses common issues with Athena like 503 slow down errors and only supporting single line JSON. It also discusses AWS Glue and common issues with crawlers, ETL jobs being slow or failing. It provides recommendations like using partitions, enabling job metrics, and contacting support with specific logs and information.
Amazon Kinesis is a platform for streaming data ingestion, processing, and analytics on AWS. The presentation discusses three Amazon Kinesis services - Kinesis Streams, Kinesis Firehose, and Kinesis Analytics. It provides an overview of each service and examples of how customers use streaming data and these services for applications like IoT, online gaming, advertising, and financial services. It also includes a demo of building a serverless IoT analytics solution on AWS using these streaming data services.
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
by Adrian Hornsby, Technical Evanglist, AWS
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
by Ben Willett, Solutions Architect, AWS
How do you get data from your sources into your Redshift data warehouse? We'll show how to use AWS Glue and Amazon Kinesis Firehose to make it easy to automate the work to get data loaded.
2 years ago if someone had claimed they could stand up a petabyte scale data warehouse in under an hour and then have a non-technical business user querying it live 30 minutes later without knowing any SQL or coding language, they would have been laughed out of the room. These days, that’s called taking advantage of disruptive technology. Amazon Web Services and Tableau Software have shifted the entire paradigm by which organizations not only store and access their data, but ultimately how they innovate with it. The fast, scalable, and inexpensive services that AWS provides for housing data combined with Tableau’s unbelievably flexible and user friendly visual analytic solution means that within hours an organization can securely put the power of their massive data assets into the hands of their domain experts without expensive overhead or lengthy ramp-up time. Attend this webinar to learn how Amazon Web Services and Tableau Software are leveraged together everyday to: • Empower visual ad-hoc data discovery against big data • Revolutionize corporate reporting and dashboards • Promote data driven decision making at every level The presentation will include: • A live demonstration of AWS and Tableau working together • A real customer case study focused on fraud detection and online video metrics • Live Q&A and an opportunity to trial both solutions
Data processing and analysis is where big data is most often consumed - driving business intelligence (BI) use cases that discover and report on meaningful patterns in the data. In this session, we will discuss options for processing, analyzing and visualizing data. We will also look at partner solutions and BI-enabling services from AWS. Attendees will learn about optimal approaches for stream processing, batch processing and Interactive analytics. AWS services to be covered include: Amazon Machine Learning, Elastic MapReduce (EMR), and Redshift.
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
This document discusses building data warehouses and data lakes in the cloud using AWS services. It provides an overview of AWS databases, analytics, and machine learning services that can be used to store and analyze data at scale. These services allow customers to migrate existing data warehouses to the cloud, build new data warehouses and data lakes more cost effectively, and gain insights from their data more easily.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
https://aws.amazon.com/webinars/anz-webinar-series/
Data Analytics Week at the San Francisco Loft
Data Warehouses and Data Lakes
Organizations use reports, dashboards, and analytics tools to extract insights from their data, monitor performance, and support decision making. To support these tools, data must be collected and prepared for use. We'll look at two approaches: a structured centralized data repository as a Data Warehouse the less-structured repository of a Data Lake. We'll compare these approaches, examine the services that support each, and explore how they work together.
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Hemant Borole - Sr. Big Data Consultant, AWS
by Amy Che, Sr Solutions Delivery Manager AWS and Marie Yap, Technical Account Manager AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
By using a Data Lake, you no longer need to worry about structuring or transforming data before storing it. A Data Lake on AWS enables your organization to more rapidly analyze data, helping you quickly discover new business insights. Join us for our webinar to learn about the benefits of building a Data Lake on AWS and how your organization can begin reaping their rewards. In this session, we will share methodology for implementing a Data Lake on AWS and best practices for getting the most from your Data Lake.
Speaker: Russell Nash,
APAC Solution Architect, DW, AWS APAC
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
This document discusses a presentation about big data and analytics on AWS. It describes what big data is, provides examples of AWS services for ingesting, storing, processing, analyzing and visualizing big data. It also provides examples of industries using AWS for data analysis and discusses Amazon Kinesis for real-time processing of streaming data. Finally, it discusses putting the various AWS services together in an end-to-end big data workflow.
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Amazon Web Services
Amazon Redshift gives you fast SQL query performance on large data sets. We will discuss optimisation from end to end, all the way from loading through to querying to ensure your end users get the data they need, when they need it.
Speaker: Russell Nash, Solutions Architect, Amazon Web Services
Featured Customer - Domain
AWS Data Services provide a suite of serverless analytics tools including Amazon Athena for interactive SQL queries, AWS Glue for ETL and data cataloging, and Amazon S3 for exabyte-scale data storage. Together these services enable building a data lake architecture for ingesting, storing, discovering, and analyzing all types of data at scale.
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Discover dark data that you are currently not analyzing.
- Analyze dark data without moving it into your data warehouse.
- Visualize the results of your dark data analytics.
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Amazon Web Services
AWS offers many data services, each optimized for a specific set of structure, size, latency, and concurrency requirements. Making the best use of all specialized services has historically required custom, error-prone data transformation and transport. Now, users can use the AWS Data Pipeline service to orchestrate data flows between Amazon S3, Amazon RDS, Amazon DynamoDB, Amazon Redshift, and on-premise data stores, seamlessly and efficiently applying EC2 instances and EMR clusters to process and transform data. In this session, we demonstrate how you can use AWS Data Pipeline to coordinate your Big Data workflows, applying the optimal data storage technology to each part of your data integration architecture. Swipely's Head of Engineering shows how Swipely uses AWS Data Pipeline to build batch analytics, backfilling all their data, while using resources efficiently. Consequently, Swipely launches novel product features with less development time and less operational complexity.
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Amazon Web Services
Developers and DBAs from a traditional relational background are spoilt for choice when looking to integrate caching and NoSQL into an application architecture to solve scaling problems and reduce costs. Even when using relational databases there are 3 managed database services on AWS for the MySQL engine alone. Trying to evaluate all the options often creates analysis paralysis, resulting in a reluctance to try something new or different. This session will guide you through a series of use cases that use different databases to solve business problems that customers face today.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
This document discusses Amazon Athena and AWS Glue. It provides an overview of Athena and how it uses Presto to query data in S3. It then discusses common issues with Athena like 503 slow down errors and only supporting single line JSON. It also discusses AWS Glue and common issues with crawlers, ETL jobs being slow or failing. It provides recommendations like using partitions, enabling job metrics, and contacting support with specific logs and information.
Amazon Kinesis is a platform for streaming data ingestion, processing, and analytics on AWS. The presentation discusses three Amazon Kinesis services - Kinesis Streams, Kinesis Firehose, and Kinesis Analytics. It provides an overview of each service and examples of how customers use streaming data and these services for applications like IoT, online gaming, advertising, and financial services. It also includes a demo of building a serverless IoT analytics solution on AWS using these streaming data services.
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
by Adrian Hornsby, Technical Evanglist, AWS
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
by Ben Willett, Solutions Architect, AWS
How do you get data from your sources into your Redshift data warehouse? We'll show how to use AWS Glue and Amazon Kinesis Firehose to make it easy to automate the work to get data loaded.
2 years ago if someone had claimed they could stand up a petabyte scale data warehouse in under an hour and then have a non-technical business user querying it live 30 minutes later without knowing any SQL or coding language, they would have been laughed out of the room. These days, that’s called taking advantage of disruptive technology. Amazon Web Services and Tableau Software have shifted the entire paradigm by which organizations not only store and access their data, but ultimately how they innovate with it. The fast, scalable, and inexpensive services that AWS provides for housing data combined with Tableau’s unbelievably flexible and user friendly visual analytic solution means that within hours an organization can securely put the power of their massive data assets into the hands of their domain experts without expensive overhead or lengthy ramp-up time. Attend this webinar to learn how Amazon Web Services and Tableau Software are leveraged together everyday to: • Empower visual ad-hoc data discovery against big data • Revolutionize corporate reporting and dashboards • Promote data driven decision making at every level The presentation will include: • A live demonstration of AWS and Tableau working together • A real customer case study focused on fraud detection and online video metrics • Live Q&A and an opportunity to trial both solutions
Data processing and analysis is where big data is most often consumed - driving business intelligence (BI) use cases that discover and report on meaningful patterns in the data. In this session, we will discuss options for processing, analyzing and visualizing data. We will also look at partner solutions and BI-enabling services from AWS. Attendees will learn about optimal approaches for stream processing, batch processing and Interactive analytics. AWS services to be covered include: Amazon Machine Learning, Elastic MapReduce (EMR), and Redshift.
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
This document discusses building data warehouses and data lakes in the cloud using AWS services. It provides an overview of AWS databases, analytics, and machine learning services that can be used to store and analyze data at scale. These services allow customers to migrate existing data warehouses to the cloud, build new data warehouses and data lakes more cost effectively, and gain insights from their data more easily.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
https://aws.amazon.com/webinars/anz-webinar-series/
Data Analytics Week at the San Francisco Loft
Data Warehouses and Data Lakes
Organizations use reports, dashboards, and analytics tools to extract insights from their data, monitor performance, and support decision making. To support these tools, data must be collected and prepared for use. We'll look at two approaches: a structured centralized data repository as a Data Warehouse the less-structured repository of a Data Lake. We'll compare these approaches, examine the services that support each, and explore how they work together.
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Hemant Borole - Sr. Big Data Consultant, AWS
by Amy Che, Sr Solutions Delivery Manager AWS and Marie Yap, Technical Account Manager AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
by Andre Hass, Specialist Technical Account Manager, AWS
Organizations use reports, dashboards, and analytics tools to extract insights from their data, monitor performance, and support decision making. To support these tools, data must be collected and prepared for use. We'll look at two approaches: a structured centralized data repository as a Data Warehouse the less-structured repository of a Data Lake. We'll compare these approaches, examine the services that support each, and explore how they work together.
by Ben Willett, Solutions Architect, AWS
Organizations use reports, dashboards, and analytics tools to extract insights from their data, monitor performance, and support decision making. To support these tools, data must be collected and prepared for use. We'll look at two approaches: a structured centralized data repository as a Data Warehouse the less-structured repository of a Data Lake. We'll compare these approaches, examine the services that support each, and explore how they work together.
Organizations use reports, dashboards, and analytics tools to extract insights from their data, monitor performance, and support decision making. To support these tools, data must be collected and prepared for use. We'll look at two approaches: a structured centralized data repository as a Data Warehouse the less-structured repository of a Data Lake. We'll compare these approaches, examine the services that support each, and explore how they work together.
Speakers:
Amy Che - Sr. Solutions Delivery Manager, AWS
Karan Desai - Solutions Architect, AWS
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
This document discusses big data analytics architectural patterns and best practices. It covers collecting and storing data from various sources, processing and analyzing data using tools like Amazon Redshift, Amazon Athena and Amazon EMR, and selecting the appropriate tools based on factors like data structure, access patterns, and data temperature. It also discusses stream/real-time analytics tools and machine learning approaches.
This document discusses big data analytics and architectural principles for building big data solutions. It covers collecting and storing data from various sources, processing and analyzing data using services like Amazon Kinesis, Redshift, EMR and Athena, and choosing the right tools based on factors like data structure, access patterns, and latency requirements. Key principles emphasized include building decoupled systems, leveraging managed services, using event-driven architectures, and focusing on cost efficiency.
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
Speaker: Shafreen Sayyed, AWS
Level: 200
Traditional data storage and analytic tools no longer provide the agility and flexibility required to deliver relevant business insights. We are seeing more and more organisations shift to a data lake solution. This approach allows you to store massive amounts of data in a central location so its readily available to be categorized, processed, analyzed, and consumed by diverse organizational groups. In this session, we’ll assemble a data lake using services such as Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, AWS Glue and integration with Amazon Redshift Spectrum.
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
The document discusses building a data lake using Amazon S3 and Amazon Glacier for storage. It covers topics like what is big data, what is a data lake, achievable business outcomes from a data lake, securing the data lake, and examples of what can be done with analytics services on AWS. The presentation provides examples of using services like Amazon Comprehend, Amazon Transcribe, Kinesis, Athena and QuickSight for natural language processing, audio analysis, real-time streaming and visualization.
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
Flexibility is key when building and scaling a data lake. The analytics solutions you use in the future will almost certainly be different from the ones you use today, and choosing the right storage architecture gives you the agility to quickly experiment and migrate with the latest analytics solutions. In this session, we explore best practices for building a data lake in Amazon S3 and Amazon Glacier for leveraging an entire array of AWS, open source, and third-party analytics tools. We explore use cases for traditional analytics tools, including Amazon EMR and AWS Glue, as well as query-in-place tools like Amazon Athena, Amazon Redshift Spectrum, Amazon S3 Select, and Amazon Glacier Select.
Cloud computing gives you a number of advantages, such as the ability to scale your web application or website on demand. If you have a new web application and want to use cloud computing, you might be asking yourself, "Where do I start?" Join us to understand best practices for scaling your resources from one to millions of users. We’ll show you how to best combine different AWS services, how to make smarter decisions for architecting your application, and how to scale your infrastructure in the cloud.
This document discusses implementing a data lake on AWS to securely store, categorize, and analyze all types of data in a centralized repository. It describes key attributes of a data lake like decoupled storage and compute, rapid ingestion and transformation, and schema on read. It then outlines various AWS services that can be used to build a data lake like S3, Athena, EMR, Redshift, Glue, and Kinesis. It provides examples of streaming IoT data into a data lake and running queries and analytics on the data.
Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...Amazon Web Services
In this session, we discuss our learnings from migrating the financial ledger and accounting system that Amazon uses from Oracle to AWS. We share the performance and cost benefits to enterprises who migrate critical systems from Oracle to AWS, the decision frameworks used to pick the appropriate AWS service for appropriate application, and best practices in project management.
What are the different options for a developer to run his DB in the Cloud? This session will look into the different options and how to choose the right DB for your workload.
The document discusses AWS Glue Data Catalog and Amazon Athena. It provides an overview of AWS Glue Data Catalog as a unified metadata repository across data sources. It then describes Amazon Athena as an interactive query service that allows users to analyze data stored in Amazon S3 using standard SQL. Various use cases are presented that demonstrate how customers can use AWS Glue Data Catalog and Amazon Athena together to build data lakes on AWS.
The document discusses AWS Glue Data Catalog and Amazon Athena. It provides an overview of AWS Glue Data Catalog as a unified metadata repository across data sources. It then describes Amazon Athena as an interactive query service that allows users to analyze data stored in Amazon S3 using standard SQL. Various use cases are presented that demonstrate how customers can use AWS Glue Data Catalog and Amazon Athena together to build data lakes and perform analytics on data stored in S3.
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Amazon Web Services
The document discusses database migration approaches to move to the cloud using AWS services. It covers how the database market and application architectures are changing, as well as how AWS offers database freedom and services for various database and analytics workloads. These include managed relational databases like RDS and Aurora, NoSQL databases and caches, data warehouses, analytics services, and the Database Migration Service for migrating databases to AWS.
The AWS Big Data services are inherently built to run at @scale. In this session, you will learn how to develop an enterprise scale big data application using AWS services such as Amazon EMR, Amazon Redshift & Redshift Spectrum, Amazon Athena, Amazon Elasticsearch Service, Amazon Kinesis, Amazon QuickSight and AWS Glue. This session will also cover different architectural patterns and customer use cases.
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
1) The document discusses building a minimum viable product (MVP) using Amazon Web Services (AWS).
2) It provides an example of an MVP for an omni-channel messenger platform that was built from 2017 to connect ecommerce stores to customers via web chat, Facebook Messenger, WhatsApp, and other channels.
3) The founder discusses how they started with an MVP in 2017 with 200 ecommerce stores in Hong Kong and Taiwan, and have since expanded to over 5000 clients across Southeast Asia using AWS for scaling.
This document discusses pitch decks and fundraising materials. It explains that venture capitalists will typically spend only 3 minutes and 44 seconds reviewing a pitch deck. Therefore, the deck needs to tell a compelling story to grab their attention. It also provides tips on tailoring different types of decks for different purposes, such as creating a concise 1-2 page teaser, a presentation deck for pitching in-person, and a more detailed read-only or fundraising deck. The document stresses the importance of including key information like the problem, solution, product, traction, market size, plans, team, and ask.
This document discusses building serverless web applications using AWS services like API Gateway, Lambda, DynamoDB, S3 and Amplify. It provides an overview of each service and how they can work together to create a scalable, secure and cost-effective serverless application stack without having to manage servers or infrastructure. Key services covered include API Gateway for hosting APIs, Lambda for backend logic, DynamoDB for database needs, S3 for static content, and Amplify for frontend hosting and continuous deployment.
This document provides tips for fundraising from startup founders Roland Yau and Sze Lok Chan. It discusses generating competition to create urgency for investors, fundraising in parallel rather than sequentially, having a clear fundraising narrative focused on what you do and why it's compelling, and prioritizing relationships with people over firms. It also notes how the pandemic has changed fundraising, with examples of deals done virtually during this time. The tips emphasize being fully prepared before fundraising and cultivating connections with investors in advance.
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
This document discusses Amazon's machine learning services for building conversational interfaces and extracting insights from unstructured text and audio. It describes Amazon Lex for creating chatbots, Amazon Comprehend for natural language processing tasks like entity extraction and sentiment analysis, and how they can be used together for applications like intelligent call centers and content analysis. Pre-trained APIs simplify adding machine learning to apps without requiring ML expertise.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.