The document discusses building data lakes and analytics on AWS. It describes how data lakes extend the traditional approach of data warehousing by allowing storage and analysis of structured, semi-structured, and unstructured data at massive scales cost effectively. It provides an overview of various AWS services that can be used for data ingestion, storage, processing, analysis and machine learning with data lakes.
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
Amazon EMR provides a flexible range of service customization options, enabling customers to use it as a building block for their data platforms. In this session, AWS customers Salesforce.com and Vanguard discuss in detail how they use Amazon EMR to build a self-service, secure, and auditable data engineering platform. Customers who want to optimize their design and configurations should attend this session to learn best practices from customer experts. Topics include achieving cost-efficient scale, using notebooks, processing streaming data, rapid prototyping of applications and data pipelines, architecting for both transient and persistent clusters, setting up advanced security and authorization controls, and enabling easy self service for users.
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...Amazon Web Services
Learn how to convert and migrate your relational databases, nonrelational databases, and data warehouses to the cloud. AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) can help with homogeneous migrations as well as migrations between different database engines, such as Oracle or SQL Server, to Amazon Aurora. Hear from Verizon about how they intend to migrate critical databases to Amazon Aurora with PostgreSQL compatibility from their current on-premises Oracle databases, and learn how they intend to deal with challenges such as conversion of legacy code and complex data types, supporting business resiliency, and maintaining data synchronization during the transition phase.
Amazon Aurora is a relational database built for the cloud and is compatible with MySQL and PostgreSQL. It combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. We'll cover some of the key innovations in the Aurora database engine and storage layers, explain recently announced features, such as Aurora Serverless, Aurora Multi-Master, and Aurora Parallel Query, and discuss best practices and optimal configurations. See why Aurora is a great fit for new application development and for migrations from overpriced, restrictive commercial databases.
AWS delivers an integrated suite of services that provide everything needed to quickly and easily build and manage a data lake for analytics. AWS-powered data lakes can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot. In this session, we will show you how you can quickly build a data lake on AWS that ingests, catalogs and processes incoming data and makes it ready for analysis. Using a live demo, we demonstrate the capabilities of AWS provided analytical services such as AWS Glue, Amazon Athena and Amazon EMR and how to build a Data Lake on AWS step-by-step.
Citrix Moves Data to Amazon Redshift Fast with Matillion ETLAmazon Web Services
Matillion ETL, easily deployable from Amazon Web Services (AWS) Marketplace, helps Citrix collate and summarize data and augment it with more traditional business data from Microsoft SQL Server for additional context. Join our webinar to learn how organizations of any size can move data to the cloud quickly, accurately, and affordably with Matillion ETL.
Join our webinar to learn:
How Citrix moved data to Amazon Redshift with speed and accuracy.
How to make informed, business-critical decisions by analyzing data with Amazon Redshift.
How to speed time-to-value for your analytics initiatives using Matillion’s push-down ELT architecture.
AWS Immersion Day - Image Data Insights & Analytics Specialist Session - June...Amazon Web Services
Learn how to incorporate video data and analytics into your data management and business decision process. Discover how industry leaders are using AWS to do the heavy lifting with image data and innovating quickly. Our specialists will cover common issues and provide best practices from using IoT devices to collect data to training a ML model to using these models on the edge without network connectivity.
by Jon Handler, Principal Solutions Architect and Sanjay Dhar, Solutions Architect, AWS
Nearly everything in IT - servers, applications, websites, connected devices, and other things - generate discrete, time-stamped records of events called logs. Processing and analyzing these logs to gain actionable insights is log analytics. We'll look at how to use centralized log analytics across multiple sources with Amazon Elasticsearch Service.
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
Amazon EMR provides a flexible range of service customization options, enabling customers to use it as a building block for their data platforms. In this session, AWS customers Salesforce.com and Vanguard discuss in detail how they use Amazon EMR to build a self-service, secure, and auditable data engineering platform. Customers who want to optimize their design and configurations should attend this session to learn best practices from customer experts. Topics include achieving cost-efficient scale, using notebooks, processing streaming data, rapid prototyping of applications and data pipelines, architecting for both transient and persistent clusters, setting up advanced security and authorization controls, and enabling easy self service for users.
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...Amazon Web Services
Learn how to convert and migrate your relational databases, nonrelational databases, and data warehouses to the cloud. AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) can help with homogeneous migrations as well as migrations between different database engines, such as Oracle or SQL Server, to Amazon Aurora. Hear from Verizon about how they intend to migrate critical databases to Amazon Aurora with PostgreSQL compatibility from their current on-premises Oracle databases, and learn how they intend to deal with challenges such as conversion of legacy code and complex data types, supporting business resiliency, and maintaining data synchronization during the transition phase.
Amazon Aurora is a relational database built for the cloud and is compatible with MySQL and PostgreSQL. It combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. We'll cover some of the key innovations in the Aurora database engine and storage layers, explain recently announced features, such as Aurora Serverless, Aurora Multi-Master, and Aurora Parallel Query, and discuss best practices and optimal configurations. See why Aurora is a great fit for new application development and for migrations from overpriced, restrictive commercial databases.
AWS delivers an integrated suite of services that provide everything needed to quickly and easily build and manage a data lake for analytics. AWS-powered data lakes can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot. In this session, we will show you how you can quickly build a data lake on AWS that ingests, catalogs and processes incoming data and makes it ready for analysis. Using a live demo, we demonstrate the capabilities of AWS provided analytical services such as AWS Glue, Amazon Athena and Amazon EMR and how to build a Data Lake on AWS step-by-step.
Citrix Moves Data to Amazon Redshift Fast with Matillion ETLAmazon Web Services
Matillion ETL, easily deployable from Amazon Web Services (AWS) Marketplace, helps Citrix collate and summarize data and augment it with more traditional business data from Microsoft SQL Server for additional context. Join our webinar to learn how organizations of any size can move data to the cloud quickly, accurately, and affordably with Matillion ETL.
Join our webinar to learn:
How Citrix moved data to Amazon Redshift with speed and accuracy.
How to make informed, business-critical decisions by analyzing data with Amazon Redshift.
How to speed time-to-value for your analytics initiatives using Matillion’s push-down ELT architecture.
AWS Immersion Day - Image Data Insights & Analytics Specialist Session - June...Amazon Web Services
Learn how to incorporate video data and analytics into your data management and business decision process. Discover how industry leaders are using AWS to do the heavy lifting with image data and innovating quickly. Our specialists will cover common issues and provide best practices from using IoT devices to collect data to training a ML model to using these models on the edge without network connectivity.
by Jon Handler, Principal Solutions Architect and Sanjay Dhar, Solutions Architect, AWS
Nearly everything in IT - servers, applications, websites, connected devices, and other things - generate discrete, time-stamped records of events called logs. Processing and analyzing these logs to gain actionable insights is log analytics. We'll look at how to use centralized log analytics across multiple sources with Amazon Elasticsearch Service.
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
With over 90% of today’s data generated in the last two years, the rate of data growth is showing no sign of slowing down. In this session, we step through the challenges and best practices for capturing data, understanding what data you own, driving insights, and predicting the future using AWS services. We frame the session and demonstrations around common pitfalls of building data lakes and how to successfully drive analytics and insights from data. We also discuss the architecture patterns brought together key AWS services, including Amazon S3, AWS Glue, Amazon Athena, Amazon Kinesis, and Amazon Machine Learning. Discover the real-world application of data lakes for roles including data scientists and business users.
Stephen Moon, Sr. Solutions Architect, Amazon Web Services
James Juniper, Solution Architect for the Geo-Community Cloud, Natural Resources Canada
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
Gain in-depth knowledge and best practices for migrating commercial data warehouses to Amazon Redshift using AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT). We use an example based on an Oracle data warehouse, and we discuss approaches to migrate it to Amazon Redshift. We also discuss some of the common challenges, limitations, and workarounds, as well as the option of using AWS Snowball to migrate very large data warehouses to Amazon Redshift.
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
Introduction to AWS Glue: Data Analytics Week at the San Francisco Loft
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Asim Kumar Sasmal - Big Data Consultant, AWS Professional Services
Your data has value for multiple business functions in your organization. Shorten your time to analytics and take faster, better decisions based on data.
In this session you will learn how you can access your data from a myriad of tools such as multiple EMR clusters, Athena & Redshift.
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
Organisations are increasingly gaining insight and knowledge from a number of IoT, API, clickstream, unstructured, and log data sources. Learn how AWS Glue makes it easy to build and manage enterprise-grade data pipelines to ingest, clean, transform, and automatically catalogue data, which enables a variety of use cases such as ad-hoc analytics, data warehousing, big data analysis, and machine learning. Also, find out how to intergrate an end-to-end CI/CD pipeline to automate the release management process for your serverless data pipelines.
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
Level 200: Visualize Your Data in Data Lake with AWS Athena and AWS Quicksight
Nowadays, enterprises are building Data Lake which store lots of structured and unstructured data for data analysis. But it takes lots of time for building the data modeling and infrastructure that is required. How to make quick data queries without servers and databases is the next big question for every enterprises.
In this workshop, eCloudvalley, the first and only Premier Consulting Partner in GCR, will demonstrate how to use serverless architecture to visualize your data using Amazon Athena and Amazon Quicksight.
You can easily query and visualize the data in your S3, and get business insights with the combination of these two services. Also, you can also build business reports with other tools such as AWS IoT, Amazon Kinesis Firehose.
Reason to Attend:
Learn how to quickly search for thousands of data on S3 via serverless Amazon's Athena
Learn how to use AWS QuickSight to retrieve information from your database quickly and create detailed reports
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018Amazon Web Services
In this chalk talk, we discuss how to design a data lake, and how to permission different groups and applications to access and analyze datasets. Learn from subject-matter experts about a variety of AWS technologies for populating your data lake, monitoring new ingestion, and processing data for meaningful analysis. We examine considerations for structured data, such as relevant database engines with geospatial support, as well as considerations for unstructured data in the form of object storage. In addition, we address how to protect and secure data based on an organization’s needs.
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
Data lakes are emerging as the most common architecture built in data-driven organizations today. A data lake enables you to store unstructured, semi-structured, or fully-structured raw data as well as processed data for different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning. Well-designed data lakes ensure that organizations get the most business value from their data assets. In this session, you learn about the common challenges and patterns for designing an effective data lake on the AWS Cloud, with wisdom distilled from various customer implementations. We walk through patterns to solve data lake challenges, like real-time ingestion, choosing a partitioning strategy, file compaction techniques, database replication to your data lake, handling mutable data, machine learning integration, security patterns, and more.
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes, and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Amazon Web Services
In this session, you have the opportunity to learn the fundamental building blocks of a data lake on AWS. You design and build a serverless pipeline to ingest, process, optimize and query data in your very own data lake. We discuss different optimizations and best practices to tune your architecture for future growth.
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Amazon Web Services
In this workshop, learn how to automatically catalog datasets in your Amazon S3 data lake using AWS Glue crawlers. Also, learn how to interactively author ETL scripts in an Amazon SageMaker notebook connected to an AWS Glue development endpoint. Finally, learn how to deploy your ETL scripts into production by turning your ETL script into managed AWS Glue jobs and add appropriate AWS Glue scheduling and triggering conditions. The resulting datasets will automatically get registered in the AWS Glue Data Catalog, and you can then query these new datasets from Amazon Athena. Knowledge of Python and familiarity with big data applications is preferred but not required. Attendees must bring their own laptops.
by Avijit Goswami, Sr Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
Learning Objectives:
- Get an inside look at Amazon S3 Select and how it helps to accelerate application performance
- Learn about how Amazon Glacier Select helps you extend your data lake to archival storage
- Understand how different applications can leverage these features
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
Unni Pillai, Specialist Solution Architect, ASEAN, AWS.
Daniel Muller, Head of Cloud Infrastructure, Spuul.
As the volume and types of data continues to grow, customers often have valuable data that is not easily discoverable and available for analytics. A common challenge for data engineering teams is architecting a data lake that can cater to the needs of diverse users - from developers to business analysts to data scientists.
In this session, we will dive deep into building a data lake using Amazon S3, Amazon Kinesis, Amazon Athena and AWS Glue. We will also see how AWS Glue crawlers can automatically discover your data, extracting and cataloguing relevant metadata to reduce operations in preparing your data for downstream consumers.
Furthermore, learn from our customer Spuul, on how they moved from a Data Warehouse based analytics to a serverless data lake. Why and how did Spuul undertake this journey? Hear about the benefits and challenges they encountered.
ABD206-Building Visualizations and Dashboards with Amazon QuickSightAmazon Web Services
Just as a picture is worth a thousand words, a visual is worth a thousand data points. A key aspect of our ability to gain insights from our data is to look for patterns, and these patterns are often not evident when we simply look at data in tables. The right visualization will help you gain a deeper understanding in a much quicker timeframe. In this session, we will show you how to quickly and easily visualize your data using Amazon QuickSight. We will show you how you can connect to data sources, generate custom metrics and calculations, create comprehensive business dashboards with various chart types, and setup filters and drill downs to slice and dice the data.
Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...Amazon Web Services
Data preparation is always a challenge. Why care about infrastructure?
Come learn how to deploy your Spark jobs in minutes using our managed services, EMR & Glue and focus on your business needs.
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Web Services
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In this session, we live demo exciting new capabilities the team have been heads down building. SendGrid, a leader in trusted email delivery, discusses how they used Athena to reinvent a popular feature of their platform.
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
With over 90% of today’s data generated in the last two years, the rate of data growth is showing no sign of slowing down. In this session, we step through the challenges and best practices for capturing data, understanding what data you own, driving insights, and predicting the future using AWS services. We frame the session and demonstrations around common pitfalls of building data lakes and how to successfully drive analytics and insights from data. We also discuss the architecture patterns brought together key AWS services, including Amazon S3, AWS Glue, Amazon Athena, Amazon Kinesis, and Amazon Machine Learning. Discover the real-world application of data lakes for roles including data scientists and business users.
Stephen Moon, Sr. Solutions Architect, Amazon Web Services
James Juniper, Solution Architect for the Geo-Community Cloud, Natural Resources Canada
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
Gain in-depth knowledge and best practices for migrating commercial data warehouses to Amazon Redshift using AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT). We use an example based on an Oracle data warehouse, and we discuss approaches to migrate it to Amazon Redshift. We also discuss some of the common challenges, limitations, and workarounds, as well as the option of using AWS Snowball to migrate very large data warehouses to Amazon Redshift.
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
Introduction to AWS Glue: Data Analytics Week at the San Francisco Loft
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Asim Kumar Sasmal - Big Data Consultant, AWS Professional Services
Your data has value for multiple business functions in your organization. Shorten your time to analytics and take faster, better decisions based on data.
In this session you will learn how you can access your data from a myriad of tools such as multiple EMR clusters, Athena & Redshift.
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
Organisations are increasingly gaining insight and knowledge from a number of IoT, API, clickstream, unstructured, and log data sources. Learn how AWS Glue makes it easy to build and manage enterprise-grade data pipelines to ingest, clean, transform, and automatically catalogue data, which enables a variety of use cases such as ad-hoc analytics, data warehousing, big data analysis, and machine learning. Also, find out how to intergrate an end-to-end CI/CD pipeline to automate the release management process for your serverless data pipelines.
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
Level 200: Visualize Your Data in Data Lake with AWS Athena and AWS Quicksight
Nowadays, enterprises are building Data Lake which store lots of structured and unstructured data for data analysis. But it takes lots of time for building the data modeling and infrastructure that is required. How to make quick data queries without servers and databases is the next big question for every enterprises.
In this workshop, eCloudvalley, the first and only Premier Consulting Partner in GCR, will demonstrate how to use serverless architecture to visualize your data using Amazon Athena and Amazon Quicksight.
You can easily query and visualize the data in your S3, and get business insights with the combination of these two services. Also, you can also build business reports with other tools such as AWS IoT, Amazon Kinesis Firehose.
Reason to Attend:
Learn how to quickly search for thousands of data on S3 via serverless Amazon's Athena
Learn how to use AWS QuickSight to retrieve information from your database quickly and create detailed reports
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018Amazon Web Services
In this chalk talk, we discuss how to design a data lake, and how to permission different groups and applications to access and analyze datasets. Learn from subject-matter experts about a variety of AWS technologies for populating your data lake, monitoring new ingestion, and processing data for meaningful analysis. We examine considerations for structured data, such as relevant database engines with geospatial support, as well as considerations for unstructured data in the form of object storage. In addition, we address how to protect and secure data based on an organization’s needs.
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
Data lakes are emerging as the most common architecture built in data-driven organizations today. A data lake enables you to store unstructured, semi-structured, or fully-structured raw data as well as processed data for different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning. Well-designed data lakes ensure that organizations get the most business value from their data assets. In this session, you learn about the common challenges and patterns for designing an effective data lake on the AWS Cloud, with wisdom distilled from various customer implementations. We walk through patterns to solve data lake challenges, like real-time ingestion, choosing a partitioning strategy, file compaction techniques, database replication to your data lake, handling mutable data, machine learning integration, security patterns, and more.
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes, and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Amazon Web Services
In this session, you have the opportunity to learn the fundamental building blocks of a data lake on AWS. You design and build a serverless pipeline to ingest, process, optimize and query data in your very own data lake. We discuss different optimizations and best practices to tune your architecture for future growth.
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Amazon Web Services
In this workshop, learn how to automatically catalog datasets in your Amazon S3 data lake using AWS Glue crawlers. Also, learn how to interactively author ETL scripts in an Amazon SageMaker notebook connected to an AWS Glue development endpoint. Finally, learn how to deploy your ETL scripts into production by turning your ETL script into managed AWS Glue jobs and add appropriate AWS Glue scheduling and triggering conditions. The resulting datasets will automatically get registered in the AWS Glue Data Catalog, and you can then query these new datasets from Amazon Athena. Knowledge of Python and familiarity with big data applications is preferred but not required. Attendees must bring their own laptops.
by Avijit Goswami, Sr Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
Learning Objectives:
- Get an inside look at Amazon S3 Select and how it helps to accelerate application performance
- Learn about how Amazon Glacier Select helps you extend your data lake to archival storage
- Understand how different applications can leverage these features
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
Unni Pillai, Specialist Solution Architect, ASEAN, AWS.
Daniel Muller, Head of Cloud Infrastructure, Spuul.
As the volume and types of data continues to grow, customers often have valuable data that is not easily discoverable and available for analytics. A common challenge for data engineering teams is architecting a data lake that can cater to the needs of diverse users - from developers to business analysts to data scientists.
In this session, we will dive deep into building a data lake using Amazon S3, Amazon Kinesis, Amazon Athena and AWS Glue. We will also see how AWS Glue crawlers can automatically discover your data, extracting and cataloguing relevant metadata to reduce operations in preparing your data for downstream consumers.
Furthermore, learn from our customer Spuul, on how they moved from a Data Warehouse based analytics to a serverless data lake. Why and how did Spuul undertake this journey? Hear about the benefits and challenges they encountered.
ABD206-Building Visualizations and Dashboards with Amazon QuickSightAmazon Web Services
Just as a picture is worth a thousand words, a visual is worth a thousand data points. A key aspect of our ability to gain insights from our data is to look for patterns, and these patterns are often not evident when we simply look at data in tables. The right visualization will help you gain a deeper understanding in a much quicker timeframe. In this session, we will show you how to quickly and easily visualize your data using Amazon QuickSight. We will show you how you can connect to data sources, generate custom metrics and calculations, create comprehensive business dashboards with various chart types, and setup filters and drill downs to slice and dice the data.
Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...Amazon Web Services
Data preparation is always a challenge. Why care about infrastructure?
Come learn how to deploy your Spark jobs in minutes using our managed services, EMR & Glue and focus on your business needs.
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Web Services
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In this session, we live demo exciting new capabilities the team have been heads down building. SendGrid, a leader in trusted email delivery, discusses how they used Athena to reinvent a popular feature of their platform.
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Modern data is massive, quickly evolving, unstructured, and increasingly hard to catalog and understand from multiple consumers and applications. This presentation will guide you though the best practices for designing a robust data architecture, highlightning the benefits and typical challenges of data lakes and data warehouses. We will build a scalable solution based on managed services such as Amazon Athena, AWS Glue, and AWS Lake Formation.
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
With over 90% of today’s data generated in the last two years, the rate of data growth is showing no sign of slowing down. In this session, we step through the challenges and best practices for capturing data, understanding what data you own, driving insights, and predicting the future using AWS services. We frame the session and demonstrations around common pitfalls of building data lakes and how to successfully drive analytics and insights from data. We also discuss the architecture patterns brought together key AWS services, including Amazon S3, AWS Glue, Amazon Athena, Amazon Kinesis, and Amazon Machine Learning. Discover the real-world application of data lakes for roles including data scientists and business users.
Stephen Moon, Sr. Solutions Architect, Amazon Web Services
James Juniper, Solution Architect for the Geo-Community Cloud, Natural Resources Canada
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning (Amazon ML) services work together to build a successful data lake for various roles, including data scientists and business users.
A data lake is an architectural approach that allows you to store massive amounts of data into a central location, so it's readily available to be categorized, processed, analyzed and consumed by diverse groups within an organization.In this session, we will introduce the Data Lake concept and its implementation on AWS.We will explain the different roles our services play and how they fit into the Data Lake picture.
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
AWS makes it easy to build and operate a highly scalable and flexible data platforms to collect, process, and analyze data so you can get timely insights and react quickly to new information. In this session we will talk about how to improve over time using your data. How do you take your everyday data and build relevant business insights, to help and continuously improve your business processes, and keep your innovation going based on your data.
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes, and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
Business analysts require easy access to data from across different parts of the business. In this session, learn why more customers have adopted Amazon Redshift than any other cloud-native Data Warehouse, and how they are building a broader analytics capability with data lakes on AWS.
Understand how AWS built machine learning (ML) into the services, taking away many of the time-intensive tasks of building an analytics platform. We cover why these customers choose Amazon Redshift for the accessibility to analysts, business reporting, deep security, ability to scale from GB to PB, and integration with the broader platform.
Learn about these customers who are increasingly opening insights to data analysts for data discovery and data scientists for machine learning. We also share how the AWS services such as AWS Glue and the coming ML-enabled AWS Lake Formation take away most of the heavy lifting,
Learn about data lifecycle best practices in the AWS Cloud, so you can optimize performance and lower the costs of data ingestion, staging, storage, cleansing, analytics and visualization, and archiving.
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
1. Build Data Lakes and Analytics on AWS
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
2. VisualizationVariability
Data Is Defined Many Different Ways
Volume Velocity Variety Veracity Value
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
3. Data Is Changing à Analytics Are Adopting
Capture and store
new data at PB-EB
scale
Do new type of analytics
in a cost effective way
• Machine learning
• Big data processing
• Real-time analytics
• Full-text search
New types of
analytics
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
4. Public Sector entities that successfully generate value from their data will be
able to offer better citizen services and data driven decisions
Most Important: Driving Value from Data
What are those use cases?
Analytics
Smart Cities
AI/ML Data lake
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
5. *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
What is Data Lake?
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
6. Traditionally, Analytics Used to Look Like This
OLTP ERP CRM LOB
Data warehouse
Business intelligence • Relational data
• TBs–PBs scale
• Schema defined prior to data load
• Operational reporting and ad hoc
• Large initial CAPEX + $10K–$50K/TB/year
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
7. Data Lakes Extend the Traditional Approach
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Relational and nonrelational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
8. A Data Lake is not an Enterprise Data Warehouse
Complementary to EDW (not replacement) EDW can be sourced from Data Lake
Schema on read (no predefined schemas) Schema on write (predefined schemas)
Structured/semi-structured/Unstructured data Structured data only
Fast ingestion of new data/content Time consuming to introduce new content
Data Science + Prediction/Advanced Analytics + BI use
cases
BI use cases
Data at low level of detail/granularity Data at summary/aggregated level of detail
Loosely defined SLAs Tight SLAs (production schedules)
Flexibility in tools (open source/tools for advanced
analytics)
Limited flexibility in tools (SQL only)
Elastic storage and compute capacity – decoupled
Explicitly sized environments, compute and storage
scaled in linearly
Data Lake EDW
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
9. Data Lakes from AWS
Analytics
• Unmatched durability, and availability at EB scale
• Best security, compliance, and audit capabilities
• Object-level controls for fine-grain access
• Fastest performance by retrieving subsets of data
• The most ways to bring data in
• 2x as many integrations with partners
• Analyze with broadest set of analytics & ML services
Machine
learning
Real-time dataOn-premises
Data Lake
on AWS
movementdata movement
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
10. Managed ML Service
Deep Learning AMIs
Video and Image Recognition
Conversational Interfaces
Deep-Learning Video Camera
Natural Language Processing
Language Translation
Speech Recognition
Text-to-Speech
Interactive Analysis
Hadoop & Spark
Data Warehousing
Full-text search
Real-time analytics
Dashboards & Visualizations
Dedicated Network connection
Secure appliances
Ruggedized Shipping Container
Database migration
Connect Devices to AWS
Real-time Data Streams
Real-time Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
Data Lakes, Analytics, and IoT Portfolio from AWS
Broadest, deepest set of analytic services
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
11. Data Lakes, Analytics, and IoT Portfolio from AWS
Broadest, deepest set of analytic services
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
12. How do I ingest my data?
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
13. How do I drive value?
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time data movementTraditional data movement
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
14. Ingest data based on the type of data
Open and comprehensive
• Data movement from on-premises
datacenters
• Dedicated network connection
• Secure appliances
• Ruggedized shipping container
• Database migration
• Gateway that lets applications write to the cloud
• Data movement from real-time sources
• Connect devices to AWS
• Real-time data streams
• Real-time video streams
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS Storage Gateway
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data movement from
real-time sources
Data movement from
your datacenters
Amazon S 3
Amazon Gl ac ier
AWS Gl u e
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
15. Real-time data movement and data lakes on
AWS
Amazon
Kinesis Data
Firehose
AWS Glue
Data Catalog
Amazon
S3 Data
Data Lake
on AWS
Amazon
Kinesis Data
Streams
Data definitionKinesis Agent
Apache Kafka
AWS SDK
LOG4J
Flume
Fluentd
AWS Mobile SDK
Kinesis Producer Library
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
16. IMPORTANT: Ingest data in its raw form …
Open and comprehensive
Amazon S 3
Amazon Gl ac ier
AWS Gl u e
• Store the data in its raw form:
• BEFORE
• Transforming
• Analyzing
• Manipulating
• Doing … anything … to it
CSV
ORC
Grok
Avro
Parquet
JSON
• This becomes your source of record you can
always go back to …
• Lifecycle policies allow you to shift it to
warm and cold storage.
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
17. Preparing raw data for consumption
Raw data stored in Data Lake:
Preparation:
Normalized
Partitioned
Compressed
Storage Optimized
Extract – Load – Transform
Raw
Ingestion
Curated
DataSets
Data Catalog
ELT
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
18. Which tool should I use to
analyze my data?
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
19. How do I drive value?
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine Learning
Real-time dataTraditional movementdata movement
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
20. Different tools for different users…
Business
Reporting
Data
Catalog
Central
Storage
SagemakerMachine Learning/Deep Learning
Data Scientists
Data Engineer
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
21. Amazon Athena – interactive analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
$ SQL
Query instantly
Zero setup cost; just
point to Amazon S3
and start querying
Pay per query
Pay only for queries run;
save 30%–90% on per-
query costs through
compression
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with Amazon
QuickSight
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
22. Amazon EMR – big data processing
Analytics and ML at scale
19 open-source projects: Apache Hadoop, Spark, HBase, Presto, and more
Enterprise-grade security
$
Latest versions
Updated with the latest
open source frameworks
within 30 days of release
Low cost
Flexible billing with per-
second billing, Amazon
EC2 Spot, Reserved
Instances, and Auto
Scaling to reduce costs
50%-80%
Use Amazon S3 storage
Process data directly in
the Amazon S3 data lake
securely with high
performance using the
EMRFS connector
Easy
Launch fully managed
Hadoop & Spark in minutes;
no cluster setup, node
provisioning, cluster tuning
Data Lake
100110000100101011100
1010101110010101000
00111100101100101
010001100001
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
23. Amazon Redshift – data warehousing
Fast, powerful, simple, and fully managed data warehouse at 1/10 the cost
Massively parallel, scale from gigabytes to petabytes
Fast at scale
Columnar storage
technology to improve I/O
efficiency and scale query
performance
$
Inexpensive
As low as $1,000 per
terabyte per year, 1/10 the
cost of traditional data
warehouse solutions; start
at $0.25 per hour
Open file formats Secure
Audit everything; encrypt
data end-to-end;
extensive certification and
compliance
Analyze optimized data
formats on the latest SSD,
and all open data formats in
Amazon S3
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
24. Machine Learning & Big Data
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
25. Data Lakes driving Machine Learning
Better
Decisions
Object Storage
Databases
Data warehouse
Streaming analytics
BI
Hadoop
Spark/Presto
Elasticsearch
Better
Products Machine Learning
Deep Learning/ AI
More
Users
More
Data
Click stream
User activity
Generated content
Purchases
Clicks
Likes
Sensor data
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
26. Agility in Machine Learning
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine Learning
Real-time dataOn-premises movementdata movement
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
27. Varied ML Use Cases
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
28. Broadest and deepest set of capabilities
The Amazon ML Stack
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
29. Modern data architecture for media enrichment
Insights to enhance viewer engagement, personalization, monetization
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
30. Activity Highlights and Suspicious Activity Pathing
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
31. Sentiment analysis
Discover insights and relationships in text, Social media, etc..
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
32. In Summary…
• Data lakes and data warehouses complement each
other
• Loose coupling, but highly performant
• Storage, analytics, metadata management, etc..
• Future-proof your analytics
• Choosing the best tool for the job
• Elasticity and multiple clusters for dedicated purposes
• Don’t forget metadata management
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
33. *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
Do you want to build
Data Lake?
@2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.