講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
講師: Yian Han, Senior Business Development Manager, AWS
AWS offers a family of intelligent services that provide cloud-native machine learning and deep learning technologies to address your different use cases and needs. For developers looking to add managed AI services to their applications, AWS brings natural language understanding (NLU) and text-to-speech (TTS) with Amazon Polly, visual search and image recognition with Amazon Rekognition, and developer-focused machine learning with Amazon Machine Learning. In this talk you will learn about these services and see demos of their capabilities
講師: Bob Yin, Senior Product Specialist, Informatica
These Informatica Cloud offerings are pre-built packages for quick time-to-value for customers looking to fast-track cloud data management initiatives. For example, customers can quickly kick start a new Amazon Redshift data warehouse project and use Informatica Cloud Connector for Amazon Redshift to load it with meaningful connected data from cloud sources such as Salesforce.com or on-premises sources such as relational databases -- all within hours, not months.
講師: George Chiu 邱志威, Sr. Industry Consultant, Teradata
Learn how Netflix engages customers by leveraging Teradata as a critical component of its data and analytics platform to create a data-driven, customer-focused business.
講師: Ivan Cheng, Solution Architect, AWS
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
https://aws.amazon.com/webinars/anz-webinar-series/
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to AWS in order to save costs, increase availability, and improve performance. AWS offers a broad set of analytics services, including solutions for batch processing, stream processing, machine learning, data workflow orchestration, and data warehousing. This session will focus on identifying the components and workflows in your current environment; and providing the best practices to migrate these workloads to the right AWS data analytics product. We will cover services such as Amazon EMR, Amazon Athena, Amazon Redshift, Amazon Kinesis, and more. We will also feature Vanguard, an American investment management company based in Malvern, Pennsylvania with over $4.4 trillion in assets under management. Ritesh Shah, Sr. Program Manager for Cloud Analytics Program at Vanguard, will describe how they orchestrated their migration to AWS analytics services, including Hadoop and Spark workloads to Amazon EMR. Ritesh will highlight the technical challenges they faced and overcame along the way, as well as share common recommendations and tuning tips to accelerate the time to production.
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
Unni Pillai, Specialist Solution Architect, ASEAN, AWS.
Daniel Muller, Head of Cloud Infrastructure, Spuul.
As the volume and types of data continues to grow, customers often have valuable data that is not easily discoverable and available for analytics. A common challenge for data engineering teams is architecting a data lake that can cater to the needs of diverse users - from developers to business analysts to data scientists.
In this session, we will dive deep into building a data lake using Amazon S3, Amazon Kinesis, Amazon Athena and AWS Glue. We will also see how AWS Glue crawlers can automatically discover your data, extracting and cataloguing relevant metadata to reduce operations in preparing your data for downstream consumers.
Furthermore, learn from our customer Spuul, on how they moved from a Data Warehouse based analytics to a serverless data lake. Why and how did Spuul undertake this journey? Hear about the benefits and challenges they encountered.
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
講師: Yian Han, Senior Business Development Manager, AWS
AWS offers a family of intelligent services that provide cloud-native machine learning and deep learning technologies to address your different use cases and needs. For developers looking to add managed AI services to their applications, AWS brings natural language understanding (NLU) and text-to-speech (TTS) with Amazon Polly, visual search and image recognition with Amazon Rekognition, and developer-focused machine learning with Amazon Machine Learning. In this talk you will learn about these services and see demos of their capabilities
講師: Bob Yin, Senior Product Specialist, Informatica
These Informatica Cloud offerings are pre-built packages for quick time-to-value for customers looking to fast-track cloud data management initiatives. For example, customers can quickly kick start a new Amazon Redshift data warehouse project and use Informatica Cloud Connector for Amazon Redshift to load it with meaningful connected data from cloud sources such as Salesforce.com or on-premises sources such as relational databases -- all within hours, not months.
講師: George Chiu 邱志威, Sr. Industry Consultant, Teradata
Learn how Netflix engages customers by leveraging Teradata as a critical component of its data and analytics platform to create a data-driven, customer-focused business.
講師: Ivan Cheng, Solution Architect, AWS
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
https://aws.amazon.com/webinars/anz-webinar-series/
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to AWS in order to save costs, increase availability, and improve performance. AWS offers a broad set of analytics services, including solutions for batch processing, stream processing, machine learning, data workflow orchestration, and data warehousing. This session will focus on identifying the components and workflows in your current environment; and providing the best practices to migrate these workloads to the right AWS data analytics product. We will cover services such as Amazon EMR, Amazon Athena, Amazon Redshift, Amazon Kinesis, and more. We will also feature Vanguard, an American investment management company based in Malvern, Pennsylvania with over $4.4 trillion in assets under management. Ritesh Shah, Sr. Program Manager for Cloud Analytics Program at Vanguard, will describe how they orchestrated their migration to AWS analytics services, including Hadoop and Spark workloads to Amazon EMR. Ritesh will highlight the technical challenges they faced and overcame along the way, as well as share common recommendations and tuning tips to accelerate the time to production.
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
Unni Pillai, Specialist Solution Architect, ASEAN, AWS.
Daniel Muller, Head of Cloud Infrastructure, Spuul.
As the volume and types of data continues to grow, customers often have valuable data that is not easily discoverable and available for analytics. A common challenge for data engineering teams is architecting a data lake that can cater to the needs of diverse users - from developers to business analysts to data scientists.
In this session, we will dive deep into building a data lake using Amazon S3, Amazon Kinesis, Amazon Athena and AWS Glue. We will also see how AWS Glue crawlers can automatically discover your data, extracting and cataloguing relevant metadata to reduce operations in preparing your data for downstream consumers.
Furthermore, learn from our customer Spuul, on how they moved from a Data Warehouse based analytics to a serverless data lake. Why and how did Spuul undertake this journey? Hear about the benefits and challenges they encountered.
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting, in its original format and extract value. In this session learn how to architect and implement an Analytics Data Lake. Hear customer examples of best practices and learn from their architectural blueprints.
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...Amazon Web Services
The world is producing an ever-increasing volume, velocity, and variety of data. For many consumers, batch analytics is no longer enough; they need sub-second analysis on fast-moving data. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how?
If you missed this popular presentation at re:Invent, attend this webinar where we simplify big data processing as a pipeline comprising various stages: ingest, store, process, analyze & visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, and durability. Finally, we provide a reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems.
Learning Objectives:
Understand key AWS Big Data services including S3, Amazon EMR, Kinesis, and Redshift
Learn architectural patterns for Big Data
Hear best practices for building Big Data applications on AWS
Didn’t make it to re:Invent? Here’s another chance to attend this popular presentation
Who Should Attend:
Architects, developers and data scientists who are looking to start a Big Data initiative
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
Level 200: Visualize Your Data in Data Lake with AWS Athena and AWS Quicksight
Nowadays, enterprises are building Data Lake which store lots of structured and unstructured data for data analysis. But it takes lots of time for building the data modeling and infrastructure that is required. How to make quick data queries without servers and databases is the next big question for every enterprises.
In this workshop, eCloudvalley, the first and only Premier Consulting Partner in GCR, will demonstrate how to use serverless architecture to visualize your data using Amazon Athena and Amazon Quicksight.
You can easily query and visualize the data in your S3, and get business insights with the combination of these two services. Also, you can also build business reports with other tools such as AWS IoT, Amazon Kinesis Firehose.
Reason to Attend:
Learn how to quickly search for thousands of data on S3 via serverless Amazon's Athena
Learn how to use AWS QuickSight to retrieve information from your database quickly and create detailed reports
Big Data and Analytics – End to End on AWS – Russell NashAmazon Web Services
In this session we will look at the common patterns for the ingest, storage, processing and analysis of different types of data on the AWS platform and illustrate how you can harness the power and scale of the cloud to drive innovation in your own business.
We will introduce key concepts for a data lake and present aspects related to its implementation. Also discussing critical success factors, pitfalls to avoid operational aspects, and insights on how AWS enables a server-less data lake architecture.
Speaker: Sebastien Menant, Solutions Architect, Amazon Web Services
A data lake is an architectural approach that allows you to store massive amounts of data into a central location, so it's readily available to be categorized, processed, analyzed and consumed by diverse groups within an organization.In this session, we will introduce the Data Lake concept and its implementation on AWS.We will explain the different roles our services play and how they fit into the Data Lake picture.
Collecting, maintaining, and analyzing data is key to keeping pace within any industry today. In addition to being a critical competitive asset, maintaining corporate data requires careful foundational planning to ensure that the data is secure at all stages. Your big data may include not only proprietary non-public information, but also controlled data that must adhere to regulations such as HIPAA or ITAR. Securing this data while maintaining access for authorized data analytics and reporting workloads can pose significant challenges. In this talk, you’ll learn about strategies leveraging tools such as AWS Identity and Access Management (IAM), AWS Key Management Service (KMS) , Amazon S3, and Amazon EMR to secure your big data workloads in the cloud.
Level: 200
Speaker: Hannah Marlowe - Consultant, Federal, WWPS Professional Services
Data Lake allows an organisation to store all of their data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand. In this session we will explore the architecture of a Data Lake on AWS and cover topics such as storage, processing and security.
Speakers:
Tom McMeekin, Associate Solutions Architect, Amazon Web Services
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
Using AWS has never been easier or more affordable to solve business problems and uncover new opportunities using data. Now, businesses of all sizes and across all industries can take advantage of big data technologies and easily collect, store, process, analyze, and share their data. Gain a thorough understanding of what AWS offers across the big data lifecycle and learn architectural best practices for applying these technologies to your projects. We will also deep dive into how to use AWS services such as Kinesis, DynamoDB, Redshift, and Quicksight to optimize logging, build real-time applications, and analyze and visualize data at any scale.
(ISM213) Building and Deploying a Modern Big Data Architecture on AWSAmazon Web Services
"The AWS platform enables large enterprises to use data to solve business problems and uncover opportunities more easily and affordably than ever before. However, to truly take advantage of AWS, enterprises need a way to collect, store, process, analyze, and continually execute on their data.
Datapipe has been an AWS partner for more than five years. In that time, it has developed a proprietary process for the deployment of AWS environments, as well as the processing and evaluation of big data analytics to optimize these environments over time. This flexible solution includes automation tools, continuous monitoring, and cloud analytics. It protects against architectural sprawl and continually redesigns for scalability. This kind of continuous build environment allows Datapipe to examine the AWS environment as a complete picture and ensure the cloud environment is running as efficiently and effectively as possible, ultimately reducing overhead costs for the enterprise.
In this session, Jason Woodlee, Senior Director of Cloud Products at Datapipe, will discuss the technical details of designing and deploying a modern big data architecture on AWS, including application purpose and design, development environment and language overview, DevOps automation best practices, and continuous build and test frameworks. Session sponsored by Datapipe."
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
The AWS Workshop Series Online is a series of live webinars designed for IT professionals who are looking to leverage the AWS Cloud to build and transform their business, are new to the AWS Cloud or looking to further expand their skills and expertise. In this series, we will cover : 'Modern Data Architectures for Business Insights at Scale'.
Using AWS to design and build your data architecture has never been easier to gain insights and uncover new opportunities to scale and grow your business. Join this workshop to learn how you can gain insights at scale with the right big data applications.
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes, and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Today organizations find themselves in a data rich world with a growing need for increased agility and accessibility of all this data for analysis and deriving keen insights to drive strategic decisions. Creating a data lake helps you to manage all the disparate sources of data you are collecting, in its original format and extract value. In this session learn how to architect and implement an Analytics Data Lake. Hear customer examples of best practices and learn from their architectural blueprints.
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...Amazon Web Services
The world is producing an ever-increasing volume, velocity, and variety of data. For many consumers, batch analytics is no longer enough; they need sub-second analysis on fast-moving data. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how?
If you missed this popular presentation at re:Invent, attend this webinar where we simplify big data processing as a pipeline comprising various stages: ingest, store, process, analyze & visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, and durability. Finally, we provide a reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems.
Learning Objectives:
Understand key AWS Big Data services including S3, Amazon EMR, Kinesis, and Redshift
Learn architectural patterns for Big Data
Hear best practices for building Big Data applications on AWS
Didn’t make it to re:Invent? Here’s another chance to attend this popular presentation
Who Should Attend:
Architects, developers and data scientists who are looking to start a Big Data initiative
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
Level 200: Visualize Your Data in Data Lake with AWS Athena and AWS Quicksight
Nowadays, enterprises are building Data Lake which store lots of structured and unstructured data for data analysis. But it takes lots of time for building the data modeling and infrastructure that is required. How to make quick data queries without servers and databases is the next big question for every enterprises.
In this workshop, eCloudvalley, the first and only Premier Consulting Partner in GCR, will demonstrate how to use serverless architecture to visualize your data using Amazon Athena and Amazon Quicksight.
You can easily query and visualize the data in your S3, and get business insights with the combination of these two services. Also, you can also build business reports with other tools such as AWS IoT, Amazon Kinesis Firehose.
Reason to Attend:
Learn how to quickly search for thousands of data on S3 via serverless Amazon's Athena
Learn how to use AWS QuickSight to retrieve information from your database quickly and create detailed reports
Big Data and Analytics – End to End on AWS – Russell NashAmazon Web Services
In this session we will look at the common patterns for the ingest, storage, processing and analysis of different types of data on the AWS platform and illustrate how you can harness the power and scale of the cloud to drive innovation in your own business.
We will introduce key concepts for a data lake and present aspects related to its implementation. Also discussing critical success factors, pitfalls to avoid operational aspects, and insights on how AWS enables a server-less data lake architecture.
Speaker: Sebastien Menant, Solutions Architect, Amazon Web Services
A data lake is an architectural approach that allows you to store massive amounts of data into a central location, so it's readily available to be categorized, processed, analyzed and consumed by diverse groups within an organization.In this session, we will introduce the Data Lake concept and its implementation on AWS.We will explain the different roles our services play and how they fit into the Data Lake picture.
Collecting, maintaining, and analyzing data is key to keeping pace within any industry today. In addition to being a critical competitive asset, maintaining corporate data requires careful foundational planning to ensure that the data is secure at all stages. Your big data may include not only proprietary non-public information, but also controlled data that must adhere to regulations such as HIPAA or ITAR. Securing this data while maintaining access for authorized data analytics and reporting workloads can pose significant challenges. In this talk, you’ll learn about strategies leveraging tools such as AWS Identity and Access Management (IAM), AWS Key Management Service (KMS) , Amazon S3, and Amazon EMR to secure your big data workloads in the cloud.
Level: 200
Speaker: Hannah Marlowe - Consultant, Federal, WWPS Professional Services
Data Lake allows an organisation to store all of their data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand. In this session we will explore the architecture of a Data Lake on AWS and cover topics such as storage, processing and security.
Speakers:
Tom McMeekin, Associate Solutions Architect, Amazon Web Services
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
Using AWS has never been easier or more affordable to solve business problems and uncover new opportunities using data. Now, businesses of all sizes and across all industries can take advantage of big data technologies and easily collect, store, process, analyze, and share their data. Gain a thorough understanding of what AWS offers across the big data lifecycle and learn architectural best practices for applying these technologies to your projects. We will also deep dive into how to use AWS services such as Kinesis, DynamoDB, Redshift, and Quicksight to optimize logging, build real-time applications, and analyze and visualize data at any scale.
(ISM213) Building and Deploying a Modern Big Data Architecture on AWSAmazon Web Services
"The AWS platform enables large enterprises to use data to solve business problems and uncover opportunities more easily and affordably than ever before. However, to truly take advantage of AWS, enterprises need a way to collect, store, process, analyze, and continually execute on their data.
Datapipe has been an AWS partner for more than five years. In that time, it has developed a proprietary process for the deployment of AWS environments, as well as the processing and evaluation of big data analytics to optimize these environments over time. This flexible solution includes automation tools, continuous monitoring, and cloud analytics. It protects against architectural sprawl and continually redesigns for scalability. This kind of continuous build environment allows Datapipe to examine the AWS environment as a complete picture and ensure the cloud environment is running as efficiently and effectively as possible, ultimately reducing overhead costs for the enterprise.
In this session, Jason Woodlee, Senior Director of Cloud Products at Datapipe, will discuss the technical details of designing and deploying a modern big data architecture on AWS, including application purpose and design, development environment and language overview, DevOps automation best practices, and continuous build and test frameworks. Session sponsored by Datapipe."
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
The AWS Workshop Series Online is a series of live webinars designed for IT professionals who are looking to leverage the AWS Cloud to build and transform their business, are new to the AWS Cloud or looking to further expand their skills and expertise. In this series, we will cover : 'Modern Data Architectures for Business Insights at Scale'.
Using AWS to design and build your data architecture has never been easier to gain insights and uncover new opportunities to scale and grow your business. Join this workshop to learn how you can gain insights at scale with the right big data applications.
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes, and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
講師: Jhen-Wei Huang, Solution Architect, AWS
Artificial Intelligence (AI) and deep learning are now ready to power your business, as it is powering most of the innovation of Amazon.com with autonomous drones, and robots, Amazon Alexa, Amazon Go, and many other hard and important business problems. Come and learn why and how to get started with deep learning, and what you can expect from a future with better AI in the cloud and on the edge.
SAN Extension Design and Solutions
Help your organization ensure data is backed up and available, both locally and remotely. Expert guidance on the pros and cons of various SAN design options, and which to choose
Learn from Cisco experts in this technical session as we cover the following:
•How to achieve the desired recovery time objective and recovery point objective for the business
•Pros and cons of various SAN extension solution designs, and which one to choose
•Cisco solutions and products for SAN extension using native Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), and Fibre Channel over Internet Protocol (FCIP) protocols
•Best practices collected from a decade’s worth of experience with some of the largest deployments in the world
•Configuration guidelines/best practices to increase the return on your investment
•Guidelines for increasing performance and security while lowering solution costs
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
In today’s session we will share with you an overview of what the typical challenges when adoption Big Data are, and how the AWS Big Data platform allows you to tackle this challenges and leverage the right Analytical/Big Data solutions in order to become successful with your strategy (Whiteboard presentation)
Antoine Genereux takes us on a detailed overview of the Database solutions available on the AWS Cloud, addressing the needs and requirements of customers at all levels. He also discusses Business Intelligence and Analytics solutions.
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivAmazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
by Dario Rivera, Solutions Architect, AWS
The world is producing an ever-increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Presented by: Arie Leeuwesteijn, Principal Solutions Architect, Amazon Web Services
Customer Guest: Sander Kieft, Sanoma
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSAmazon Web Services
With an ever-increasing set of technologies to process big data, organizations often struggle to understand how to build scalable and cost-effective big data applications.
In this webinar, we will simplify big data processing as a pipeline comprising various stages; and then show you how to choose the right technology for each stage based on criteria such as data structure, design patterns, and best practices.
Learning Objectives:
Understand key AWS Big Data services including S3, Amazon EMR, Kinesis, and Redshift
Learn architectural patterns for Big Data
Hear best practices for building Big Data applications on AWS
Who Should Attend:
Architects, developers and data scientists who are looking to start a Big Data initiative
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Speakers:
Neel Mitra - Solutions Architect, AWS
Roger Dahlstrom - Solutions Architect, AWS
Database and Analytics on the AWS Cloud - AWS Innovate TorontoAmazon Web Services
Antoine Genereux, AWS Solutions Architect, takes us on a tour of database solutions available for the AWS Cloud, and powerful analytics and business intelligence reporting tools.
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...1Strategy
In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. John Pignata, AWS Startup Solutions Architect, will discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. He will provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Level: Intermediate
Speakers:
Tony Nguyen - Senior Consultant, ProServe, AWS
Hannah Marlowe - Consultant - Federal, AWS
Data Analytics Week at the San Francisco Loft
Using Data Lakes
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Speakers:
John Mallory - Principal Business Development Manager Storage (Object), AWS
Hemant Borole - Sr. Big Data Consultant, AWS
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
1. Big Data Architectural Patterns
and Best Practices on AWS
Name: Han xiaoyong
Department: SA
Company: AWS
2. What to Expect from the Session
Big data challenges
Architectural principles
How to simplify big data processing
What technologies should you use?
• Why?
• How?
Reference architecture
Design patterns
8. Architectural Principles
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage AWS managed services
• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns
• Immutable logs, materialized views
Be cost-conscious
• Big data ≠ big cost
9. Simplify Big Data Processing
COLLECT STORE PROCESS/
ANALYZE
CONSUME
Time to answer (Latency)
Throughput
Cost
11. Types of DataCOLLECT
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Applications
In-memory data structures
Database records
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
LoggingTransport
Search documents
Log files
Messaging
Message MESSAGES
Messaging
Messages
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
IoT
Data streams
Transactions
Files
Events
12. Hot Warm Cold
Volume MB–GB GB–TB PB–EB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low–high High Very high
Request rate Very high High Low
Cost/GB $$-$ $-¢¢ ¢
Hot data Warm data Cold data
Data Characteristics: Hot, Warm, Cold
17. What About Amazon SQS?
• Decouple producers & consumers
• Persistent buffer
• Collect multiple streams
• No client ordering (Standard)
• FIFO queue preserves client
ordering
• No streaming MapReduce
• No parallel consumption
• Amazon SNS can publish to
multiple SNS subscribers
(queues or ʎ functions)
Publisher
Amazon SNS
topic
function
ʎ
AWS Lambda
function
Amazon SQS
queue
queue
Subscriber
Consumers
4 3 2 1
12344 3 2 1
1234
2134
13342
Standard
FIFO
18. Which Stream/Message Storage Should I Use?
Amazon
DynamoDB
Streams
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Apache
Kafka
Amazon
SQS
(Standard)
Amazon SQS
(FIFO)
AWS managed Yes Yes Yes No Yes Yes
Guaranteed ordering Yes Yes No Yes No Yes
Delivery (deduping) Exactly-once At-least-once At-least-once At-least-once At-least-once Exactly-once
Data retention period 24 hours 7 days N/A Configurable 14 days 14 days
Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ 3 AZ
Scale /
throughput
No limit /
~ table IOPS
No limit /
~ shards
No limit /
automatic
No limit /
~ nodes
No limits /
automatic
300 TPS /
queue
Parallel consumption Yes Yes No Yes No No
Stream MapReduce Yes Yes N/A Yes N/A N/A
Row/object size 400 KB 1 MB Destination
row/object size
Configurable 256 KB 256 KB
Cost Higher (table
cost)
Low Low Low (+admin) Low-medium Low-medium
Hot Warm
19. In-memory
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Search
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon S3
Amazon SQS
Message
Amazon S3
File
LoggingIoTApplicationsTransportMessaging
File Storage
20. Why Is Amazon S3 Good for Big Data?
• Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
• No need to run compute clusters for storage (unlike HDFS)
• Can run transient Hadoop clusters & Amazon EC2 Spot Instances
• Multiple & heterogeneous analysis clusters can use the same data
• Unlimited number of objects and volume of data
• Very high bandwidth – no aggregate throughput limit
• Designed for 99.99% availability – can tolerate zone failure
• Designed for 99.999999999% durability
• No need to pay for data replication
• Native support for versioning
• Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies
• Secure – SSL, client/server-side encryption at rest
• Low cost
21. What About HDFS & Data Tiering?
• Use HDFS for very frequently accessed
(hot) data
• Use Amazon S3 Standard for frequently
accessed data
• Use Amazon S3 Standard – IA for less
frequently accessed data
• Use Amazon Glacier for archiving cold data
22. In-memory
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS Database
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Search
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon SQS
Message
Amazon S3
File
LoggingIoTApplicationsTransportMessaging
In-memory, Database,
Search
23. Best Practice: Use the Right Tool for the Job
Search
Amazon Elasticsearch
Service
In-memory
Amazon ElastiCache
Redis
Memcached
SQL
Amazon Aurora
Amazon RDS
MySQL
PostgreSQL
Oracle
SQL Server
NoSQL
Amazon DynamoDB
Cassandra
HBase
MongoDB
24. COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon SQS
Message
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
SearchSQLNoSQLCacheFile
LoggingIoTApplicationsTransportMessaging
Amazon ElastiCache
• Managed Memcached or Redis service
Amazon DynamoDB
• Managed NoSQL database service
Amazon RDS
• Managed relational database service
Amazon Elasticsearch Service
• Managed Elasticsearch service
25. Which Data Store Should I Use?
Data structure → Fixed schema, JSON, key-value
Access patterns → Store data in the format you will access it
Data characteristics → Hot, warm, cold
Cost → Right cost
26. Data Structure and Access Patterns
Access Patterns What to use?
Put/Get (key, value) In-memory, NoSQL
Simple relationships → 1:N, M:N NoSQL
Multi-table joins, transaction, SQL SQL
Faceting, search Search
Data Structure What to use?
Fixed schema SQL, NoSQL
Schema-free (JSON) NoSQL, Search
(Key, value) In-memory, NoSQL
27. Amazon ElastiCache Amazon
DynamoDB
Amazon
RDS/Aurora
Amazon
ES
Amazon S3 Amazon Glacier
Average
latency
ms ms ms, sec ms,sec ms,sec,min
(~ size)
hrs
Typical
data stored
GB GB–TBs
(no limit)
GB–TB
(64 TB max)
GB–TB MB–PB
(no limit)
GB–PB
(no limit)
Typical
item size
B-KB KB
(400 KB max)
KB
(64 KB max)
B-KB
(2 GB max)
KB-TB
(5 TB max)
GB
(40 TB max)
Request
Rate
High – very high Very high
(no limit)
High High Low – high
(no limit)
Very low
Storage cost
GB/month
$$ ¢¢ ¢¢ ¢¢ ¢ ¢4/10
Durability Low - moderate Very high Very high High Very high Very high
Availability High
2 AZ
Very high
3 AZ
Very high
3 AZ
High
2 AZ
Very high
3 AZ
Very high
3 AZ
Hot data Warm data Cold data
Which Data Store Should I Use?
28. Cost-Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project. The design calls for
many small files, perhaps up to a billion during peak. The
total size would be on the order of 1.5 TB per month…”
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per month
300 2048 1483 777,600,000
34. Which Stream & Message Processing Technology Should I Use?
Amazon
EMR (Spark
Streaming)
Apache
Storm
KCL Application Amazon Kinesis
Analytics
AWS
Lambda
Amazon SQS
Application
AWS
managed
Yes (Amazon
EMR)
No (Do it
yourself)
No ( EC2 + Auto
Scaling)
Yes Yes No (EC2 + Auto
Scaling)
Serverless No No No Yes Yes No
Scale /
throughput
No limits /
~ nodes
No limits /
~ nodes
No limits /
~ nodes
Up to 8 KPU /
automatic
No limits /
automatic
No limits /
~ nodes
Availability Single AZ Configurable Multi-AZ Multi-AZ Multi-AZ Multi-AZ
Programming
languages
Java,
Python,
Scala
Almost any
language via
Thrift
Java, others via
MultiLangDaemon
ANSI SQL with
extensions
Node.js,
Java,
Python
AWS SDK
languages (Java,
.NET, Python, …)
Uses Multistage
processing
Multistage
processing
Single stage
processing
Multistage
processing
Simple
event-based
triggers
Simple event
based triggers
Reliability KCL and
Spark
checkpoints
Framework
managed
Managed by KCL Managed by
Amazon Kinesis
Analytics
Managed by
AWS
Lambda
Managed by SQS
Visibility Timeout
35. Which Analysis Tool Should I Use?
Amazon Redshift Amazon Athena Amazon EMR
Presto Spark Hive
Use case Optimized for data
warehousing
Ad-hoc Interactive
Queries
Interactive
Query
General purpose
(iterative ML, RT, ..)
Batch
Scale/throughput ~Nodes Automatic / No limits ~ Nodes
AWS Managed
Service
Yes Yes, Serverless Yes
Storage Local storage Amazon S3 Amazon S3, HDFS
Optimization Columnar storage, data
compression, and zone
maps
CSV, TSV, JSON,
Parquet, ORC, Apache
Web log
Framework dependent
Metadata Amazon Redshift managed Athena Catalog
Manager
Hive Meta-store
BI tools supports Yes (JDBC/ODBC) Yes (JDBC) Yes (JDBC/ODBC & Custom)
Access controls Users, groups, and access
controls
AWS IAM Integration with LDAP
UDF support Yes (Scalar) No Yes
Slow
36. What About ETL?
https://aws.amazon.com/big-data/partner-solutions/
ETLSTORE PROCESS / ANALYZE
Data Integration Partners
Reduce the effort to move, cleanse, synchronize,
manage, and automatize data related processes. AWS Glue
AWS Glue is a fully managed ETL service that makes
it easy to understand your data sources, prepare the
data, and move it reliably between data stores
39. STORE CONSUMEPROCESS / ANALYZE
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Applications & API
Analysis and visualization
Notebooks
IDE
Business
users
Data scientist,
developers
COLLECT ETL
49. Summary
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage AWS managed services
• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns
• Immutable log, batch, interactive & real-time views
Be cost-conscious
• Big data ≠ big cost