Learn about architecture best practices for combining AWS storage and database technologies. We outline AWS storage options (Amazon EBS, Amazon EC2 Instance Storage, Amazon S3 and Amazon Glacier) along with AWS database options including Amazon ElastiCache (in-memory data store), Amazon RDS (SQL database), Amazon DynamoDB (NoSQL database), Amazon CloudSearch (search), Amazon EMR (hadoop) and Amazon Redshift (data warehouse). Then we discuss how to architect your database tier by using the right database and storage technologies to achieve the required functionality, performance, availability, and durability—at the right cost.
AWS에서는 Big Data 분석 및 처리를 위해 다양한 Analytics 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터 분석 및 처리를 위해 데이터 레이크 카탈로그를 구축하거나 ETL을 위해 사용되는 AWS Glue 내부 구조를 살펴보고 효율적으로 사용할 수 있는 방법들을 소개합니다.
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
In this session, we discuss architectural principles that helps simplify big data analytics.
We'll apply principles to various stages of big data processing: collect, store, process, analyze, and visualize. We'll disucss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on.
Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
AWS Cost Management Workshop at the San Francisco Loft
AWS offers a number of products that allow you to access, organize, understand, optimize, and control your AWS costs and usage. This workshop will help you get started using AWS Cost Explorer to visualize your usage patterns and identify your underlying cost drivers. From there, you can take action on your insights by learning how to set custom cost and usage budgets and receive alerts via email or Amazon SNS topic using AWS Budgets.
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...Amazon Web Services Korea
빅데이터 분석을 위해 온프레미스 환경에서 대규모 하둡 클러스터를 운영하고 있는 고객은 매우 많습니다. 하지만 고객은 최근 관리 및 운영, 비용 등 다양한 어려움을 겪고 있으며, 이를 극복하기 위한 클라우드 전환을 적극적으로 검토하고 있습니다. 온프레미스 하둡을 클라우드 기반으로 마이그레이션 하기 위해 세워야 할 전략과 고려사항, 최적화를 위한 다양한 기법과 비용/성능 최적의 클러스터 구성 방안, 더 나아가서 TCO를 최적화하기 위한 구체적인 방안을 본 세션을 통해 소개드립니다.
Amazon S3 hosts trillions of objects and is used for storing a wide range of data, from system backups to digital media. This presentation from the Amazon S3 Masterclass webinar we explain the features of Amazon S3 from static website hosting, through server side encryption to Amazon Glacier integration. This webinar will dive deep into the feature sets of Amazon S3 to give a rounded overview of its capabilities, looking at common use cases, APIs and best practice.
See a recording of this video here on YouTube: http://youtu.be/VC0k-noNwOU
Check out future webinars in the Masterclass series here: http://aws.amazon.com/campaigns/emea/masterclass/
View the Journey Through the Cloud webinar series here: http://aws.amazon.com/campaigns/emea/journey/
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
AWS에서는 Big Data 분석 및 처리를 위해 다양한 Analytics 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터 분석 및 처리를 위해 데이터 레이크 카탈로그를 구축하거나 ETL을 위해 사용되는 AWS Glue 내부 구조를 살펴보고 효율적으로 사용할 수 있는 방법들을 소개합니다.
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
In this session, we discuss architectural principles that helps simplify big data analytics.
We'll apply principles to various stages of big data processing: collect, store, process, analyze, and visualize. We'll disucss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on.
Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
AWS Cost Management Workshop at the San Francisco Loft
AWS offers a number of products that allow you to access, organize, understand, optimize, and control your AWS costs and usage. This workshop will help you get started using AWS Cost Explorer to visualize your usage patterns and identify your underlying cost drivers. From there, you can take action on your insights by learning how to set custom cost and usage budgets and receive alerts via email or Amazon SNS topic using AWS Budgets.
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...Amazon Web Services Korea
빅데이터 분석을 위해 온프레미스 환경에서 대규모 하둡 클러스터를 운영하고 있는 고객은 매우 많습니다. 하지만 고객은 최근 관리 및 운영, 비용 등 다양한 어려움을 겪고 있으며, 이를 극복하기 위한 클라우드 전환을 적극적으로 검토하고 있습니다. 온프레미스 하둡을 클라우드 기반으로 마이그레이션 하기 위해 세워야 할 전략과 고려사항, 최적화를 위한 다양한 기법과 비용/성능 최적의 클러스터 구성 방안, 더 나아가서 TCO를 최적화하기 위한 구체적인 방안을 본 세션을 통해 소개드립니다.
Amazon S3 hosts trillions of objects and is used for storing a wide range of data, from system backups to digital media. This presentation from the Amazon S3 Masterclass webinar we explain the features of Amazon S3 from static website hosting, through server side encryption to Amazon Glacier integration. This webinar will dive deep into the feature sets of Amazon S3 to give a rounded overview of its capabilities, looking at common use cases, APIs and best practice.
See a recording of this video here on YouTube: http://youtu.be/VC0k-noNwOU
Check out future webinars in the Masterclass series here: http://aws.amazon.com/campaigns/emea/masterclass/
View the Journey Through the Cloud webinar series here: http://aws.amazon.com/campaigns/emea/journey/
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019Amazon Web Services Korea
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용
김태현 솔루션즈 아키텍트, AWS
AWS에서는 Big Data 분석 및 처리를 위해 분석 목적에 맞는 다양한 Big Data Framework 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터의 분석 및 처리를 위해 사용되는 AWS Glue와 Amazon EMR 같은 AWS Big Data Framework의 내부구조를 살펴보고 머신러닝을 포함한 다양한 분석 및 ETL을 위해 효율적으로 사용할 수 있는 방법들을 소개합니다.
Need to start querying data instantly? Amazon Athena an interactive query service that makes it easy to interactive queries on data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately.
In this presentation, we will show you how Amazon Athena makes it easy it is to query your data stored in S3
- 동영상 보기: https://www.youtube.com/watch?v=Rq4I57eqIp4
Amazon RDS 프록시는 Amazon Relational Database Service (RDS)를 위한 완전 관리형 고가용성 데이터베이스 프록시로, 애플리케이션의 확장 성, 데이터베이스 장애에 대한 탄력성 및 보안 성을 향상시킬 수 있습니다. (2020년 6월 서울 리전 출시)
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you to focus on your applications and business.
Redis is an open source, in-memory data store that delivers sub-millisecond response times enabling millions of requests per second to power real-time applications. It can be used as a fast database, cache, message broker, and queue. Amazon ElastiCache delivers the ease-of-use and power of Redis along with the availability, reliability, scalability, security, and performance suitable for the most demanding applications. We’ll take a close look at Redis and how to use it to power different use cases.
Speaker: Samir Karande - Sr. Manager, Solutions Architecture, AWS
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services
Organizations need to gain insight and knowledge from a growing number of IoT, APIs, clickstreams, and unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, we cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL pipelines for your data lake. We also discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Please join us for a speaker meet-and-greet following this session at the Speaker Lounge (ARIA East, Level 1, Willow Lounge). The meet-and-greet starts 15 minutes after the session and runs for half an hour.
In the event of a disaster, you need to be able to recover lost data quickly to ensure business continuity. For critical applications, keeping your time to recover and data loss to a minimum and optimizing your overall capital expense can be challenging. This session presents AWS features and services along with disaster recovery architectures that you can leverage when building highly available and disaster-resilient strategies.
Behind the Scenes: Exploring the AWS Global Network (NET305) - AWS re:Invent ...Amazon Web Services
The AWS Global Network provides a secure, highly available, and high- performance infrastructure for customers. In this session, we walk through the architecture of various parts of the AWS network such as Availability Zones, AWS Regions, our Global Network connecting AWS Regions to each other and our Edge Network which provides Internet connectivity. We explain how AWS services such as AWS Direct Connect and Amazon CloudFront integrate with our Global Network to provide the best experience for our customers. We also dive into how the AWS Global Network connects to the rest of the Internet through peering at a global scale. If you are curious about how AWS network infrastructure can support large-scale cat photo distribution or how Internet routing works, this session answers those questions. Please join us for a speaker meet-and-greet following this session at the Speaker Lounge (ARIA East, Level 1, Willow Lounge). The meet-and-greet starts 15 minutes after the session and runs for half an hour.
This session is designed to introduce you to fundamental cloud computing and AWS security concepts that will help you prepare for the Security Week sessions, demos, and workshops. We will also provide an overview of the Security pillar of the AWS Cloud Adoption Framework (CAF) and talk about how AWS keeps humans away from data—and how you can, too.
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
많은 고객들은 기존 방식의 분석에서 확장하여 데이터에서 최대한 가치를 얻고 그에 기반한 의사 결정을 하기를 원하고 있습니다. 본 웨비나에서는 데이터 분석의 근간이 되는 데이터 레이크와 고객들이 안전하고 확장 가능한 데이터 분석을 쉽게 할 수 있게 해주는 AWS의 서비스 포트폴리오에 대해서 알아보도록 하겠습니다.
대상 :
빅 데이터 및 데이터 분석 담당자, AWS 기반 데이터 분석에 관심 있는 모든 분
발표자 :
이종화 솔루션즈 아키텍트, AWS
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you to focus on your applications and business.
In this webinar, you'll learn about the foundational security blocks and how to start using them effectively to create robust and secure architectures. Discover how Identity and Access management is done and how it integrates with other AWS services. In addition, learn how to improve governance by using AWS Security Hub, AWS Config and CloudTrail to gain unprecedented visibility of activity in the account. Subsequently use AWS Config rules to rectify configuration issues quickly and effectively.
AWS Glue는 고객이 분석을 위해 손쉽게 데이터를 준비하고 로드할 수 있게 지원하는 완전관리형 ETL(추출, 변환 및 로드) 서비스입니다. AWS 관리 콘솔에서 클릭 몇 번으로 ETL 작업을 생성하고 실행할 수 있습니다. 빅데이터 분석 시 다양한 데이터 소스에 대한 전처리 작업을 할 때, 별도의 데이터 처리용 서버나 인프라를 관리할 필요가 없습니다. 본 세션에서는 지난 5월 서울 리전에 출시한 Glue 서비스에 대한 자세한 소개와 함께 다양한 활용 팁을 데모와 함께 소개해 드립니다.
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Amazon Web Services Korea
발표영상 다시보기: https://kr-resources.awscloud.com/data-databases-and-analytics/aurora-mysql-backtrack%EC%9D%84-%EC%9D%B4%EC%9A%A9%ED%95%9C-%EB%B9%A0%EB%A5%B8-%EB%B3%B5%EA%B5%AC-%EB%B0%A9%EB%B2%95-%EC%A7%84%EA%B5%90%EC%84%A0-aws-database-modernization-day-%EC%98%A8%EB%9D%BC%EC%9D%B8-2
Aurora MySQL은 기존 MySQL의 운영에 추가한 많은 기능들을 제공해 드리고 있습니다. 이 중 복구에 관련된 기능인 Aurora MySQL PITR과 Backtrack에 대한 소개를 드리고자 합니다. 두 기능을 통해 운영 중 일어날 수 있는 rollback 상황에서, 어떠한 방식으로 복구를 할 수 있는지 실습해보실 수 있습니다.
Amazon Kinesis Data Analytics는 실시간으로 스트리밍 데이터를 처리하고 분석할 수 있는 서버리스 서비스입니다. Kinesis Data Analytics를 사용하면 로그 분석, 클릭스트림 분석, 사물 인터넷(IoT), 광고 기술, 게임 등의 대규모의 스트림을 처리할 수 있는 애플리케이션을 신속하고 유연하게 구축할 수 있으며 유지관리의 어려움에서 벗어날 수 있습니다. 이 세션에서는 Kinesis Data Analytics의 동작과 기능, 운영상의 모범 사례에 대한 설명을 바탕으로 Streaming Application 개발, Studio Notebook 활용하는 방법을 데모를 통해 알아봅니다.
In part one you will learn about benefits of moving Oracle Database Workloads to AWS, licensing and key aspects to consider. Part two is about understanding how to execute migrations, key success factors, and demonstration.
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019Amazon Web Services Korea
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용
김태현 솔루션즈 아키텍트, AWS
AWS에서는 Big Data 분석 및 처리를 위해 분석 목적에 맞는 다양한 Big Data Framework 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터의 분석 및 처리를 위해 사용되는 AWS Glue와 Amazon EMR 같은 AWS Big Data Framework의 내부구조를 살펴보고 머신러닝을 포함한 다양한 분석 및 ETL을 위해 효율적으로 사용할 수 있는 방법들을 소개합니다.
Need to start querying data instantly? Amazon Athena an interactive query service that makes it easy to interactive queries on data in Amazon S3, using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately.
In this presentation, we will show you how Amazon Athena makes it easy it is to query your data stored in S3
- 동영상 보기: https://www.youtube.com/watch?v=Rq4I57eqIp4
Amazon RDS 프록시는 Amazon Relational Database Service (RDS)를 위한 완전 관리형 고가용성 데이터베이스 프록시로, 애플리케이션의 확장 성, 데이터베이스 장애에 대한 탄력성 및 보안 성을 향상시킬 수 있습니다. (2020년 6월 서울 리전 출시)
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you to focus on your applications and business.
Redis is an open source, in-memory data store that delivers sub-millisecond response times enabling millions of requests per second to power real-time applications. It can be used as a fast database, cache, message broker, and queue. Amazon ElastiCache delivers the ease-of-use and power of Redis along with the availability, reliability, scalability, security, and performance suitable for the most demanding applications. We’ll take a close look at Redis and how to use it to power different use cases.
Speaker: Samir Karande - Sr. Manager, Solutions Architecture, AWS
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services
Organizations need to gain insight and knowledge from a growing number of IoT, APIs, clickstreams, and unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, we cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL pipelines for your data lake. We also discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Please join us for a speaker meet-and-greet following this session at the Speaker Lounge (ARIA East, Level 1, Willow Lounge). The meet-and-greet starts 15 minutes after the session and runs for half an hour.
In the event of a disaster, you need to be able to recover lost data quickly to ensure business continuity. For critical applications, keeping your time to recover and data loss to a minimum and optimizing your overall capital expense can be challenging. This session presents AWS features and services along with disaster recovery architectures that you can leverage when building highly available and disaster-resilient strategies.
Behind the Scenes: Exploring the AWS Global Network (NET305) - AWS re:Invent ...Amazon Web Services
The AWS Global Network provides a secure, highly available, and high- performance infrastructure for customers. In this session, we walk through the architecture of various parts of the AWS network such as Availability Zones, AWS Regions, our Global Network connecting AWS Regions to each other and our Edge Network which provides Internet connectivity. We explain how AWS services such as AWS Direct Connect and Amazon CloudFront integrate with our Global Network to provide the best experience for our customers. We also dive into how the AWS Global Network connects to the rest of the Internet through peering at a global scale. If you are curious about how AWS network infrastructure can support large-scale cat photo distribution or how Internet routing works, this session answers those questions. Please join us for a speaker meet-and-greet following this session at the Speaker Lounge (ARIA East, Level 1, Willow Lounge). The meet-and-greet starts 15 minutes after the session and runs for half an hour.
This session is designed to introduce you to fundamental cloud computing and AWS security concepts that will help you prepare for the Security Week sessions, demos, and workshops. We will also provide an overview of the Security pillar of the AWS Cloud Adoption Framework (CAF) and talk about how AWS keeps humans away from data—and how you can, too.
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
많은 고객들은 기존 방식의 분석에서 확장하여 데이터에서 최대한 가치를 얻고 그에 기반한 의사 결정을 하기를 원하고 있습니다. 본 웨비나에서는 데이터 분석의 근간이 되는 데이터 레이크와 고객들이 안전하고 확장 가능한 데이터 분석을 쉽게 할 수 있게 해주는 AWS의 서비스 포트폴리오에 대해서 알아보도록 하겠습니다.
대상 :
빅 데이터 및 데이터 분석 담당자, AWS 기반 데이터 분석에 관심 있는 모든 분
발표자 :
이종화 솔루션즈 아키텍트, AWS
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you to focus on your applications and business.
In this webinar, you'll learn about the foundational security blocks and how to start using them effectively to create robust and secure architectures. Discover how Identity and Access management is done and how it integrates with other AWS services. In addition, learn how to improve governance by using AWS Security Hub, AWS Config and CloudTrail to gain unprecedented visibility of activity in the account. Subsequently use AWS Config rules to rectify configuration issues quickly and effectively.
AWS Glue는 고객이 분석을 위해 손쉽게 데이터를 준비하고 로드할 수 있게 지원하는 완전관리형 ETL(추출, 변환 및 로드) 서비스입니다. AWS 관리 콘솔에서 클릭 몇 번으로 ETL 작업을 생성하고 실행할 수 있습니다. 빅데이터 분석 시 다양한 데이터 소스에 대한 전처리 작업을 할 때, 별도의 데이터 처리용 서버나 인프라를 관리할 필요가 없습니다. 본 세션에서는 지난 5월 서울 리전에 출시한 Glue 서비스에 대한 자세한 소개와 함께 다양한 활용 팁을 데모와 함께 소개해 드립니다.
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Amazon Web Services Korea
발표영상 다시보기: https://kr-resources.awscloud.com/data-databases-and-analytics/aurora-mysql-backtrack%EC%9D%84-%EC%9D%B4%EC%9A%A9%ED%95%9C-%EB%B9%A0%EB%A5%B8-%EB%B3%B5%EA%B5%AC-%EB%B0%A9%EB%B2%95-%EC%A7%84%EA%B5%90%EC%84%A0-aws-database-modernization-day-%EC%98%A8%EB%9D%BC%EC%9D%B8-2
Aurora MySQL은 기존 MySQL의 운영에 추가한 많은 기능들을 제공해 드리고 있습니다. 이 중 복구에 관련된 기능인 Aurora MySQL PITR과 Backtrack에 대한 소개를 드리고자 합니다. 두 기능을 통해 운영 중 일어날 수 있는 rollback 상황에서, 어떠한 방식으로 복구를 할 수 있는지 실습해보실 수 있습니다.
Amazon Kinesis Data Analytics는 실시간으로 스트리밍 데이터를 처리하고 분석할 수 있는 서버리스 서비스입니다. Kinesis Data Analytics를 사용하면 로그 분석, 클릭스트림 분석, 사물 인터넷(IoT), 광고 기술, 게임 등의 대규모의 스트림을 처리할 수 있는 애플리케이션을 신속하고 유연하게 구축할 수 있으며 유지관리의 어려움에서 벗어날 수 있습니다. 이 세션에서는 Kinesis Data Analytics의 동작과 기능, 운영상의 모범 사례에 대한 설명을 바탕으로 Streaming Application 개발, Studio Notebook 활용하는 방법을 데모를 통해 알아봅니다.
In part one you will learn about benefits of moving Oracle Database Workloads to AWS, licensing and key aspects to consider. Part two is about understanding how to execute migrations, key success factors, and demonstration.
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...Amazon Web Services
The world is producing an ever-increasing volume, velocity, and variety of data. For many consumers, batch analytics is no longer enough; they need sub-second analysis on fast-moving data. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how?
If you missed this popular presentation at re:Invent, attend this webinar where we simplify big data processing as a pipeline comprising various stages: ingest, store, process, analyze & visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, and durability. Finally, we provide a reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems.
Learning Objectives:
Understand key AWS Big Data services including S3, Amazon EMR, Kinesis, and Redshift
Learn architectural patterns for Big Data
Hear best practices for building Big Data applications on AWS
Didn’t make it to re:Invent? Here’s another chance to attend this popular presentation
Who Should Attend:
Architects, developers and data scientists who are looking to start a Big Data initiative
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivAmazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Presented by: Arie Leeuwesteijn, Principal Solutions Architect, Amazon Web Services
Customer Guest: Sander Kieft, Sanoma
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.
講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSAmazon Web Services
With an ever-increasing set of technologies to process big data, organizations often struggle to understand how to build scalable and cost-effective big data applications.
In this webinar, we will simplify big data processing as a pipeline comprising various stages; and then show you how to choose the right technology for each stage based on criteria such as data structure, design patterns, and best practices.
Learning Objectives:
Understand key AWS Big Data services including S3, Amazon EMR, Kinesis, and Redshift
Learn architectural patterns for Big Data
Hear best practices for building Big Data applications on AWS
Who Should Attend:
Architects, developers and data scientists who are looking to start a Big Data initiative
Antoine Genereux takes us on a detailed overview of the Database solutions available on the AWS Cloud, addressing the needs and requirements of customers at all levels. He also discusses Business Intelligence and Analytics solutions.
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
In today’s session we will share with you an overview of what the typical challenges when adoption Big Data are, and how the AWS Big Data platform allows you to tackle this challenges and leverage the right Analytical/Big Data solutions in order to become successful with your strategy (Whiteboard presentation)
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
Analyzing large data sets requires significant compute and storage capacity that can vary in size based on the amount of input data and the analysis required. This characteristic of big data workloads is ideally suited to the pay-as-you-go cloud model, where applications can easily scale up and down based on demand. Learn how Amazon S3 can help scale your big data platform. Hear from Redfin and Twitter about how they build their big data platforms on AWS and how they use S3 as an integral piece of their big data platforms.
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)Amazon Web Services
This mid-level technical session will provide an overview of the techniques that you can use to build high-scalabilty applications on AWS. Take a journey from 1 user to 10 million users and understand how your application's architecture can evolve and which AWS services can help as you increase the number of users that you serve.
Data is gravity. Your workloads and processing is dependent on where your data is and how it is stored. With AWS, you have a host of storage options and the key to successfully leverage them is to know when to use which option. This session will explain in details about each of the AWS Storage offerings along with data ingestion optins into the Cloud using Snowball and Snowmobile
Marc Trimuschat,
Head - Business Developement, AWS Storage, AWS APAC
Similar to AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013 (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
2. The Third Platform
• Built on:
–
–
–
–
Mobile devices
Cloud services
Social technologies
Big data
• Billions of users
• Millions of apps
3. Data Volume, Velocity, Variety
• 2.7 zettabytes (ZB) of data
exists in the digital universe
today
– 1 ZB = 1 billion terabytes
• 450 billion transaction per day
by 2020
• More unstructured data than
structured data
4. Common Questions from Database Developers
Cloud Migration
• How do I move (my data) to the
cloud?
Data/Storage Technologies
• What data store should I use?
– SQL or NoSQL?
– Hadoop or DW?
– What about search?
Management Concerns
• Is my data (in the cloud) secure?
• Relational features w/o management
nightmares?
• My data volume, velocity, and variety
are exploding!
• How can I reduce cost?
Performance and Delivery
• Need low latency (ms or µs)
• Need high throughput
• Need to ship in days – not years!
6. Cloud Data Tier Architecture – Use the Right Tool for the Job!
Client Tier
App/Web Tier
Data Tier
Search
Cache
Blob Store
ETL
NoSQL
SQL
Data
Warehouse
Hadoop
10. AWS Primitive Compute and Storage
Compute Capabilities
• Many different EC2 instance
types
–
–
–
–
General purpose
Compute optimized
Storage optimized
Memory optimized
• Host any major data storage
technology
Raw Storage Options
• EC2 Instance store (ephemeral)
• Amazon Elastic Block Store (EBS)
– Standard volume
• 1 TB, ~100 IOPS per volume
– Provisioned IOPS volume
• 1 TB, up to 4000 IOPS per volume
– Stripe multiple volumes for higher
IOPS or storage
– RDBMS
– NoSQL
– Cache
Primitives add flexibility, but also come with operational burden!
11. AWS Data Tier Architecture - Us the right tool for the job!
Data Tier
Amazon
ElastiCache
Amazon
CloudSearch
Amazon
Elastic MapReduce
Amazon S3
Amazon
Glacier
Amazon DynamoDB
Amazon RDS
Amazon Redshift
AWS Data Pipeline
20. Data Characteristics: Hot, Warm, Cold
Hot
Warm
Cold
Volume
Item size
Latency
Durability
MB–GB
B–KB
ms
Low–High
GB–TB
KB–MB
ms, sec
High
PB
KB–TB
min, hrs
Very High
Request rate
Cost/GB
Very High
$$-$
High
$-¢¢
Low
¢
22. What data store should I use?
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Cloud
Search
Amazon
Redshift
Amazon
EMR (Hive)
Amazon S3
Amazon
Glacier
Average
latency
ms
ms
ms,sec
ms,sec
sec,min
sec,min,
hrs
ms,sec,min hrs
(~ size)
Data volume
GB
GB–TBs
(no limit)
GB–TB
GB–TB
(3 TB Max)
Item size
B-KB
KB
KB
(64 KB max) (~rowsize)
TB–PB
GB–PB GB–PB
(1.6 PB max) (~nodes) (no limit)
GB–PB
(no limit)
KB
(1 MB
max)
KB
(64 K max)
KB-MB
KB-GB
(5 TB max)
GB
(40 TB
max)
Request rate Very High Very High
High
High
Low
Low
Low–
Very High
(no limit)
Very Low
(no limit)
Storage cost $$
$/GB/month
¢¢
$
¢
¢
¢
¢
High
High
High
High
Very High
Very High
Durability
¢¢
Low Very High
Moderate
Hot Data
Warm Data
Cold Data
23. AWS Data Tier Architecture - Use the right tool for the job!
Data Tier
Amazon
ElastiCache
Amazon
CloudSearch
Amazon
Elastic MapReduce
Amazon S3
Amazon
Glacier
Amazon DynamoDB
Amazon RDS
Amazon Redshift
AWS Data Pipeline
25. Cost Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project that will greatly increase
my team’s use of Amazon S3. Hoping you could answer
some questions. The current iteration of the design calls for
many small files, perhaps up to a billion during peak. The
total size would be on the order of 1.5 TB per month…”
Request rate Object size Total size Objects per month
(Writes/sec) (Bytes)
(GB/month)
300
2048
1483
777,600,000
30. Amazon RDS
When to use
When not to use
•
•
•
•
Transactions
Complex queries
Medium to high query/write rate
– Up to 30 K IOPS (15 K reads + 15
K writes)
•
•
•
100s of GB to low TBs
Workload can fit in a single node
High durability
Massive read/write rates
– Example: 150 K write requests per
second
•
Data size or throughput demands
sharding
– Example: 10 s or 100 s of terabytes
•
•
Simple Get/Put and queries that a
NoSQL can handle
Complex analytics
Push-Button Scaling
Multi-AZ
AZ 1
AZ 2
Region
Read Replicas
31. Amazon RDS Best Practices
• Use the right DB instance class
• Use EBS-optimized instances
– db.m1.large, db.m1.xlarge, db.m2.2xlarge, db.m2.4xlarge,
db.cr1.8xlarge
• Use provisioned IOPS
• Use multi-AZ for high availability
• Use read replicas for
– Scaling reads
– Schema changes
– Additional failure recovery
32. Amazon DynamoDB
When to use
•
•
•
•
•
•
•
Fast and predictable performance
Seamless/massive scale
Autosharding
Consistent/low latency
No size or throughput limits
Very high durability
Key-value or simple queries
When not to use
•
•
•
•
Need multi-item/row or cross table
transactions
Need complex queries, joins
Need real-time analytics on
historic data
Storing cold data
33. Amazon DynamoDB Best Practices
• Keep item size small
• Store metadata in Amazon DynamoDB and
large blobs in Amazon S3
• Use a table with a hash key for extremely
high scale
• Use table per day, week, month etc. for
storing time series data
• Use conditional/OCC updates
• Use hash-range key to model
– 1:N relationships
– Multi-tenancy
• Avoid hot keys and hot partitions
Events_table_2012
Event_id
(Hash key)
Timestam
p
(range key)
Attribute1
….
Attribute N
Events_table_2012_05_week1
Events_table_2012_05_week2
Attribute1
…. Attribute N
Event_id
Timestam
(Hash key)
p Timestam
Attribute1
…. Attribute N
Event_id
(range key)
(Hash key)
p
Events_table_2012_05_week3
(range key)
Attribute1
…. Attribute N
Event_id
Timestam
(Hash key)
p
(range key)
34. Amazon ElastiCache (Memcached)
When to use
When not to use
•
•
•
•
•
•
Transient key-value store
Need to speed up reads/write
Caching frequent SQL, NoSQL or
DW query results
Saving transient and frequently
updated data
–
–
•
Increment/decrement game
scores/counters
Web application session storage
Best effort deduplication
Store infrequently used data
Need persistence
35. Amazon ElastiCache (Memcached) Best Practices
•
•
•
•
•
Use autodiscovery
Share memcached client objects in application
Use TTLs
Consider memory for connections overhead
Use Amzon CloudWatch alarms / SNS alerts
•
•
•
Number of connections
Swap memory usage
Freeable memory
36. Amazon ElastiCache (Redis)
When to use
When not to use
•
•
•
•
•
Key-value store with advanced
data structures
– Strings, lists, sets, sorted sets,
hashes
•
•
•
•
•
•
Caching
Leader boards
High-speed sorting
Atomic counters
Queuing systems
Activity streams
Need “native” sharding or scale-out
Need “hard” persistence
Data won’t fit in memory
Need transaction rollback even
under exceptions
37. Amazon ElastiCache (Redis) Best Practices
•
•
Use TTL
Use the right instance types
•
•
Use read replicas
•
•
•
•
•
Instances with high ECU/vCPU and network performance
yield the highest throughput. Example: m2.4xlarge, m2.2xlarge
Increase read throughput
AOF cannot protect against all failure modes
Promote read replicas to primary
Use RDB file snapshot for on-premises to Amazon ElastiCache migration
Key parameter group settings
•
•
•
Avoid “AOF with fsync always” – huge impact on performance
AOF (+ RDB) with fsync everysec – best durability + performance
Pub-sub: set client-output-buffer-limit-pubsub-hard-limit and client-output-buffer-limit-pubsub-soft-limit
based on the workloads
38. Amazon CloudSearch
When to use
When not to use
•
•
•
•
•
•
•
No search expertise
Full-text search
Ranking
Relevance
Structured and unstructured data
Faceting
– $0 to $10 (4 items)
– $10 and above (3 items)
Not as replacement for a database
–
Not as a system of record
– Transient data
– Nonatomic updates
39. Amazon CloudSearch Best Practices
• Batch documents for uploading
• Use Amazon CloudSearch for searching and another
store for retrieving full records for the UI (i.e. don’t use
return fields)
• Include other data like popularity scores in documents
• Use stop words to remove common terms
• Use fielded queries to reduce match sets
• Query latency is proportional to query specificity
40. Amazon Redshift
When to use
When not to use
•
•
•
•
•
•
•
•
•
Information analysis and reporting
Complex DW queries that
summarize historical data
Batched large updates e.g. daily
sales totals
10s of concurrent queries
100s GB to PB
Compression
Column based
Very high durability
OLTP workloads
– 1000s of concurrent users
– Large number of singleton
updates
41. Amazon Redshift Best Practices
• Use COPY command to load large data sets from Amazon
S3, Amazon DynamoDB, Amazon EMR/EC2/Unix/Linux hosts
– Split your data into multiple files
– Use GZIP or LZOP compression
– Use manifest file
• Choose proper sort key
– Range or equality on WHERE clause
• Choose proper distribution key
– Join column, foreign key or largest dimension, group by column
– Avoid distribution key for denormalized data
42. Amazon Elastic MapReduce
When to use
When not to use
•
•
Batch analytics/processing
–
•
•
•
•
•
Answers in minutes or hours
Structured and unstructured data
•
Parallel scans of the entire dataset
with uniform query performance
Supports Hive QL + other languages
GB, TB, or PB of data
Replicated data store (HDFS) for
ad-hoc and real-time queries
(HBase)
Real-time analytics (DW)
– Need answers in seconds
1000s of concurrent users
43. Amazon Elastic MapReduce Best Practices
• Choose between transient and persistent
clusters for best TCO
• Leverage Amazon S3 integration for
highly durable and interim storage
• Right-size cluster instances based on
each job – not one size fits all
• Leverage resizing and spot to add and
remove capacity cost-effectively
• Tuning cluster instances can be easier
than tuning Hadoop code
Job Flow
Duration:
14 Hours
Job Flow
Duration:
7 Hours
44. AWS Data Pipeline
When to use
•
•
Automate movement and transformation
of data (ETL in the cloud)
Dependency management
–
–
•
•
•
Schedule management
Transient Amazon EMR clusters
Regular data move pattern
–
–
•
Data
Control
Every hour, day
Every 30 minutes
Amazon DynamoDB backups
–
Cross region
When not to use
•
•
•
Less that 15 minutes scheduling
interval
Execution latency less than a minute
Event-based scheduling
45. AWS Data Pipeline Best Practices
•
•
•
•
Use dependency rather than time based
Make your activities idempotent
Add in your tools using shell activity
Use Amazon S3 for staging
46. Amazon S3
When to use
When not to use
•
•
•
•
•
•
•
•
•
Store large objects
Key-value store - Get/Put/List
Unlimited storage
Versioning
Very high durability
– 99.999999999%
•
•
Very high throughput (via parallel
clients)
Use for storing persistent data
– Backups
– Source/target for EMR
– Blob store with metadata in SQL
or NoSQL
•
Complex queries
Very low latency (ms)
Search
Read-after-write consistency for
overwrites
Need transactions
47. Amazon S3 Best Practices
•
•
•
•
Use random hash prefix for keys
Ensure a random access pattern
Use Amazon CloudFront for high throughput GETs and PUTs
Leverage the high durability, high throughput design of Amazon S3
for backup and as a common storage sink
•
•
•
•
•
Durable sink between data services
Supports de-coupling and asynchronous delivery
Consider RRS for lower cost, lower durability storage of derivatives or copies
Consider parallel threads and multipart upload for faster writes
Consider parallel threads and range get for faster reads
48. Amazon Glacier
When to use
When not to use
•
•
•
•
•
•
•
•
Infrequently accessed data sets
Very low cost storage
Data retrieval times of several
hours is acceptable
Encryption at rest
Very high durability
– 99.999999999%
Unlimited amount of storage
Frequent access
Low latency access
49. Amazon Glacier Best Practices
• Reduce request and storage costs with aggregation
•
•
•
Aggregating your files into bigger files before sending them to Amazon Glacier
Store checksums along with your files
Use a format that allows you to access files within your aggregate archive
• Improve speed and reliability with multipart upload
• Reduce costs with ranged retrievals
• Maintaining your own index in a highly durable store
50. Amazon EC2 + Amazon EBS/Instance
Storage
When to use
When not to use
•
•
•
•
Alternate data store technologies
Hand-tuned performance needs
Direct/admin access required
•
When a managed service will do
the job
When operational experience is
low
51. Amazon EBS Best Practices
•
Pick the right EC2 instance type
•
•
•
•
•
Use provisioned IOPS volumes for database workloads requiring
consistent IOPS
Use standard volumes for workloads requiring low to moderate IOPS
& occasional bursts
Stripe multiple Amazon EBS volumes for higher IOPS or storage
•
•
•
Higher “network performance” instances for driving more Amazon EBS IOPS
EBS-Optimized EC2 instances for dedicated throughput between EC2 & Amazon EBS
RAID0 for higher I/O
RAID10 for highest local durability
Amazon EBS snapshots
•
Quiesce the file system and take a snapshot
52. Amazon EC2 Best Practices
HI-Best IOPS/$
HS-Best GB/$
Best vCPU/$
Best MemoryGiB/$
55. AWS Data Tier Architecture - Use the right tool for the job!
Data Tier
Amazon
ElastiCache
Amazon
CloudSearch
Amazon
Elastic MapReduce
Amazon S3
Amazon
Glacier
Amazon DynamoDB
Amazon RDS
Amazon Redshift
AWS Data Pipeline