by Darin Briskman, Technical Evangelist, AWS
DynamoDB queries enable consistent low latency at any workload, using the partition key, sort key, local secondary indexes, and global secondary indexes. Amazon Elasticsearch Service enables flexible search, including ranking and aggregation. Adding Elasticsearch to DynamoDB opens new capabilities to combine the power of query and search. Learn how Amazon.com uses this combination and how you can use it, too. Level: 200
by Kwesi Edwards, Business Development Manager, AWS
Database migration doesn’t need to be difficult or time-consuming. Learn how AWS Database Migration Service provides an easy, secure migration from on-premises and Amazon EC2 environments to Amazon RDS, Amazon Redshift, Amazon DynamoDB and EC2 databases, with minimal-downtime. We’ll also see how the AWS Schema Conversion Tool automatically converts your schema and a majority of the custom code, so you can get up and running in the cloud quickly and inexpensively. We’ll discuss alternative data migration strategies for special use cases. Level 200
AWS Learning Webinar Spot Instances Benefits & Best Practices ExplainedAmazon Web Services
Deep Dive on Amazon EC2 Spot Instances
Amazon EC2 Spot instances allow you to bid on spare Amazon EC2 computing capacity, saving up to 90% over On-Demand EC2 pricing. In this session, we’ll dive deep into how Amazon EC2 Spot instances allow you to run and scale variety of workloads including containerized environments and applications such as stateless web services, image rendering, big data analytics, CI/CD pipelines, and massively parallel computations, for a fraction of the cost. This webinar will help architects, engineers, and developers from organizations of all sizes understand when and how to run your environment on EC2 Spot Instances.
Speaker: Chad Schmutzer, Solution Architect - Spot, Amazon Web Services
Serverless Applications at Global Scale with Multi-Regional Deployments - AWS...Amazon Web Services
Learning Objectives:
- Input and decision points when architecting a serverless multi-regional application
- Active-active Multi-Regional API with API Gateway and Lambda
- Replication with DynamoDB
DAT317_Migrating Databases and Data Warehouses to the CloudAmazon Web Services
In this introductory session, we look at how to convert and migrate your commercial databases and data warehouses to the cloud and gain your database freedom. AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) have been used to migrate tens of thousands of databases. These include Oracle and SQL Server to Amazon Aurora, Teradata and Netezza to Amazon Redshift, MongoDB to Amazon DynamoDB, and many other data source and target combinations. Learn how to easily and securely migrate your data and procedural code, enjoy flexibility and cost savings, and gain new opportunities.
Cloud storage provides education with a reliable, scalable, and secure alternative to on-premises storage systems. AWS offers eight different object, file, and block storage options supporting applications, archiving and compliance options. This webinar will provide an overview to the services and key education use cases ranging from data center replacement to video storage and file sharing. Services covered include Amazon S3, Amazon EFS, Amazon EBS, Amazon Glacier, Amazon Storage Gateway, and the Snowball family.
by Darin Briskman, Technical Evangelist, AWS
SQL is a powerful tool to query data, but it doesn't cover everything you might need. Sometimes, the precision of SQL is a limitation, that can be overcome by using the flexibility and inherent ranking of search. Learn how to use AWS servcies to create fully managed solutions using Amazon Aurora and Amazon Elasticsearch Service to combine the power of query and search. Level: 200
MCL314_Unlocking Media Workflows Using Amazon RekognitionAmazon Web Services
Companies can have large amounts of image and video content in storage with little or no insight about what they have—effectively sitting on an untapped licensing and advertising goldmine. Learn how media companies are using Amazon Rekognition APIs for object or scene detection, facial analysis, facial recognition, or celebrity recognition to automatically generate metadata for images to provide new licensing and advertising revenue opportunities. Understand how to use Amazon Rekognition APIs to index faces into a collection at high scale, filter frames from a video source for processing, perform face matches that populate a person index in ElasticSearch, and use the Amazon Rekognition celebrity match feature to optimize the process for faster time to market and more accurate results.
by Kwesi Edwards, Business Development Manager, AWS
Database migration doesn’t need to be difficult or time-consuming. Learn how AWS Database Migration Service provides an easy, secure migration from on-premises and Amazon EC2 environments to Amazon RDS, Amazon Redshift, Amazon DynamoDB and EC2 databases, with minimal-downtime. We’ll also see how the AWS Schema Conversion Tool automatically converts your schema and a majority of the custom code, so you can get up and running in the cloud quickly and inexpensively. We’ll discuss alternative data migration strategies for special use cases. Level 200
AWS Learning Webinar Spot Instances Benefits & Best Practices ExplainedAmazon Web Services
Deep Dive on Amazon EC2 Spot Instances
Amazon EC2 Spot instances allow you to bid on spare Amazon EC2 computing capacity, saving up to 90% over On-Demand EC2 pricing. In this session, we’ll dive deep into how Amazon EC2 Spot instances allow you to run and scale variety of workloads including containerized environments and applications such as stateless web services, image rendering, big data analytics, CI/CD pipelines, and massively parallel computations, for a fraction of the cost. This webinar will help architects, engineers, and developers from organizations of all sizes understand when and how to run your environment on EC2 Spot Instances.
Speaker: Chad Schmutzer, Solution Architect - Spot, Amazon Web Services
Serverless Applications at Global Scale with Multi-Regional Deployments - AWS...Amazon Web Services
Learning Objectives:
- Input and decision points when architecting a serverless multi-regional application
- Active-active Multi-Regional API with API Gateway and Lambda
- Replication with DynamoDB
DAT317_Migrating Databases and Data Warehouses to the CloudAmazon Web Services
In this introductory session, we look at how to convert and migrate your commercial databases and data warehouses to the cloud and gain your database freedom. AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) have been used to migrate tens of thousands of databases. These include Oracle and SQL Server to Amazon Aurora, Teradata and Netezza to Amazon Redshift, MongoDB to Amazon DynamoDB, and many other data source and target combinations. Learn how to easily and securely migrate your data and procedural code, enjoy flexibility and cost savings, and gain new opportunities.
Cloud storage provides education with a reliable, scalable, and secure alternative to on-premises storage systems. AWS offers eight different object, file, and block storage options supporting applications, archiving and compliance options. This webinar will provide an overview to the services and key education use cases ranging from data center replacement to video storage and file sharing. Services covered include Amazon S3, Amazon EFS, Amazon EBS, Amazon Glacier, Amazon Storage Gateway, and the Snowball family.
by Darin Briskman, Technical Evangelist, AWS
SQL is a powerful tool to query data, but it doesn't cover everything you might need. Sometimes, the precision of SQL is a limitation, that can be overcome by using the flexibility and inherent ranking of search. Learn how to use AWS servcies to create fully managed solutions using Amazon Aurora and Amazon Elasticsearch Service to combine the power of query and search. Level: 200
MCL314_Unlocking Media Workflows Using Amazon RekognitionAmazon Web Services
Companies can have large amounts of image and video content in storage with little or no insight about what they have—effectively sitting on an untapped licensing and advertising goldmine. Learn how media companies are using Amazon Rekognition APIs for object or scene detection, facial analysis, facial recognition, or celebrity recognition to automatically generate metadata for images to provide new licensing and advertising revenue opportunities. Understand how to use Amazon Rekognition APIs to index faces into a collection at high scale, filter frames from a video source for processing, perform face matches that populate a person index in ElasticSearch, and use the Amazon Rekognition celebrity match feature to optimize the process for faster time to market and more accurate results.
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Amazon Web Services
Learning Objectives:
- Learn how to migrate Oracle databases to the cloud
- Learn how to run additional components of the Oracle stack on AWS
- Get acquainted with other database options on AWS
by Joyjeet Banerjee, Enterprise Solutions Architect, AWS
Amazon Aurora is a MySQL- and PostgreSQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features in areas like high availability, security, performance management and database cloning. Level 300
Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...Amazon Web Services
Changing database engines is often daunting to customers. However, the value of a highly scalable, cost-effective, and fully managed service, such as Amazon Aurora, can make the challenge worth it. In this hand-on lab, we demonstrate how to take advantage of the AWS Schema Conversion Tool (SCT) and AWS Database Migration Service (DMS) to facilitate and simplify migrating an Oracle database to the Amazon Aurora PostgreSQL-compatible Edition. We connect to an Oracle (source) and a PostgreSQL (target) instance, and convert the Oracle database schema and code objects to PostgreSQL using AWS SCT. Then, we migrate and replicate the data using AWS DMS. AWS credits are provided. Bring your laptop, and have an active AWS account.
We have recently seen some convergence of different database technologies. Many customers are evaluating heterogeneous migrations as their database needs have evolved or changed. Evaluating the best database to use for a job isn’t as clear as it was ten years ago. In this session, we discuss the ideal use cases for relational and nonrelational data services, including Amazon ElastiCache for Redis, Amazon DynamoDB, Amazon Aurora, and Amazon Redshift. This session digs into how to evaluate a new workload for the best managed database option.
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
In this session, we simplify big data processing as a data bus comprising various stages: collect, store, process, analyze, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Run Your CI/CD Pipeline at Scale for a Fraction of the Cost - AWS Online Tech...Amazon Web Services
Learning Objectives:
- Learn how Amazon EC2 Spot Instances can help run and scale CI/CD pipeline for a fraction of the cost
- Learn how to deploy and configure the EC2 Spot Fleet plugin for Jenkins
- Leverage the full scale of the AWS cloud for faster results
Increasingly, valuable customer data sources are dispersed among on-premises data centers, SaaS providers, partners, third-party data providers, and public datasets. Building a data lake on AWS offers a foundation for storing on-premises, third-party, and public datasets cost effectively with high performance. This workshop introduces AWS tools and technologies you can use to analyze and extract value from petabyte-scale datasets, including Amazon Athena and Amazon Redshift Spectrum.
This presentation compares three modern architecture patterns that startups are building their businesses around. It includes a realistic analysis of cost, team management, and security implications of each approach. It covers AWS Elastic Beanstalk, Amazon ECS, Amazon API Gateway, AWS Lambda, Amazon DynamoDB, and Amazon CloudFront. Attendees will also hear from venture capital investor Third Rock Ventures (TRV) who has launched 40+ biotech startups over the last 10 years. TRV will outline how it launches cloud native startups that turn bleeding edge science into new treatments across the spectrum of disease, with highlights drawn Relay Therapeutics and Tango Therapeutics.
Design patterns and best practices for data analytics with amazon emr (ABD305)Amazon Web Services
Amazon EMR is one of the largest Hadoop operators in the world, enabling customers to run ETL, machine learning, real-time processing, data science, and low-latency SQL at petabyte scale. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about lowering cost with Auto Scaling and Spot Instances, and security best practices for encryption and fine-grained access control. Finally, we dive into some of our recent launches to keep you current on our latest features.
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017Amazon Web Services
In this session, we discuss the evolution of database and analytics services in AWS, the new database and analytics services and features we launched this year, and our vision for continued innovation in this space. We are witnessing an unprecedented growth in the amount of data collected, in many different forms. Storage, management, and analysis of this data require database services that scale and perform in ways not possible before. AWS offers a collection of database and other data services—including Amazon Aurora, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon ElastiCache, Amazon Kinesis, and Amazon EMR—to process, store, manage, and analyze data. In this session, we provide an overview of AWS database and analytics services and discuss how customers are using these services today.
STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...Amazon Web Services
Enterprises of all sizes face continuing data growth and persistent requirements to back up and recover application data. The pains of recurring storage hardware purchasing, management, and failures are still acute for many IT organizations. Some also need to integrate on-premises datasets with in-cloud workloads, such as big data processing and analytics. Learn how to use AWS Storage Gateway to connect on-premises applications to AWS storage services using standard storage protocols, such as NFS, iSCSI, and VTL. Storage Gateway enables hybrid cloud storage solutions for backup and disaster recovery, file sharing, in-cloud processing, or bulk ingest for migration. We discuss use cases with real-life customer stories, and offer best practices.
Moving your File Data to Amazon EFS - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Recognize why and when to use Amazon EFS and the ecomonic benefits versus other solutions
- Understand key elements to optimize the movement of your data to Amazon EFS
- See Amazon EFS in action with live demo
Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...Amazon Web Services
Databases continue to grow to be multiple terabytes in size, but migrating to the cloud doesn't have to take days or create disruption for your business. To perform data migration at petabyte scale with minimal impact to your business, you can now use the new combination of AWS Database Migration Service replication agents and AWS Snowball. In this session, we discuss how to extract large-scale data from an on-premises Oracle database and migrate it to Amazon Aurora. We then outline a step-by-step process for converting your Oracle schema to a PostgreSQL-based schema.
DAT324_Expedia Flies with DynamoDB Lightning Fast Stream Processing for Trave...Amazon Web Services
Building rich, high-performance streaming data systems requires fast, on-demand access to reference data sets, to implement complex business logic. In this talk, Expedia will discuss the architectural challenges the company faced, and how DAX + DynamoDB fits into the overall architecture and met their design requirements. Additionally, you will hear how DAX that enabled Expedia to add caching to their existing applications in hours, which previously was taking much longer. Session attendees will walk away with three key outputs: 1) Expedia’s overall architectural patterns for streaming data 2) how they uniquely leverage DynamoDB, DAX, Apache Spark, and Apache Kafka to solve these problems 3) the value that DAX provides and how it enabled them to improve our performance and throughput, reduce costs, and all without having to write any new code.
Containers have revolutionized the way we build, package, deploy, and run applications. While containers initially only supported code and tooling for Linux applications, Docker now offers API and toolchain support for running Windows Servers in containers.
This webinar was held in March 2018 to an Australian and New Zealander audience.
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumAmazon Web Services
Most companies are over-run with data, yet they lack critical insights to make timely and accurate business decisions. They are missing the opportunity to combine large amounts of new, unstructured big data that resides outside their data warehouse with trusted, structured data inside their data warehouse. In this session, we take an in-depth look at how modern data warehousing blends and analyzes all your data, inside and outside your data warehouse without moving the data, to give you deeper insights to run your business. We will cover best practices on how to design optimal schemas, load data efficiently, and optimize your queries to deliver high throughput and performance.
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
AWS Commercial Management and Cost Optimisation - Dec 2017Amazon Web Services
Technical levers and strategic mechanisms for AWS Commercial Management and Cost Optimisation. Includes 2017 commercially relevant updates.
Speaker: Peter Shi, Commercial Architect, BD AWS APAC
In this session, you learn how to set up a crawler to automatically discover your data and build your AWS Glue Data Catalog. You then auto-generate an AWS Glue ETL script, download it, and interactively edit it using a Zeppelin notebook, connected to an AWS Glue development endpoint. After that, you upload this script to Amazon S3, reuse it across multiple jobs, and add trigger conditions to run the jobs. The resulting datasets automatically get registered in the AWS Glue Data Catalog and you can then query these new datasets from Amazon EMR and Amazon Athena. Prerequisites: Knowledge of Python and familiarity with big data applications is preferred but not required. Attendees must bring their own laptops.
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
Gain in-depth knowledge and best practices for migrating commercial data warehouses to Amazon Redshift using AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT). We use an example based on an Oracle data warehouse, and we discuss approaches to migrate it to Amazon Redshift. We also discuss some of the common challenges, limitations, and workarounds, as well as the option of using AWS Snowball to migrate very large data warehouses to Amazon Redshift.
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Amazon Web Services
Learning Objectives:
- Learn how to migrate Oracle databases to the cloud
- Learn how to run additional components of the Oracle stack on AWS
- Get acquainted with other database options on AWS
by Joyjeet Banerjee, Enterprise Solutions Architect, AWS
Amazon Aurora is a MySQL- and PostgreSQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features in areas like high availability, security, performance management and database cloning. Level 300
Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...Amazon Web Services
Changing database engines is often daunting to customers. However, the value of a highly scalable, cost-effective, and fully managed service, such as Amazon Aurora, can make the challenge worth it. In this hand-on lab, we demonstrate how to take advantage of the AWS Schema Conversion Tool (SCT) and AWS Database Migration Service (DMS) to facilitate and simplify migrating an Oracle database to the Amazon Aurora PostgreSQL-compatible Edition. We connect to an Oracle (source) and a PostgreSQL (target) instance, and convert the Oracle database schema and code objects to PostgreSQL using AWS SCT. Then, we migrate and replicate the data using AWS DMS. AWS credits are provided. Bring your laptop, and have an active AWS account.
We have recently seen some convergence of different database technologies. Many customers are evaluating heterogeneous migrations as their database needs have evolved or changed. Evaluating the best database to use for a job isn’t as clear as it was ten years ago. In this session, we discuss the ideal use cases for relational and nonrelational data services, including Amazon ElastiCache for Redis, Amazon DynamoDB, Amazon Aurora, and Amazon Redshift. This session digs into how to evaluate a new workload for the best managed database option.
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
In this session, we simplify big data processing as a data bus comprising various stages: collect, store, process, analyze, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Run Your CI/CD Pipeline at Scale for a Fraction of the Cost - AWS Online Tech...Amazon Web Services
Learning Objectives:
- Learn how Amazon EC2 Spot Instances can help run and scale CI/CD pipeline for a fraction of the cost
- Learn how to deploy and configure the EC2 Spot Fleet plugin for Jenkins
- Leverage the full scale of the AWS cloud for faster results
Increasingly, valuable customer data sources are dispersed among on-premises data centers, SaaS providers, partners, third-party data providers, and public datasets. Building a data lake on AWS offers a foundation for storing on-premises, third-party, and public datasets cost effectively with high performance. This workshop introduces AWS tools and technologies you can use to analyze and extract value from petabyte-scale datasets, including Amazon Athena and Amazon Redshift Spectrum.
This presentation compares three modern architecture patterns that startups are building their businesses around. It includes a realistic analysis of cost, team management, and security implications of each approach. It covers AWS Elastic Beanstalk, Amazon ECS, Amazon API Gateway, AWS Lambda, Amazon DynamoDB, and Amazon CloudFront. Attendees will also hear from venture capital investor Third Rock Ventures (TRV) who has launched 40+ biotech startups over the last 10 years. TRV will outline how it launches cloud native startups that turn bleeding edge science into new treatments across the spectrum of disease, with highlights drawn Relay Therapeutics and Tango Therapeutics.
Design patterns and best practices for data analytics with amazon emr (ABD305)Amazon Web Services
Amazon EMR is one of the largest Hadoop operators in the world, enabling customers to run ETL, machine learning, real-time processing, data science, and low-latency SQL at petabyte scale. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about lowering cost with Auto Scaling and Spot Instances, and security best practices for encryption and fine-grained access control. Finally, we dive into some of our recent launches to keep you current on our latest features.
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017Amazon Web Services
In this session, we discuss the evolution of database and analytics services in AWS, the new database and analytics services and features we launched this year, and our vision for continued innovation in this space. We are witnessing an unprecedented growth in the amount of data collected, in many different forms. Storage, management, and analysis of this data require database services that scale and perform in ways not possible before. AWS offers a collection of database and other data services—including Amazon Aurora, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon ElastiCache, Amazon Kinesis, and Amazon EMR—to process, store, manage, and analyze data. In this session, we provide an overview of AWS database and analytics services and discuss how customers are using these services today.
STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...Amazon Web Services
Enterprises of all sizes face continuing data growth and persistent requirements to back up and recover application data. The pains of recurring storage hardware purchasing, management, and failures are still acute for many IT organizations. Some also need to integrate on-premises datasets with in-cloud workloads, such as big data processing and analytics. Learn how to use AWS Storage Gateway to connect on-premises applications to AWS storage services using standard storage protocols, such as NFS, iSCSI, and VTL. Storage Gateway enables hybrid cloud storage solutions for backup and disaster recovery, file sharing, in-cloud processing, or bulk ingest for migration. We discuss use cases with real-life customer stories, and offer best practices.
Moving your File Data to Amazon EFS - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Recognize why and when to use Amazon EFS and the ecomonic benefits versus other solutions
- Understand key elements to optimize the movement of your data to Amazon EFS
- See Amazon EFS in action with live demo
Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...Amazon Web Services
Databases continue to grow to be multiple terabytes in size, but migrating to the cloud doesn't have to take days or create disruption for your business. To perform data migration at petabyte scale with minimal impact to your business, you can now use the new combination of AWS Database Migration Service replication agents and AWS Snowball. In this session, we discuss how to extract large-scale data from an on-premises Oracle database and migrate it to Amazon Aurora. We then outline a step-by-step process for converting your Oracle schema to a PostgreSQL-based schema.
DAT324_Expedia Flies with DynamoDB Lightning Fast Stream Processing for Trave...Amazon Web Services
Building rich, high-performance streaming data systems requires fast, on-demand access to reference data sets, to implement complex business logic. In this talk, Expedia will discuss the architectural challenges the company faced, and how DAX + DynamoDB fits into the overall architecture and met their design requirements. Additionally, you will hear how DAX that enabled Expedia to add caching to their existing applications in hours, which previously was taking much longer. Session attendees will walk away with three key outputs: 1) Expedia’s overall architectural patterns for streaming data 2) how they uniquely leverage DynamoDB, DAX, Apache Spark, and Apache Kafka to solve these problems 3) the value that DAX provides and how it enabled them to improve our performance and throughput, reduce costs, and all without having to write any new code.
Containers have revolutionized the way we build, package, deploy, and run applications. While containers initially only supported code and tooling for Linux applications, Docker now offers API and toolchain support for running Windows Servers in containers.
This webinar was held in March 2018 to an Australian and New Zealander audience.
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumAmazon Web Services
Most companies are over-run with data, yet they lack critical insights to make timely and accurate business decisions. They are missing the opportunity to combine large amounts of new, unstructured big data that resides outside their data warehouse with trusted, structured data inside their data warehouse. In this session, we take an in-depth look at how modern data warehousing blends and analyzes all your data, inside and outside your data warehouse without moving the data, to give you deeper insights to run your business. We will cover best practices on how to design optimal schemas, load data efficiently, and optimize your queries to deliver high throughput and performance.
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
AWS Commercial Management and Cost Optimisation - Dec 2017Amazon Web Services
Technical levers and strategic mechanisms for AWS Commercial Management and Cost Optimisation. Includes 2017 commercially relevant updates.
Speaker: Peter Shi, Commercial Architect, BD AWS APAC
In this session, you learn how to set up a crawler to automatically discover your data and build your AWS Glue Data Catalog. You then auto-generate an AWS Glue ETL script, download it, and interactively edit it using a Zeppelin notebook, connected to an AWS Glue development endpoint. After that, you upload this script to Amazon S3, reuse it across multiple jobs, and add trigger conditions to run the jobs. The resulting datasets automatically get registered in the AWS Glue Data Catalog and you can then query these new datasets from Amazon EMR and Amazon Athena. Prerequisites: Knowledge of Python and familiarity with big data applications is preferred but not required. Attendees must bring their own laptops.
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
Gain in-depth knowledge and best practices for migrating commercial data warehouses to Amazon Redshift using AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT). We use an example based on an Oracle data warehouse, and we discuss approaches to migrate it to Amazon Redshift. We also discuss some of the common challenges, limitations, and workarounds, as well as the option of using AWS Snowball to migrate very large data warehouses to Amazon Redshift.
Organisations involved in Big Data and Analytics spend a lot of time preparing data for analysis which often involves large-scale movement and transformation. In this session we will explore AWS Glue, a new service designed to assist with the process of cataloging, transforming and scheduling for your data pipeline.
Speaker: Cassandra Bonner, Solutions Architect, Amazon Web Services
Data Analytics Week at the San Francisco Loft
Using Data Lakes
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Speakers:
John Mallory - Principal Business Development Manager Storage (Object), AWS
Hemant Borole - Sr. Big Data Consultant, AWS
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
Learning Objectives:
- Understand how to build a serverless big data solution quickly and easily
- Learn how to discover and prepare all your data for analytics
- Learn how to query and visualize analytics on all your data to create actionable insights
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Level: Intermediate
Speakers:
Tony Nguyen - Senior Consultant, ProServe, AWS
Hannah Marlowe - Consultant - Federal, AWS
AWS Databases
·Database models (SQL vs. NoSQL)
·Amazon Relational Database Service (RDS) concepts, including database instances, security groups, and parameter and option groups
·Amazon DynamoDB concepts, including data model and supported operations
A quick overview of Redshift and common use-cases. Followed by tools and links to performance tuning. How Redshift fits in the AWS data services. A list of key new features since last meetup in September 2016, including Redshift Spectrum that allows one to run SQL directly on your data sitting on Amazon S3. It also includes Redshift echosystem with data integration, bi, consultancy and data modelling partners.
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesAmazon Web Services
Amazon EMR is a managed Hadoop service that makes it easy for customers to use big data frameworks and applications like Hadoop, Spark, and Presto to analyze data stored in HDFS or on Amazon S3 , Amazon’s highly scalable object storage service. In this webinar, we will introduce the latest release of Amazon EMR. With Amazon EMR release 5.0, customers can now launch the latest versions of popular open source frameworks including Apache Spark 2.0, Hive 2.1, Presto 0.151, Tez 0.8.4, and Apache Hadoop 2.7.2. We will walk through a demo to show you how to deploy a Hadoop environment within minutes. We will cover common use cases and best practices to lower costs using Amazon S3 as your data store and Amazon EC2 Spot Instances, which allow you to bid on space Amazon computing capacity.
Learning Objectives:
• Describe the new features and updated frameworks in Amazon EMR 5.0
• Learn best practices and real-world applications for Amazon EMR
• Understand how to use EC2 Spot pricing to save costs
• Explain the advantages of decoupling storage and compute with Amazon S3 as storage layer for EMR workloads
AWS re:Invent 2016 was AWS’ largest event yet with over 32,000 attendees, 400 breakout sessions, and two keynotes of new product announcements. In this talk, we’ll explore the core themes of AWS re:Invent 2016 such as serverless and artificial intelligence. We will also drill down into several of the services and features unveiled including AWS Batch, AWS Shield, Aurora for Postgres, X-Ray, Polly, Lex, Rekognition, AWS Step Functions. Light appetizers and refreshments will be provided.
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
In this session, we discuss how Spark and Presto complement the Netflix big data platform stack that started with Hadoop, and the use cases that Spark and Presto address. Also, we discuss how we run Spark and Presto on top of the Amazon EMR infrastructure; specifically, how we use Amazon S3 as our data warehouse and how we leverage Amazon EMR as a generic framework for data-processing cluster management.
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Speakers:
Neel Mitra - Solutions Architect, AWS
Roger Dahlstrom - Solutions Architect, AWS
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
Introduction to AWS Glue: Data Analytics Week at the San Francisco Loft
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
John Mallory - Principal Business Development Manager, Storage, AWS
Asim Kumar Sasmal - Big Data Consultant, AWS Professional Services
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to AWS in order to save costs, increase availability, and improve performance. AWS offers a broad set of analytics services, including solutions for batch processing, stream processing, machine learning, data workflow orchestration, and data warehousing. This session will focus on identifying the components and workflows in your current environment; and providing the best practices to migrate these workloads to the right AWS data analytics product. We will cover services such as Amazon EMR, Amazon Athena, Amazon Redshift, Amazon Kinesis, and more. We will also feature Vanguard, an American investment management company based in Malvern, Pennsylvania with over $4.4 trillion in assets under management. Ritesh Shah, Sr. Program Manager for Cloud Analytics Program at Vanguard, will describe how they orchestrated their migration to AWS analytics services, including Hadoop and Spark workloads to Amazon EMR. Ritesh will highlight the technical challenges they faced and overcame along the way, as well as share common recommendations and tuning tips to accelerate the time to production.
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
2. AWS Data Services to Accelerate Your Move to the Cloud
RDS
Open
Source
RDS
Commercial
Aurora
Migration for DB Freedom
DynamoDB
& DAX
ElastiCache EMR Amazon
Redshift
Redshift
Spectrum
AthenaElasticsearch
Service
QuickSightGlue
Databases to Elevate your Apps
Relational Non-Relational
& In-Memory
Analytics to Engage your Data
Inline Data Warehousing Reporting
Data Lake
Amazon AI to Drive the Future
Lex
Polly
Rekognition Machine
Learning
Deep Learning, MXNet
Database Migration
Schema Conversion
3. AWS Data Services to Accelerate Your Move to the Cloud
RDS
Open
Source
RDS
Commercial
Aurora
Migration for DB Freedom
DynamoDB
& DAX
ElastiCache EMR Amazon
Redshift
Redshift
Spectrum
AthenaElasticsearch
Service
QuickSightGlue
Lex
Polly
Rekognition Machine
Learning
Databases to Elevate your Apps
Relational Non-Relational
& In-Memory
Analytics to Engage your Data
Inline Data Warehousing Reporting
Data Lake
Amazon AI to Drive the Future
Deep Learning, MXNet
Database Migration
Schema Conversion
4. Schemaless data model
Consistent low latency performance
Predictable provisioned throughput
Seamless scalability with no storage limits
High durability & availability (replication across 3 facilities)
Easy administration – we scale for you!
Low cost
DynamoDB
DAXApp
DynamoDB Accelerator (DAX) offers caching without
coding for sub-millisecond read latency and up to 10x
throughput
DynamoDB: Non-Relational
Managed Database Service
5. Availability Zone A
Partition A
Host 4 Host 6
Availability Zone B Availability Zone C
Partition APartition A Partition CPartition C Partition C
Host 5
Partition B
Host 1 Host 3Host 2
Partition B
Host 7 Host 9Host 8
Partition B
CustomerOrdersTable
Data is always
replicated to three
Availability Zones
3-way replication
OrderId: 1
CustomerId: 1
ASIN: [B00X4WHP5E]
Hash(1) = 7B
Highly available and durable
Partition A
6. Availability Zone A
Partition A
Host 4 Host 6
Availability Zone B Availability Zone C
Partition APartition A Partition CPartition C Partition C
Host 5
Partition B
Host 1 Host 3Host 2
Partition B
Host 7 Host 9Host 8
Partition B
CustomerOrdersTable
Data is always
replicated to three
Availability Zones
3-way replication
OrderId: 1
CustomerId: 1
ASIN: [B00X4WHP5E]
Hash(1) = 7B
Highly available and durable
Partition A
7. Consistently fast at any scale
Consistent Single-Digit Millisecond Latency
Requests (millions)
Latency (milliseconds)
10. Local secondary indexes
10 GB max per
partition key,
i.e. LSIs limit the
# of sort keys!
A1
(partition key)
A3
(sort key)
A2 A4 A5
A1
(partition key)
A4
(sort key)
A2 A3 A5
A1
(partition key)
A5
(sort key)
A2 A3 A4
• Alternate sort key
attribute
• Index is local to a
partition key
11. Reads and writes
provisioned
separately for GSIs
INCLUDE A2
A
LL
KEYS_ONLY
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A3
(partition key)
A1
(table key)
A2
• Alternate partition
(+sort) key
• Sparse
• Can be added or
removed anytime
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A2
A3
(partition key)
A1
(table key)
Global secondary indexes
12. DynamoDB Streams
Partition A
Partition B
Partition C
üOrdered stream of item
changes
üExactly once, strictly
ordered by key
üHighly durable, scalable
ü24 hour retention
üSub-second latency
üCompatible with Kinesis
Client Library
DynamoDB Streams
1
Shards have a lineage and
automatically close after time
or when the associated
DynamoDB partition splits
2
3
Updates
KCL
Worker
Amazon
Kinesis Client
Library
Application
KCL
Worker
KCL
Worker
GetRecords
DynamoDB
Table
DynamoDB Stream
Shards
13. DynamoDB Streams and Triggers
AWS Lambda
function
Amazon SNS
ü Implemented as AWS Lambda functions
ü Scale automatically
ü C#, Java, Node.js, Python
Triggers
Amazon ES
Amazon ElastiCache
16. Integration with Amazon EMR
The Elasticsearch-Hadoop or ES-Hadoop connector enables
several Hadoop stack applications running on EMR or EC2 to
power real-time search and analytics with Amazon Elasticsearch
as well as beautiful visualizations with Kibana.
• seamlessly moves data between Hadoop and ElasticSearch
and allows indexing of Hadoop Data (HDFS/EMRFS) to and
query from Amazon Elasticsearch.
Amazon
EMR
Amazon ES
ES-Hadoop
17. ES-Hadoop Connector – for Spark & Friends
Hadoop Applications on
EMR/EC2
STORM
Amazon
Elasticsearch
Index data to
Elasticsearch Cluster
* Query data from
Elasticsearch Cluster
ES-Hadoop Connector
0
Analyze
Search
Visualize
Discover
* With Spark SQL, at runtime, Spark SQL translates to Query DSL. Data is filtered at source.
18. ES-Hadoop Connector – considerations
ES-Hadoop
• Performance:
Since Amazon Elasticsearch cluster nodes are not collocated
on EMR cluster nodes, local discovery should be disabled so
the ES-Hadoop Connector only connects through the declared
es.nodes during all operations, including reads and writes.
es.nodes.wan.only should be set to true
Since partition to partition architecture or parallelism cannot be
achieved, performance may be impacted at scale and ES-
Hadoop connector tasks should be tested for bottlenecks.
19. ES-Hadoop Connector – considerations (contd.)
ES-Hadoop
• Security:
• For EMR Cluster in a public subnet, use IP-based access
policy with Amazon Elasticsearch to whitelist EMR IPs.
• For EMR Cluster in a private subnet, use Identity-based
access policy with Amazon Elasticsearch and install AWS
ES/Kibana Proxy on EMR nodes via bootstrap action.
20. Kinesis Firehose delivery architecture with
transformations
S3 bucket
source records
data source
source records
Amazon Elasticsearch
Service
Firehose
delivery stream
transformed
records
delivery failure
Data transformation
function
transformation failure
22. Elasticsearch works with structured JSON
{
"name" : {
"first" : "Jon",
"last" : "Smith",
}
"age": 26,
"city" : "palo alto",
"years_employed" : 4,
"interests" : [
"guitar",
"sports"
]
}
• Documents contain fields –
name/value pairs
• Fields can nest
• Value types include text,
numerics, dates, and geo
objects
• Field values can be single or
array
• When you send documents to
Elasticsearch they should arrive
as JSON*
*ES 5 can work with unstructured documents
23. If your data is not already in
structured JSON, you must
transform it, creating
structured JSON that
Elasticsearch "understands"
24. The most basic way to transform data
• Run a script in Amazon EC2, Lambda, etc. that reads data
from your data source, creates JSON documents, and ships
to Amazon Elasticsearch Service directly
25. Logstash simplifies transformation
• Logstash is open-source ETL over streams. Run colocated
with your application or read from your source
• Many input plugins and output plugins make it easy to
connect to Logstash
• Grok pattern matching to pull out values and re-write
Application
Instance
26. Elasticsearch 5 ingest processors
When you index documents, you can specify a pipeline.
The pipeline can have a series of processors that
pre-process the data before indexing.
Twenty processors are available, some are simple:
{ "append":
{ "field": "field1"
"value": ["item2", "item3", "item4"] } }
Others are more complex, like the Grok processor for
regex with aliased expressions.
27. Firehose transformations add robust delivery
S3 bucket
source records
data source
source records
Amazon Elasticsearch
Service
Firehose
delivery stream
transformed
records
delivery failure
Data transformation
function
transformation failure
• Inline calls to
Lambda for
free-form
changes to the
underlying data
• Failed
transforms
tracked and
delivered to S3
28. Firehose transformations add robust delivery
intermediate
Amazon S3
bucket
backup S3 bucket
source records
data source
source records
Amazon Elasticsearch
Service
Firehose
delivery stream
transformed
records transformed
records
transformation failure
delivery failure
• Inline calls to Lambda for free-form changes to the
underlying data
• Failed transforms tracked and delivered to S3
30. Cluster is a collection of nodes
Amazon ES cluster
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3Dedicated master nodes
Data nodes: queries and updates
31. Data pattern
Amazon ES cluster
logs_01.21.2017
logs_01.22.2017
logs_01.23.2017
logs_01.24.2017
logs_01.25.2017
logs_01.26.2017
logs_01.27.2017
Shard 1
Shard 2
Shard 3
host
ident
auth
timestamp
etc.
Each index has
multiple shards
Each shard contains
a set of documents
Each document contains
a set of fields and values
One index per day
32. Indices and Mappings
Index: product
Type: cellphone
documentId
Fields: make (keyword), inventory
(int), location (geo point)
Type: reviews
documentId
Fields: make(keyword), review (text),
rating (float), date (date)
http://hostname/product/cellphone/1 http://hostname/product/reviews/1
34. Shards
- Indexes are split into multiple shards
- Primary shards are defined at index creation
- Defaults to 5 Primaries and 1 Replica Shard
- Shards allow
- Horizontal scale
- Distribute and parallelize the operations to increase
throughput
- Create replicas to provide high availability in case of failures
35. Shards … contd
- Shard is a Lucene index
- Number of Replica shards can be changed on the fly but
not the primary shards
- To change the number of primary shards, the index
needs to be re-created
- Shards are automatically balanced when cluster is re-
sized
36. 199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245
Document
Fields
host ident auth timestamp verb request status size
Field indexes
199.72.81.55
unicomp6.unicomp.net
199.120.110.21
burger.letters.com
199.120.110.21
205.212.115.106
d104.aa.net
1, 4, 8, 12, 30, 42, 58, 100...
Postings
Elasticsearch creates an index for
each field, containing the
decomposed values of those fields
37. host:199.72.81.55 AND verb:GET
1,
4,
8,
12,
30,
42,
58,
100
...
Look up
199.72.81.55 GET
1,
4,
9,
50,
58,
75,
90,
103
...
AND
Merge
1,
4,
58
Score
1.2,
3.7,
0.4
Sort
4,
1,
58
The index data structures support fast
retrieval and merging. Scoring and
sorting support best match retrieval
38. - Create Index called product
- Get list of Indices
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open product 95SQ4TS 5 1 0 0 260b 260b
$ curl –XPUT ‘http://hostname/product/’
Index and Document Command Examples
$ curl ‘http://hostname/_cat/indices’
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open product 95SQ4TS 5 1 0 0 260b 260b
40. What happens at Index Operation
http PUT – http://hostname/product/cellphone/1
Elasticsearch Cluster
Instance 1 Instance 2
1
2
32 1
3
Instance 3
1. Indexing operation
2. Shard determined is based on hashing with
document ID.
3. Current node forwards document to node
holding the primary shard
4. Primary shard ensures all replica shards
replay the same indexing operation
1
3
4
41. Mappings
1. Mappings are used to define types of documents.
2. Define various fields in a document
3. Mapping Types –
1. Core
1. Text or keyword
2. Numeric
3. Date
4. Boolean
2. Arrays and Multi-fields
1. Arrays – “tags” : [“blue”,”red”]
2. Multi-fields – Index same data with different settings
3. Pre-defined fields
1. _ttl, _size
2. _uid, _id, _type, _index
3. _all, _source
42. Mapping command examples
curl -XPUT ’http://hostname/product' -H 'Content-Type: application/json' –d‘
{
"mappings": {
"cellphone": {
"properties": {
"make": {
"type": "text"
}
}
}
}
}’
Create an index called product with mapping, cellphone and field make
as type text –
43. Mapping command examples
curl -XPUT ’http://hostname/product/_mapping/reviews' -H 'Content-Type:
application/json' -d’
{
"properties": {
”review": {
"type": "text"
},
“rating”: {
“type”: “int”
}
}
}’
Add a new mapping, reviews, with fields review, as string and rating, as
int, to existing index, product –
44. Mapping command examples
curl -XPUT ’http://hostname/product/_mapping/cellphone' -H 'Content-Type:
application/json' -d’
{
"properties": {
”inventory": {
"type": ”int"
}
}
}’
Add a new field, inventory as integer, to existing mapping, cellphone in
index product –