Slide-deck used in Bend Web Design and Development Meetup (http://web.archive.org/web/20150728021205/http://www.meetup.com/Bend-Web-Design-and-Development/events/222592014/)
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...Amazon Web Services
Enterprises are starting to deploy large scale Hadoop clusters to extract value out of the data that they are generating. These clusters often span hundreds of nodes. To speed up the time to value, a lot of the newer deployments are happening in AWS, moving from the traditional on-premises, bare-metal world. Cloudera supports just such deployments. In this session, Cloudera shares the lessons learned and best practices for deploying multi-tenant Hadoop clusters in AWS. They will cover what reference deployments look like, what services are relevant for Hadoop deployments, network configurations, instance types, backup and disaster recovery considerations, and security considerations. They will also talk about what works well, what doesn't, and what has to be done going forward to improve the operability of Hadoop on AWS.
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
Elasticsearch has quickly become the leading open source technology for scaling search and building document services on. Many software providers have come to rely on it to serve the needs of high-performance, production applications.
In this talk, we’ll go deep on lessons learned from three years in production scaling from a few shards to more than 100 spread across 100s of nodes on AWS--to serve real-time queries against 100s of millions of documents.
Attendees will learn:
* How to capacity plan for ES on AWS
* How to scale and reshard on AWS with zero downtime
* What AWS and ES metrics to collect and alert on
* Tips on day to day ES operations
Session sponsored by SignalFx.
Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; Deployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively.
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Amazon Web Services
AWS is a great fit for both steady state and episodic computational workloads. Here we present some common architecture patterns for analyzing genomic and other biomedical data on scalable high-throughput computational clusters on AWS. This talk will cover bootstrapping a traditional Beowulf compute cluster on AWS EC2, data transfer and storage strategies for S3.
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Elasticsearch is a fully featured search engine used for real-time analytics, and Amazon Elasticsearch Service makes it easy to deploy Elasticsearch clusters on AWS. With Amazon ES, you can ingest and process billions of events per day, and explore the data using Kibana to discover patterns. In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution.
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. Apache MXNet is a fully-featured, flexibly-programmable and ultra-scalable deep learning framework supporting innovative deep models including convolutional neural networks (CNNs), and long short-term memory networks (LSTMs). This Tech Talk will show you how to launch the deep learning cloud formation template and deploy the deep learning AMI to train your own deep neural network, using MNIST, to recognize handwritten digits and test it for accuracy.
Learning Objectives:
- Learn about the features and benefits of Apache MXNet
- Learn about the deep learning AMIs with the tools you need for DL
- Learn how to train a neural network using MXNet"
AWS provides a wide set of services to manage your data, which allow our customers to choose the right tool to the right workload. Learn how to make your databases up to 10x faster and less expensive with Amazon ElastiCache for Redis and utilize DynamoDB Accelerator (DAX) to access your data on DynamoDB faster with no additional development efforts. If you need fast access to your data, these services might be the right services for your workload.
Slide-deck used in Bend Web Design and Development Meetup (http://web.archive.org/web/20150728021205/http://www.meetup.com/Bend-Web-Design-and-Development/events/222592014/)
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...Amazon Web Services
Enterprises are starting to deploy large scale Hadoop clusters to extract value out of the data that they are generating. These clusters often span hundreds of nodes. To speed up the time to value, a lot of the newer deployments are happening in AWS, moving from the traditional on-premises, bare-metal world. Cloudera supports just such deployments. In this session, Cloudera shares the lessons learned and best practices for deploying multi-tenant Hadoop clusters in AWS. They will cover what reference deployments look like, what services are relevant for Hadoop deployments, network configurations, instance types, backup and disaster recovery considerations, and security considerations. They will also talk about what works well, what doesn't, and what has to be done going forward to improve the operability of Hadoop on AWS.
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
Elasticsearch has quickly become the leading open source technology for scaling search and building document services on. Many software providers have come to rely on it to serve the needs of high-performance, production applications.
In this talk, we’ll go deep on lessons learned from three years in production scaling from a few shards to more than 100 spread across 100s of nodes on AWS--to serve real-time queries against 100s of millions of documents.
Attendees will learn:
* How to capacity plan for ES on AWS
* How to scale and reshard on AWS with zero downtime
* What AWS and ES metrics to collect and alert on
* Tips on day to day ES operations
Session sponsored by SignalFx.
Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; Deployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively.
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Amazon Web Services
AWS is a great fit for both steady state and episodic computational workloads. Here we present some common architecture patterns for analyzing genomic and other biomedical data on scalable high-throughput computational clusters on AWS. This talk will cover bootstrapping a traditional Beowulf compute cluster on AWS EC2, data transfer and storage strategies for S3.
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Elasticsearch is a fully featured search engine used for real-time analytics, and Amazon Elasticsearch Service makes it easy to deploy Elasticsearch clusters on AWS. With Amazon ES, you can ingest and process billions of events per day, and explore the data using Kibana to discover patterns. In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution.
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. Apache MXNet is a fully-featured, flexibly-programmable and ultra-scalable deep learning framework supporting innovative deep models including convolutional neural networks (CNNs), and long short-term memory networks (LSTMs). This Tech Talk will show you how to launch the deep learning cloud formation template and deploy the deep learning AMI to train your own deep neural network, using MNIST, to recognize handwritten digits and test it for accuracy.
Learning Objectives:
- Learn about the features and benefits of Apache MXNet
- Learn about the deep learning AMIs with the tools you need for DL
- Learn how to train a neural network using MXNet"
AWS provides a wide set of services to manage your data, which allow our customers to choose the right tool to the right workload. Learn how to make your databases up to 10x faster and less expensive with Amazon ElastiCache for Redis and utilize DynamoDB Accelerator (DAX) to access your data on DynamoDB faster with no additional development efforts. If you need fast access to your data, these services might be the right services for your workload.
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Organizations often need to quickly analyze large amounts of data, such as logs generated from a wide variety of sources and formats. However, traditional approaches require a lot of time and effort designing complex data transformation and loading processes; and configuring data warehouses. Using AWS, you can start querying your datasets within minutes. In this session you will learn how you can deploy a managed Presto environment in minutes to interactively query log data using standard ANSI SQL. Presto is a popular open source SQL engine for running interactive analytic queries against data sources of all sizes. We will talk about common use cases and best practices for running Presto on Amazon EMR.
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Web Services
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch four years ago, our customers have launched more than 5.5 million Hadoop clusters. In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Amazon EMR is one of the largest Hadoop operators in the world. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We will also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features.
(GAM403) From 0 to 60 Million Player Hours in 400B Star SystemsAmazon Web Services
Elite Dangerous is a Kickstarter-backed, massive-scale space MMO by Frontier Games. With no prior experience with AWS, Frontier have used EC2, S3, RDS, DynamoDB, Elasticache, and CloudFormation to deploy a cross-platform PC & Console MMO experience that is sold and distributed worldwide. Every action made by each of the 825,000 (and counting) Elite Dangerous players drives the combined game's story forward, and impacts a live galactic commodities market running on EC2 and RDS in real-time. Frontier uses Elite Dangerous is a Kickstarter-backed, massive-scale space MMO by Frontier Games. With no prior experience with AWS, Frontier have used EC2, S3, RDS, DynamoDB, Elasticache, and CloudFormation to deploy a cross-platform PC & Console MMO experience that is sold and distributed worldwide. Every action made by each of the 825,000 (and counting) Elite Dangerous players drives the combined game's story forward, and impacts a live galactic commodities market running on EC2 and RDS in real-time. Frontier uses AWS to create a simulation of the entire 400 billion star systems of the Milky Way galaxy using physics engines running on Amazon EC2. Finally, learn how Elite distributes updates and DLC to game clients using Amazon S3 and Amazon CloudFront.to create a simulation of the entire 400 billion star systems of the Milky Way galaxy using physics engines running on Amazon EC2. Finally, learn how Elite distributes updates and DLC to game clients using Amazon S3 and Amazon CloudFront.
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...Amazon Web Services
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters. In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
"In this session, you will learn how to easily access your data on S3, and how to visualize and generate insights from Amazon Athena and other data sources through Amazon QuickSight. In addition we will share some tips & best practices for using Athena & QuickSight.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from various data sources (Amazon Redshift, Amazon Athena, Amazon EMR, Amazon RDS and more)."
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
This presentation is a real-world case study about moving a large portfolio of batch analytical programs that process 30 billion or more transactions every day, from a proprietary MPP database appliance architecture to the Hadoop ecosystem in the cloud, leveraging Hive, Amazon EMR, and S3.
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
In this session, we discuss how Spark and Presto complement the Netflix big data platform stack that started with Hadoop, and the use cases that Spark and Presto address. Also, we discuss how we run Spark and Presto on top of the Amazon EMR infrastructure; specifically, how we use Amazon S3 as our data warehouse and how we leverage Amazon EMR as a generic framework for data-processing cluster management.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the types of Amazon EBS block storage including General Purpose (SSD), Provisioned IOPS (SSD) as well as the new Throughput Optimized HDD and Cold HDD. Along the way, we will share Amazon EBS best practices for performance, management and security.
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way.
In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Amazon Web Services
A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.
Data science with spark on amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)Amazon Web Services
Interested in finding out more about the AWS data warehouse service, Amazon Redshift? Then join us for this introductory level technical session where you can learn more about the way in which AWS customers are using Redshift and the benefits that they have delivered to their organisations as a result as well as tricks and tips for getting the most from Redshift.
(BDT320) New! Streaming Data Flows with Amazon Kinesis FirehoseAmazon Web Services
Amazon Kinesis Firehose is a fully-managed, elastic service to deliver real-time data streams to Amazon S3, Amazon Redshift, and other destinations. In this session, we start with overviews of Amazon Kinesis Firehose and Amazon Kinesis Analytics. We then discuss how Amazon Kinesis Firehose makes it even easier to get started with streaming data, without writing a stream processing application or provisioning a single resource. You learn about the key features of Amazon Kinesis Firehose, including its companion agent that makes emitting data from data producers even easier. We walk through capture and delivery with an end-to-end demo, and discuss key metrics that will help developers and architects understand their streaming data flow. Finally, we look at some patterns for data consumption as the data streams into S3. We show two examples: using AWS Lambda, and how you can use Apache Spark running within Amazon EMR to query data directly in Amazon S3 through EMRFS.
Introduction to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of Spot EC2 instances to reduce costs, and other Amazon EMR architectural best practices.
Learn from Accubits Technologies
High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Organizations often need to quickly analyze large amounts of data, such as logs generated from a wide variety of sources and formats. However, traditional approaches require a lot of time and effort designing complex data transformation and loading processes; and configuring data warehouses. Using AWS, you can start querying your datasets within minutes. In this session you will learn how you can deploy a managed Presto environment in minutes to interactively query log data using standard ANSI SQL. Presto is a popular open source SQL engine for running interactive analytic queries against data sources of all sizes. We will talk about common use cases and best practices for running Presto on Amazon EMR.
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Web Services
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch four years ago, our customers have launched more than 5.5 million Hadoop clusters. In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Amazon EMR is one of the largest Hadoop operators in the world. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We will also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features.
(GAM403) From 0 to 60 Million Player Hours in 400B Star SystemsAmazon Web Services
Elite Dangerous is a Kickstarter-backed, massive-scale space MMO by Frontier Games. With no prior experience with AWS, Frontier have used EC2, S3, RDS, DynamoDB, Elasticache, and CloudFormation to deploy a cross-platform PC & Console MMO experience that is sold and distributed worldwide. Every action made by each of the 825,000 (and counting) Elite Dangerous players drives the combined game's story forward, and impacts a live galactic commodities market running on EC2 and RDS in real-time. Frontier uses Elite Dangerous is a Kickstarter-backed, massive-scale space MMO by Frontier Games. With no prior experience with AWS, Frontier have used EC2, S3, RDS, DynamoDB, Elasticache, and CloudFormation to deploy a cross-platform PC & Console MMO experience that is sold and distributed worldwide. Every action made by each of the 825,000 (and counting) Elite Dangerous players drives the combined game's story forward, and impacts a live galactic commodities market running on EC2 and RDS in real-time. Frontier uses AWS to create a simulation of the entire 400 billion star systems of the Milky Way galaxy using physics engines running on Amazon EC2. Finally, learn how Elite distributes updates and DLC to game clients using Amazon S3 and Amazon CloudFront.to create a simulation of the entire 400 billion star systems of the Milky Way galaxy using physics engines running on Amazon EC2. Finally, learn how Elite distributes updates and DLC to game clients using Amazon S3 and Amazon CloudFront.
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...Amazon Web Services
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters. In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
"In this session, you will learn how to easily access your data on S3, and how to visualize and generate insights from Amazon Athena and other data sources through Amazon QuickSight. In addition we will share some tips & best practices for using Athena & QuickSight.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from various data sources (Amazon Redshift, Amazon Athena, Amazon EMR, Amazon RDS and more)."
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
This presentation is a real-world case study about moving a large portfolio of batch analytical programs that process 30 billion or more transactions every day, from a proprietary MPP database appliance architecture to the Hadoop ecosystem in the cloud, leveraging Hive, Amazon EMR, and S3.
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
In this session, we discuss how Spark and Presto complement the Netflix big data platform stack that started with Hadoop, and the use cases that Spark and Presto address. Also, we discuss how we run Spark and Presto on top of the Amazon EMR infrastructure; specifically, how we use Amazon S3 as our data warehouse and how we leverage Amazon EMR as a generic framework for data-processing cluster management.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the types of Amazon EBS block storage including General Purpose (SSD), Provisioned IOPS (SSD) as well as the new Throughput Optimized HDD and Cold HDD. Along the way, we will share Amazon EBS best practices for performance, management and security.
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way.
In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Amazon Web Services
A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.
Data science with spark on amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)Amazon Web Services
Interested in finding out more about the AWS data warehouse service, Amazon Redshift? Then join us for this introductory level technical session where you can learn more about the way in which AWS customers are using Redshift and the benefits that they have delivered to their organisations as a result as well as tricks and tips for getting the most from Redshift.
(BDT320) New! Streaming Data Flows with Amazon Kinesis FirehoseAmazon Web Services
Amazon Kinesis Firehose is a fully-managed, elastic service to deliver real-time data streams to Amazon S3, Amazon Redshift, and other destinations. In this session, we start with overviews of Amazon Kinesis Firehose and Amazon Kinesis Analytics. We then discuss how Amazon Kinesis Firehose makes it even easier to get started with streaming data, without writing a stream processing application or provisioning a single resource. You learn about the key features of Amazon Kinesis Firehose, including its companion agent that makes emitting data from data producers even easier. We walk through capture and delivery with an end-to-end demo, and discuss key metrics that will help developers and architects understand their streaming data flow. Finally, we look at some patterns for data consumption as the data streams into S3. We show two examples: using AWS Lambda, and how you can use Apache Spark running within Amazon EMR to query data directly in Amazon S3 through EMRFS.
Introduction to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of Spot EC2 instances to reduce costs, and other Amazon EMR architectural best practices.
Learn from Accubits Technologies
High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
AWS Chicago user group - October 2015 "reInvent Replay"Cohesive Networks
Greg Khairallah, Business Development Manager for Big Data and Analytics at Amazon Web Services, presented the big news from reInvent on October 22, 2015 at Cohesive Networks' offices.
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
AWS is hosting the first FSI Cloud Symposium in Hong Kong, which will take place on Thursday, March 23, 2017 at Grand Hyatt Hotel. The event will bring together FSI customers, industry professional and AWS experts, to explore how to turn the dream of transformation, innovation and acceleration into reality by exploiting Cloud, Voice to Text and IoT technologies. The packed agenda includes expert sessions on a host of pressing issues, such as security and compliance, as well as customer experience sharing on how cloud computing is benefiting the industry.
Speaker: Pawan Agnihotri, Principal Solutions Architect, AWS
Introduction to AWS products, services, and common solutions. Overview of fundamentals to become more proficient in identifying AWS services to help make informed decisions about IT solutions based on business requirements. Helps you get started working on AWS.
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
Amazon RDS is a fully managed relational database service that enables you to launch an optimally configured, secure, and highly available database with just a few clicks. It manages time-consuming database administration tasks, freeing you to focus on your applications and business. In this session, we review the capabilities of the service and the latest available features.
Amazon RDS is a fully managed relational database service that enables you to launch an optimally configured, secure, and highly available database with just a few clicks. In this session, we review the service’s capabilities and its latest features. We also show you how Amazon RDS manages time-consuming database administration tasks, freeing you to focus on your applications and business.
Amazon Web Services (AWS) can make hosting scalable, highly-available websites and web applications easier and less expensive for the Enterprise Education customers. Join us for an informative webinar on tools AWS provides to elastically scale your architecture to avoid underutilized resources while reducing complexity with templates, partners, and tools to do much of the heavy lifting of creating and running a website for you.
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018Amazon Web Services
Amazon Relational Database Service (Amazon RDS) is a fully managed relational database service that enables you to launch an optimally configured, secure, and highly available database with just a few clicks. It manages time-consuming database administration tasks, freeing you to focus on your applications and business. We review the capabilities of the service and review the latest available featurese.
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017Amazon Web Services
At Netflix, we make the best use of Amazon EC2 instance types and features to create a high- performance cloud, achieving near bare-metal speed for our workloads. This session summarizes the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and helps you improve performance, reduce latency outliers, and make better use of EC2 features. We show how to choose EC2 instance types, how to choose between Xen modes (HVM, PV, or PVHVM), and the importance of EC2 features such SR-IOV for bare-metal performance. We also cover basic and advanced kernel tuning and monitoring, including the use of Java and Node.js flame graphs and performance counters.
Amazon EC2 changes the economics of computing and provides you with complete control of your computing resources. It is designed to make web-scale cloud computing easier for developers. In this session, we will take you on a journey, starting with the basics of key management and security groups and ending with an explanation of Auto Scaling and how you can use it to match capacity and costs to demand using dynamic policies. We will also discuss tools and best practices that will help you build failure resilient applications that take advantage of the scale and robustness of AWS regions.
Similar to Re invent announcements_2016_hcls_use_cases_mchampion (20)
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
2. The 5 Pillars of AWS Well-Architected Framework
• Global Accessibility
• Agility
• Scalability
• Breadth of Functionality
• Pay-As-You-Go-Pricing• Shared Model of
Responsibility
Slide courtesy of
5. Summary
Service Availability Regions
Organisations Preview us-east-1
New Instance Types Mixed Mixed
Lightsail Now us-east-1
Athena Now us-east-1, us-west-2
Polly Now
us-east-1, us-west-2, us-east-2,
eu-west-1
Rekognition Now us-east-1, us-west-2, eu-west-1
Lex Preview us-east-1
Aurora with
PostgreSQL
Preview us-east-1
Greengrass Preview
us-east-1, us-west-2, eu-west-
1, ap-southeast-2, ap-
northeast-2, ap-northeast-1, ap-
southeast-1
Snowball Edge Now Same as Snowball
Snowmobile Now All regions
AppStream 2.0 Now
us-east-1, us-west-2, eu-west-
1, ap-northeast-1
Shield Now Standard Everywhere
Elastic GPU Preview us-east-1
Service Availability Regions
Shield Advanced Now
us-east-1, us-west-2,
eu-west-1, ap-
northeast-1
CodeBuild Now
us-east-1, us-west-2,
eu-west-1
Batch Preview us-east-1
Step Functions Now
us-east-1, us-east-2,
us-west-2, eu-west-1,
ap-northeast-1
OpsWorks for
Chef Automate
Now
us-east-1, us-west-2,
eu-west-1
EC2 Systems Manager Now Most regions
Personal Health Dashboard Now All regions
Blox Now N/A
X-Ray Preview
All Regions except cn-
north-1, us-gov-west-1
Pinpoint Now us-east-1
Lambda@Edge Preview All edge locations
Glue Pre-announcement -
*Services included in my deeper-dive
8. New Instance Types
General Purpose Computing
• T2 (Burstable CPU)
• Added t2.xlarge, t2.2xlarge
• Available today (including
GovCloud)
• R3 → R4 (Memory Optimized)
• Upgraded from Ivy Bridge to
Broadwell
• Added 16xlarge with 488 GiB RAM
• Available today (including
GovCloud)
• C4 → C5 (Compute Optimized)
• Upgraded from Haswell to Skylake
• Available 2017
High Performance I/O
• I2 → I3
• 5 sizes, 3.3M IOPS for random
reads. 8 GB/s total throughput for
sequential reads
• Ideal for transactional workloads,
high performance databases, real
time analytics, NoSQL databases
• Upgraded from Ivy Bridge to
Broadwell
• SSDs upgraded to NVMe-based
drives
• 3.3 million random IOPS (4
kiB block size)
• 16 GB/sec throughput
• Up to 64 vCPUs, 488 GiB RAM,
15.2 TB storage (vs. 32/244/6.4)
• Added encryption at rest
9. Recently Released Instance Types
• X1 memory-optimized Instances
• 1,952 GiB of DDR4 based memory, 8x the memory offered by any other Amazon EC2
instance
• Each X1 instance is powered by four Intel® Xeon® E7 8880 v3 (Haswell) processors
and offers 128 vCPUs
• G2 → P2 GPU Instances
• P2’s provide up to 16 NVIDIA K80 GPUs, 64 vCPUs and 732 GiB of host memory
• combined 192 GB of GPU memory, 40 thousand parallel processing cores, 70 teraflops
of single precision floating point performance, and over 23 teraflops of double precision
floating point performance.
• GPUDirect™ (peer-to-peer GPU communication) capabilities for up to 16 GPUs, so that
multiple GPUs can work together within a single host.
• ENA-based Enhanced Networking for cluster P2 instances
10. Elastic GPUs
Name GPU Memory
eg1.medium 1 GiB
eg1.large 2 GiB
eg2.xlarge 4 GiB
eg2.2xlarge 8 GiB
OpenEye Scientific on AWS
• Flexible: Elastic GPUs come in a wide range of sizes and attach a wide range of EC2 instances
allowing users to scale GPUs independent of CPU and RAM.
• Cost-effective: fraction the cost of purchasing full GPU instances
• OpenGL-compliant: run any graphics-intensive application
• Workstation quality: capable of running a wide range of demanding graphics workload, such as 3D
modeling. Streaming options include NICE Desktop Cloud Visualization
11. Machine Learning & AI in Medical Software
• High-level Performance & Scalable
• CPU/GPU Cluster Networks
• Support up to 20Gbps of
low-latency networking
• Elastic GPUs
• ELB vs EFS (Elastic File
System)
• Application Elastic Load
Balancing
• ECS RunTask & Blox
12.
13. Amazon EFS & Amazon EBS
• EBS GP2 (SSD) would have a max Volume throughput of 160MB/s and
1,250MB/s to the instance.
• EFS throughput is tied to the provisioned storage in the EFS volume
• All file systems deliver a consistent baseline performance of 50 MB/s per
TB of storage
• All file systems (regardless of size) can burst to 100 MB/s
• File systems larger than 1TB can burst to 100 MB/s per TB of storage.
• As you add data to your file system, the maximum throughput available
to the file system scales linearly and automatically with your storage.
14. EC2 F1 instances
Preview
F1 is the first compute instance with customer programmable FPGA
hardware for application acceleration
• Speed up applications 30x with dedicated access to high
performance FPGAs
• Includes Hardware Developer Kit (HDK) and developer AMI
• Amazon FPGA Images (AFI) can be reused across F1 instances
• Ideal for data flow applications including transcoding, financial and
risk modeling, genomic analysis, big data processing, and large
scale simulations
15. EC2 F1 instances
Preview
• Vivado Design Suite
supported for ultra high
productivity with next
generation C/C++ and
IP-based design
16. We have developed DRAGEN™,
the world’s first bioinformatics
processor that uses a field-
programmable gate array (FPGA) to
provide hardware-accelerated
implementations of genome pipeline
algorithms, genome data
compression, and has been shown
to speed whole genome data
analysis from hours to minutes,
while maintaining high accuracy
and reducing costs”
Pieter van Rooyen, Ph.D.
Chief Executive Officer of Edico Genome
”
“
• The reconfigurable DRAGEN™ Bio-IT
Platform can be loaded with highly
optimized algorithms including Whole
Genome or Exome, RNAseq,
Methylome, Microbiome and Cancer.
• Creating an AWS product offering
(currently workflows are provided as a
service)
17. AWS CodeBuild
• Fully managed build service that compiles source code, runs tests, and
produces software packages that are ready to deploy.
18. Introducing AWS Batch
Preview - us-east-1
Easily and efficiently run hundreds of thousands
of batch computing jobs on AWS.
Fully Managed
Eliminates the need to operate
batch processing solutions
Cost Optimized Resource
Provisioning
Dynamically scales compute
resources to any quantity
required
Integrated with AWS
Natively integrated with the AWS
platform
19. AWS Batch Key Features
Dynamic Spot Bidding
Integrated Monitoring and Logging
Fine-grained Access Control
Priority-based Job Scheduling
Granular Job Definitions
Simple Job Dependency Modeling
Support for Popular Workflow Engines
Dynamic Compute Resource Provisioning and Scaling
ECS Task Placement Engine will manage EC2 instances and containers
according to job specifications like binpacking, spread, affinity.
EC2 Container Service (ECS) Feature
20. Introducing Blox Available Now
A collection of open source projects for container management and orchestration
on Amazon ECS
• Customized scheduling and orchestration
• Build schedulers and integrate third-party schedulers on top of ECS
• Leverage Amazon ECS to fully manage and scale your clusters.
Build Custom Schedulers
End to End Developer
Experience
Integrate 3rd Party
Schedulers
21. Introducing AWS Step Functions
Available Now - us-east-1, us-east-2, us-west-2, eu-west-1, ap-northeast-1
Coordinate distributed applications using visual workflows
• Visually arrange components as a series of steps – in a matter
of minutes
• Triggers and tracks each step so applications execute in order
and as expected
• Handles errors with built-in retry and fallback
• Works with AWS Lambda, Amazon EC2, Amazon EC2
Container Service (ECS), Amazon CloudWatch, and Auto
Scaling
22. Compared to other available methods, GT-Scan2
identifies genomic location with higher sensitivity
and specificity …[and]... democratizes the ability
to find optimal CRISPR target sites by offering
this complex computation as a cloud-service
using AWS Lambda functions [in an architecture
with DynamoDB, API gateway, S3, and SNS]
Denis Bauer, PhD
Team Leader, Transformational Bioinformatics, CSIRO
”
“ CRISPR-Cas9 technology can be used
to recognize and edit specific locations
in the genome by pattern-matching
unique sequences of DNA
New genome editing technology
introduces revolutionary way of
approaching Cancer treatment
Identification of robust on target
CRISPR sites is computationally
intensive and time-sensitive
Cloud services for computationally guided genome engineering
Researchers at the Commonwealth Scientific and Industrial Research Organization (CSIRO) in
Australia used AWS resources to power an innovative tool (GT-Scan2) for genome editing engineering
23. Other Announcements
Lambda C# support.
Lambda@Edge allows you to run JavaScript code on the AWS edge
network. (Preview)
Amazon Lightsail provides fixed price virtual private servers. (GA)
26. Amazon Athena
Analyze large amounts of data stored in Amazon S3 using
standard SQL. No hardware to run.
S3 Bucket
Simple Workflow Service
(SWF)
EC2 analysis instances
managed by SWF
Transformed
data
RedShift or RDS
database
Athena
27. Amazon Athena
Based on Presto distributed SQL engine.
Parallelized queries across hundreds or
thousands of cores.
Can process a variety of data formats,
including:
• JSON
• CSV
• Log files
• Text with custom delimiters
• Apache Parquet (columnar format)
• Apache ORC (columnar format)
Available today in N. Virginia
and Oregon.
30. Amazon Polly
Text to speech service that can be integrated into your own
applications.
• Supports 47 male and female voices.
• Handles 24 different languages.
• Works well with unadorned text.
• Context sensitive: “I live on Main St.” vs. “St. Teresa”
• Also supports Speech Synthesis Markup Language for more detailed
pronunciation.
Available today in Northern Virginia, Oregon, Ohio, and Ireland.
31. Amazon Lex
Build conversational text and voice interfaces.
• Automatic Speech Recognition (ASR)
• Natural Language Understanding (NLU)
• Same technology that powers Amazon
Alexa.
32. OhioHealth Speech Recognition & ML Mobile App
Behind the scenes ML algorithm workflow
OhioHealth Mobile App
We are excited about utilizing
evolving speech recognition and
natural language processing
technology to enhance the lives of
our customers. Amazon Lex
represents a great opportunity for
us to deliver a new experience to
our patients
Michael Krouse,
Senior Vice President Operational Support and
Chief Information Officer, OhioHealth
”
“
36. AWS Snowball Edge
Improves on Snowball:
More Connectivity
• 10GBASE-T, 10/25 Gb SFP28, 40 Gb QSFP+
• 3G cellular, Wi-Fi for IoT devices
• PCIe expansion port
More Storage
• 100 TB of storage
Horizontal Scaling and Clustering
• Add capacity and durability
• Rack mountable
37. AWS Snowball Edge
New Storage Endpoints
• Supports a subset of the S3 API.
• NFS v3 and v4.1 support.
• File and directory metadata are mapped to S3 metadata.
Local Lambda Functions
• Can filter, clean, analyze, track, summarize data as it
arrives.
• Python 2.7, 128 MB environment support.
39. AWS Snowmobile
Import up to 100 PB of data
Standard shipping container (45’ long x 9.6’ high x 8’ wide); water-proof
and climate-controlled
350 kW power required (AWS can arrange for a generator)
Data is encrypted using KMS keys
Chain-of-custody tracking; video surveillance; GPS tracking with
cellular and satellite telemetry; security vehicle escort; on-premises
security guards available
Multiple 40 Gb/s connections; 1 Tb/s aggregate throughput
41. AWS Organizations
Organizations allows you to create groups of AWS
accounts to more easily manage security and automation
settings.
• Centrally manage multiple accounts to help you scale.
• Control which AWS services are available to individual accounts
• Automate new account creation
• Simplify billing
In preview.
42. AWS Shield
Managed DDoS protection.
All AWS customers benefit from the automatic protections of AWS Shield
Standard, at no additional charge. AWS Shield Standard defends against most
common, frequently occurring network and transport layer DDoS attacks that
target your web site or applications.
AWS Shield Advanced provides enhanced detection and mitigation customized
to your application. The AWS DDoS Response Team (DRT) applies manual
mitigations against more complex and sophisticated DDoS attacks. DDoS cost
protection provides a safeguard against scaling charges as a result of a DDoS
attack.
Available now.
43. Mia D. Champion, PhD
Technical & Research Computing
Business Development Manager
miachamp@amazon.com
Editor's Notes
Expanded T2 Instances – The T2 instances offer great performance for workloads that do not need to use the full CPU on a consistent basis. Our customers use them for general purpose workloads such as application servers, web servers, development environments, continuous integration servers, and small databases. We’ll be adding the t2.xlarge (16 GiB of memory) and the t2.2xlarge (32 GiB of memory). Like the existing T2 instances, the new sizes will be offer a generous amount of baseline performance (up to 4x that of the existing instances), along with the ability to burst to entire core when you need more compute power.
New R4 Instances – The R4 instances are designed for today’s memory-intensive Business Intelligence, in-memory caching, and database applications and offer up to 488 GiB of memory. The R4 instances improve on the popular R3 instances with a larger L3 cache and higher memory speeds. On the network side, the R4 instances support up to 20 Gbps of ENA-powered network bandwidth when used within a Placement Group, along with 12 Gbps of dedicated throughput to EBS. Instances are available in six sizes, with up to 64 vCPUs and 488 GiB of memory.
C5 instances include the next generation of Intel’s Xeon processors (code named Skylake) with AVX 512 and up to 72 vCPUs (twice that of previous generation compute-optimized instances) and 144 GiB of memory, making them the highest price to compute performance of any Amazon EC2 instance. C5 instances also feature new AWS hardware acceleration that delivers three times the Amazon EBS bandwidth of C4 instances. C5 instances are ideal for compute-intensive scientific modeling, financial operations, machine learning, and distributed analytics that require high performance for floating point calculations.
.
I3 instances are ideal for the most demanding input/output (I/O) intensive relational databases, NoSQL databases, transactional systems, and analytics workloads. I3 instances have 15.2 TB of fast, low latency locally attached storage backed by Non Volatile Memory Express (NVMe) based SSDs to deliver up to nine times the IOPs as the previous generation with 3.3 million random IOPS at 4 KB block size with a total I/O throughput of 16 GB per sec.
Intel Processors :
Nehalem: Intel Core i7 875KSandy Bridge: Intel Core i7 2600KIvy Bridge: Intel Core i7 3770KHaswell: Intel Core i7 4790KBroadwell: Intel Core i7 5775C
Elastic GPUs allow customers to add low-cost graphics acceleration to Amazon EC2 instances over the network
Flexible: Elastic GPUs come in a wide range of sizes and attach a wide range of EC2 instances
Cost-effective: fraction the cost of purchasing full GPU instances
OpenGL-compliant: run any graphics-intensive application
Workstation quality: capable of running a wide range of demanding graphics workload, such as 3D modeling
Elastic GPU is a new Amazon EC2 feature which provides flexible, low-cost, workstation-quality and OpenGL-compliant graphics acceleration to existing EC2 instance types. With Elastics GPUs, customers with graphics applications are no longer constrained to fixed hardware configurations and limited GPU selection. Customers now have the flexibility to select from a wide range of Amazon EC2 instances and configure the amount of graphics acceleration for their workloads. The actual physical GPU device is invisible to customers and they do not need to know which GPU is serving their API calls.
Customers can get graphics acceleration by simply specifying a flag when launching an instance and installing software. There’s no need to worry about configuring the GPU device and tuning vendor specific driver parameters. There’s no confusion about choosing the right GPU card. Everything is managed by AWS.
Build conversational text and voice interfaces
Text to speech service that can be integrated into your own applications.
Supports 47 male and female voices.
Handles 24 different languages.